[ExI] 23andTriangulation

Tue Jul 9 23:33:29 UTC 2013

On Tue, Jul 9, 2013 at 6:46 PM, spike <spike at rainier66.com> wrote:
> Ideas please: as a test, is there a way to get a dozen of us, or fewer, make
> an excel file with columns of 64 numbers, each a random integer between 0
> and 255.  We define as sisters those columns which share an average of 32
> numbers.  Cousin columns share 16, second cousins 8 and so on.  Next I form
> the triads, and see if I can get the algorithm to find which columns are
> related to which.
>
> From that, I should be able to figure out which column is related to which
> by triangulation, ja?

That is not what we mean with the term "Genetic Algorithm"  :)

I don't think it would be difficult to implement your matching
algorithm.  I'm not really clear on how you define the triads.  It
sounds like you are going to do that manually until you have enough
practice with it to describe an algorithm.

When I read triangulation from two points, I thought it sounded more
like interpolation - but that's an even weirder term to explain.

I also imagined the classical problem solving strategies you might use
as a model.  I considered bin-filling.  I thought about showing
"relatedness" via a graph, then finding the Eulerian path that shows
how everyone gets to be in this big family.  I thought about the
classic 'family' tree notation, but we don't have obvious/intuitive
tools for matching trees like puzzle pieces to see how they either
overlap/connect.  We also aren't that good at multi-dimensional Venn
diagrams.  Actually, Venn diagrams probably is the easiest way to
display the trivial case of "my [grand]parents" and "your
[grand]parents" intersection has a set-size of 0, 1, ..., n

Since I'm this far down the rabbit hole, let me ask:  Are 64 random
numbers 0-255 going to form sets in a sufficiently realistic way to
model the people you are trying to trace?  I feel like 256/64=4 isn't
a large enough number to prevent everyone from being related to
everyone else - or is that the point you are trying to make?  Does
that scale to 'matching' on the number of data points 23andMe is
reporting from the entire genome?  (All genes/Small#genes=Large Scalar
:: different from toy case)

Back to your excel sheet (the hammer to all your computing nails, eh?)
 Are you interested in gene-values matching in the same ordinal
position?  Are you interested in more than one match in series?  I
don't know what any of this would mean, but I'm sure it'd be
discovered as relevant at some point.

> In real life, I am working on forming my first actual triad with actual DNA
> relatives, both of which are dedicated genealogists.  I am having a hell of
> a time explaining the concept.

I like the puzzle idea.  Sometimes a family is like a bunch of pieces
all locked together.  Sometimes one puzzle piece locks together two
previously disconnected groups of pieces and you can see more clearly
what is the big picture.

> Actually anyone here is welcome to either instruct on the
> rightness/wrongness of even thinking about doing something like this, or
> alternately, offer much-needed assistance with the algorithm.

Have you tried Googledocs spreadsheets yet?  Maybe it doesn't have all
the awesome as Excel, however gdocs are free and can easily be
shared/collaborated upon in realtime.  That's gotta be worth something
in this instance.  :)