## Genetic Probabilities & Code Revisited

### February 23, 2007

‘Someone’ commented that the probability matrix used in the basic genetics code was not necessarily explained very well. I was worried about this because, for some reason, I find it hard to explain. So this post will cover it in more detail.

If you haven’t already been there, I suggest you go over ‘A Bit of Genetics’ to get the background, although I’ll actually go over most of it again here.

I’ve noticed that a lot of people get to this blog through Internet searches for ‘eye colour allele’ (WordPress tells me these things; *they’re watching you!*) So I’ll describe the genetics in this context. It should be noted, that in reality eye colour is much more complex than the blue/brown eye example so the outcomes are not *really* what I describe here; this is a simplified explanation!

So, we have a single genetic locus which contains two copies of the eye colour gene. The ‘a’ gene does nothing to the colour of the eye, while the ‘A’ gene creates brown pigment proteins that cause the iris of the eye to become brown.

There are three possible combinations of this pair of genes:

aa

aA

AA

Because the ‘a’ allele is effectively ’empty’ and does nothing, while the ‘A’ allele is active, individuals will have brown eyes if they carry even a single copy of the ‘A’ gene. We call the ‘A’ gene ** dominant** because it overrides the ‘a’ gene. The ‘a’ gene is

*recessive.*Individuals that have ‘aa’ at the locus are referred to as ‘** homozygous recessive**‘. These individual will have blue eyes.

Homozygous comes from the two words (Ancient Greek, I think); ‘homo’ meaning ‘the same’ and ‘zygous’ referring to an egg. This is because an ‘aa’ parent can only give ‘a’ genes to its offspring.

Individuals that have ‘AA’ at the locus are ‘** homozygous dominant**‘. These individuals will have brown eyes. An ‘AA’ parent can only pass on ‘A’ genes to its offspring.

Individuals that have ‘aA’ at the locus are ** heterozygous**.

Heterozygous some from the two words; ‘hetero meaning ‘different’ and ‘zygous’ referring to an egg. This is because an ‘aA’ parent can pass on *different* genes to its offspring.

When two individuals mate and produce offspring, the offspring inherits one copy of the eye gene from each parent. This determines the genotype (and the eye colour) of the offspring. The specific copy of the gene they inherit is random. It’s a 50/50 chance that it’s either one. For homozygous parents this makes no difference, but for heterozygous parents if means that there’s an equal chance for each of the genes to be passed on for each offspring.

Typically, to show the possible offspring that can be produced by mating two individuals, a simple outcome matrix is used. This takes the form:

Parent A has alleles A1 and A2, while parent B has alleles B1 and B2. There are four possible combinations of offspring: A1B1, A2B1, A1B2, A2B2.

Lets look at the matrices for each of the six possible mating combinations.

#### 1. An ‘aa’ parent mates with another ‘aa’ parent.

Here’s the matrix:

You can see from this that all the offspring will be ‘aa’. Therefore if both parents have blue eyes, the offspring will have blue eyes. (*Remember, reality is more complex so if you and your spouse have blue eyes don’t assume the worst if your children don’t!*)

The probability list for this mating is therefore:

- aa 1
- aA 0
- AA 0

How do we get this? Well, there are four possible outcomes from the matrix and they are all ‘aa’. Therefore the probability of an ‘aa’ offspring is 4/4 = 1.

NOTE: Probabilities are measured from 0 to 1. So a 50% chance is written as a probability of 0.5. This makes the arithmetic for statistical calculations easier. I’m not going to explain probabilities at the moment, just accept that I write them as values between 0 and 1 andnotas percentages from 0 to 100.

#### 2. An ‘AA’ parent mates with another ‘AA’ parent.

Here’s the matrix:

You can see from this that all the offspring will be ‘AA’; All their offspring will have brown eyes.

The probability list for this mating is therefore:

- aa 0
- aA 0
- AA 1

#### 3. An ‘aA’ parent mates with another ‘aA’ parent.

Here’s the matrix:

This shows that all offspring types are possible! Some some offspring will have blue eyes, but most will have brown eyes.

The probability matrix for this mating is:

- aa 0.25
- aA 0.5
- AA 0.25

Where do these come from? There are four offspring types possible.

One of these types is ‘aa’. Therefore its probability is 1/4, which is 0.25.

One of these offspring types is ‘AA’. Therefore its probability is 1/4, which is 0.25.

The last two offspring types are actually the same; ‘aA’ is the same as ‘Aa’. Therefore the probability is 2/4, which is 0.5.

#### 4. An ‘AA’ parent mates with an ‘aA’ parent.

Here’s the matrix:

In this case there are two possible outcomes; 2 ‘aA’ and 2 ‘AA’. All of the offspring will have brown eyes.

The probabilities for this mating are:

- aa 0
- aA 0.5
- AA 0.5

#### 5. An ‘aa’ Parent mates with an ‘aA’ parent.

Here’s the matrix:

In this case there are two possible outcomes; ‘aa’ offspring with blue eyes and ‘aA’ offspring with brown eyes. The probabilities are:

- aa 0.5
- aA 0.5
- AA 0

#### 6. An ‘aa’ parent mates with an ‘AA’ parent.

In this case there is only one outcome; all offspring will be ‘aA’ and will have brown eyes. The probabilities are:

- aa 0
- aA 1.0
- AA 0

#### The Probability Line.

OK, so that’s all the *theoretical* outcomes. How do we add these to a simulation? We use the probabilities in a additive fashion to make a probability line. For example, the probability line for an ‘aA’ parent mating with an other ‘aA’ parent (cross number three in the above list) would look like this:

So the ‘aa’ offspring outcome has 25% of the line, the ‘aA’ offspring outcome has 50% of the line and the ‘AA’ offspring outcome has 23% of the line. The line spans from 0.0 to 1.0. We can out this into an array of floating point numbers as:

float[] probabilities = new float[] { 0.25, 0.75, 1.0 };

Now if we were to generate a random number between 0 and 1 and test against the values in the probability line, we would get an outcome:

System.Random r = new Random(); double off = r.NextDouble(); Locus ret; if (off <= probabilities[0]) ret = locus_enum.aa; else if (off <= probabilities[1]) ret = locus_enum.aA; else if (off <= probabilities[2]) ret = locus_enum.AA;

If we ran this code 1000000 times and recorded the results, we would find that *about* 25% of the returned loci were ‘aa’ (blue eyes), *about* 50% were ‘AA’ (brown eyes) and *about* 25% were ‘aA’ (brown eyes, but carrying a blue eye gene).

Note that I said ** ABOUT** 25%. Because we’re using random numbers there’s always a chance that the exact outcome (from probability arithmetic) will not happen. I shall explain how this happens in the real world later when I discuss

*genetic drift*in another post.

This is the basis for a simulation.

All that remains is to store the probability lines for all 6 of the possible parental crosses. We can store this in a 3x3x3 matrix, which we’ll call the probability matrix:

In C# this comes out as:

// Prepare the offspring probability matrix. p_offspring = new float[,,] { { {1.0F , 1.0F , 1.0F }, {0.5F , 1.0F , 1.0F }, {0.0F , 1.0F , 1.0F } }, { {0.5F , 1.0F , 1.0F }, {0.25F, 0.75F, 1.0F }, {0.0F , 0.5F , 1.0F } }, { {0.0F , 1.0F , 1.0F }, {0.0F , 0.5F , 1.0F }, {0.0F , 0.0F , 1.0F } } };

The ‘F’ character just forces the value to be a ‘float’ type. Don’t worry about it.

Now, we use a little slight-of-hand here; if you read the ‘aa’ allele as a 0, ‘aA’ as a 1 and ‘AA’ as a 2, you can use the genotypes of the parents as indexes into the 3-dimensional array. This correlates to the code we have on the locus object that returns the property named ‘A_Count’.

The indexes of the probability array then become

- p_offspring[A_Count of parent 1, A_Count of parent 2,
] for the probability of an ‘aa’ offspring.*0* - p_offspring[A_Count of parent 1, A_Count of parent 2,
] for the probability of an ‘aA’ offspring.*1* - p_offspring[A_Count of parent 1, A_Count of parent 2,
] for the probability of an ‘AA’ offspring.*2*

So we can code as:

System.Random r = new Random(); double off = r.NextDouble(); Locus ret; if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,0]) ret = locus_enum.aa; else if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,1]) ret = locus_enum.aA; else if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,2]) ret = locus_enum.AA;

Where parentOne and parentTwo are the two parents.

This, with a little extra error checking and statistical data collection, is the content of the Locus.Cross method in the heart of our code.

Don’t expect to understand this immediately. If you’re a programmer, take the code and try it out. Step through it a few times. It still takes me a little time to get it all straight in my head, even though I’m the one that wrote it.

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Written while listening to:

Slide In (Dfa remix) – Goldfrapp by Various Artists from the album A-Z Of Annie Mac [UK] Disc 1