‘Someone’ commented that the probability matrix used in the basic genetics code was not necessarily explained very well.  I was worried about this because, for some reason, I find it hard to explain. So this post will cover it in more detail.

If you haven’t already been there, I suggest you go over ‘A Bit of Genetics’ to get the background, although I’ll actually go over most of it again here.

I’ve noticed that a lot of people get to this blog through Internet searches for ‘eye colour allele’ (WordPress tells me these things; they’re watching you!) So I’ll describe the genetics in this context. It should be noted, that in reality eye colour is much more complex than the blue/brown eye example so the outcomes are not really what I describe here; this is a simplified explanation!

So, we have a single genetic locus which contains two copies of the eye colour gene.  The ‘a’ gene does nothing to the colour of the eye, while the ‘A’ gene creates brown pigment proteins that cause the iris of the eye to become brown.

There are three possible combinations of this pair of genes:

aa
aA
AA

Because the ‘a’ allele is effectively ’empty’ and does nothing, while the ‘A’ allele is active, individuals will have brown eyes if they carry even a single copy of the ‘A’ gene. We call the ‘A’ gene dominant because it overrides the ‘a’ gene.  The ‘a’ gene is recessive.

Individuals that have ‘aa’ at the locus are referred to as ‘homozygous recessive‘.  These individual will have blue eyes.

Homozygous comes from the two words (Ancient Greek, I think); ‘homo’ meaning ‘the same’ and ‘zygous’ referring to an egg.  This is because an ‘aa’  parent can only give ‘a’ genes to its offspring.

Individuals that have ‘AA’ at the locus are ‘homozygous dominant‘. These individuals will have brown eyes. An ‘AA’ parent can only pass on ‘A’ genes to its offspring.

Individuals that have ‘aA’ at the locus are heterozygous.

Heterozygous some from the two words; ‘hetero meaning ‘different’ and ‘zygous’ referring to an egg. This is because an ‘aA’ parent can pass on different genes to its offspring.

When two individuals mate and produce offspring, the offspring inherits one copy of the eye gene from each parent. This determines the genotype (and the eye colour) of the offspring. The specific copy of the gene they inherit is random.  It’s a 50/50 chance that it’s either one.  For homozygous parents this makes no difference, but for heterozygous parents if means that there’s an equal chance for each of the genes to be passed on for each offspring.

Typically, to show the possible offspring that can be produced by mating two individuals, a simple outcome matrix is used.  This takes the form:

Inheritance matrix explanation.

Parent A has alleles A1 and A2, while parent B has alleles B1 and B2.  There are four possible combinations of offspring: A1B1, A2B1, A1B2, A2B2.

Lets look at the matrices for each of the six possible mating combinations.

1. An ‘aa’ parent mates with another ‘aa’ parent.

Here’s the matrix:

Inheritance matrix for aa by aa.

 You can see from this that all the offspring will be ‘aa’. Therefore if both parents have blue eyes, the offspring will have blue eyes. (Remember, reality is more complex so if you and your spouse have blue eyes don’t assume the worst if your children don’t!)

The probability list for this mating is therefore:

  • aa 1
  • aA 0
  • AA 0

How do we get this?  Well, there are four possible outcomes from the matrix and they are all ‘aa’.  Therefore the probability of an ‘aa’ offspring is 4/4 = 1.

NOTE: Probabilities are measured from 0 to 1.  So a 50% chance is written as a probability of 0.5. This makes the arithmetic for statistical calculations easier. I’m not going to explain probabilities at the moment, just accept that I write them as values between 0 and 1 and not as percentages from 0 to 100.

2. An ‘AA’ parent mates with another ‘AA’ parent.

Here’s the matrix:

Interitance matrix for AA x AA

You can see from this that all the offspring will be ‘AA’; All their offspring will have brown eyes.

The probability list for this mating is therefore:

  • aa 0
  • aA 0
  • AA 1

3. An ‘aA’ parent mates with another ‘aA’ parent.

Here’s the matrix:

Inheritance matrix for aA x aA

This shows that all offspring types are possible! Some some offspring will have blue eyes, but most will have brown eyes.

The probability matrix for this mating is:

  • aa 0.25
  • aA 0.5
  • AA 0.25

Where do these come from?  There are four offspring types possible.

One of these types is ‘aa’.  Therefore its probability is 1/4, which is 0.25.

One of these offspring types is ‘AA’.  Therefore its probability is 1/4, which is 0.25.

The last two offspring types are actually the same; ‘aA’ is the same as ‘Aa’.  Therefore the probability is 2/4, which is 0.5.

4. An ‘AA’ parent mates with an ‘aA’ parent.

Here’s the matrix:

Inheritance matrix for aa x aA

In this case there are two possible outcomes; 2 ‘aA’ and 2 ‘AA’. All of the offspring will have brown eyes.

The probabilities for this mating are:

  • aa 0
  • aA 0.5
  • AA 0.5

5. An ‘aa’ Parent mates with an ‘aA’ parent.

Here’s the matrix:

Inheritance matrix for aa x aA

In this case there are two possible outcomes; ‘aa’ offspring with blue eyes and ‘aA’ offspring with brown eyes. The probabilities are:

  • aa 0.5
  • aA 0.5
  • AA 0

6. An ‘aa’ parent mates with an ‘AA’ parent.

Inheritance matrix for aa x aA

 In this case there is only one outcome; all offspring will be ‘aA’ and will have brown eyes. The probabilities are:

  • aa 0
  • aA 1.0
  • AA 0

The Probability Line.

OK, so that’s all the theoretical outcomes.  How do we add these to a simulation?  We use the probabilities in a additive fashion to make a probability line. For example, the probability line for an ‘aA’ parent mating with an other ‘aA’ parent (cross number three in the above list) would look like this:

Probability Line

 So the ‘aa’ offspring outcome has 25% of the line, the ‘aA’ offspring outcome has 50% of the line and the ‘AA’ offspring outcome has 23% of the line.  The line spans from 0.0 to 1.0.  We can out this into an array of floating point numbers as:

float[] probabilities = new float[] { 0.25, 0.75, 1.0 };

Now if we were to generate a random number between 0 and 1 and test against the values in the probability line, we would get an outcome:

System.Random r = new Random();
double off = r.NextDouble();
Locus ret;

if (off <= probabilities[0])
    ret = locus_enum.aa;
else if (off <= probabilities[1])
    ret = locus_enum.aA;
else if (off <= probabilities[2])
    ret = locus_enum.AA;

If we ran this code 1000000 times and recorded the results, we would find that about 25% of the returned loci were ‘aa’ (blue eyes), about 50% were ‘AA’ (brown eyes) and about 25% were ‘aA’ (brown eyes, but carrying a blue eye gene).

Note that I said ABOUT 25%. Because we’re using random numbers there’s always a chance that the exact outcome (from probability arithmetic) will not happen.  I shall explain how this happens in the real world later when I discuss genetic drift in another post.

This is the basis for a simulation.

All that remains is to store the probability lines for all 6 of the possible parental crosses.  We can store this in a 3x3x3 matrix, which we’ll call the probability matrix:

Probability cube

 In C# this comes out as:

// Prepare the offspring probability matrix.
p_offspring = new float[,,] {
                {   {1.0F , 1.0F , 1.0F  },
                    {0.5F , 1.0F , 1.0F  },
                    {0.0F , 1.0F , 1.0F  }   
                },               
                {   {0.5F , 1.0F , 1.0F  },
                    {0.25F, 0.75F, 1.0F  },
                    {0.0F , 0.5F , 1.0F  }   
                },               
                {   {0.0F , 1.0F , 1.0F  },
                    {0.0F , 0.5F , 1.0F  },
                    {0.0F , 0.0F , 1.0F  }   
                }
                };

The ‘F’ character just forces the value to be a ‘float’ type. Don’t worry about it.

Now, we use a little slight-of-hand here;  if you read the ‘aa’ allele as a 0, ‘aA’ as a 1 and ‘AA’ as a 2, you can use the genotypes of the parents as indexes into the 3-dimensional array.  This correlates to the code we have on the locus object that returns the property named ‘A_Count’.

The indexes of the probability array then become

  • p_offspring[A_Count of parent 1, A_Count of parent 2, 0] for the probability of an ‘aa’ offspring.
  • p_offspring[A_Count of parent 1, A_Count of parent 2, 1] for the probability of an ‘aA’ offspring.
  • p_offspring[A_Count of parent 1, A_Count of parent 2, 2] for the probability of an ‘AA’ offspring.

So we can code as:

System.Random r = new Random();
double off = r.NextDouble();
Locus ret;

if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,0])
    ret = locus_enum.aa;
else if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,1])
    ret = locus_enum.aA;
else if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,2])
    ret = locus_enum.AA;

Where parentOne and parentTwo are the two parents.

This, with a little extra error checking and statistical data collection, is the content of the Locus.Cross method in the heart of our code.

Don’t expect to understand this immediately.  If you’re a programmer, take the code and try it out.  Step through it a few times.  It still takes me a little time to get it all straight in my head, even though I’m the one that wrote it.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Written while listening to:

Slide In (Dfa remix) – Goldfrapp by Various Artists from the album A-Z Of Annie Mac [UK] Disc 1

Advertisements

OK, so we can simulate individual genetic loci.  Next we need to group them together into individual members of a population of animals.

The next C# class needed is the individual:

public class Individual

An individual is mainly a data collection.  In this simple example we’re going to use an array of loci to encode an individual’s height:

// Default number of loci in height array
const int arraySize = 10;

// The height array
private Locus[] heightLoci = new Locus[arraySize];

Pretty simple stuff.  We have several options on how do decode this array of loci into a height value.  The simplest is to interpret the loci as additive genes.  The idea is that an ‘a’ locus had no effect on height, while an ‘A’ locus adds 1 unit to the individuals height. This allows us to take the locus’ A_Count property and add them together:

///<summary>
/// Gets the height of this individual.
///</summary>
public int Height
{
    get
    {
        // Add up the A_Count values of the loci in the array.
        int height = 0;
        foreach (Locus l in heightLoci)
        {
            height += l.A_Count;
        }
        return height;
    }
}

We could add other data to the individual; sex, age or additional genetic characteristics could all be added to increase the complexity of the simulation.  I might show examples of this another time, but for now we’ll simply store the array of loci affecting height.

The complexity comes in initialising the individual.  We have to randomly assign a locus value to each locus in the individual. We could go off and read a lot about random number generators, hand-crafting our own one with all the latest random number generation methods.  Or we could just use the one that comes with .NET.

We’ll use the basic one. We’re not trying to prove anything scientific here, just show the basic mechanisms of evolution, so the quality of the random number generator isn’t really worth worrying over.

Because we’re using enums for the locus values, we can simply generate numbers between 0 and 2. We’ll do that in the default constructor:

///<summary>
/// Initializes a new instance of the <see cref="T:Individual"/> class.
///</summary>
///<remarks>Initialises the locus array with random values.</remarks>
public Individual()
{
    // Initialise random loci.
    System.Random rand = new Random();

    for (int i = 0; i < arraySize; i++)
    {
        // Locus is taken as random number from 0 to 2
        locus_enum loc = (locus_enum)rand.Next(3);

        // Store frequency of generated loci.
        switch (loc)
        {
            case locus_enum.aa:
                aaCount++;
                break;
            case locus_enum.aA:
                aACount++;
                break;
            case locus_enum.AA:
                AACount++;
                break;
            default:
                break;
        }

        // Generate new locus in array
        heightLoci[i] = new Locus(loc);
    }
}

The System.Random class will generate random numbers sufficient for our needs. By default it will seed itself with the system time so we don’t need to worry about setting it up, we just create the instance and use it.

Essentially we loop through the locus array, setting the locus value to a random number between 0 and 2. This converts easily to our locus_enum enumerated type.

The switch statement in the middle of the method stores the numbers of each type of generated locus in some static variables in the Individual class.  This allows us to check on them for outrageous errors!

We’ll also use a second constructor to enable us to force a locus array into an individual.  This comes in useful for testing and allows us to force mutations in simulations should we wish to do so.

///<summary>
/// Initializes a new instance of the <see cref="T:Individual"/> class.
///</summary>
///<param name="xLoci">The <see cref="Locus"/> array to use to create the individual.</param>
public Individual(Locus[] xLoci)
{
    // Passed array must be same size as default.
    if (xLoci.Length == arraySize)
    {
        heightLoci = xLoci;
    }
    else
    {
        throw new Exception("Incorrect locus array length in Individual(Locus[])");
    }
}

In the main program, I’ve added some simple code to initialise an array of 1000000 individuals.  This forms a simple population.   I have also added a small assembly with some descriptive statistics code which I then use to display frequency data of the heights in the population.

Individual[] population = new Individual[1000000];
DescriptiveStatistics.Histogram histo = new DescriptiveStatistics.Histogram(0, 20, 1);

for(int i = 0; i < 1000000; i++)
{
    population[i] = new Individual();
    histo.Add(population[i].Height);
}

for (int i = 0; i < histo.BucketCount; i++)
{
    Console.WriteLine("{0}t{1}", histo.LimitForBucket(i), histo.CountForBucket(i));
}

I manually copied these into Excel and came out with the following histogram:

Histogram of population height.

Yes, it’s Normally Distributed.

You can get the sourcecode for this blog entry over at the Channel9 Sandbox.

Next time, I’ll go over a little more theory and then we can start mating individuals together, ready to start simulating evolution.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

OK, time for some code.

First we have to represent a single genetic locus.  We are going to use the basic dominant/recessive allele model which shows three versions (alleles) of a specific locus: aa, aA & AA.  We’ll use an ‘enum’ for this one:

 ///<summary>

 /// Enum defining possible genotypes at a single locus.

 ///</summary>

 public enum locus_enum

 {

     aa=0,

     aA=1,

     AA=2

 }

(Apologies for the duff formatting, I can’t seem to get it right on-line).

Notice the numeric values assigned to each one.  We can read the numeric values as ‘the number of dominant alleles’.  This can come in handy later.

We’ll wrap this up with an object called ‘Locus’ to add some methods to this data.

public class Locus
{
   private locus_enum _locus;

   public Locus(locus_enum l)
    {
        _locus = l;
    }

    public bool Phenotype
    {
        get
        {
            return _locus == locus_enum.aa;
        }
    }

    public int A_Count
    {
        get
        {
            return (int)_locus;
        }
    }

    public override string ToString()
    {
        switch (_locus)
        {
            case locus_enum.aa:
                return "aa";
            case locus_enum.aA:
                return "aA";
            case locus_enum.AA:
                return "AA";
            default:
                return "ERROR!";
        }
    }

    private locus_enum locus
    {
        get
        {
            return _locus;
        }
    }
}

Next, we need to be able to manipulate genes. Specifically, we need to be able to cross them using Mendellian genetics.

Now the simple, brute force method would be to randomly select one allele from each locus and then combine them to create an offspring one.

We’re not going to do that, we’re going to use a probability matrix.

We know all the probabilities of producing a specific type of offspring given two parents, so we just need to generate some random numbers and throw them against the probabilities.

We shall take our example from ‘A Bit Of Genetics’:

If both parents are ‘aA’ then the offspring probabilities are:

  • aa — 25%
  • aA — 50%
  • AA — 25%

So we add these probabilities together to create a ‘line of probability’ that goes from 0.0 to 1.0.  We break the line up into chunks of the relative size of the probabilities like this:

Probability line diagram

So we can generate a random number between 0.0 and 1.0, find it on the line and the result is the value for that chunk of the line.  We can do all of this with a probability matrix that covers all the possible parental and offspring values.  The matrix has three dimensions, so it looks like a cube:

Offspring probability cube.

In C# this looks like:

// Prepare the offspring probability matrix.
p_offspring = new float[, ,] {

     {   {1.0F , 1.0F , 1.0F  },
         {0.5F , 1.0F , 1.0F  },
         {0.0F , 1.0F , 1.0F  }   
     },               
     {   {0.5F , 1.0F , 1.0F  },
         {0.25F, 0.75F, 1.0F  },
         {0.0F , 0.5F , 1.0F  }   
     },               
     {   {0.0F , 1.0F , 1.0F  },
         {0.0F , 0.5F , 1.0F  },
         {0.0F , 0.0F , 1.0F  }   
     }
};

We then need a method that will return an offspring when given two parents:

///<summary>
/// Crosses the two specified loci.
///</summary>
///<param name="o"></param>
///<param name="t"></param>
///<returns></returns>
public static Locus Cross(Locus o, Locus t)
{
    System.Random r = new Random();
    double off = r.NextDouble();

    Locus ret;

    /* check to see of the offspring is aa */
    if ((off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aa]) && (p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aa] != 0))
    {
        ret = new Locus(locus_enum.aa);

        switch (o.locus)
        {
            case locus_enum.aa:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.AA:
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }

        return ret;
    }

    /* check to see of the offspring is aA */
    if ((off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aA]) && (p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aA] != 0))
    /*    if (off <= p_offspring[o][t][aA])*/
    {
        ret = new Locus(locus_enum.aA);
        switch (o.locus)
        {
            case locus_enum.aa:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                break;

            case locus_enum.AA:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }
        return ret;
    }

    /* check to see of the offspring is AA */
    if (off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.AA])
    {
        ret = new Locus(locus_enum.AA);
        switch (o.locus)
        {
            case locus_enum.aa:
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.AA:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }
        return ret;
    }

    /* if we got here, there's been an error ! */
    throw new Exception("Probability screw-up in locus_cross()");

    /* return something to keep the compiler happy */
    return new Locus(locus_enum.aa);

}

To make them easily accessible, we’ll embed the probability matrix and the crossing method as static methods of the ‘Locus’ class which they act on.

I have put the code into a ZIP file at the Channel 9 Sandbox.  There is a single Visual Studio 2005 solution with an executable project that does nothing (yet) and a unit test assembly called SimulationTests. If you don’t have Visual Studio 2005, this project will also work in the Visual Studio C# Express version (which is currently free).  The unit tests are written using NUnit which you will have to install as well. So far you can just read the code and run the tests.

I shall be using this project as the basis for all the demonstrations I build up in the future.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Survival of the what?

December 3, 2006

Pretty much everybody knows the phrase ‘ the survival of the fittest’, but it’s pretty shocking how few people actually understand what it means.

It’s that word – fitness. People don’t get it. It’s not intuitive.

Evolutionary fitness is NOT about physical fitness. It’s NOT about being the biggest and the meanest. It’s NOT about being the smarted or the wealthiest. It’s NOT about being an A-type personality who always wins.

It’s about how many of your genes you pass on to the next generation.

There are lots of ways to do this. You can have lots of children by one mate. You can have lots of children by lots of mates.

Or, and this is actually quite important, you can help your relatives to have lots of children.

There’s a phrase ‘inclusive fitness’ which relates to a value called ‘Hamilton’s R value’. R stands for relatedness.

When you have children they each get half of their genes from you. They are therefore 0.5 the same as you. They are given an R-value of 0.5. If your children grow up and have children of their own, your grandchildren will each get 0.5 of the genes of your children. That’s 0.5 of 0.5 of your genes. So your grandchildren have 0.25 of your genes, so an R value of 0.25.

If you have 2 children your inclusive fitness if 2 * 0.5 = 1.

If each of your children then have 2 children then you get and inclusive fitness of 2*0.5 + 4*0.25 = 2.

On the other hand, if you only have one child, but that child has 10 children, your inclusive fitness is 0.5 + 10*0.25 = 3.

Inclusive fitness doesn’t just work ‘downwards’ to your own offspring, but it also works ‘sideways’ to you siblings.

Your brothers and sisters are given an R value of 0.5 (you do the maths, you each got a selection of 0.5 of each parents genes, but it’s a random 0.5, so you’re likely to share 0.5 of your siblings genes).

If your brother/sister has children, passing on 0.5 of their genes, then each niece/nephew will have 0.5*0.5 = 0.25 of your genes.

So your brother has 3 children, your inclusive fitness is: 0.5 + (3*0.25) = 1.25.

And now a philosophical point

Everyone on the planet is, in some way, related. Therefore if you spend your time helping anyone raise children (for example, by donating to charity to vaccinate children in the 3rd world) your inclusive fitness goes up a little.

The main point

‘Survial of the fittest’ is not an excuse to act like a selfish b*stard. Quite the opposite. If you help people out in life, you get rewarded by increased evolutionary fitness. Think about it.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Written while listening to:

Feeling Good by Muse from the album Origin Of Symmetry

Comfortably Numb by Scissor Sisters from the album Scissor Sisters [UK Bonus Tracks]

Drugs by Simple Kid from the album 1

Broken Internet :(

November 27, 2006

My ADSL router at home is broken 😦  Still, it’s under warrantly so I’m waiting for a replacement.

That’s why I’ve been quiet recently.  Not supposed to blog from work (shhh, don’t tell anyone).

Herbie

Evolutionary Modeling

October 26, 2006

I thought I’d write a bit about how evolution is modeled, or at least how I model it.

Most models will start out with a verbal version; a plain English explanation of the mechanism.

Unfortunately, I found that many of the more mathematical models missed this bit out.  It seems that mathematical types understand a lot just by looking at equations.  I’m not a mathematical type (despite being able to write simulations). I need the verbal model.

So I’ll first make two definitions:

  1. A Model is a mathematical function.  Given the same parameters it will always give the same answers.
  2. A Simulation is an actual reproduction of a simplified system. Simulations normally use some random numbers to determine probabilities of what will happen next.  Given the same parameters a simulation may give wildly different answers each time.

So what’s the point of a simulation if it always gives different answers? What you do with a simulation is to run it a number of times. Each run is called a replicate. When you have enough replicates you will analyse the results as a group and look for trends and patterns.

Simulations are often called Monte-Carlo models. This refers to the random number element — it’s supposed to be like gambling at a Monte-Carlo casino.  I’ve never been to Monte-Carlo, or gambled at a casino. This is a stupid name.  Before computers were available some simulations were done by throwing dice for randomness.  They should have called it D&D modeling.

Models are often build on the principles of the Normal Distribution and will deal with changes in mean and standard deviation over time, or in regard to some other variable.

I don’t do models.  Actually, I can’t do models.  Like I said, I’m not very mathematical (I’ve tried to learn calculus three times, and I still find it tough going).

What I understand and am going to write about are simulations.

So what’s the basis for the simulations I write?

Firstly there’s the probability stuff.  You decide on the probability of a particular event happening.  Then you generate a random number between 0 and 1 and if that number of less that your probability, then the event happens.

Pretty simple really.

To simulate a coin tossing, you would write:

if(randomNumber < 0.5)
Print Heads
else
Print Tails

You would then do this a few hundred (or even thousand) times and see that the coin comes down Heads half the time.

But we’re not simulating coin tossing, we’re simulating evolution.

Evolution is defined as ‘ any net change in the genetic makeup of a population’. So were going to need a population and some genetics.

In the last post I showed the three types of heterozygous gene possible: aa, aA and AA.

So that’s a gene.  Put some of these into a structure and we have an individual .  Make an array of these and we have a population with some genetics.  All we need to evolve the population is some selection pressure and a big, repeating loop.

The flow of a simulation program is often like this:

  1. Initialise population with random genes.
  2. Kill off some of the individuals based on their genetics and a probability-based function.
  3. Allow some of the individuals to breed — the probability of breeding will also be a probability-based function.
  4. Record the genetics of the population
  5. Go to 2

This is a simulation with overlapping generations.  Some simpler simulations will simply allow the old population to reproduce into a new, empty population of the same size and then destroy the parent population.  It depends on the type of animal your are basing your simulations on.

The probability-based functions that decide who dies and who breeds are the selection pressure in the model.  Selection pressure makes populations evolve as they try to maxmise their fitness.

What’s fitness?  I’ll tell you next time, it’s getting late.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

A bit of Genetics

October 15, 2006

I’m going to take a short break from the normal distribution and talk about the basic mechanisms of genetic inheritance.

If you took biology in high-school, this will all sound familiar.

Genes and Alleles.

The genetics I’m going to talk about are for heterozygous species (such as ourselves). This basically means that an individual has two copies of each gene. Some species are homozygous — but they’re mainly single-celled organisms and we all know how trashy they can get.

When a gene has alternative versions, they are called ‘alleles’. We are only going to look at the case where a gene has two alleles. I’m keeping it simple, stupid.

One of these alleles is dominant. That means if an individual has even one copy of that allele, then they will show the physical attributes coded by that allele.

The other gene is recessive. In order for an individual to display the characteristics of the recessive gene, both of the copies held by that individual must be the recessive allele.

The classic example of dominant and recessive alleles is eye colour. The allele for blue eyes is recessive. The allele for brown eyes is dominant. To have blue eyes, you need two copies of the blue eye allele. If you have one blue eye allele and one brown eye allele, then you’ll have brown eyes.

We will call the dominant allele ‘A’ and the recessive allele ‘a’. See? The lower case is recessive while the upper case is dominant.

OK, so there are three possible combinations of allele:

  • aa
  • aA
  • AA

Notice that aA and Aa are the same; the sequence does not matter.

Breeding.

Yeah. You’ve been waiting for this bit. I know.

So, we’ve got two potential parents. They breed. Give them some privacy, please. We’ll wait while they finish.

OK, they’re done. That was a bit quick, wasn’t it?

So what genes will the child carry? Basically the child will get one randomly selected gene from each parent.

The following table shows the possible offspring types for each parent combination:

Parent 1

aa aA AA
aa aa aa, aA aa, aA
Parent 2 aA aa, aA aa, aA, AA aA, AA
AA aA aA, AA AA

For example, if both parents are ‘aa’ then all offspring will be ‘aa’ because they are the only alleles available.

If one parent is ‘aa’ and the other is ‘aA’, then offspring will either be ‘aa’ or ‘aA’; ‘AA’ offspring are not possible because only one parent has the dominant ‘A’ gene to pass down.

By going through the possibilities, we can come to the probabilities.

If both parents are ‘aa’ then the probability of offspring being ‘aa’ if 100%.

If both parents are ‘aA’ then the offspring probabilities are:

  • aa — 25%
  • aA — 50%
  • AA — 25%

We can now build a probability matrix to determine the probabilities of offspring genotypes from any parental genotypes.

I’ll do that in the next installment and show how this probability matrix can be used in a computer simulation of genetics.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Written while listening to:

King Of The Mountain by Kate Bush from the album Aerial

She Sells Sanctuary (Long Version) by The Cult from the album Best of Rare Cult

That Lady/Part 1 & 2 by The Isley Brothers; O’Kelly Isley; Ronald Isley; Rudolph Isley; Marvin Isley; Ernie Isley; Chris Jasper from the album Summer Breeze Greatest Hits

Take Your Mama by Scissor Sisters from the album Scissor Sisters [UK Bonus Tracks]