Basic Genetics Simulation Code

December 20, 2006

OK, time for some code.

First we have to represent a single genetic locus.  We are going to use the basic dominant/recessive allele model which shows three versions (alleles) of a specific locus: aa, aA & AA.  We’ll use an ‘enum’ for this one:

 ///<summary>

 /// Enum defining possible genotypes at a single locus.

 ///</summary>

 public enum locus_enum

 {

     aa=0,

     aA=1,

     AA=2

 }

(Apologies for the duff formatting, I can’t seem to get it right on-line).

Notice the numeric values assigned to each one.  We can read the numeric values as ‘the number of dominant alleles’.  This can come in handy later.

We’ll wrap this up with an object called ‘Locus’ to add some methods to this data.

public class Locus
{
   private locus_enum _locus;

   public Locus(locus_enum l)
    {
        _locus = l;
    }

    public bool Phenotype
    {
        get
        {
            return _locus == locus_enum.aa;
        }
    }

    public int A_Count
    {
        get
        {
            return (int)_locus;
        }
    }

    public override string ToString()
    {
        switch (_locus)
        {
            case locus_enum.aa:
                return "aa";
            case locus_enum.aA:
                return "aA";
            case locus_enum.AA:
                return "AA";
            default:
                return "ERROR!";
        }
    }

    private locus_enum locus
    {
        get
        {
            return _locus;
        }
    }
}

Next, we need to be able to manipulate genes. Specifically, we need to be able to cross them using Mendellian genetics.

Now the simple, brute force method would be to randomly select one allele from each locus and then combine them to create an offspring one.

We’re not going to do that, we’re going to use a probability matrix.

We know all the probabilities of producing a specific type of offspring given two parents, so we just need to generate some random numbers and throw them against the probabilities.

We shall take our example from ‘A Bit Of Genetics’:

If both parents are ‘aA’ then the offspring probabilities are:

  • aa — 25%
  • aA — 50%
  • AA — 25%

So we add these probabilities together to create a ‘line of probability’ that goes from 0.0 to 1.0.  We break the line up into chunks of the relative size of the probabilities like this:

Probability line diagram

So we can generate a random number between 0.0 and 1.0, find it on the line and the result is the value for that chunk of the line.  We can do all of this with a probability matrix that covers all the possible parental and offspring values.  The matrix has three dimensions, so it looks like a cube:

Offspring probability cube.

In C# this looks like:

// Prepare the offspring probability matrix.
p_offspring = new float[, ,] {

     {   {1.0F , 1.0F , 1.0F  },
         {0.5F , 1.0F , 1.0F  },
         {0.0F , 1.0F , 1.0F  }   
     },               
     {   {0.5F , 1.0F , 1.0F  },
         {0.25F, 0.75F, 1.0F  },
         {0.0F , 0.5F , 1.0F  }   
     },               
     {   {0.0F , 1.0F , 1.0F  },
         {0.0F , 0.5F , 1.0F  },
         {0.0F , 0.0F , 1.0F  }   
     }
};

We then need a method that will return an offspring when given two parents:

///<summary>
/// Crosses the two specified loci.
///</summary>
///<param name="o"></param>
///<param name="t"></param>
///<returns></returns>
public static Locus Cross(Locus o, Locus t)
{
    System.Random r = new Random();
    double off = r.NextDouble();

    Locus ret;

    /* check to see of the offspring is aa */
    if ((off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aa]) && (p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aa] != 0))
    {
        ret = new Locus(locus_enum.aa);

        switch (o.locus)
        {
            case locus_enum.aa:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.AA:
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }

        return ret;
    }

    /* check to see of the offspring is aA */
    if ((off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aA]) && (p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aA] != 0))
    /*    if (off <= p_offspring[o][t][aA])*/
    {
        ret = new Locus(locus_enum.aA);
        switch (o.locus)
        {
            case locus_enum.aa:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                break;

            case locus_enum.AA:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }
        return ret;
    }

    /* check to see of the offspring is AA */
    if (off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.AA])
    {
        ret = new Locus(locus_enum.AA);
        switch (o.locus)
        {
            case locus_enum.aa:
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.AA:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }
        return ret;
    }

    /* if we got here, there's been an error ! */
    throw new Exception("Probability screw-up in locus_cross()");

    /* return something to keep the compiler happy */
    return new Locus(locus_enum.aa);

}

To make them easily accessible, we’ll embed the probability matrix and the crossing method as static methods of the ‘Locus’ class which they act on.

I have put the code into a ZIP file at the Channel 9 Sandbox.  There is a single Visual Studio 2005 solution with an executable project that does nothing (yet) and a unit test assembly called SimulationTests. If you don’t have Visual Studio 2005, this project will also work in the Visual Studio C# Express version (which is currently free).  The unit tests are written using NUnit which you will have to install as well. So far you can just read the code and run the tests.

I shall be using this project as the basis for all the demonstrations I build up in the future.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

6 Responses to “Basic Genetics Simulation Code”

  1. Josh Says:

    /// /// Crosses the two specified loci./// /// /// /// public static Locus Cross(Locus o, Locus t)

    Using one letter and abbreviations of parameter names is always a bad idea. Not only do I have to figure out what you do with t but I also have to figure out what t is. If you are going to have xml documentation why not use it?

  2. Dr Herbie Says:

    You got me Josh, it was just plain laziness. Although in this case I think it’s pretty obvious that they are the two loci to be crossed.

    (o and t actually stand for ‘one’ and ‘two’ — it’s a bad habit from long ago that I can’t seem to shake).

    The XML comments I put in are created by Ghostdoc (http://www.roland-weigelt.de/ghostdoc/) and I often forget to add the parameter descriptions.

    I promise I’ll try and do better. 🙂

    Herbie


  3. […] Dr Herbie’s Blog Just another nobody. « Basic Genetics Simulation Code […]

  4. someone Says:

    Can you explain the probability matrix? I dont understand the matrix (1.0F,etc..)

  5. Dr Herbie Says:

    OK. I was worried that the explanation of the matrix wasn’t done well enough. Let me think about the best way to explain it and I’ll do a new blog entry specifically about it in the next few days.

    Thanks for reading.

    Herbie


  6. […] Probabilities & Code Revisited ‘Someone’ commented that the probability matrix used in the basic genetics code was not necessarily explained very […]


Leave a reply to From Genes to Individuals : more simulation code « Dr Herbie’s Blog Cancel reply