OK, you’ve been very patient.  All this talk of genetics, drift and populations and so far we haven’t seen any ‘survival of the fittest’.  Well, here goes.

The last version of the overlapping population model just randomly killed individuals with a 0.25 probability. By simply making this probability dependent on the trait on the individual we link survival to genetics which is the basis for Darwinian evolution.

So, I’ve slightly modified the mortality function in the ComplexPopulation<T> class to call a virtual method to calculate the probability that a specific individual dies:

        ///<summary> 
        /// Does one season of mortality. 
        ///</summary> 
        ///<remarks>Performs probability-based deletions of individuals in the population.</remarks> 
        public void DoOneSeasonOfMortality() 
        { 
            System.Random rand = new Random();   

            for (int i = 0; i < mPopulationSize; i++) 
            { 
                double chance = rand.NextDouble();   

                if (chance <= ProbabilityOfDeath(population[i])) 
                { 
                    // Push index into stack then delete individual. 
                    stackTop++; 
                    stack[stackTop] = i; 
                    population[i] = null; 
                } 
            } 
        }   

        ///<summary> 
        /// Calculates the probability that this individual dies 
        ///</summary> 
        ///<param name="individual">The individual.</param> 
        ///<returns>A value between 0 and 1</returns> 
        public virtual double ProbabilityOfDeath(IIndividual individual) 
        { 
            // 0.25 probability of death for each individual. 
            return 0.25; 
        }

 

This has no effect on the functionality of this class, but it enables us to create a new population class with a different mortality function:

///<summary> 
/// An overlapping population with trait-dependent mortality. 
///</summary> 
///<typeparam name="T">The <see cref="IIndividual"/>-based type in the population.</typeparam> 
class MortalityPopulation<T> : ComplexPopulation<T> where T : class, IIndividual, new() 
{ 
    public MortalityPopulation(int populationSize, int MaxTraitValue) 
        : base(populationSize, MaxTraitValue) 
    { 
    }   

    ///<summary> 
    /// Calculates mortality as a straight line relationship with trait value. 
    ///</summary> 
    ///<param name="individual">The <see cref="IIndividual"/>.</param> 
    ///<returns>A <see cref="double"/> between 0 and 1 representing the probability of mortality for this individual.</returns> 
    public override double ProbabilityOfDeath(IIndividual individual) 
    { 
        int traitValue = individual.TraitValue;   

        return ((double)traitValue / (double)base.mMaxTraitValue) / (double)10; 
    } 
}

 

All this derived class does is override the virtual method to provide a calculated survival probability.

The equation we’re using is charted as:

Chart of fitness functions.

We’ll come to the three ‘strength’ factors later – for the moment we’ll use the ‘default’ strength of 10. This means that individuals with a trait value of 20 (all AA alleles) have a 0.1 probability of death, while individual with a trait value of 0 (all aa alleles) have a probability of death of 0 (ie they live for ever!).

So what is the result of running a population through this simulation?

Average trait evolving over 1000 generations.

So there’s a marked drop in the average trait value and a gradual decrease in the standard deviation.  This is because the population’s distribution is being ‘squashed’ up against the edge of possible values.

If we look at the trait as a 3D chart of distribution over time we get:

3D image of trait evolving over time.

That’s a bit difficult to see, so here’s the distribution histograms over time:

TraitFitnessStandardHistogram

So you can see, firstly, that at generation 0 before any evolution has occurred, the trait is normally distributed in the population, but as our selective pressure takes hold (generation 50) the distribution stops being normal.

Imagine that the normal distribution is made of jelly.  Evolution is acting like a hand pushing the jelly to the left : while being pushed, the jelly distorts. This hand is the ‘selective pressure’ of evolution. The distribution is moving due to individuals dying and being born, not due to individuals changing while they live.

In this particular example, there is a wall in the way at trait value 0; there is nowhere left to evolve to.  Pushing the jelly against this wall makes it kind of go ‘splat’ against that wall, but the hand doesn’t stop pushing and it ends up holding the jelly against the wall.

If the wall wasn’t there and the hand stopped pushing, the jelly would return to it’s normal distribution shape.

From this we can draw the conclusion that measuring the distribution of a physical trait in a population might give us a clue as to whether it is under strong selective pressure – if the normal distribution is skewed it’s a clue that strong selective pressure is forcing the population to evolve.

Returning to our graph of mortality probability by trait value, we can turn the ‘strength’ of the selective pressure up or down.

In our code:

return ((double)traitValue / (double)base.mMaxTraitValue) / (double)10;

 

the value ’10’ is the strength of the selective pressure.  If we make it smaller, the selective pressure is stronger, if we make it larger the selective pressure is weaker.

Setting it to 1 gives the following change in mean trait:

TraitFitnessHighPressure

As you can see, the change in mean is sharper than before.  This is because more individuals are dying each year and so changes occur faster.

Setting the ‘strength’ factor to 25 gives the following:

TraitFitnessLowPressure

Which shows the changes taking place more slowly.

So there you have a simple evolutionary model based on ‘survival of the fittest’ Darwinian Evolution.

Next I shall show how Sexual Selection can also cause evolutionary change in place of survival of the fittest.

The code for this simulation can be found over in the Channel 9 sandbox.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

I promised, a while ago now, to show a version of the simulation with overlapping generations. This basically means that not all the individuals in the population die every season, so some remain in the next ‘generation’.

OK, so starting with the simple stuff.

Our original non-overlapping generation, only had one method to run every season:

DoOneSeasonOfReproduction()

 

All this did was to reproduce continually into a new, empty population until the new population became full.

Overlapping populations will need an extra method:

DoOneSeasonOfMortality()

 

The mortality phase is when individuals are killed off to make space on the population for new individuals that are created in the reproduction phase.  We could combine this into one method, but somehow I feel they should be kept separate.

So I mentioned creating spaces in the population.  This is the real secret of this simulation.  When I first wrote this type of simulation, I used linked lists to hold the population so that you could remove nodes from the list easily and then add new individuals easily.  It ran like a dog. I mean it was really, really, really slow. Like, start the simulation off and come back tomorrow to see the results of the first couple of runs.

Writing a simulation is not like writing a normal application.  You don’t care about using too much memory and you do care about speed (your probably going to run hundred of replicates, so you don’t want to have to wait weeks for if to complete).

The secret is to use only arrays.  I used an array in the non-overlapping population version and it was the obvious choice because the population size is always static.  Using arrays in the overlapping version is a little more involved because, like I said, we make spaces in the population.

The first version of the array only simulation was probably the natural first design that anyone would come up with without spending too much time thinking about it.  When adding new offspring to the population, iterate through the array to find the spaces.  This works for small populations, but when you keep having to iterate through an array of 10,000 indices it get slow again.

So to speed up the finding of spaces, we’re going to use a simple stack.  The stack is also an array-based implementation for speed.  Essentially, we create an integer array of the same length as the population size (so that if the random numbers dictate it, the entire population can die off which would indicate the end of the simulation).

Every time an individual is killed, it’s index is pushed onto the top of the ‘spaces stack’.  Whenever an individual is added to the population, it’s new index is popped from the spaces stack.  When the spaces stack is empty our population is full.

Here’s a diagram:

Flow of the space stack

So the space stack speeds up the process of finding a space in the population into which new offspring can be inserted, and the use of fixed size arrays means there’s not much by way of pesky memory allocations to slow things down.

So, lets implement this:

///<summary>
/// Does one season of mortality.
///</summary>
///<remarks>Performs probability-based deletions of individuals in the population.</remarks>
public void DoOneSeasonOfMortality()
{
    System.Random rand = new Random();

    for (int i = 0; i < mPopulationSize; i++)
    {
        // Find random 
        double chance = rand.NextDouble();

        // 0.25 probability of death for each individual.
        if (chance <= 0.25)
        {
            // Push index into stack then delete individual.
            stackTop++;
            stack[stackTop] = i;
            population[i] = null;
        }
    }
}

 

In this example, we’re just using a fixed probability of 0.25 to determine if an individual dies.  I could probably do without actually setting the individual in the population to null and rely entirely on the stack to hold which individuals were dead, but frankly that idea worries me.

So now we’ve got a population with vacant spaces, we need to fill them back up with reproduction.  Here’s the new reproduction code:

///<summary>
/// Does one season of reproduction.
///</summary>
///<remarks>This updates the Population with one season of reproduction.</remarks>
public void DoOneSeasonOfReproduction()
{
    System.Random rand = new Random();

    // While stackTop >= 0 there are still vacant indices in the population
    while (stackTop >= 0)
    {
        // Find two random parents (non null entries)
        int idxParentOne = rand.Next(0, mPopulationSize - 1);
        while (population[idxParentOne] == null)
        {
            idxParentOne = rand.Next(0, mPopulationSize - 1);
        }

        int idxParentTwo = rand.Next(0, mPopulationSize - 1);
        while (population[idxParentTwo] == null)
        {
            idxParentTwo = rand.Next(0, mPopulationSize - 1);
        }

        // Populate the space at the top of the stack and then pop that
        // space from the stack.
        population[stack[stackTop]] = population[idxParentOne].Cross(population[idxParentTwo]) as T;
        stack[stackTop] = -1;
        stackTop--;

    }
}

 

This is pretty much the same as for the non-overlapping population code, except we put the new offspring into the same population as the parents and keep going until the population is full.

The final change that I have made is to create an interface for the population objects, so that we can swap between non-overlapping (simple) populations and overlapping (complex) populations.  Here’s the interface:

interface IPopulation<T> where T : class, IIndividual, new()
{
    void DoOneSeasonOfMortality();
    void DoOneSeasonOfReproduction();
    T[] Population { get; }
    int PopulationSize { get; }
    DescriptiveStatistics.Histogram TakeCensus();
}

 

The  simple population has a mortality method that does nothing and only exists to satisfy the interface (not great design, but it will do for now).

This interface means that our main simulation loop looks like this :

System.IO.StreamWriter sw = System.IO.File.CreateText("c:\\Log.txt");

stats = new DescriptiveStatistics.Histogram(0, 500, 10);
WriteHistogramHeader(sw, stats);
int repCount = 10;

IPopulation<Individual> population = new ComplexPopulation<Individual>(500, 20);

DescriptiveStatistics.Histogram histo = population.TakeCensus();
int generationCount = 0;
while ((histo.Statistics.StdDev > 0) || (generationCount == 0))
{
    generationCount++;
    population.DoOneSeasonOfMortality();
    population.DoOneSeasonOfReproduction();
    histo = population.TakeCensus();

    Console.WriteLine("{0}\t{1}\t{2}", generationCount, histo.Statistics.Average, histo.Statistics.StdDev);
    sw.Write("{0}\t{1}\t{2}", generationCount, histo.Statistics.Average, histo.Statistics.StdDev);
    WriteHistogram(sw, histo);
    sw.Flush();

}

sw.Close();
sw.Dispose();
}

 

We can easily switch population types by replacing the line

IPopulation<Individual> population = new ComplexPopulation<Individual>(500, 20);

with

IPopulation<Individual> population = new SimplePopulation<Individual>(500, 20);

to change the population type that we are simulating.

I have put all of the code for this post over on the Channel 9 sandbox so you can download and have a play.

If you're interested in writing software, check out my other blog: Coding at The Coal Face

Replicates

May 25, 2007

So far, all we’ve done is look at individual runs of the simulation. While this is all well and good to get a feel for evolving populations, we sometimes need to ask wider questions. For example, in my past post on Genetic Drift we stated that Drift has less of an effect in smaller populations by just comparing two simulations. If we wanted to know how population size affects the strength of drift we need to get a little more scientific and create replicates.

Replicates are multiple answers to the same question.

Why?

In our simulations, we use random numbers in order to simulate effects like Genetic Drift. The down side of this is that these random numbers introduce a certain level of ‘noise’ in to the answer. By running the simulation multiple times we can take an average and get a clearer picture.

You can think of it in terms of a ‘population of results’; in order to understand the whole population, we need to take several samples. I’ve taken our simulation from the last entry which finds the number of generations until Genetic Drift stops changing the population and run it multiple times, taking the answers from each run and putting them into our histogram object. The output looks like this:

Now, I’m prepared to say that this looks like a normal distribution. It’s complicated by the fact that it’s close to zero on the X-Axis, so it looks skewed.

So the replicates in a simulation are effectively samples taken from the theoretical population of answers, which is normally distributed around a mean value. This is why we run a lot of replicates and then take the mean and standard deviation as our answer.

How Many Samples?

Now there’s the question. How do we know when we’ve got the right number of replicates? The only way to find out for sure is to look at the results of a given simulation for different numbers of replicates. Here’s a chart of the means from another set of simulations against the number of replicates run:

As you can see once there are more that about 230 replicates, the mean answer seems to settle at a value between 60 and 70.

There’s a trade-off here; more replicates are more accurate, but take more time to produce. Now, I’m a careful type of guy so I think I’ll use 400 replicates so that there’s a buffer against highly randomised data. I’ll use this number of replicates for all future simulations.

I also collected the frequency histograms for all these simulation runs, which I have plotted here in 3D to show how the distributions change with more replicates:

You can see that with a small number of replicates (towards the back of the 3D area) there is no real peak to the distributions, while for larger numbers of replicates (towards the front of the 3D area) there is a definite peak.

I won’t be going into any real statistical analysis of results for the simulations that appear here : I don’t believe they’re really necessary.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Genetic Drift

April 30, 2007

If you’ve managed to follow all the stuff so far about probability matrices, and random numbers you’ll want to know why we do simulations this way.

The answer is to represent genetic drift.

Genetic Drift is an evolutionary mechanism that doesn’t get as much press coverage as selection (survival of the fittest).

The reason Genetic Drift doesn’t get as much attention is that it’s a weak force.  It’s also random. However it’s still important.

To explain Genetic Drift I’m going to start with a ‘thought experiment’. So you’ll have to use your imagination.

Imagine you have a large bowl.  Into that bowl you tip 500 red marbles and 500 blue marbles.

Now put on a blindfold and mix up the marbles in the bowl with your hands, being careful not to spill any. Next, keeping the blindfold on, randomly take 100 marbles out of the bowl.  Don’t drop any, and don’t loose count.

Before you take the blindfold off, guess how many red marbles you took out of the bowl.  If you answered “50 red marbles” you might not be correct.

50 red marbles is the predicted number of red marbles you would expect;  half the marbles in the bowl (the total marble population) were red, so you’d expect a random selection of marbles (the sample) to also be half made up of red marbles.

But it’s not that simple.  Because you couldn’t see the marbles you were choosing, the marbles were chosen randomly.  Because the marbles were chosen randomly there’s a chance that the selected marbles (the sample) was not 50/50.  You might have actually selected 51 red marbles and 49 blue, or 43 red marbles and 57 blue.  There’s even a very tiny chance that you could have selected 100 blue marbles.

Now take the 100 marbles you chose and multiply each one by ten, so you have 1000 marbles again.  This is your next generation.  Repeat the blindfold/bowl/random choice bit again.  Now how many red marbles did you select? It’s still probably not 50.  It might be even further away from 50 than last time.  Keep doing this a few hundred times and you could find that the number of red marbles in the bowl changes over time.  It drifts in numbers in a random fashion.

That’s what genetic drift is in an evolving population.

Because there’s a random element in which individuals in a population get to mate together, there’s a random element in which genes get passed down to the next generation, which means there a random element to the genetic makeup of the population.  And as you should know if you read my first post, changes to the genetic makeup of a population is called evolution.

(EDIT:  I’ve just Googled and found other people are using marbles to explain genetic drift.  Dammit!  I thought I was being original!)

So, Genetic Drift (now I’ve defined what it is I’d better start capitalising the words) is evolution, but it is not Darwinian evolution.

What’s that? Not Darwinian evolution?  Nope.  Darwin didn’t know about genetics, so he could never have thought of Genetic Drift.  This is why current evolutionary science is described as Neo-Darwinian. So when idiots start spouting how Darwinian theory has holes in it, you could answer straight back with “that’s why we study neo-Darwinian theory!”; but that will probably make you sound like an idiot too, so I generally just keep quiet and ignore the lunatic fringe.

So now we have an idea of what Genetic Drift is, lets look at an example.  I finished my previous post with a graph.  I’ll put another copy here:

Chart of randomly changin population genetics.

First thing to remember:  the Y-axis represents the average trait value. Possible trait values range from 0 to 20.  The X-axis represents time and goes for about 200 generations.

If you look at the very first (left-most) point on the graph, you’ll see that it’s close to a value of 10.  This is roughly what we would expect the average value to be when the population is randomly generated, although due to the randomness of the initialisation you can never be certain.

In another previous post, I showed a chart of the genetic distribution within a randomly generated population. Here it is again:

Distribution of trait within a population.

This is a normal distribution, so keep in mind that the top chart, showing average trait is just the tip of the iceberg; each point is just the top of a normal distribution lurking 95% under the surface. When you look at the trait over time graph, think of it as being like this:

OK, so now how do we interpret the chart?  Well, it’s drifting around.  There’s no real direction to the changes except at the end where it seems to stop at a value of 8.

Why?

It’s due to the phenomenon of ‘fixation’.  Because the alleles in the individuals don’t ‘jump around’ in the gene (our array of Locus instances in the code) it’s possible to reach a state where every individual in the population has the same allele at the same location on the gene.  When this happens, there can be no more change at that gene. My guess is that when the population reaches an unchanging value of 8, that 4 loci in the gene are fixed for the ‘AA’ allele, while the rest are fixed for the ‘aa’ allele.

Let’s get more data.  I’ll run another simulation, but this time I’ll also record the standard deviation as well as the mean for the trait…

Chart of Average Trait Value over time

As we can see from this chart, the standard variation decreases slowly until there isn’t any left when the average trait value stops changing at a value of 6.

(Nice chart, huh?  Just got Office 2007 installed 🙂 )

Fixation can be an important mechanism in smaller populations, but when population size increases, it becomes less likely.  The last chart is from a population of 1000, this next one is from a population of 10000:

Drift in a large population.

As you can see, in this larger population (actually running the simulation for twice as long, too) there is little or no reduction in standard deviation, although you can still see it drifting around.

Remember when I said drift was a weak force?  That doesn’t mean it can’t have a large effect.  The following chart came about by chance as I was preparing to generate the last chart:

This was in a large population (10000) and run over 400 generations.  As you can see, the trait value drops to almost zero.  This means that the ‘A’ allele has almost vanished from the population. Once an allele is removed from the population, that allele is extinct.  This can happen purely through genetic drift.

Summary

Genetic Drift is an effect of the randomness of mating.  Although it is a weak force compared to selective pressure, it can have a large effect on the evolution of a population.  Drift is a stronger force in smaller populations.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

OK, so we’ve got code for genes and code for individuals.  If we want to simulate evolution, we’re going to need a population.

Static Population Size

The first point to make is that we’re going to use a static population size. In reality populations are not necessarily static; populations grow and shrink.  However, in general populations follow a sigmoidal growth curve similar to this (idealised) one:

Population Growth Curve

Now, if we look at the right-hand end of the graph the population we can see that the population is a constant size. So in our model we’re assuming that our population if mature and constant. This model doesn’t fit all population types – some just don’t follow this model – but the majority of animal populations do follow this model.

Plus it’s much, much easier to write.

Generational Overlap

Populations in simulations can be split into two distict categories:  non-overlapping generations and overlapping generations.

Non-overlapping generations are the simplest model, so we’ll tack those first.  In a non-overlapping generation model, each generation dies at the end of the ‘year’ to be replaced by a completely new population. In programming terms, we create a second list of individuals of the same length as the original and let individuals from the ‘parent’ population reproduce, placing their offspring into the ‘offspring’ population until the offspring population is full.  We then delete the parent population and replace it with the new one.

Overlapping populations are a little different; individuals in the population can live for more than one year. When individuals do die, they leave ‘gaps’ in the population that can be filled by offspring. In programming terms we perform a mortality operation that kills off some of the individuals (usually in a way related to a measure of their evolutionary fitness), and then fill the empty spaces in the population with offspring.  This is more complex than a non-overlapping model as we need to keep track of the ‘gaps’ that mortality opens up in the population as well as perform fitness-dependent mortality.

Writing Generic Population Code

I’ve decided to write population code for both overlapping and non-overlapping populations using generics in .NET 2.  We can then create a basic ‘individual’ class that we can modify for our own purposes depending on what we want to do with out simulation, while re-using the same population code over and over again.

Firstly we have to deal with the ‘individual’ class that we’ve already created.

Improving the Individual : Interactions

So far our individuals are isolated from one another. In a true population they interact to some degree; at the very least they mate. We have to make a design decision here:  the basic population code will remain the same for almost all simulations, but the code for the individuals in the population will change a lot.  What I’m going to do is use an interface for the individuals, and use generics for the population.

We only currently need two thing from an individual;  a value for the genetic trait that the individual carries and the ability to cross with another individual. We’ll add more to this as we build up the simulation later.

public interface IIndividual

{
    int TraitValue { get;}
    IIndividual Cross(IIndividual mate);
}

In our code we just need to adapt our Individual class.

Firstly, deriving from the IIndividual interface:

public class Individual : IIndividual

Then, getting the trait value (int this case, just passing back the ‘height’ property:

 ///<summary>
/// Gets the trait value.
///</summary>
///<value>The trait value.</value>
public int TraitValue
{
    get { return this.Height; }
}

Finally, we need to add code to allow us to cross two individuals:

///<summary>
/// Crosses with the specified mate.
///</summary>
///<param name="mate">The mate.</param>
///<returns>A new <see cref="IIndividual"/> as offspring.</returns>
public IIndividual Cross(IIndividual mate)
{
    // Cast to specific type, error if not possible.
    Individual other = mate as Individual;
    if (other == null)
    {
        throw new System.ArgumentException("mate is not of same type.", "mate");
    }

    // Cross each locus of the gene
    Locus[] offspringGene = new Locus[arraySize];

    for (int i = 0; i < arraySize; i++)
    {
        offspringGene[i] = Locus.Cross(this.heightLoci[i], other.heightLoci[i]);
    }

    // Create new offspring with new genotype.
    return new Individual(offspringGene);
}

First things first.  We cast the IIndividual interface passed into our specific Individual type.  If this fails, we throw an exception. The types must be the same.

Next we create a new array of Locus objects.  These will be given to the offpspring.

We loop through all the parental Loci, crossing each one and putting the result into the offpsring Locus array.

We create a new Individual passing it the new Locus array.  We return the new Individual as the result (it will get cast back to an IIndividual automatically).

Look at how the Locus array gets crossed:  the first locus in parent 1 gets crossed with the first locus in parent 2 and the result becomes the first locus in the offspring.  This ordering is important.  Loci do not ‘jump about’ in the gene array, they stay at their original index.  We shall later add code to shuffle things around (called ‘cross-over’ or more correctly ‘conjugation’).

Notice also that there is no randomisation on crossover.  Only our probability rules are used to create locus types, they are at no point randomly assigned.  We shall later add some randomness to create ‘point mutations’ within the population.

A Non-overlapping Population

So now that we’ve got the individual interface sorted, lets look at using it in a population.

First, the class declaration:

///<summary>
/// A Simple, non-overlapping population.
///</summary>
///<typeparam name="T">The <see cref="IIndividual"/> based class.</typeparam>
class SimplePopulation<T> where T : class, IIndividual, new()

The class SimplePopulation uses generics. This will allow us to use the same popultion class with different versions of the individual class without having to change the population code.

The SimplePopulation class has some constraints on the generic type.  The type ‘T’ must be a class (not a value type or struct), it must implement the IIndividual interface and it must have a default constructor (with no parameters).

The SimplePopulation has only a few internal data members:

// Private holder for the population size.
private int mPopulationSize = 100;
// Maximum possible value for trait (required for census histogram)
private int mMaxTraitValue = 20;

// The actual population.
private T[] population;

and the constructor  simply initialises them:

///<summary>
/// Initializes a new instance of the <see cref="T:SimplePopulation&lt;T&gt;"/> class.
///</summary>
///<param name="populationSize">Size of the population.</param>
public SimplePopulation(int populationSize, int MaxTraitValue)
{
    mPopulationSize = populationSize;
    mMaxTraitValue = MaxTraitValue;

    population = new T[mPopulationSize];

    for (int i = 0; i < mPopulationSize; i++)
    {
        population[i] = new T();
    }
}

Notice the line that uses new T() — this is allowed due to the new()constraint in the SimplePopulation generics constraint mentioned above.

The next feature of a population is that it must be able to allow breeding.  We will do this through a single season with a method:

///<summary>
/// Does one season of reproduction.
///</summary>
///<remarks>This updates the Population with one season of reproduction.</remarks>
public void DoOneSeasonOfReproduction()
{
    T[] newGeneration = new T[mPopulationSize];

    System.Random rand = new Random();

    for (int i = 0; i < mPopulationSize; i++)
    {
        int idxParentOne = rand.Next(0, mPopulationSize - 1);
        int idxParentTwo = rand.Next(0, mPopulationSize - 1);

        newGeneration[i] = population[idxParentOne].Cross(population[idxParentTwo]) as T;

    }

    population = newGeneration;
}

Here we simply create a new array of individuals and fill it by randomly selecting parents from the original population and mating them to create a single ofspring.  We’re just doing random mating for now.  We’ll add trait-based selection later.

Once the new population is full, it replaces the old one.

Simple really.

I have also added a method to take a census of the populatin as a snapshot.  I won’t go over the code here (it’s in the download) as it’s just housekeeping.

So how to use our simpl population?

static void Main(string[] args)
{
    SimplePopulation<Individual> SimplePop = new SimplePopulation<Individual>(1000, 20);

    System.IO.StreamWriter sw = System.IO.File.CreateText("Log.txt");

    for (int i = 0; i < 999; i++)
    {
        DescriptiveStatistics.Histogram histo = SimplePop.TakeCensus();

        if ((i % 10) == 0)
        {
            Console.WriteLine("{0}   {1}", i, histo.Statistics.Average);
        }

        sw.WriteLine(histo.Statistics.Average);

        SimplePop.DoOneSeasonOfReproduction();
    }

    sw.Close();
    sw.Dispose();

}

We create our population using our Individual class as the template type, passing the population size (1000) and the maximum trait value (20).  We get the maximum trait value as twice the number of trait alleles in the Individual class; there are 10 alleles, therefore the maximum trait is 20.

Then we simply loop through the population once for each generation (I chose 999 generations pretty arbitrarily). We start each generation with a census which I write to the console and to a log file.  Then we perform one season of reproduction.

Not very complicated, I’m sure you’ll agree.

Running the simulation does not look very exciting.  Numbers appear on the screen and get saved to a text file. Woopee, I hear you cry sarcastically.

I’ve taken the output of running the program and plotted it as a chart to get this:

So what are you looking at?  This is an evolving population.  It’s not Darwinian Selection, it’s  Genetic Drift. I’ll discuss Genetic Drift in the next post and we’ll explore it a little with the code we’ve written so far.

Summary

So what have we got? We have a population of individuals that reproduce randomly.  Genes are inherited my offspring using probability-based Mendellian genetics. When run over a period of time, we get genetic drift.

The code for this entry can be downloaded from Channel9’s sandbox.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

‘Someone’ commented that the probability matrix used in the basic genetics code was not necessarily explained very well.  I was worried about this because, for some reason, I find it hard to explain. So this post will cover it in more detail.

If you haven’t already been there, I suggest you go over ‘A Bit of Genetics’ to get the background, although I’ll actually go over most of it again here.

I’ve noticed that a lot of people get to this blog through Internet searches for ‘eye colour allele’ (WordPress tells me these things; they’re watching you!) So I’ll describe the genetics in this context. It should be noted, that in reality eye colour is much more complex than the blue/brown eye example so the outcomes are not really what I describe here; this is a simplified explanation!

So, we have a single genetic locus which contains two copies of the eye colour gene.  The ‘a’ gene does nothing to the colour of the eye, while the ‘A’ gene creates brown pigment proteins that cause the iris of the eye to become brown.

There are three possible combinations of this pair of genes:

aa
aA
AA

Because the ‘a’ allele is effectively ’empty’ and does nothing, while the ‘A’ allele is active, individuals will have brown eyes if they carry even a single copy of the ‘A’ gene. We call the ‘A’ gene dominant because it overrides the ‘a’ gene.  The ‘a’ gene is recessive.

Individuals that have ‘aa’ at the locus are referred to as ‘homozygous recessive‘.  These individual will have blue eyes.

Homozygous comes from the two words (Ancient Greek, I think); ‘homo’ meaning ‘the same’ and ‘zygous’ referring to an egg.  This is because an ‘aa’  parent can only give ‘a’ genes to its offspring.

Individuals that have ‘AA’ at the locus are ‘homozygous dominant‘. These individuals will have brown eyes. An ‘AA’ parent can only pass on ‘A’ genes to its offspring.

Individuals that have ‘aA’ at the locus are heterozygous.

Heterozygous some from the two words; ‘hetero meaning ‘different’ and ‘zygous’ referring to an egg. This is because an ‘aA’ parent can pass on different genes to its offspring.

When two individuals mate and produce offspring, the offspring inherits one copy of the eye gene from each parent. This determines the genotype (and the eye colour) of the offspring. The specific copy of the gene they inherit is random.  It’s a 50/50 chance that it’s either one.  For homozygous parents this makes no difference, but for heterozygous parents if means that there’s an equal chance for each of the genes to be passed on for each offspring.

Typically, to show the possible offspring that can be produced by mating two individuals, a simple outcome matrix is used.  This takes the form:

Inheritance matrix explanation.

Parent A has alleles A1 and A2, while parent B has alleles B1 and B2.  There are four possible combinations of offspring: A1B1, A2B1, A1B2, A2B2.

Lets look at the matrices for each of the six possible mating combinations.

1. An ‘aa’ parent mates with another ‘aa’ parent.

Here’s the matrix:

Inheritance matrix for aa by aa.

 You can see from this that all the offspring will be ‘aa’. Therefore if both parents have blue eyes, the offspring will have blue eyes. (Remember, reality is more complex so if you and your spouse have blue eyes don’t assume the worst if your children don’t!)

The probability list for this mating is therefore:

  • aa 1
  • aA 0
  • AA 0

How do we get this?  Well, there are four possible outcomes from the matrix and they are all ‘aa’.  Therefore the probability of an ‘aa’ offspring is 4/4 = 1.

NOTE: Probabilities are measured from 0 to 1.  So a 50% chance is written as a probability of 0.5. This makes the arithmetic for statistical calculations easier. I’m not going to explain probabilities at the moment, just accept that I write them as values between 0 and 1 and not as percentages from 0 to 100.

2. An ‘AA’ parent mates with another ‘AA’ parent.

Here’s the matrix:

Interitance matrix for AA x AA

You can see from this that all the offspring will be ‘AA’; All their offspring will have brown eyes.

The probability list for this mating is therefore:

  • aa 0
  • aA 0
  • AA 1

3. An ‘aA’ parent mates with another ‘aA’ parent.

Here’s the matrix:

Inheritance matrix for aA x aA

This shows that all offspring types are possible! Some some offspring will have blue eyes, but most will have brown eyes.

The probability matrix for this mating is:

  • aa 0.25
  • aA 0.5
  • AA 0.25

Where do these come from?  There are four offspring types possible.

One of these types is ‘aa’.  Therefore its probability is 1/4, which is 0.25.

One of these offspring types is ‘AA’.  Therefore its probability is 1/4, which is 0.25.

The last two offspring types are actually the same; ‘aA’ is the same as ‘Aa’.  Therefore the probability is 2/4, which is 0.5.

4. An ‘AA’ parent mates with an ‘aA’ parent.

Here’s the matrix:

Inheritance matrix for aa x aA

In this case there are two possible outcomes; 2 ‘aA’ and 2 ‘AA’. All of the offspring will have brown eyes.

The probabilities for this mating are:

  • aa 0
  • aA 0.5
  • AA 0.5

5. An ‘aa’ Parent mates with an ‘aA’ parent.

Here’s the matrix:

Inheritance matrix for aa x aA

In this case there are two possible outcomes; ‘aa’ offspring with blue eyes and ‘aA’ offspring with brown eyes. The probabilities are:

  • aa 0.5
  • aA 0.5
  • AA 0

6. An ‘aa’ parent mates with an ‘AA’ parent.

Inheritance matrix for aa x aA

 In this case there is only one outcome; all offspring will be ‘aA’ and will have brown eyes. The probabilities are:

  • aa 0
  • aA 1.0
  • AA 0

The Probability Line.

OK, so that’s all the theoretical outcomes.  How do we add these to a simulation?  We use the probabilities in a additive fashion to make a probability line. For example, the probability line for an ‘aA’ parent mating with an other ‘aA’ parent (cross number three in the above list) would look like this:

Probability Line

 So the ‘aa’ offspring outcome has 25% of the line, the ‘aA’ offspring outcome has 50% of the line and the ‘AA’ offspring outcome has 23% of the line.  The line spans from 0.0 to 1.0.  We can out this into an array of floating point numbers as:

float[] probabilities = new float[] { 0.25, 0.75, 1.0 };

Now if we were to generate a random number between 0 and 1 and test against the values in the probability line, we would get an outcome:

System.Random r = new Random();
double off = r.NextDouble();
Locus ret;

if (off <= probabilities[0])
    ret = locus_enum.aa;
else if (off <= probabilities[1])
    ret = locus_enum.aA;
else if (off <= probabilities[2])
    ret = locus_enum.AA;

If we ran this code 1000000 times and recorded the results, we would find that about 25% of the returned loci were ‘aa’ (blue eyes), about 50% were ‘AA’ (brown eyes) and about 25% were ‘aA’ (brown eyes, but carrying a blue eye gene).

Note that I said ABOUT 25%. Because we’re using random numbers there’s always a chance that the exact outcome (from probability arithmetic) will not happen.  I shall explain how this happens in the real world later when I discuss genetic drift in another post.

This is the basis for a simulation.

All that remains is to store the probability lines for all 6 of the possible parental crosses.  We can store this in a 3x3x3 matrix, which we’ll call the probability matrix:

Probability cube

 In C# this comes out as:

// Prepare the offspring probability matrix.
p_offspring = new float[,,] {
                {   {1.0F , 1.0F , 1.0F  },
                    {0.5F , 1.0F , 1.0F  },
                    {0.0F , 1.0F , 1.0F  }   
                },               
                {   {0.5F , 1.0F , 1.0F  },
                    {0.25F, 0.75F, 1.0F  },
                    {0.0F , 0.5F , 1.0F  }   
                },               
                {   {0.0F , 1.0F , 1.0F  },
                    {0.0F , 0.5F , 1.0F  },
                    {0.0F , 0.0F , 1.0F  }   
                }
                };

The ‘F’ character just forces the value to be a ‘float’ type. Don’t worry about it.

Now, we use a little slight-of-hand here;  if you read the ‘aa’ allele as a 0, ‘aA’ as a 1 and ‘AA’ as a 2, you can use the genotypes of the parents as indexes into the 3-dimensional array.  This correlates to the code we have on the locus object that returns the property named ‘A_Count’.

The indexes of the probability array then become

  • p_offspring[A_Count of parent 1, A_Count of parent 2, 0] for the probability of an ‘aa’ offspring.
  • p_offspring[A_Count of parent 1, A_Count of parent 2, 1] for the probability of an ‘aA’ offspring.
  • p_offspring[A_Count of parent 1, A_Count of parent 2, 2] for the probability of an ‘AA’ offspring.

So we can code as:

System.Random r = new Random();
double off = r.NextDouble();
Locus ret;

if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,0])
    ret = locus_enum.aa;
else if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,1])
    ret = locus_enum.aA;
else if (off <= p_offspring[parentOne.A_Count,parentTwo.A_Count,2])
    ret = locus_enum.AA;

Where parentOne and parentTwo are the two parents.

This, with a little extra error checking and statistical data collection, is the content of the Locus.Cross method in the heart of our code.

Don’t expect to understand this immediately.  If you’re a programmer, take the code and try it out.  Step through it a few times.  It still takes me a little time to get it all straight in my head, even though I’m the one that wrote it.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Written while listening to:

Slide In (Dfa remix) – Goldfrapp by Various Artists from the album A-Z Of Annie Mac [UK] Disc 1

OK, so we can simulate individual genetic loci.  Next we need to group them together into individual members of a population of animals.

The next C# class needed is the individual:

public class Individual

An individual is mainly a data collection.  In this simple example we’re going to use an array of loci to encode an individual’s height:

// Default number of loci in height array
const int arraySize = 10;

// The height array
private Locus[] heightLoci = new Locus[arraySize];

Pretty simple stuff.  We have several options on how do decode this array of loci into a height value.  The simplest is to interpret the loci as additive genes.  The idea is that an ‘a’ locus had no effect on height, while an ‘A’ locus adds 1 unit to the individuals height. This allows us to take the locus’ A_Count property and add them together:

///<summary>
/// Gets the height of this individual.
///</summary>
public int Height
{
    get
    {
        // Add up the A_Count values of the loci in the array.
        int height = 0;
        foreach (Locus l in heightLoci)
        {
            height += l.A_Count;
        }
        return height;
    }
}

We could add other data to the individual; sex, age or additional genetic characteristics could all be added to increase the complexity of the simulation.  I might show examples of this another time, but for now we’ll simply store the array of loci affecting height.

The complexity comes in initialising the individual.  We have to randomly assign a locus value to each locus in the individual. We could go off and read a lot about random number generators, hand-crafting our own one with all the latest random number generation methods.  Or we could just use the one that comes with .NET.

We’ll use the basic one. We’re not trying to prove anything scientific here, just show the basic mechanisms of evolution, so the quality of the random number generator isn’t really worth worrying over.

Because we’re using enums for the locus values, we can simply generate numbers between 0 and 2. We’ll do that in the default constructor:

///<summary>
/// Initializes a new instance of the <see cref="T:Individual"/> class.
///</summary>
///<remarks>Initialises the locus array with random values.</remarks>
public Individual()
{
    // Initialise random loci.
    System.Random rand = new Random();

    for (int i = 0; i < arraySize; i++)
    {
        // Locus is taken as random number from 0 to 2
        locus_enum loc = (locus_enum)rand.Next(3);

        // Store frequency of generated loci.
        switch (loc)
        {
            case locus_enum.aa:
                aaCount++;
                break;
            case locus_enum.aA:
                aACount++;
                break;
            case locus_enum.AA:
                AACount++;
                break;
            default:
                break;
        }

        // Generate new locus in array
        heightLoci[i] = new Locus(loc);
    }
}

The System.Random class will generate random numbers sufficient for our needs. By default it will seed itself with the system time so we don’t need to worry about setting it up, we just create the instance and use it.

Essentially we loop through the locus array, setting the locus value to a random number between 0 and 2. This converts easily to our locus_enum enumerated type.

The switch statement in the middle of the method stores the numbers of each type of generated locus in some static variables in the Individual class.  This allows us to check on them for outrageous errors!

We’ll also use a second constructor to enable us to force a locus array into an individual.  This comes in useful for testing and allows us to force mutations in simulations should we wish to do so.

///<summary>
/// Initializes a new instance of the <see cref="T:Individual"/> class.
///</summary>
///<param name="xLoci">The <see cref="Locus"/> array to use to create the individual.</param>
public Individual(Locus[] xLoci)
{
    // Passed array must be same size as default.
    if (xLoci.Length == arraySize)
    {
        heightLoci = xLoci;
    }
    else
    {
        throw new Exception("Incorrect locus array length in Individual(Locus[])");
    }
}

In the main program, I’ve added some simple code to initialise an array of 1000000 individuals.  This forms a simple population.   I have also added a small assembly with some descriptive statistics code which I then use to display frequency data of the heights in the population.

Individual[] population = new Individual[1000000];
DescriptiveStatistics.Histogram histo = new DescriptiveStatistics.Histogram(0, 20, 1);

for(int i = 0; i < 1000000; i++)
{
    population[i] = new Individual();
    histo.Add(population[i].Height);
}

for (int i = 0; i < histo.BucketCount; i++)
{
    Console.WriteLine("{0}t{1}", histo.LimitForBucket(i), histo.CountForBucket(i));
}

I manually copied these into Excel and came out with the following histogram:

Histogram of population height.

Yes, it’s Normally Distributed.

You can get the sourcecode for this blog entry over at the Channel9 Sandbox.

Next time, I’ll go over a little more theory and then we can start mating individuals together, ready to start simulating evolution.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

OK, time for some code.

First we have to represent a single genetic locus.  We are going to use the basic dominant/recessive allele model which shows three versions (alleles) of a specific locus: aa, aA & AA.  We’ll use an ‘enum’ for this one:

 ///<summary>

 /// Enum defining possible genotypes at a single locus.

 ///</summary>

 public enum locus_enum

 {

     aa=0,

     aA=1,

     AA=2

 }

(Apologies for the duff formatting, I can’t seem to get it right on-line).

Notice the numeric values assigned to each one.  We can read the numeric values as ‘the number of dominant alleles’.  This can come in handy later.

We’ll wrap this up with an object called ‘Locus’ to add some methods to this data.

public class Locus
{
   private locus_enum _locus;

   public Locus(locus_enum l)
    {
        _locus = l;
    }

    public bool Phenotype
    {
        get
        {
            return _locus == locus_enum.aa;
        }
    }

    public int A_Count
    {
        get
        {
            return (int)_locus;
        }
    }

    public override string ToString()
    {
        switch (_locus)
        {
            case locus_enum.aa:
                return "aa";
            case locus_enum.aA:
                return "aA";
            case locus_enum.AA:
                return "AA";
            default:
                return "ERROR!";
        }
    }

    private locus_enum locus
    {
        get
        {
            return _locus;
        }
    }
}

Next, we need to be able to manipulate genes. Specifically, we need to be able to cross them using Mendellian genetics.

Now the simple, brute force method would be to randomly select one allele from each locus and then combine them to create an offspring one.

We’re not going to do that, we’re going to use a probability matrix.

We know all the probabilities of producing a specific type of offspring given two parents, so we just need to generate some random numbers and throw them against the probabilities.

We shall take our example from ‘A Bit Of Genetics’:

If both parents are ‘aA’ then the offspring probabilities are:

  • aa — 25%
  • aA — 50%
  • AA — 25%

So we add these probabilities together to create a ‘line of probability’ that goes from 0.0 to 1.0.  We break the line up into chunks of the relative size of the probabilities like this:

Probability line diagram

So we can generate a random number between 0.0 and 1.0, find it on the line and the result is the value for that chunk of the line.  We can do all of this with a probability matrix that covers all the possible parental and offspring values.  The matrix has three dimensions, so it looks like a cube:

Offspring probability cube.

In C# this looks like:

// Prepare the offspring probability matrix.
p_offspring = new float[, ,] {

     {   {1.0F , 1.0F , 1.0F  },
         {0.5F , 1.0F , 1.0F  },
         {0.0F , 1.0F , 1.0F  }   
     },               
     {   {0.5F , 1.0F , 1.0F  },
         {0.25F, 0.75F, 1.0F  },
         {0.0F , 0.5F , 1.0F  }   
     },               
     {   {0.0F , 1.0F , 1.0F  },
         {0.0F , 0.5F , 1.0F  },
         {0.0F , 0.0F , 1.0F  }   
     }
};

We then need a method that will return an offspring when given two parents:

///<summary>
/// Crosses the two specified loci.
///</summary>
///<param name="o"></param>
///<param name="t"></param>
///<returns></returns>
public static Locus Cross(Locus o, Locus t)
{
    System.Random r = new Random();
    double off = r.NextDouble();

    Locus ret;

    /* check to see of the offspring is aa */
    if ((off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aa]) && (p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aa] != 0))
    {
        ret = new Locus(locus_enum.aa);

        switch (o.locus)
        {
            case locus_enum.aa:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.AA:
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }

        return ret;
    }

    /* check to see of the offspring is aA */
    if ((off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aA]) && (p_offspring[o.A_Count, t.A_Count, (int)locus_enum.aA] != 0))
    /*    if (off <= p_offspring[o][t][aA])*/
    {
        ret = new Locus(locus_enum.aA);
        switch (o.locus)
        {
            case locus_enum.aa:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                break;

            case locus_enum.AA:
                if (t.A_Count == 2)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }
        return ret;
    }

    /* check to see of the offspring is AA */
    if (off <= p_offspring[o.A_Count, t.A_Count, (int)locus_enum.AA])
    {
        ret = new Locus(locus_enum.AA);
        switch (o.locus)
        {
            case locus_enum.aa:
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.aA:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;
            case locus_enum.AA:
                if (t.A_Count == 0)
                {
                    string message = string.Format("Parent one : {0},tParent two {1}tOffspring : {2}n", o.locus, t.locus, ret);
                    throw new Exception("Locus produced the WRONG offspring!nn" + message);
                }
                break;

        }
        return ret;
    }

    /* if we got here, there's been an error ! */
    throw new Exception("Probability screw-up in locus_cross()");

    /* return something to keep the compiler happy */
    return new Locus(locus_enum.aa);

}

To make them easily accessible, we’ll embed the probability matrix and the crossing method as static methods of the ‘Locus’ class which they act on.

I have put the code into a ZIP file at the Channel 9 Sandbox.  There is a single Visual Studio 2005 solution with an executable project that does nothing (yet) and a unit test assembly called SimulationTests. If you don’t have Visual Studio 2005, this project will also work in the Visual Studio C# Express version (which is currently free).  The unit tests are written using NUnit which you will have to install as well. So far you can just read the code and run the tests.

I shall be using this project as the basis for all the demonstrations I build up in the future.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Survival of the what?

December 3, 2006

Pretty much everybody knows the phrase ‘ the survival of the fittest’, but it’s pretty shocking how few people actually understand what it means.

It’s that word – fitness. People don’t get it. It’s not intuitive.

Evolutionary fitness is NOT about physical fitness. It’s NOT about being the biggest and the meanest. It’s NOT about being the smarted or the wealthiest. It’s NOT about being an A-type personality who always wins.

It’s about how many of your genes you pass on to the next generation.

There are lots of ways to do this. You can have lots of children by one mate. You can have lots of children by lots of mates.

Or, and this is actually quite important, you can help your relatives to have lots of children.

There’s a phrase ‘inclusive fitness’ which relates to a value called ‘Hamilton’s R value’. R stands for relatedness.

When you have children they each get half of their genes from you. They are therefore 0.5 the same as you. They are given an R-value of 0.5. If your children grow up and have children of their own, your grandchildren will each get 0.5 of the genes of your children. That’s 0.5 of 0.5 of your genes. So your grandchildren have 0.25 of your genes, so an R value of 0.25.

If you have 2 children your inclusive fitness if 2 * 0.5 = 1.

If each of your children then have 2 children then you get and inclusive fitness of 2*0.5 + 4*0.25 = 2.

On the other hand, if you only have one child, but that child has 10 children, your inclusive fitness is 0.5 + 10*0.25 = 3.

Inclusive fitness doesn’t just work ‘downwards’ to your own offspring, but it also works ‘sideways’ to you siblings.

Your brothers and sisters are given an R value of 0.5 (you do the maths, you each got a selection of 0.5 of each parents genes, but it’s a random 0.5, so you’re likely to share 0.5 of your siblings genes).

If your brother/sister has children, passing on 0.5 of their genes, then each niece/nephew will have 0.5*0.5 = 0.25 of your genes.

So your brother has 3 children, your inclusive fitness is: 0.5 + (3*0.25) = 1.25.

And now a philosophical point

Everyone on the planet is, in some way, related. Therefore if you spend your time helping anyone raise children (for example, by donating to charity to vaccinate children in the 3rd world) your inclusive fitness goes up a little.

The main point

‘Survial of the fittest’ is not an excuse to act like a selfish b*stard. Quite the opposite. If you help people out in life, you get rewarded by increased evolutionary fitness. Think about it.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Written while listening to:

Feeling Good by Muse from the album Origin Of Symmetry

Comfortably Numb by Scissor Sisters from the album Scissor Sisters [UK Bonus Tracks]

Drugs by Simple Kid from the album 1

Evolutionary Modeling

October 26, 2006

I thought I’d write a bit about how evolution is modeled, or at least how I model it.

Most models will start out with a verbal version; a plain English explanation of the mechanism.

Unfortunately, I found that many of the more mathematical models missed this bit out.  It seems that mathematical types understand a lot just by looking at equations.  I’m not a mathematical type (despite being able to write simulations). I need the verbal model.

So I’ll first make two definitions:

  1. A Model is a mathematical function.  Given the same parameters it will always give the same answers.
  2. A Simulation is an actual reproduction of a simplified system. Simulations normally use some random numbers to determine probabilities of what will happen next.  Given the same parameters a simulation may give wildly different answers each time.

So what’s the point of a simulation if it always gives different answers? What you do with a simulation is to run it a number of times. Each run is called a replicate. When you have enough replicates you will analyse the results as a group and look for trends and patterns.

Simulations are often called Monte-Carlo models. This refers to the random number element — it’s supposed to be like gambling at a Monte-Carlo casino.  I’ve never been to Monte-Carlo, or gambled at a casino. This is a stupid name.  Before computers were available some simulations were done by throwing dice for randomness.  They should have called it D&D modeling.

Models are often build on the principles of the Normal Distribution and will deal with changes in mean and standard deviation over time, or in regard to some other variable.

I don’t do models.  Actually, I can’t do models.  Like I said, I’m not very mathematical (I’ve tried to learn calculus three times, and I still find it tough going).

What I understand and am going to write about are simulations.

So what’s the basis for the simulations I write?

Firstly there’s the probability stuff.  You decide on the probability of a particular event happening.  Then you generate a random number between 0 and 1 and if that number of less that your probability, then the event happens.

Pretty simple really.

To simulate a coin tossing, you would write:

if(randomNumber < 0.5)
Print Heads
else
Print Tails

You would then do this a few hundred (or even thousand) times and see that the coin comes down Heads half the time.

But we’re not simulating coin tossing, we’re simulating evolution.

Evolution is defined as ‘ any net change in the genetic makeup of a population’. So were going to need a population and some genetics.

In the last post I showed the three types of heterozygous gene possible: aa, aA and AA.

So that’s a gene.  Put some of these into a structure and we have an individual .  Make an array of these and we have a population with some genetics.  All we need to evolve the population is some selection pressure and a big, repeating loop.

The flow of a simulation program is often like this:

  1. Initialise population with random genes.
  2. Kill off some of the individuals based on their genetics and a probability-based function.
  3. Allow some of the individuals to breed — the probability of breeding will also be a probability-based function.
  4. Record the genetics of the population
  5. Go to 2

This is a simulation with overlapping generations.  Some simpler simulations will simply allow the old population to reproduce into a new, empty population of the same size and then destroy the parent population.  It depends on the type of animal your are basing your simulations on.

The probability-based functions that decide who dies and who breeds are the selection pressure in the model.  Selection pressure makes populations evolve as they try to maxmise their fitness.

What’s fitness?  I’ll tell you next time, it’s getting late.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face