OK, you’ve been very patient.  All this talk of genetics, drift and populations and so far we haven’t seen any ‘survival of the fittest’.  Well, here goes.

The last version of the overlapping population model just randomly killed individuals with a 0.25 probability. By simply making this probability dependent on the trait on the individual we link survival to genetics which is the basis for Darwinian evolution.

So, I’ve slightly modified the mortality function in the ComplexPopulation<T> class to call a virtual method to calculate the probability that a specific individual dies:

        ///<summary> 
        /// Does one season of mortality. 
        ///</summary> 
        ///<remarks>Performs probability-based deletions of individuals in the population.</remarks> 
        public void DoOneSeasonOfMortality() 
        { 
            System.Random rand = new Random();   

            for (int i = 0; i < mPopulationSize; i++) 
            { 
                double chance = rand.NextDouble();   

                if (chance <= ProbabilityOfDeath(population[i])) 
                { 
                    // Push index into stack then delete individual. 
                    stackTop++; 
                    stack[stackTop] = i; 
                    population[i] = null; 
                } 
            } 
        }   

        ///<summary> 
        /// Calculates the probability that this individual dies 
        ///</summary> 
        ///<param name="individual">The individual.</param> 
        ///<returns>A value between 0 and 1</returns> 
        public virtual double ProbabilityOfDeath(IIndividual individual) 
        { 
            // 0.25 probability of death for each individual. 
            return 0.25; 
        }

 

This has no effect on the functionality of this class, but it enables us to create a new population class with a different mortality function:

///<summary> 
/// An overlapping population with trait-dependent mortality. 
///</summary> 
///<typeparam name="T">The <see cref="IIndividual"/>-based type in the population.</typeparam> 
class MortalityPopulation<T> : ComplexPopulation<T> where T : class, IIndividual, new() 
{ 
    public MortalityPopulation(int populationSize, int MaxTraitValue) 
        : base(populationSize, MaxTraitValue) 
    { 
    }   

    ///<summary> 
    /// Calculates mortality as a straight line relationship with trait value. 
    ///</summary> 
    ///<param name="individual">The <see cref="IIndividual"/>.</param> 
    ///<returns>A <see cref="double"/> between 0 and 1 representing the probability of mortality for this individual.</returns> 
    public override double ProbabilityOfDeath(IIndividual individual) 
    { 
        int traitValue = individual.TraitValue;   

        return ((double)traitValue / (double)base.mMaxTraitValue) / (double)10; 
    } 
}

 

All this derived class does is override the virtual method to provide a calculated survival probability.

The equation we’re using is charted as:

Chart of fitness functions.

We’ll come to the three ‘strength’ factors later – for the moment we’ll use the ‘default’ strength of 10. This means that individuals with a trait value of 20 (all AA alleles) have a 0.1 probability of death, while individual with a trait value of 0 (all aa alleles) have a probability of death of 0 (ie they live for ever!).

So what is the result of running a population through this simulation?

Average trait evolving over 1000 generations.

So there’s a marked drop in the average trait value and a gradual decrease in the standard deviation.  This is because the population’s distribution is being ‘squashed’ up against the edge of possible values.

If we look at the trait as a 3D chart of distribution over time we get:

3D image of trait evolving over time.

That’s a bit difficult to see, so here’s the distribution histograms over time:

TraitFitnessStandardHistogram

So you can see, firstly, that at generation 0 before any evolution has occurred, the trait is normally distributed in the population, but as our selective pressure takes hold (generation 50) the distribution stops being normal.

Imagine that the normal distribution is made of jelly.  Evolution is acting like a hand pushing the jelly to the left : while being pushed, the jelly distorts. This hand is the ‘selective pressure’ of evolution. The distribution is moving due to individuals dying and being born, not due to individuals changing while they live.

In this particular example, there is a wall in the way at trait value 0; there is nowhere left to evolve to.  Pushing the jelly against this wall makes it kind of go ‘splat’ against that wall, but the hand doesn’t stop pushing and it ends up holding the jelly against the wall.

If the wall wasn’t there and the hand stopped pushing, the jelly would return to it’s normal distribution shape.

From this we can draw the conclusion that measuring the distribution of a physical trait in a population might give us a clue as to whether it is under strong selective pressure – if the normal distribution is skewed it’s a clue that strong selective pressure is forcing the population to evolve.

Returning to our graph of mortality probability by trait value, we can turn the ‘strength’ of the selective pressure up or down.

In our code:

return ((double)traitValue / (double)base.mMaxTraitValue) / (double)10;

 

the value ’10’ is the strength of the selective pressure.  If we make it smaller, the selective pressure is stronger, if we make it larger the selective pressure is weaker.

Setting it to 1 gives the following change in mean trait:

TraitFitnessHighPressure

As you can see, the change in mean is sharper than before.  This is because more individuals are dying each year and so changes occur faster.

Setting the ‘strength’ factor to 25 gives the following:

TraitFitnessLowPressure

Which shows the changes taking place more slowly.

So there you have a simple evolutionary model based on ‘survival of the fittest’ Darwinian Evolution.

Next I shall show how Sexual Selection can also cause evolutionary change in place of survival of the fittest.

The code for this simulation can be found over in the Channel 9 sandbox.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

I promised, a while ago now, to show a version of the simulation with overlapping generations. This basically means that not all the individuals in the population die every season, so some remain in the next ‘generation’.

OK, so starting with the simple stuff.

Our original non-overlapping generation, only had one method to run every season:

DoOneSeasonOfReproduction()

 

All this did was to reproduce continually into a new, empty population until the new population became full.

Overlapping populations will need an extra method:

DoOneSeasonOfMortality()

 

The mortality phase is when individuals are killed off to make space on the population for new individuals that are created in the reproduction phase.  We could combine this into one method, but somehow I feel they should be kept separate.

So I mentioned creating spaces in the population.  This is the real secret of this simulation.  When I first wrote this type of simulation, I used linked lists to hold the population so that you could remove nodes from the list easily and then add new individuals easily.  It ran like a dog. I mean it was really, really, really slow. Like, start the simulation off and come back tomorrow to see the results of the first couple of runs.

Writing a simulation is not like writing a normal application.  You don’t care about using too much memory and you do care about speed (your probably going to run hundred of replicates, so you don’t want to have to wait weeks for if to complete).

The secret is to use only arrays.  I used an array in the non-overlapping population version and it was the obvious choice because the population size is always static.  Using arrays in the overlapping version is a little more involved because, like I said, we make spaces in the population.

The first version of the array only simulation was probably the natural first design that anyone would come up with without spending too much time thinking about it.  When adding new offspring to the population, iterate through the array to find the spaces.  This works for small populations, but when you keep having to iterate through an array of 10,000 indices it get slow again.

So to speed up the finding of spaces, we’re going to use a simple stack.  The stack is also an array-based implementation for speed.  Essentially, we create an integer array of the same length as the population size (so that if the random numbers dictate it, the entire population can die off which would indicate the end of the simulation).

Every time an individual is killed, it’s index is pushed onto the top of the ‘spaces stack’.  Whenever an individual is added to the population, it’s new index is popped from the spaces stack.  When the spaces stack is empty our population is full.

Here’s a diagram:

Flow of the space stack

So the space stack speeds up the process of finding a space in the population into which new offspring can be inserted, and the use of fixed size arrays means there’s not much by way of pesky memory allocations to slow things down.

So, lets implement this:

///<summary>
/// Does one season of mortality.
///</summary>
///<remarks>Performs probability-based deletions of individuals in the population.</remarks>
public void DoOneSeasonOfMortality()
{
    System.Random rand = new Random();

    for (int i = 0; i < mPopulationSize; i++)
    {
        // Find random 
        double chance = rand.NextDouble();

        // 0.25 probability of death for each individual.
        if (chance <= 0.25)
        {
            // Push index into stack then delete individual.
            stackTop++;
            stack[stackTop] = i;
            population[i] = null;
        }
    }
}

 

In this example, we’re just using a fixed probability of 0.25 to determine if an individual dies.  I could probably do without actually setting the individual in the population to null and rely entirely on the stack to hold which individuals were dead, but frankly that idea worries me.

So now we’ve got a population with vacant spaces, we need to fill them back up with reproduction.  Here’s the new reproduction code:

///<summary>
/// Does one season of reproduction.
///</summary>
///<remarks>This updates the Population with one season of reproduction.</remarks>
public void DoOneSeasonOfReproduction()
{
    System.Random rand = new Random();

    // While stackTop >= 0 there are still vacant indices in the population
    while (stackTop >= 0)
    {
        // Find two random parents (non null entries)
        int idxParentOne = rand.Next(0, mPopulationSize - 1);
        while (population[idxParentOne] == null)
        {
            idxParentOne = rand.Next(0, mPopulationSize - 1);
        }

        int idxParentTwo = rand.Next(0, mPopulationSize - 1);
        while (population[idxParentTwo] == null)
        {
            idxParentTwo = rand.Next(0, mPopulationSize - 1);
        }

        // Populate the space at the top of the stack and then pop that
        // space from the stack.
        population[stack[stackTop]] = population[idxParentOne].Cross(population[idxParentTwo]) as T;
        stack[stackTop] = -1;
        stackTop--;

    }
}

 

This is pretty much the same as for the non-overlapping population code, except we put the new offspring into the same population as the parents and keep going until the population is full.

The final change that I have made is to create an interface for the population objects, so that we can swap between non-overlapping (simple) populations and overlapping (complex) populations.  Here’s the interface:

interface IPopulation<T> where T : class, IIndividual, new()
{
    void DoOneSeasonOfMortality();
    void DoOneSeasonOfReproduction();
    T[] Population { get; }
    int PopulationSize { get; }
    DescriptiveStatistics.Histogram TakeCensus();
}

 

The  simple population has a mortality method that does nothing and only exists to satisfy the interface (not great design, but it will do for now).

This interface means that our main simulation loop looks like this :

System.IO.StreamWriter sw = System.IO.File.CreateText("c:\\Log.txt");

stats = new DescriptiveStatistics.Histogram(0, 500, 10);
WriteHistogramHeader(sw, stats);
int repCount = 10;

IPopulation<Individual> population = new ComplexPopulation<Individual>(500, 20);

DescriptiveStatistics.Histogram histo = population.TakeCensus();
int generationCount = 0;
while ((histo.Statistics.StdDev > 0) || (generationCount == 0))
{
    generationCount++;
    population.DoOneSeasonOfMortality();
    population.DoOneSeasonOfReproduction();
    histo = population.TakeCensus();

    Console.WriteLine("{0}\t{1}\t{2}", generationCount, histo.Statistics.Average, histo.Statistics.StdDev);
    sw.Write("{0}\t{1}\t{2}", generationCount, histo.Statistics.Average, histo.Statistics.StdDev);
    WriteHistogram(sw, histo);
    sw.Flush();

}

sw.Close();
sw.Dispose();
}

 

We can easily switch population types by replacing the line

IPopulation<Individual> population = new ComplexPopulation<Individual>(500, 20);

with

IPopulation<Individual> population = new SimplePopulation<Individual>(500, 20);

to change the population type that we are simulating.

I have put all of the code for this post over on the Channel 9 sandbox so you can download and have a play.

If you're interested in writing software, check out my other blog: Coding at The Coal Face

Replicates

May 25, 2007

So far, all we’ve done is look at individual runs of the simulation. While this is all well and good to get a feel for evolving populations, we sometimes need to ask wider questions. For example, in my past post on Genetic Drift we stated that Drift has less of an effect in smaller populations by just comparing two simulations. If we wanted to know how population size affects the strength of drift we need to get a little more scientific and create replicates.

Replicates are multiple answers to the same question.

Why?

In our simulations, we use random numbers in order to simulate effects like Genetic Drift. The down side of this is that these random numbers introduce a certain level of ‘noise’ in to the answer. By running the simulation multiple times we can take an average and get a clearer picture.

You can think of it in terms of a ‘population of results’; in order to understand the whole population, we need to take several samples. I’ve taken our simulation from the last entry which finds the number of generations until Genetic Drift stops changing the population and run it multiple times, taking the answers from each run and putting them into our histogram object. The output looks like this:

Now, I’m prepared to say that this looks like a normal distribution. It’s complicated by the fact that it’s close to zero on the X-Axis, so it looks skewed.

So the replicates in a simulation are effectively samples taken from the theoretical population of answers, which is normally distributed around a mean value. This is why we run a lot of replicates and then take the mean and standard deviation as our answer.

How Many Samples?

Now there’s the question. How do we know when we’ve got the right number of replicates? The only way to find out for sure is to look at the results of a given simulation for different numbers of replicates. Here’s a chart of the means from another set of simulations against the number of replicates run:

As you can see once there are more that about 230 replicates, the mean answer seems to settle at a value between 60 and 70.

There’s a trade-off here; more replicates are more accurate, but take more time to produce. Now, I’m a careful type of guy so I think I’ll use 400 replicates so that there’s a buffer against highly randomised data. I’ll use this number of replicates for all future simulations.

I also collected the frequency histograms for all these simulation runs, which I have plotted here in 3D to show how the distributions change with more replicates:

You can see that with a small number of replicates (towards the back of the 3D area) there is no real peak to the distributions, while for larger numbers of replicates (towards the front of the 3D area) there is a definite peak.

I won’t be going into any real statistical analysis of results for the simulations that appear here : I don’t believe they’re really necessary.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

Well, it looks like Word 2007 has a blogging feature. I shall investigate further …

Genetic Drift

April 30, 2007

If you’ve managed to follow all the stuff so far about probability matrices, and random numbers you’ll want to know why we do simulations this way.

The answer is to represent genetic drift.

Genetic Drift is an evolutionary mechanism that doesn’t get as much press coverage as selection (survival of the fittest).

The reason Genetic Drift doesn’t get as much attention is that it’s a weak force.  It’s also random. However it’s still important.

To explain Genetic Drift I’m going to start with a ‘thought experiment’. So you’ll have to use your imagination.

Imagine you have a large bowl.  Into that bowl you tip 500 red marbles and 500 blue marbles.

Now put on a blindfold and mix up the marbles in the bowl with your hands, being careful not to spill any. Next, keeping the blindfold on, randomly take 100 marbles out of the bowl.  Don’t drop any, and don’t loose count.

Before you take the blindfold off, guess how many red marbles you took out of the bowl.  If you answered “50 red marbles” you might not be correct.

50 red marbles is the predicted number of red marbles you would expect;  half the marbles in the bowl (the total marble population) were red, so you’d expect a random selection of marbles (the sample) to also be half made up of red marbles.

But it’s not that simple.  Because you couldn’t see the marbles you were choosing, the marbles were chosen randomly.  Because the marbles were chosen randomly there’s a chance that the selected marbles (the sample) was not 50/50.  You might have actually selected 51 red marbles and 49 blue, or 43 red marbles and 57 blue.  There’s even a very tiny chance that you could have selected 100 blue marbles.

Now take the 100 marbles you chose and multiply each one by ten, so you have 1000 marbles again.  This is your next generation.  Repeat the blindfold/bowl/random choice bit again.  Now how many red marbles did you select? It’s still probably not 50.  It might be even further away from 50 than last time.  Keep doing this a few hundred times and you could find that the number of red marbles in the bowl changes over time.  It drifts in numbers in a random fashion.

That’s what genetic drift is in an evolving population.

Because there’s a random element in which individuals in a population get to mate together, there’s a random element in which genes get passed down to the next generation, which means there a random element to the genetic makeup of the population.  And as you should know if you read my first post, changes to the genetic makeup of a population is called evolution.

(EDIT:  I’ve just Googled and found other people are using marbles to explain genetic drift.  Dammit!  I thought I was being original!)

So, Genetic Drift (now I’ve defined what it is I’d better start capitalising the words) is evolution, but it is not Darwinian evolution.

What’s that? Not Darwinian evolution?  Nope.  Darwin didn’t know about genetics, so he could never have thought of Genetic Drift.  This is why current evolutionary science is described as Neo-Darwinian. So when idiots start spouting how Darwinian theory has holes in it, you could answer straight back with “that’s why we study neo-Darwinian theory!”; but that will probably make you sound like an idiot too, so I generally just keep quiet and ignore the lunatic fringe.

So now we have an idea of what Genetic Drift is, lets look at an example.  I finished my previous post with a graph.  I’ll put another copy here:

Chart of randomly changin population genetics.

First thing to remember:  the Y-axis represents the average trait value. Possible trait values range from 0 to 20.  The X-axis represents time and goes for about 200 generations.

If you look at the very first (left-most) point on the graph, you’ll see that it’s close to a value of 10.  This is roughly what we would expect the average value to be when the population is randomly generated, although due to the randomness of the initialisation you can never be certain.

In another previous post, I showed a chart of the genetic distribution within a randomly generated population. Here it is again:

Distribution of trait within a population.

This is a normal distribution, so keep in mind that the top chart, showing average trait is just the tip of the iceberg; each point is just the top of a normal distribution lurking 95% under the surface. When you look at the trait over time graph, think of it as being like this:

OK, so now how do we interpret the chart?  Well, it’s drifting around.  There’s no real direction to the changes except at the end where it seems to stop at a value of 8.

Why?

It’s due to the phenomenon of ‘fixation’.  Because the alleles in the individuals don’t ‘jump around’ in the gene (our array of Locus instances in the code) it’s possible to reach a state where every individual in the population has the same allele at the same location on the gene.  When this happens, there can be no more change at that gene. My guess is that when the population reaches an unchanging value of 8, that 4 loci in the gene are fixed for the ‘AA’ allele, while the rest are fixed for the ‘aa’ allele.

Let’s get more data.  I’ll run another simulation, but this time I’ll also record the standard deviation as well as the mean for the trait…

Chart of Average Trait Value over time

As we can see from this chart, the standard variation decreases slowly until there isn’t any left when the average trait value stops changing at a value of 6.

(Nice chart, huh?  Just got Office 2007 installed🙂 )

Fixation can be an important mechanism in smaller populations, but when population size increases, it becomes less likely.  The last chart is from a population of 1000, this next one is from a population of 10000:

Drift in a large population.

As you can see, in this larger population (actually running the simulation for twice as long, too) there is little or no reduction in standard deviation, although you can still see it drifting around.

Remember when I said drift was a weak force?  That doesn’t mean it can’t have a large effect.  The following chart came about by chance as I was preparing to generate the last chart:

This was in a large population (10000) and run over 400 generations.  As you can see, the trait value drops to almost zero.  This means that the ‘A’ allele has almost vanished from the population. Once an allele is removed from the population, that allele is extinct.  This can happen purely through genetic drift.

Summary

Genetic Drift is an effect of the randomness of mating.  Although it is a weak force compared to selective pressure, it can have a large effect on the evolution of a population.  Drift is a stronger force in smaller populations.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

OK, so we’ve got code for genes and code for individuals.  If we want to simulate evolution, we’re going to need a population.

Static Population Size

The first point to make is that we’re going to use a static population size. In reality populations are not necessarily static; populations grow and shrink.  However, in general populations follow a sigmoidal growth curve similar to this (idealised) one:

Population Growth Curve

Now, if we look at the right-hand end of the graph the population we can see that the population is a constant size. So in our model we’re assuming that our population if mature and constant. This model doesn’t fit all population types – some just don’t follow this model – but the majority of animal populations do follow this model.

Plus it’s much, much easier to write.

Generational Overlap

Populations in simulations can be split into two distict categories:  non-overlapping generations and overlapping generations.

Non-overlapping generations are the simplest model, so we’ll tack those first.  In a non-overlapping generation model, each generation dies at the end of the ‘year’ to be replaced by a completely new population. In programming terms, we create a second list of individuals of the same length as the original and let individuals from the ‘parent’ population reproduce, placing their offspring into the ‘offspring’ population until the offspring population is full.  We then delete the parent population and replace it with the new one.

Overlapping populations are a little different; individuals in the population can live for more than one year. When individuals do die, they leave ‘gaps’ in the population that can be filled by offspring. In programming terms we perform a mortality operation that kills off some of the individuals (usually in a way related to a measure of their evolutionary fitness), and then fill the empty spaces in the population with offspring.  This is more complex than a non-overlapping model as we need to keep track of the ‘gaps’ that mortality opens up in the population as well as perform fitness-dependent mortality.

Writing Generic Population Code

I’ve decided to write population code for both overlapping and non-overlapping populations using generics in .NET 2.  We can then create a basic ‘individual’ class that we can modify for our own purposes depending on what we want to do with out simulation, while re-using the same population code over and over again.

Firstly we have to deal with the ‘individual’ class that we’ve already created.

Improving the Individual : Interactions

So far our individuals are isolated from one another. In a true population they interact to some degree; at the very least they mate. We have to make a design decision here:  the basic population code will remain the same for almost all simulations, but the code for the individuals in the population will change a lot.  What I’m going to do is use an interface for the individuals, and use generics for the population.

We only currently need two thing from an individual;  a value for the genetic trait that the individual carries and the ability to cross with another individual. We’ll add more to this as we build up the simulation later.

public interface IIndividual

{
    int TraitValue { get;}
    IIndividual Cross(IIndividual mate);
}

In our code we just need to adapt our Individual class.

Firstly, deriving from the IIndividual interface:

public class Individual : IIndividual

Then, getting the trait value (int this case, just passing back the ‘height’ property:

 ///<summary>
/// Gets the trait value.
///</summary>
///<value>The trait value.</value>
public int TraitValue
{
    get { return this.Height; }
}

Finally, we need to add code to allow us to cross two individuals:

///<summary>
/// Crosses with the specified mate.
///</summary>
///<param name="mate">The mate.</param>
///<returns>A new <see cref="IIndividual"/> as offspring.</returns>
public IIndividual Cross(IIndividual mate)
{
    // Cast to specific type, error if not possible.
    Individual other = mate as Individual;
    if (other == null)
    {
        throw new System.ArgumentException("mate is not of same type.", "mate");
    }

    // Cross each locus of the gene
    Locus[] offspringGene = new Locus[arraySize];

    for (int i = 0; i < arraySize; i++)
    {
        offspringGene[i] = Locus.Cross(this.heightLoci[i], other.heightLoci[i]);
    }

    // Create new offspring with new genotype.
    return new Individual(offspringGene);
}

First things first.  We cast the IIndividual interface passed into our specific Individual type.  If this fails, we throw an exception. The types must be the same.

Next we create a new array of Locus objects.  These will be given to the offpspring.

We loop through all the parental Loci, crossing each one and putting the result into the offpsring Locus array.

We create a new Individual passing it the new Locus array.  We return the new Individual as the result (it will get cast back to an IIndividual automatically).

Look at how the Locus array gets crossed:  the first locus in parent 1 gets crossed with the first locus in parent 2 and the result becomes the first locus in the offspring.  This ordering is important.  Loci do not ‘jump about’ in the gene array, they stay at their original index.  We shall later add code to shuffle things around (called ‘cross-over’ or more correctly ‘conjugation’).

Notice also that there is no randomisation on crossover.  Only our probability rules are used to create locus types, they are at no point randomly assigned.  We shall later add some randomness to create ‘point mutations’ within the population.

A Non-overlapping Population

So now that we’ve got the individual interface sorted, lets look at using it in a population.

First, the class declaration:

///<summary>
/// A Simple, non-overlapping population.
///</summary>
///<typeparam name="T">The <see cref="IIndividual"/> based class.</typeparam>
class SimplePopulation<T> where T : class, IIndividual, new()

The class SimplePopulation uses generics. This will allow us to use the same popultion class with different versions of the individual class without having to change the population code.

The SimplePopulation class has some constraints on the generic type.  The type ‘T’ must be a class (not a value type or struct), it must implement the IIndividual interface and it must have a default constructor (with no parameters).

The SimplePopulation has only a few internal data members:

// Private holder for the population size.
private int mPopulationSize = 100;
// Maximum possible value for trait (required for census histogram)
private int mMaxTraitValue = 20;

// The actual population.
private T[] population;

and the constructor  simply initialises them:

///<summary>
/// Initializes a new instance of the <see cref="T:SimplePopulation&lt;T&gt;"/> class.
///</summary>
///<param name="populationSize">Size of the population.</param>
public SimplePopulation(int populationSize, int MaxTraitValue)
{
    mPopulationSize = populationSize;
    mMaxTraitValue = MaxTraitValue;

    population = new T[mPopulationSize];

    for (int i = 0; i < mPopulationSize; i++)
    {
        population[i] = new T();
    }
}

Notice the line that uses new T() — this is allowed due to the new()constraint in the SimplePopulation generics constraint mentioned above.

The next feature of a population is that it must be able to allow breeding.  We will do this through a single season with a method:

///<summary>
/// Does one season of reproduction.
///</summary>
///<remarks>This updates the Population with one season of reproduction.</remarks>
public void DoOneSeasonOfReproduction()
{
    T[] newGeneration = new T[mPopulationSize];

    System.Random rand = new Random();

    for (int i = 0; i < mPopulationSize; i++)
    {
        int idxParentOne = rand.Next(0, mPopulationSize - 1);
        int idxParentTwo = rand.Next(0, mPopulationSize - 1);

        newGeneration[i] = population[idxParentOne].Cross(population[idxParentTwo]) as T;

    }

    population = newGeneration;
}

Here we simply create a new array of individuals and fill it by randomly selecting parents from the original population and mating them to create a single ofspring.  We’re just doing random mating for now.  We’ll add trait-based selection later.

Once the new population is full, it replaces the old one.

Simple really.

I have also added a method to take a census of the populatin as a snapshot.  I won’t go over the code here (it’s in the download) as it’s just housekeeping.

So how to use our simpl population?

static void Main(string[] args)
{
    SimplePopulation<Individual> SimplePop = new SimplePopulation<Individual>(1000, 20);

    System.IO.StreamWriter sw = System.IO.File.CreateText("Log.txt");

    for (int i = 0; i < 999; i++)
    {
        DescriptiveStatistics.Histogram histo = SimplePop.TakeCensus();

        if ((i % 10) == 0)
        {
            Console.WriteLine("{0}   {1}", i, histo.Statistics.Average);
        }

        sw.WriteLine(histo.Statistics.Average);

        SimplePop.DoOneSeasonOfReproduction();
    }

    sw.Close();
    sw.Dispose();

}

We create our population using our Individual class as the template type, passing the population size (1000) and the maximum trait value (20).  We get the maximum trait value as twice the number of trait alleles in the Individual class; there are 10 alleles, therefore the maximum trait is 20.

Then we simply loop through the population once for each generation (I chose 999 generations pretty arbitrarily). We start each generation with a census which I write to the console and to a log file.  Then we perform one season of reproduction.

Not very complicated, I’m sure you’ll agree.

Running the simulation does not look very exciting.  Numbers appear on the screen and get saved to a text file. Woopee, I hear you cry sarcastically.

I’ve taken the output of running the program and plotted it as a chart to get this:

So what are you looking at?  This is an evolving population.  It’s not Darwinian Selection, it’s  Genetic Drift. I’ll discuss Genetic Drift in the next post and we’ll explore it a little with the code we’ve written so far.

Summary

So what have we got? We have a population of individuals that reproduce randomly.  Genes are inherited my offspring using probability-based Mendellian genetics. When run over a period of time, we get genetic drift.

The code for this entry can be downloaded from Channel9’s sandbox.

 

If you’re interested in writing software, check out my other blog: Coding at The Coal Face

I’ve got a new PC, so there will be a short period of adjustment while I set everything up.