Evolution of Novelty and Diversity: an open research idea

Spurred by the conference I have been attending (see previous post), and specifically work on evolution of diversity and complexity, I have decided to post one of my unsuccessful research grant applications as an open research idea. This is an application to the Leverhulme Trust that did not get past the first stage a couple of years ago. I really like this idea, but will never have the time to do it by myself, nor am I likely to find a suitable funder. So I am putting the idea into the public domain: after all, it serves no good to anyone sitting on my hard disk.. If you or someone you know would like to pick up on the idea, collaborate with me, maybe even come to work with me (perhaps through a Marie Curie application?) or even work completely independently of me, here it is.

Evolution of unbounded novelty and diversity using computer models of metabolism

Evolution has led to a continuous emergence of novel species, resulting in the diversity and complexity of life that we observe today. It is commonly presumed that the conditions set out by Darwin, of diversity, heredity and selection, are sufficient to explain this emergence. However, this cannot be tested in laboratory timescales, and computer simulations of evolution have been unsuccessful in producing an unending progression of novel, diverse and increasingly complex species, referred to as open-ended evolution (1). The formulation and validation of necessary and sufficient conditions for open-ended evolution is one of the biggest unsolved problems in biology (2).

We aim to address this research gap. Our hypothesis is that to obtain open-ended evolution, there must be positive feedback from the development of novelty in one species leading to the construction of new niches for other species to exploit. We propose that this positive feedback is a missing component from the current formulation of the theory of evolution. We will test this hypothesis using a computer evolution approach, by building a simulation of evolvable artificial single-cell organisms.

The novelty of our proposed approach is the use of real biochemistry to provide a rich and varied context for evolution. In the simulations, different ‘species’ of ‘organism’ will be distinguished because they possess different sets of enzymes. To survive and reproduce, organisms will be required to produce certain quantities and proportions of key chemicals: fatty acids, amino acids, nucleic acids and carbohydrates. Organisms will live in spatial patches, grow on available nutrients, and die to release chemicals that can be reused by other organisms. Fitness in the simulations will be identical to fitness in real biology: species that are better able to survive and reproduce are fitter than those that are not. Diversity will be driven by the gain or loss of enzymes, enabling organisms to use different resources and/or manufacture different biochemicals.

Our approach is radically different from previous attempts to evolve open-ended novelty. Laboratory approaches, commonly using the bacterium Escherichia coli, have been successful in evolving novel carbon source utilization (3) and diversification into two sub-types (4, 5). These results are exciting. However, experimental approaches are fundamentally limited because it is impossible to study millions of years of evolution in a laboratory setting. We cannot make broader generalizations from these results.

Computer simulations allow study of evolution on longer timescales and more focussed conditions than possible in the laboratory. The most celebrated examples are Tierra (6) and Avida (7). However, these do not exhibit open-ended evolution, but convergence to a small number of species (8). Other approaches evolve models that already contain complex building blocks, such as neural controllers or locomotive mechanisms (9, 10). Even though these approaches have evolved novelty, they start with considerable granularity and complexity. It is not clear to what extent these systems allow for unbounded novelty beyond the complexity inherent in the components. Moreover, all of these approaches lack the reusability of compounds and functional plasticity of enzymes present in real chemistry, which underpins biological evolution.

Although it is tempting to perceive the diversity of life in terms of the range of plants and animals familiar to us, single-cell organisms (bacteria and archaea) account for over 95% of genetic diversity (11). Even the bacterial species E. coli contains more genetic diversity than that which distinguishes humans from social amoebae (12). Single-cell organisms differ not by their developmental complexity (limbs, organs etc.), but their metabolic complexity: their capacity to use different biochemical sources to provide energy and reproduce. Thus we propose that it is sufficient to consider the richness of the biochemical world of single-cell creatures to obtain open-ended evolution. The positive feedback required by our hypothesis arises because the emergence of a new metabolic pathway in one species, leading to biosynthesis of novel compounds, provides opportunities for other species to evolve to exploit those compounds.

The use of an appropriately rich model chemistry is central for a successful simulation environment. A common approach is to use an artificial chemistry, such as string chemistries (13), or rich chemistries using molecular graphs (14, 15). A novelty of our approach is the use of real biochemistry. This bridges the gap between previous unrealistic computational approaches (6,7), and real-life evolution (3-5). There are three advantages of moving closer to the biology. First, we know that biology has the open-ended evolution property that we seek to reproduce. Second, there are now considerable data available that we can utilize. And third, the results obtained will be more applicable to the biological world.

Preliminary research from DJS’s group on modelling evolution has focussed on the evolution of transcription regulation. We have found that networks evolve repressor functions and hierarchical regulation to control energy usage (16-19) and that basal expression is necessary to obtain realistic network evolution (17). These results will inform the architecture of the transcription regulatory elements in the proposed simulations.

Technical Programme of Work

  1. Compilation of biochemical compound and enzyme reaction database, to include all known biochemicals and reactions, using relevant sources including KEGG (20) and MetaCyc (21).
  2. Compilation of data of free energies of formation for each biochemical compound. Measured free energies are available in databases including BRENDA (22), XPDB (23). Where measurements are unavailable, the group contribution method will be used to make suitable estimates (24, 25).
  3. Development of evolvable system for simulation of a lineage of organisms with a given set of enzymes, and given any input (local concentrations of biochemicals). Ordinary differential equations will be used to model both the metabolic reactions, and the transcription control mechanisms associated with expression of relevant enzymes.
  4. Development of spatial array that includes organisms and biochemical concentrations in each compartment, and appropriate levels of mixing between neighbouring spatial compartments.
  5. Development of evolutionary timescale simulation, to include mutations in enzyme sets and regulatory control, competition for resources, growth and death, and thus selection of most successful strains.
  6. Analysis for diversity and novelty in simulation results, using evolutionary activity statistics (8).


1. Bedau, M.A. 2008. In S. Bullock et al. (eds.) Artificial Life XI: Proceedings of the Eleventh International Conference on the Simulation and Synthesis of Living Systems, p. 750. MIT Press, Cambridge, MA.

2. Bedau, M.A. et al. 2000. Artificial Life 6: 363–376.

3. Blount, Z.D. et al. 2008. Proc Natl Acad Sci USA 105:7899-7906.

4. Rozen, D.E. and Lenski, R.E. 2000. Am Nat. 155: 24-35.

5. Rozen, D.E. et al. 2009. Ecol Lett. 12:34-44.

6. Ray, T.S. 1992. In Langton, C. G. et al. (Eds.) (1992). Artificial life II. Redwood City, CA: Addison-Wesley. Pp 371–408.

7. Ofria, C. and Wilke, C.O. 2004. Artificial Life 10:191-229.

8. Bullock, S. and Bedau, M.A. 2006. Artificial Life 12: 1–5.

9. Channon, A. 2001. In J. Kelemen & P. Sosik (Eds.). Advances in Artificial Life 2159 pp. 417-426. Springer-Verlag.

10. Turk, G. 2010. In Hellerman, H. et al. (eds). Proceedings of the Twelfth International Conference on the Synthesis and Simulation of Living Systems. Pp496 – 503. MIT Press.

11. Ciccarelli F.D. et al. 2006. Science 311:1283-1287.

12. Lukjacenko, O. et al. 2010. Microb Ecol. 60: 708-20.

13. Hickinbotham, S. et al. 2010. In Hellerman, H. et al. (eds). Proceedings of the Twelfth International Conference on the Synthesis and Simulation of Living Systems. pp24-31. MIT Press.

14. Benko, G, et al. 2003. J Chem Inf Comput Sci, 43:1085–93.

15. Ullrich, A. et al. 2011. Artificial Life 17: 87-108.

16. Jenkins, D.J. and Stekel, D.J. 2010. Journal of Molecular Evolution 71: 128-40.

17. Jenkins, D.J. and Stekel, D.J. 2010. Journal of Molecular Evolution 70: 215-231.

18. Jenkins, D.J. and Stekel, D.J. 2009. Artificial Life 15: 259-91.

19. Stekel, D.J. and Jenkins, D.J. 2008. BMC systems biology, 2:6.

20. Kanehisa, M. et al. 2008. Nucleic Acids Res. 36, D480-484.

21. Caspi, R. et al. 2008. Nucleic Acids Res. 36: D623–631.

22. Scheer, M. et al. 2011. Nucleic Acids Res. 39: D670-676.

23. Goldberg, R.N. et al. 2004. Bioinformatics 20: 2874-2877.

24. Jankowski, M.D. et al. 2008. Biophys. J. 95: 1487-99.

25. http://milolab.webfactional.com

Modelling Biological Evolution 2013: Conference Highlights

Over the last couple of days I have been attending the Modelling Biological Evolution conference at the University of Leicester organized by Andrew Morozov.

For me, the most interesting theme to have emerged is work on evolutionary branching: conditions under which polymorphisms (or even speciation) might arise. These were all talked about in the context of mathematical models (ODE-type formulations based on generalized Lotka-Volterra systems). The best talk I attended was by Andrew White (Heriot Watt University). He described various system of parasite-host co-evolution, the most interesting of which demonstrated increases in diversity: a new host could emerge that was resistant to current parasites, following which a new parasite could emerge that would infect that host. He rather nicely linked that work to experimental work from Mike Brockhurst (University of York) on phage infections of bacteria showing similar patterns. The results could of course be interpreted at a speciation level, or, probably more fairly, at the level of molecular diversification (e.g. of MHC types in an immune system). What I really appreciated about this resut is that it spoke to the idea that increased diversity can result through a positive feedback mechanism: diversification leads to new niches and thus the potential for further diversification. I have thought for some time that this is the most important mechanism that drives diversification / speciation in natural systems and it was nice to see an example of the mechanism in action.

The other talk I particularly appreciated on the subject was by Claus Rueffler (University of Vienna). He spoke about a result on complexity and diversity in Doebeli and Ispolatov 2010 that also contains this feedback idea. This paper relies on a specific model to obtain its result on conditions for evolutionary branching. Rueffler demonstrated general conditions under which branching might take place that depend only upon the properties of the Hessian matrix associated with key parameters in model space. The important point is that the analysis is model-independent: it only considers the properties of the model forms needed to obtain the result.

Similar ideas were presented by Eva Kisdi (University of Helsinki). She focussed on models that include evolutionary trade-offs (e.g. between virulence and transmissibility): her point was that instead of choosing a function and analyzing its consequences, one could consider desired properties of a model (e.g. branching or limit cycles) and then use “critical function analysis” to derive conditions for possible trade-off functions that would admit the desired behaviour. Eva made the important point that many models make ad hoc choices of functions and thus lead to ad hoc results of little predictive value.

I think Eva’s point really touched on some of the weaknesses that emerged in many of the talks that I attended: there was a great deal of theory (some of which was very good), but very little interface with real biological data. I find this somewhat surprising: modelling in ecology and evolution has been around for very much longer that modelling in say molecular biology (where I currently work), and yet seems to be less mature. I think that the field would really benefit from far greater interaction between theoretical and experimental researchers. Ideally, models should be looking to generate empirically falsifiable hypotheses.

Perhaps the most entertaining talks were given by Nadav Shnerb and David Kessler (both Bar Ilan University). Nadav’s first talk was about power-law-like distributions observed in genus/species distributions. Core to his work is Stephen Hubbell’s neutral theory of biodiversity.
Nadav showed that distributions of number of species within genera could be explained by a neutral model for radiation and the genus and species level coupled with extinction. Nadav’s most important point was that if you wish to make an argument that a certain observed trait is adaptive, then you have to rule out the null hypothesis that it could arise neutrally through mutation/drift. I hope that is something we addressed with regards global regulators in gene regulatory networks in Jenkins and Stekel 2010. David spoke about biodiversity distributions also, showing that adaptive forces could explain biodiversity data (they are generally poor at this due to competitive exclusion that occurs in many models) if the fitness trait is allowed a continuous rather than discrete distribution.

Nadav’s second talk was about first names of babies. This was very interesting – especially as I have a young family (and a daughter with a very old-fashioned name). He looked at the error distribution (easily shown to be binomial-like noise proportional to square root of mean) that is superimposed on a deterministic increase and decrease in popularity of a name over a 60 year period. His thesis was that the error distribution due to external events would be proportional to mean (not root mean), and, as only 5 names in his data set (Norwegian names in ~ 20th Century) did not fit binomial noise, he ruled out external events (e.g. celebrity) as being a major driver. The problem I have with this is that he didn’t rule out external events in the deterministic part of the data (e.g. initiating a rise in popularity of a name that then follows the deterministic feedback law he proposed).