Speaking at Biolog Phenotype Microarray Conference

Today I will be presenting at the Biolog Phenotype Microarrays conference at the University of Florence. Yesterday’s meeting was very good, with many interesting talks on different applications of Biolog arrays, including use of different software (mainly ductApe and opm). A highlight was meeting one of the lead developers of opm, Lea Vaas.

My talk will combine some of Matthias’s results on fitting models to Biolog data, with a demonstration of Mike’s HiPerFit software. Matthias’s work is nearly published (just waiting for a referee to agree to the last minor edit) and Mike’s software is nearly finished (a few bugs to iron out and a few guides to write) so it is all very exciting. The live demo depends on everything working just fine! It worked last night in my hotel room…

Three Posters at Biometals 2012 in Brussels

Together with Hiroki Takahashi, Jon Hobman and Selina Clayton we are attending the Biometals 2012 conference in Brussels. We have three posters between us:

Crossland, R.C., Hobman, J.L. and Stekel, D.J. Mathematical Modelling of Mercury Resistance.

Takahashi, H., Oshima, T., Clayton, S.R., Hobman, J.L., Tobe, T., Kanaya, S., Ogasawara, N. and Stekel, D.J. Mathematical modelling towards understanding of zinc homeostasis in Escherichia coli.

Clayton, S.R., Patel, M.D., Constantinidou, C., Oshima, T., Takahashi, H., Heurlier, K., Stekel, D.J., Hobman, J.L. The role of zinc uptake regulator, Zur, in pathogenic and non-pathogenic Escherichia coli.

These are posters 53, 54 and 55 so if you are in the area or at the conference please do look us up.

Recent Advances in Statistical Inference for Mathematical Biology – report on MBI workshop

Today saw the end of the workshop at MBI on Recent Advances in Statistical Inference for Mathematical Biology. It has been a very enjoyable and thought-provoking workshop – definitely well worth the visit. My own talk received a good number of questions and plenty of interesting discussion. It was definitely at the more ‘applied’ end of the talks given; many of the talks described new methodologies and it is these that were particularly useful.

Perhaps the most interesting feature to emerge from this workshop is the work on identifiability or estimability of the parameters: it is the four talks most focussed on this topic that I will review very briefly below. The difference between these two terms is non-identifiability of parameters is a structural issue: no amount of additional data could help; non-estimability is a feature of the model and the data: the parameters cannot be estimated from the data at hand, but perhaps with different data they could be. This issue has certainly become an important concern in our own work: situations in which the Markov chain is unable to provide meaningful estimates for one or more parameters. On one level, this is useful, indeed it is one of the reasons why we are using these approaches: if we cannot estimate two parameters but could estimate (say) the ratio of two parameters then we want to know that, and the joint posterior distributions give that information. But in other cases it is holding us back: we have inference schemes that do not converge for one or more parameters, limiting our capacity to make scientific inductions, and we need good methods both to diagnoze a problem and to suggest sensible resolutions.

Two talks discussed approaches to simulations based on the geometric structure of the likelihood space. Mark Transtrum’s talk considered Riemannian geometric approaches to search optimization.  The solution space often describes a manifold in data coordinates that can have a small number of ‘long’ dimensions and many ‘narrow’ dimensions. The issue he was addressing a long canyons of ‘good’ solutions that are difficult for a classical MCMC or optimization scheme to follow. Interestingly, this leads to the classical Levenberg-Marquardt algorithm that allows optimal and rapid searching along the long dimensions – and Mark described an improvement to the algorithm. However, in discussions afterwards, he mentioned that following geodesics along the narrow dimensions to the manifold boundary can help identify combinations of parameters that cannot be estimated well from the data. Mark’s paper is Transtrum, M.K. et al. 2011. Phys. Rev. E. 83, 036701.

Similar work was described by Ben Calderhead. He described work trying to do inference on models with oscillatory dynamics, leading to difficult multi-model likelihood functions. The approach was also to use a Riemannian-manifold MCMC combined with running a chain with parallel temperatures that give different levels of weight of the (difficult) likelihood relative to the (smooth) prior. The aim again is to follow difficult ridges in the solution space, while also being able to escape and explore other regions. Ben’s methodological paper is Girolami, M. and Calderhead, B. 2011. J. Roy. Stat. Soc. 73: 123-214.

A very different approach was described by Subhash Lele. Here, the issue is diagnosing estimability and convergence of a chain using a simple observation: if you imagine ‘cloning’ the data, i.e. repeating the inference using two or more copies (N say) of your original data, then the more copies of the data you use, the more the process will converge to the maximum likelihood estimate. Fewer copies will weight the prior more. This means that if all is working well: (i) as N increases, the variance of the posterior should decrease; (ii) if you start with different priors, then as N increases, the posteriors should become more similar. If these do not happen, then you have a problem. The really nice thing about this approach is that it is very easy to explain and implement: methods based on Riemannian geometry are not for the faint-hearted and can only really be used by people with a strong mathematical background; data cloning methods are more accesible! Subhash’s papers on data cloning can be downloaded from his web site.

Finally, another approach to identifiability was described by Clemens Kreutz. He described ways of producing confidence intervals for parameters that involved following individual parameters and then re-optimizing for the other parameters. Although more computationally intensive, this looks useful for producing more reliable estimates both of parameter and model fit variability. Clemens’s work is available at http://arxiv.org/abs/1107.0013.

There were many more applied talks too, that I very much enjoyed, to a range of interesting applications and data. Barbel Finkenstadt gave a talk that included, in part, work carried out by Dafyd Jenkins, and I was filled with an up-welling of pride to see him doing so well! I also particularly appreciated Richard Boys’s honest attempt to build an inference scheme with a messy model and messy data and obtaining mixed results.

All-in-all, an enjoyable and interesting week, well worth the trip, and I look forward to following up on some interesting new methodologies.

Speaking at Workshop: Recent Advances in Statistical Inference for Mathematical Biology

Today I will be presenting at at the Mathematical Biosciences Institute at Ohio State University which this week is hosting the workshop Recent Advances in Statistical Inference for Mathematical Biology. I will be giving a talk about Hiroki’s work (abstract here and below), while Dorota will be presenting a poster about her work.

I am very excited about this workshop as it is the first to my knowledge to bring together mathematical modelling with statistical inference. In my view, this marriage is crucial to the future development of mathematical biology as a field.


Inferring the gap between mechanism and phenotype in dynamical models of gene regulation


Dynamical (differential equation) models in molecular biology are often cast in terms of biological mechanisms such as transcription, translation and protein-protein and protein-DNA interactions. However, most molecular biological measurements are at the phenotypic level, such as levels of gene or protein expression in wild type and chemically or genetically perturbed systems. Mechanistic parameters are often difficult or impossible to measure. We have been combining dynamical models with statistical inference as a means to integrate phenotypic data with mechanistic hypotheses. In doing so we are able to identify key parameters that determine system behaviour, and parameters with insufficient evidence to estimate, and thus make informed predictions for further experimental work. We are also able to use inferred parameters to build stochastic and multi-scale models to investigate behaviour at single-cell level. We apply these ideas to two systems in microbiology: global gene regulation in the antibiotic-resistance bearing RK2 plasmids, and zinc uptake and efflux regulation in Escherichia coli.


Matthias speaking at Computational Biology and Innovation PhD Symposium, Dublin

Today sees the start of the Computational Biology and Innovation PhD Symposium at University College, Dublin. Matthias Gerstgrasser will be giving a presentation in tomorrow’s (Wednesday’s) session.

Title and abstract are:

Parallelising Sequential Metropolis-Hastings: Implementing MCMC in multi-core and GPGPU environments.

Markov Chain Monte Carlo (MCMC) techniques have become popular in recent years to efficiently calculate complex posterior distributions in Bayesian statistics. In computational biology, these methods have a wide range of applications, and in particular lend themselves to parameter estimation in models of complex biological systems. The Metropolis-Hastings algorithm is one widely used routine in this context. (1)

Our research focuses on employing the computational power provided by multi-core CPUs and general-purpose graphics processing units (GPGPUs) to provide a speedup to the operation of this algorithm. Both multi-core and GPGPU architectures offer vast computing power compared to traditional single-core environments, but tapping into these resources presents additional complexities. Yet current computer systems rely increasingly on increasing core count rather than performance per core to provide improvements in computing power, a trend that is almost certain to continue in the future. While (2) provides a GPGPU algorithm applicable to Independent Metropolis-Hastings (IMH), a parallel implementation of general  MH instances has proven difficult due to the inherently sequential nature of this algorithm. In our own research, we are investigating possible speedups in automated model fitting and parameter estimation in large phenotype arrays of brewer’s yeast and other microorganisms. Our findings, however, would be equally applicable to other problems in systems biology.

We show how for some types of target distributions we can leverage independence in the structure of these distributions in order to partially parallelise the running of the MH algorithm. We furthermore discuss how this approach can be implemented efficiently on both multi-core CPUs as well as in GPGPU environments. In both cases we divide the workload of computing the acceptance probability in the MH algorithm’s main loop among several threads. Furthermore, we replicate the remaining instructions of the loop among these threads as well in order to minimise overhead incurred by thread creation, synchronisation and deletion. More importantly, in GPGPU environments this modification greatly decreases data transfers between GPU and main memory. Both our implementations show a significant speedup over a single-threaded classical MH algorithm for computationally expensive target distributions. We discuss limitations of these implementations and necessary conditions for them to provide a measurable speedup over single-threaded implementations. 

In conclusion we compare the performance of parallelising a single instance of the MH algorithm compared to running several instances in parallel on either a multi-core CPU or in a GPGPU environment. The latter approach is particularly applicable to the common situation of estimating e.g. parameters from a number of distinct, but similar, experiments. We show how GPGPU computing can be used in these situations to provide an even greater speedup compared to single-threaded implementations. 

1. Wilkinson, D J. Stochastic Modelling for Systems Biology, 2006.
2. Jacob, P, Robert CP, Murray HS. 2011; arXiv:1010.1595v3.

Modelling and Microbiology – Conference at the eScience Institute

Tomorrow is the final meeting at the eScience Institute in Edinburgh: our conference on Modelling and Microbiology. Although I am a co-organizer, I am unable to attend for family reasons. We will be ably represented by Dorota Herman, who will be speaking on Tuesday at 3:05pm.

The entire meeting will be webcast and can be watched here. We have many good speakers, so I am posting the full speaker timetable below.

Monday July 4th

1.45-2 Welcome (Rosalind Allen)

2-3 Robert Austin (Princeton)
Darwin, ecology and the emergence of bacterial resistance: an
attempt at a synthesis

3.30-4.15 Martin Howard (John Innes Centre)
Dissecting the dynamics of low copy number plasmid segregation

4.15–5 Tobias Bollenbach (IST Austria)
Microbial responses to antibiotic combinations

Tuesday July 5th

9.30-10.30 Martin Ackermann (ETH Zuerich)
An evolutionary perspective on phenotypic heterogeneity in bacteria

11-11.45 Peter Lund (Birmingham)
Insights into stress response from laboratory-based evolution

11.45-12.05 Sara Mitri (Oxford)
Social evolution in microbial communities

12.05-12.25 Fatima Drubi (Leiden University)
Do bacteria sporulate as a bet-hedging strategy in stochastic

2-2.45 Alexander Morozov (Edinburgh)
Self-assembled bacterial rotors

2.45-3.05 Bartlomiej Waclaw (Edinburgh)
Simple models for bacterial evolution with migration

3.05-3.25 Dorota Herman (Birmingham)
Mathematical model for transcriptional regulation of RK2 plasmids and
its evolutional optimisation

3.55-4.40 Jan Kreft (Birmingham)
Individual-based modelling of horizontal gene transfer in chemostats
and biofilms

4.40-5.15 Phil Aldridge (Newcastle)

Continuous control of flagellar gene expression by the s28-FlgM
regulatory circuit in Salmonella enterica

Wednesday July 6th

9.30-10.30 Oskar Hallatschek (MPI Goettingen)
Genetic drift and selection in growing biofilms

11-11.45 Ian Stansfield (Aberdeen)
Negative feedback loops in the translational control of gene

11.45-12.05 Leena Nieminen (Strathclyde)
Modelling metabolic switching in differentiating bacterium
Streptomyces coelicor

12.05-12.25 David Richards (John Innes)
The mechanistic basis of hyphal branching in Streptomyces

2-2.45 Mamen Romano (Aberdeen)
The dynamics of demand and supply in mRNA translation

2.45-3.05 Dominique Chu (Kent)
Optimisation of gene expression resources in bioprocessing host cell

Thursday July 7th

9.30-10.30 John Little (U. Arizona)
Stochastic modelling of the phage lambda regulatory circuit: prophage
induction and stability

11.00-11.45 Francesco Falciani (Birmingham)
A systems biology approach sheds new light on bacterial acid

11.45-12.30 Kevin Foster (Oxford)
Spatiogenetic structure and cooperation in microbe

Poster at the 8th European Conference on Mathematical and Theoretical Biology

Dorota Herman is at the 8th European Conference on Mathematical and Theoretical Biology in Krakow this week where she is presenting a poster on her work. Poster abstract is below.

Dorota Herman, Chris Thomas and Dov Stekel

Evolutionary optimization of negative and co-operative autoregulation in RK2 plasmids

The central control operon of the RK2 plasmid is negatively and co-operatively autoregulated by dimers of two global plasmid regulators, KorA and KorB. Several roles for negative feedbacks in biosystems have been proposed by many researchers, and these roles include reduction of noise, increased robustness, speeding of response time and reducing burden on host. In this work, we seek to explain the evolutionary adaptation of the RK2 central control operon in terms of these proposed roles, using comparative analyses of the wild type system with a progression of simpler systems. We used a stochastic, multi-scale model that includes negative and co-operative gene autoregulation of the central control operon of the plasmid, plasmid replication and host cell growth and division. Keeping track of an RK2 plasmid line, we can observe the dynamics of protein abundance from entry of the plasmid into a naive host through to steady state. The comparative analyses between the regulation in models of the wild type central control operon and models with simpler, adequate architectures show a speed up of response time and a decrease in burden for the host, indicated by a decrease in the number of produced mRNAs. In comparison, minimal increased robustness and reduction of internal noise in steady state of bacterial growth phase were observed in these anayses. We conclude that possible reasons for evolution of the complex negative feedback regulation of the RK2 central control operon are the optimization of fast response times and reduced burden to host, and that it is unlikely that this regulatory system has evolved to reduced noise or increase robustness.