New Publication: Reconstructing promoter activity from Lux bioluminescent reporters

Absolutely delighted to report that our paper has been published:

Iqbal M, Doherty N, Page AML, Qazi SNA, Ajmera I,  Lund PA, Kyraios T, Scott DJ, Hill PJ and Stekel DJ (2017) Reconstructing promoter activity from Lux bioluminescent reporters. PLOS Computational Biology 13(9): e1005731.


The bacterial Lux system is used as a gene expression reporter. It is fast, sensitive and non-destructive, enabling high frequency measurements. Originally developed for bacterial cells, it has also been adapted for eukaryotic cells, and can be used for whole cell biosensors, or in real time with live animals without the need for euthanasia. However, correct interpretation of bioluminescent data is limited: the bioluminescence is different from gene expression because of nonlinear molecular and enzyme dynamics of the Lux system. We have developed a computational approach that, for the first time, allows users of Lux assays to infer gene transcription levels from the light output. This approach is based upon a new mathematical model for Lux activity, that includes the actions of LuxAB, LuxEC and Fre, with improved mechanisms for all reactions, as well as synthesis and turn-over of Lux proteins. The model is calibrated with new experimental data for the LuxAB and Fre reactions from Photorhabdus luminescens—the source of modern Lux reporters—while literature data has been used for LuxEC. Importantly, the data show clear evidence for previously unreported product inhibition for the LuxAB reaction. Model simulations show that predicted bioluminescent profiles can be very different from changes in gene expression, with transient peaks of light output, very similar to light output seen in some experimental data sets. By incorporating the calibrated model into a Bayesian inference scheme, we can reverse engineer promoter activity from the bioluminescence. We show examples where a decrease in bioluminescence would be better interpreted as a switching off of the promoter, or where an increase in bioluminescence would be better interpreted as a longer period of gene expression. This approach could benefit all users of Lux technology.

Author summary

Bioluminescent reporters are used in many areas of biology as fast, sensitive and non-destructive measures of gene expression. They have been developed for bacteria, adapted now for other kinds of organisms, and recently been used for whole cell biosensors, and for real-time live animal models for infection without the need for euthanasia. However, users of Lux technologies rely on the light output being similar to the gene expression they wish to measure. We show that this is not the case. Rather, there is a nonlinear relationship between the two: light output can be misleading and so limits the way that such data can be interpreted. We have developed a new computational method that, for the first time, allows users of Lux reporters to infer accurate gene transcription levels from bioluminescent data. We show examples where a small decrease in light would be better interpreted as promoter being switched off, or where an increase in light would be better interpreted as promoter activity for a longer time.


Thanks to all my brilliant collaborators and coauthors. Thanks also to the lovely referees (one of whom signed their review) who said of the article: “an extremely important contribution to the field” (Reviewer 1) and “a significant advance” (Reviewer 2) and  provided helpful and constructive feedback.




New publication: A Bayesian approach to analyzing phenotype microarray data enables estimation of microbial growth parameters

Delighted to say that our first paper of 2016 is published. Matthias’s Biolog paper is now online-ready with the Journal of Bioinformatics and Computational Biology. This is also the first output from our Biolog grant – with a second paper detailing our newer software and analysis being planned.

Gerstgrasser M, Nicholls S, Stout M, Smart K, Powell C, Kypraios T and Stekel DJ. 2016. A Bayesian approach to analyzing phenotype microarray data enables estimation of microbial growth parameters. J Bioinform Comput Biol. DOI: 10.1142/S0219720016500074


Biolog phenotype microarrays (PMs) enable simultaneous, high throughput analysis of cell cultures in different environments. The output is high-density time-course data showing redox curves (approximating growth) for each experimental condition. The software provided with the Omnilog incubator/reader summarizes each time-course as a single datum, so most of the information is not used. However, the time courses can be extremely varied and often contain detailed qualitative (shape of curve) and quantitative (values of parameters) information. We present a novel, Bayesian approach to estimating parameters from Phenotype Microarray data, fitting growth models using Markov Chain Monte Carlo (MCMC) methods to enable high throughput estimation of important information, including length of lag phase, maximal “growth” rate and maximum output. We find that the Baranyi model for microbial growth is useful for fitting Biolog data. Moreover, we introduce a new growth model that allows for diauxic growth with a lag phase, which is particularly useful where Phenotype Microarrays have been applied to cells grown in complex mixtures of substrates, for example in industrial or biotechnological applications, such as worts in brewing. Our approach provides more useful information from Biolog data than existing, competing methods, and allows for valuable comparisons between data series and across different models.

Inferring the error variance in Metropolis-Hastings MCMC

One of the great joys of working with two talented post-docs in the research group – Mike Stout and Mudassar Iqbal – as well as a great collaboration with Theodore Kypraios, is that they are often one step ahead of me and I am playing catch-up. Recently, Theo has discussed with them how to estimate the error variance associated with the data used in Metropolis-Hastings MCMC simulations.

The starting point, usually, is that we have some data, let us say y_i for i=1, \cdots, n, and a model – usually, in our case, a dynamical system – which we are trying to fit to the data. For any given set of parameters \theta, our model will provide estimates for the data points that we will call \hat y_i. Now, assuming uniform Gaussian errors, our likelihood function L(\theta) looks like:

L(\theta) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi\sigma^2}}e^{-\frac{1}{2}(\frac{y_i - \hat y_i}{\sigma})^2}

where \sigma^2 is the error variance associated with the data. Now, when I first started using MCMC, I naively thought that we could use values for \sigma^2 provided by our experimental collaborators, and so we could use different values of \sigma^2 according to how confident our collaborators were in the measurements, equipment etc. What I found in practice was that these values rarely worked (in terms of convergence of the Markov chain) and we have had to make up error variances using trial and error.

So I was delighted when I heard that Theo had briefed both Mike and Mudassar about a method for estimating the error variance as part of the MCMC. Since I have not tried it before, I thought I would give it a go. I am posting the theory and some of my simulations, which are helpful results.


The theory behind estimating \sigma^2 is as follows. First, set

\tau = \frac{1}{\sigma^2}

We can then re-write the likelihood, now for the model parameters \theta and also the unknown value \tau, as

L(\bf{\theta}, \tau) = \frac{\tau^{(n/2)}}{\sqrt{2 \pi}^n}e^{-\frac{\tau}{2}\sum_{i=1}^n(y_i - \hat y_i)^2}

Now observe that this has the functional form of a Gamma distribution for \tau, as the p.d.f. for a Gamma distribution is given by:

f(x; \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}

So if we set a prior distribution for \tau as a Gamma distribution with parameters \alpha and \beta, then the conditional posterior distribution for \tau is given by:

p(\tau | \theta) \propto \tau^{(n/2)+ \alpha - 1}e^{-\tau(\frac{1}{2}\sum_{i=1}^n(y_i - \hat y_i)^2+\beta)}

We observe that this is itself a Gamma distribution, with parameters \alpha \prime = \alpha + n/2 and \beta \prime = \beta + \frac{1}{2} \sum_{i=1}^n (y_i - \hat y_i)^2. Thus the parameter \tau can be sampled with a Gibbs step as part of the MCMC simulation (usually using Metropolis-Hastings steps for the other parameters).


The simulations I have run are with a toy model that I use a great deal for teaching. Consider a constitutively-expressed protein that is produced at constant rate k and degrades (or dilutes) at constant rate \gamma per protein. A differential equation for protein concentration P is given by:

\frac{dP}{dt} = k - \gamma P

This ODE has the closed form solution:

P = \frac{k}{\gamma} + (P_0 - \frac{k}{\gamma}) e^{-\gamma t}

where P_0 is the concentration of protein at t=0. For the purposes of MCMC estimation, mixing is improved by setting P_1 = \frac{k}{\gamma} so that the closed form solution is:

P = P_1 + (P_0 - P_1) e^{-\gamma t}

Some data I have used for teaching purposes comes from the paper Kim, J.M. et al. 2006. Thermal injury induces heat shock protein in the optic nerve head in vivo. Investigative ophthalmology and visual science 47: 4888-94. The data is quantitative Western blots of Hsp70 in the optic nerve of rats, as induced by laser damage. (Apologies for the unpleasantness of the experiment):

Time / hours Protein / au
3 1100
6 1400
12 1700
18 2100
24 2150

The aim is to use a Metropolis-Hastings MCMC, together with a Gibbs step for the \tau parameter, to fit the data. The issue that immediately arises is how to set the parameters \alpha and \beta. This may seem arbitrary, but it is already better than choosing a value for \sigma^2, as the Gamma distribution will exploring of that parameter. For my first simulation, I thought that \sigma = 100 would be sensible (this turned out to be a remarkably good choice, as we will see). So I set \alpha = 0.01 and \beta = 100 and lo and behold, the whole MCMC worked beautifully. (Incidentally, I used independent Gaussian proposals for the other three parameters, with standard deviations of 100 for the P_0 and P_1 and standard deviation of 0.01 for \gamma. These parameters were forced to be positive – Darren Wilkinson has an excellent post on doing that correctly. Use of log-normal proposals in this case leads to very poor mixing, with the chain taking some large excursions for the P_1 and \gamma parameters).


The median parameter values are P_0 = 786, P_1 = 2526, \gamma = 0.0686 and \tau = 0.000122. The latter corresponds to \sigma = 90.6. With these values, we can see a good fit to the data: below are plotted the data points (in red), the best fit (with median parameter values) in blue, and model predictions from a random sample of 50 parameter sets from the posterior distribution in black.



However, some questions obviously arise: how sensitive is this procedure to choices of \alpha and \beta? I will confess: I use Bayesian approaches fairly reluctantly, being more comfortable with classical frequentist statistics. What I like about Bayesian approaches are firstly the description of unknown parameters with a probability distribution, and secondly the availability of highly effective computer algorithms (i.e. MCMC). What makes me uncomfortable is the potential for introducing bias through the prior distributions. So I have carried out some investigations with different values of \alpha and \beta. In particular, I wanted to know: (i) what happens if I keep the mean (equal to \alpha / \beta) the same but vary the parameters? (ii) what happens if I vary the mean of the distribution? The table below summarizes positive results:

alpha beta P0 P1 gamma sigma
0.01 100 786 2526 0.0686 90.6
1 10000 747 2428 0.0795 98.0
0.0001 1 797 2533 0.0681 96.3
0.1 10 822 2603 0.0623 97.9
0.001 1000 760 2455 0.0758 94.8
1 1 792 2539 0.0676 64.9
0 0 805 2565 0.0653 98.3

As you can see (please ignore the last line for now), the results are robust to a very wide range of \alpha and \beta, even producing a good estimate for \sigma when that estimate is a long way from the mean of the prior distribution. But then we can make the following observation. Consider the sum of squares for a ‘best-fit’ model, for example using the parameters for the first row (this is 12748). So as long as \alpha \ll n/2 and \beta \ll 12748/2, the prior will introduce very little bias. But if you try to use values of \alpha and especially \beta very much larger than an estimated sum of squares from well-fitted model parameters, then things might go wrong. For example, when I set \alpha = 1 and \beta = 10^6 then my MCMC did not converge properly.

This leads to my final point, and the final row in the table. Would it be possible to remove prior bias altogether? If you look at the marginal posterior for \tau, we observe that if we set \alpha = \beta = 0, we obtain a Gamma distribution, whose mean is precisely the error variance, as, in this case,

\frac{\beta \prime}{\alpha \prime} = \frac{\sum_{i=1}^n(y_i - \hat y_i)^2}{n}

The algorithm should work perfectly well sampling from this Gamma distribution, and indeed it does, producing comparable results to when an informative prior is used.


In summary, I am happy to conclude that this method is good for estimating error variance. Clear advantages are:

  1. It is simple to implement and fairly fast to run – adding a Gibbs step is no big deal.
  2. It is clearly preferable to making up a fixed number for the error variance – which was what we were doing before.
  3. The prior parameters allow you to make use of information you might have from experimental collaborators on likely errors in the data.
  4. The level of bias from the priors is relatively low, and can be eliminated altogether.