New publication: A Bayesian approach to analyzing phenotype microarray data enables estimation of microbial growth parameters

Delighted to say that our first paper of 2016 is published. Matthias’s Biolog paper is now online-ready with the Journal of Bioinformatics and Computational Biology. This is also the first output from our Biolog grant – with a second paper detailing our newer software and analysis being planned.

Gerstgrasser M, Nicholls S, Stout M, Smart K, Powell C, Kypraios T and Stekel DJ. 2016. A Bayesian approach to analyzing phenotype microarray data enables estimation of microbial growth parameters. J Bioinform Comput Biol. DOI: 10.1142/S0219720016500074

Abstract

Biolog phenotype microarrays (PMs) enable simultaneous, high throughput analysis of cell cultures in different environments. The output is high-density time-course data showing redox curves (approximating growth) for each experimental condition. The software provided with the Omnilog incubator/reader summarizes each time-course as a single datum, so most of the information is not used. However, the time courses can be extremely varied and often contain detailed qualitative (shape of curve) and quantitative (values of parameters) information. We present a novel, Bayesian approach to estimating parameters from Phenotype Microarray data, fitting growth models using Markov Chain Monte Carlo (MCMC) methods to enable high throughput estimation of important information, including length of lag phase, maximal “growth” rate and maximum output. We find that the Baranyi model for microbial growth is useful for fitting Biolog data. Moreover, we introduce a new growth model that allows for diauxic growth with a lag phase, which is particularly useful where Phenotype Microarrays have been applied to cells grown in complex mixtures of substrates, for example in industrial or biotechnological applications, such as worts in brewing. Our approach provides more useful information from Biolog data than existing, competing methods, and allows for valuable comparisons between data series and across different models.

Advertisements

Speaking at Biolog Phenotype Microarray Conference

Today I will be presenting at the Biolog Phenotype Microarrays conference at the University of Florence. Yesterday’s meeting was very good, with many interesting talks on different applications of Biolog arrays, including use of different software (mainly ductApe and opm). A highlight was meeting one of the lead developers of opm, Lea Vaas.

My talk will combine some of Matthias’s results on fitting models to Biolog data, with a demonstration of Mike’s HiPerFit software. Matthias’s work is nearly published (just waiting for a referee to agree to the last minor edit) and Mike’s software is nearly finished (a few bugs to iron out and a few guides to write) so it is all very exciting. The live demo depends on everything working just fine! It worked last night in my hotel room…

Biolog data analysis software training events

Are you making the most of your Biolog data?

Our new HiPerFit software allows you to

  • Fit suitable kinetic models to Biolog data
  • Identify key parameters e.g. lag phase or maximal growth rate
  • Identify best model for data
  • Visualize, compare and contrast results
  • Run analysis in high throughput

Learn to use our software at two free training events

  • Tuesday 30th June at the University of Leicester
  • Friday 24th July at the University of Surrey

At the event you will learn how to

  • Install and run the Biolog data visualizer on your own laptop
  • Ensure that your data is formatted for the Biolog data analyzer
  • How to launch a high throughput analysis run
  • How to use the visualizer to get the most of your data
  • How to install the Biolog data analyzer on a suitable server

Register now at http://goo.gl/forms/stL0nLal9F for your free place

Bring and run your own data!

Registration closes on 18th June for Leicester and 10th July for Surrey

Welcome to Mike Stout

This week Mike Stout started work in our group as a research fellow on the BBSRC funded project to develop systems for high throughput analysis of cell growth data from BIOLOG phenotype arrays; a lay summary of this project can be found here.

Prior to this, Mike was a PDRA at the the Centre for Plant Integrative Biology, University of Nottingham, working with Professor Charlie Hodgman on developing repositories for multi-scale systems biology models and imaging data, and tools for systems biology simulation visualization. Mike’s PhD, also at the University of Nottingham, was on predicting geometric and topological properties of proteins using a range of machine learning systems, in particular Learning Classifier System. He has a background in both Biology and Computer Science and before his PhD headed the Electronic Journals Group at Oxford University Press, managing transnational projects to develop journal content online.

Mike’s research interests include Complex Systems Science, Evolutionary Computation, Functional Programming, Information Visualization and High Performance Computation using, for example, GPUs.

Mike’s experience and expertise will be particularly valuable for the group and we look forward to working with him.

Research grant award from the BBSRC

On Friday we heard good news from the BBSRC that our research grant application for the analysis of Biolog data has been successful. This is a joint bid with Katherine Smart, Jon Hobman, Helen West and Theodore Kypraios. The relevant quote from the BBSRC is:

Dear Dr Stekel,

I am please to inform you that application BB/J01558X/1 – ‘High throughput analysis of cell growth data from phenotype arrays’ submitted to the BBSRC 2011 Responsive Mode Grant Round 3 (RM3) has been successful.  We are currently in the process of preparing the grants for announcement. 

There will be a postdoctoral position associated with this grant which will be advertised in due course according to usual University of Nottingham procedures.

Lay Summary for the Research Grant

Fifty people died as a result of the recent E. coli outbreak in Germany. Four thousand people were infected. With a growing global human population, how do we ensure that we all have access to safe food? Fossil fuels will run out, and the recent Fukushima disaster highlighted the risks of nuclear energy. How do we provide sustainable sources of fuel to meet our energy and transport needs in the context of a population that is not just growing, but also developing?

These are major challenges, and a key strategy for overcoming them is the study of microbes. In the case of E. coli the disease is caused by harmful bacteria, and we need to understand how harmful bacteria survive in farms, soil, food production, storage and preparation facilities, as well as in animal and human hosts. In the case of fuels, microbes provide an opportunity for a new generation of biofuels. Biofuels are carbon neutral technologies, but conventional biofuels need similar materials or land that could otherwise be used for food. We are now seeking to develop biofuels from plant matter that cannot be used for food and is currently wasted. To do this, we need to find new strains of yeast that can convert this plant matter into fuel.

In recent years, new technologies have been developed that enable us to read the full genome sequence of a microbe in just a day. This is indeed remarkable, but the genome sequence is a set of instructions in a language that we can only begin to understand. What really matters is how a microbe behaves in different environments: on what foods does it thrive, on what foods does it starve? What potential toxins can it survive and what toxins kill it? These questions are essential for understanding how we can combat harmful food-borne bacteria, or develop new bioenergy producing agents. And if we can link these answers to the genome sequence, we have a powerful way of decoding the language of the genes.

This proposal is focussed on a technology, called Biolog Phenotype Microarrays, that precisely measure how well microbes thrive in thousands of conditions, including different food sources and potential toxins. The arrays generate time courses that plot each condition at a regular point in time, with several hundred measurements of cell activity during the course of an experiment. Each time course encodes a wealth of information: how long does it take before the microbes start to become active? How quickly do they grow? Are they able to use more than one food source, and if so, is one better than the other? How much do they grow? Remarkably, there are no analysis methods available that allow users of Biolog arrays to obtain this information from the Biolog output: instead, users typically use a single datum, such as the end-point, or total growth, and discard most of the valuable information.

The aim of this proposal is to bridge this gap. To do so, we intend to build mathematical models that describe cell activity in Biolog arrays; these need to reflect the details of the technology, as well as the complexity of the conditions in which the cells are grown. We propose to develop automated ways of working out which model best fits any given set of data, and identify the key parameters describing microbial behaviour. Automation is essential, because a single experiment can generate 2000 microbial time courses. The methods have to be accessible to the wider scientific community, not just mathematicians, so we need to develop user-friendly interfaces to the methods we develop, and provide training for Biolog users in these methods.

Finally, in our established research programmes, we have generated vast quantities of Biolog data on survival of harmful E. coli strains, microbial soil contamination and the development of new yeast strains for producing biofuel from non-food plant material. We will directly address the food safety and bioenergy challenges by applying our methods to these data.