Good news that our collaboration with William Atiomo, Dave Barratt and others has paid of and we have just had a paper published in Metabolomics:
Zeina Haoula, Srinivasarao Ravipati, Dov J. Stekel, Catharine A. Ortori, Charlie Hodgman, Clare Daykin, Nick Raine-Fenning, David A. Barrett and William Atiomo. 2014. Lipidomic analysis of plasma samples from women with polycystic ovary syndrome. Metabolomics DOI: 10.1007/s11306-014-0726-y.
Polycystic ovary syndrome (PCOS) is a common disorder affecting between 5 and 18 % of females of reproductive age and can be diagnosed based on a combination of clinical, ultrasound and biochemical features, none of which on its own is diagnostic. A lipidomic approach using liquid chromatography coupled with accurate mass high-resolution mass-spectrometry (LC-HRMS) was used to investigate if there were any differences in plasma lipidomic profiles in women with PCOS compared with control women at different stages of menstrual cycle. Plasma samples from 40 women with PCOS and 40 controls aged between 18 and 40 years were analysed in combination with multivariate statistical analyses. Multivariate data analysis (LASSO regression and OPLS-DA) of the sample lipidomics datasets showed a weak prediction model for PCOS versus control samples from the follicular and mid-cycle phases of the menstrual cycle, but a stronger model (specificity 85 % and sensitivity 95 %) for PCOS versus the luteal phase menstrual cycle controls. The PCOS vs luteal phase model showed increased levels of plasma triglycerides and sphingomyelins and decreased levels of lysophosphatidylcholines and phosphatidylethanolamines in PCOS women compared with controls. Lipid biomarkers of PCOS were tentatively identified which may be useful in distinguishing PCOS from controls especially when performed during the menstrual cycle luteal phase.
My contribution was to carry out Lasso regression as an additional supervised machine learning technique to the OPLS-DA carried out by Srini. It was actually a fair bit of work in the end. Interestingly, people in metabolomics do tend to use PLS based approaches to classification/supervised learning, regardless the wide range of options available for such tasks. Separately from this paper, Anna Swan carried out analyses of the data from this paper using many different classifiation algorithms in Weka to see if any were clearly better than any others, or if an ensemble of different algorithms could give better results, but she found nothing better than the results reported in this paper.