Rafik and I have just had our second paper on transcription factor binding site prediction published:
Salama, R.A. and Stekel, D.J. 2013. A non-independent energy based multiple sequence alignment improves prediction of transcription factor binding sites. Bioinformatics doi: 10.1093/bioinformatics/btt463.
Motivation: Multiple Sequence Alignments (MSAs) are usually scored under the assumption that the sequences being aligned have evolved by common descent. Consequently, the differences between sequences reflect the impact of insertions, deletions and mutations. However, non-coding DNA binding sequences, such as transcription factor binding sites (TFBS), are frequently not related by common descent, and so the existing alignment scoring methods are not well suited for aligning such sequences.
Results: We present a novel multiple MSA methodology that scores TFBS DNA sequences by including the interdependence of neighboring bases. We introduced two variants supported by different underlying hypotheses, one statistically and the other thermodynamically based. We assessed the alignments through their performance in TFBS prediction: both methods show considerable improvements when compared with standard MSA algorithms. Moreover, the thermodynamically based hypothesis (EDNA) outperforms the statistical one due to improved stability in the base stacking free energy of the alignment. EDNA can be downloaded from http://sourceforge.net/projects/msa-edna/.