Abstract:
Accurate prediction of relative solvent
accessibilities of amino acid residues in proteins may be used to facilitate
protein structure prediction. Toward that goal we developed a novel method
for improved prediction of relative solvent accessibilities. Contrary
to other machine learning based methods from the literature we do not
impose a classification problem with arbitrary (and non-physical) boundaries
between the classes. Rather, we seek a continuous approximation of the
real valued relative solvent accessibilities using non-linear regression
with several feed forward and recurrent neural networks, which are combined
into a consensus predictor. For training we use a set of 890 protein structures
derived from the PFAM database and we perform a careful validation of
the results on control sets comprised together of 774 structures that
were continuously derived from new PDB structures and had no homology
to proteins included in the training. The effects of including family
profiles and the sequence databases growth as well as the effects of weighting
the errors by the expected levels of variability in relative solvent accessibilities
observed for equivalent residues in a family of homologous structures
are assessed. The new method outperforms classification-based methods
when the real valued predictions are projected onto a two-class (i.e.
exposed vs. buried) classification problem. Classification accuracy of
about 77% is consistently achieved on control sets with the threshold
of 25% relative solvent accessibility (RSA). Moreover, relative to variations
observed at a given level of RSA in families of homologous structures,
consistently high regression accuracy is achieved for wide range of relative
solvent accessibility. For example, the mean square error for all buried
and partially buried residues (RSA<35%) is less than 15% RSA and gradually
increases for more exposed residues, for which the observed level of variability
in protein families is also much higher than for the buried and partially
buried residues. A Web server that enables predicting relative solvent
accessibilities using the new method and provides customizable graphical
representation of the results is available at http://sable.cchmc.org.
Speaker
Jaroslaw (Jarek) Meller is an Assistant Professor, Division of Pediatric
Informatics, Children's Hospital Research Foundation, Cincinnati, USA.
Secondary appointment in the Dept. of Biomedical Engineering, University
of Cincinnati. Also employed in the Dept. of Informatics, Nicholas Copernicus
University, Torun, Poland (leave of absence).
Education and professional experience:
PhD in computational chemistry, MSc in physics (major in computer physics),
MSc in mathematics (major in computer science) and undergraduate education
in sociology from the Nicholas Copernicus University in Torun, Poland.
In the past I was working on new methods for the electronic structure
of molecules (Universite Paul Sabatier, Toulouse, France), atomic simulations
of proteins (Hebrew University, Jerusalem, Israel), electronic structure
of large biomolecules (Kyoto University, Japan), protein structure prediction
and annotation of genomic sequences (Cornell University, Ithaca, USA).
My current research:
My major fields of interest are: computational biology, bioinformatics,
protein structure prediction, functional gene annotations, machine learning,
data mining and classification problems in genomics and proteomics (see
for example LOOPP, SIFT, SABLE).
|