_______________________________________________________________________________________________
 


A Novel Method for Predicting Relative Solvent Accessibility Using Neural Networks Based Regression

By

Dr Jaroslaw (Jarek) Meller,
Division of Pediatric Informatics,
Children's Hospital Research Foundation,
Cincinnati, USA

_______________________________________________________________________________________________

Abstract:
Accurate prediction of relative solvent accessibilities of amino acid residues in proteins may be used to facilitate protein structure prediction. Toward that goal we developed a novel method for improved prediction of relative solvent accessibilities. Contrary to other machine learning based methods from the literature we do not impose a classification problem with arbitrary (and non-physical) boundaries between the classes. Rather, we seek a continuous approximation of the real valued relative solvent accessibilities using non-linear regression with several feed forward and recurrent neural networks, which are combined into a consensus predictor. For training we use a set of 890 protein structures derived from the PFAM database and we perform a careful validation of the results on control sets comprised together of 774 structures that were continuously derived from new PDB structures and had no homology to proteins included in the training. The effects of including family profiles and the sequence databases growth as well as the effects of weighting the errors by the expected levels of variability in relative solvent accessibilities observed for equivalent residues in a family of homologous structures are assessed. The new method outperforms classification-based methods when the real valued predictions are projected onto a two-class (i.e. exposed vs. buried) classification problem. Classification accuracy of about 77% is consistently achieved on control sets with the threshold of 25% relative solvent accessibility (RSA). Moreover, relative to variations observed at a given level of RSA in families of homologous structures, consistently high regression accuracy is achieved for wide range of relative solvent accessibility. For example, the mean square error for all buried and partially buried residues (RSA<35%) is less than 15% RSA and gradually increases for more exposed residues, for which the observed level of variability in protein families is also much higher than for the buried and partially buried residues. A Web server that enables predicting relative solvent accessibilities using the new method and provides customizable graphical representation of the results is available at http://sable.cchmc.org.

Speaker
Jaroslaw (Jarek) Meller is an Assistant Professor, Division of Pediatric Informatics, Children's Hospital Research Foundation, Cincinnati, USA. Secondary appointment in the Dept. of Biomedical Engineering, University of Cincinnati. Also employed in the Dept. of Informatics, Nicholas Copernicus University, Torun, Poland (leave of absence).

Education and professional experience:
PhD in computational chemistry, MSc in physics (major in computer physics), MSc in mathematics (major in computer science) and undergraduate education in sociology from the Nicholas Copernicus University in Torun, Poland. In the past I was working on new methods for the electronic structure of molecules (Universite Paul Sabatier, Toulouse, France), atomic simulations of proteins (Hebrew University, Jerusalem, Israel), electronic structure of large biomolecules (Kyoto University, Japan), protein structure prediction and annotation of genomic sequences (Cornell University, Ithaca, USA).

My current research:
My major fields of interest are: computational biology, bioinformatics, protein structure prediction, functional gene annotations, machine learning, data mining and classification problems in genomics and proteomics (see for example LOOPP, SIFT, SABLE).

 
_______________________________________________________________________________________________