Identification of structurally conserved residues of proteins in absence of structural homologs using negative correlation learniing
Motivation: So far various bioinformatics and machine learning techniques applied for identification of conserved residues in proteins. The majority of these techniques has been only focused on the prediction of sequence and functionally conserved residues in proteins. Although few computational methods are available for the prediction of structurally conserved residues from protein structure, all methods require homologous structural information and structure based alignments, which still prove to be a bottleneck in protein structure comparison studies. In this work, we developed a neural network approach for identification of structurally important residues from a single protein structure without using homologous structural information and structural alignment.
Results: A neural network (NN) ensemble method that utilizes negative correlation learning (NCL) approach was developed for identification of structurally conserved residues (SCRs) in proteins using features that represent amino acid conservation and composition, physico-chemical properties and structural properties like solvent accessibility, secondary structures, hydrogen bonding and residue compactness. The NCL-NN algorithm was applied to 6042 structurally conserved residues that have been extracted from 131 protein superfamilies. This method obtained high prediction sensitivity (92.8%) and quality (Matthew’s correlation coefficient is 0.852) in identification of SCRs using single representative protein for each superfamily in the test dataset. Further benchmarking using 60 protein domains containing 1651 SCRs that were not part of the training and testing datasets shows that the NCL-NN method can correctly predict SCRs with ~90% sensitivity. These results suggest the use-fullness of NCL-NN approach for facilitating the identification of SCRs utilizing information derived from a single protein structure and its sequence homologues. Therefore, this method could be extremely effective in large-scale benchmarking studies where reliable structural homologues and alignments are limited.
Click here to Download SCR-NN package
Click here to Download features for the training, testing and benchmarking dataset