Predicting missing and spurious links and labels of protein-interaction networks

Funded by MOE Tier-2 grant S$482,000 from 01/2017 – 12/2019


Protein-protein interaction networks (PIN) are currently among the mostly available and studied molecular interaction networks. Nodes of PIN are proteins and an edge (link) represents a physical interaction between two proteins. PIN are widely used to investigate cellular functions and disease mechanisms: for example, for prediction of protein function, biological and disease pathways, drug targets, protein complexes, gene ontologies, etc.


Despite their importance and widespread use, because of experimental limitations and lack of quality control in small-scale studies, PINs are of limited coverage and insufficient accuracy to make reliable interpretations. Functional annotations or assigning labels to proteins is usually done by relating proteins to the terms of Gene Ontology (GO). Due to evolving GO terms and inconsistencies in manual curation, labels of proteins are incomplete and known only partially. Missing and spurious links and labels pose challenges to every application that uses PIN and undermine the power of techniques used in their analysis and applications. The aim of this project is to develop computational algorithms and tools to predict missing and spurious links and labels of PIN.


We hypothesize that molecular interaction networks preserve the nested hierarchical modularity of cellular functions. Based on this, we will derive the hierarchical modular architecture of PIN by using random block models. GO provides an organized hierarchy of protein function in terms of molecular function, biological process, and cellular component. By using GO, we will build a hierarchy of ontology terms for the PIN. We will then map the hierarchies of PIN and GO terms by matching each term in GO hierarchy to at most one module in the PIN hierarchy while identifying conflicting terms. By aligning the hierarchies of PIN and GO terms, we will be able to find missing and spurious labels of GO terms and PIN modules. By combining these algorithms, we will develop a computational efficient technique to predict missing and spurious links and labels of PIN.


We will use text-mining techniques and tools to validate missing and spurious links and labels. We will recognize sentences indicating an interaction of a missing/spurious link in a corpus of abstracts from PubMed database. Missing labels will be inferred by overrepresented GO annotations and propagating known labels of parent and child nodes.  The software developed in this project will be made publicly available as app in the popular network analysis tool Cytoscape.


We will perform extensive experiments on yeast and human PIN to evaluate and validate our algorithms. We will demonstrate three applications of our methods: (i) predicting biological pathways, (ii) prioritizing disease targets, and (iii) detecting hierarchies of protein complexes. In particular, because of interest to our collaborators, we will demonstrate these applications on two skin diseases: psoriasis and leprosy. We also demonstrate how other types of data, such as gene expression data, can be integrated into our methods.




Properties and interactions of functional modules of the human brain across the life span

Funded by MOE Tier 1 grant $100,000 from 09/2015 – 04/2017


The human brain is organized into functional modules consisting of different brain regions working together to achieve a specific brain function. Human behavior and cognitive functions undergo changes over the life span. We hypothesize that the variations of brain functional modules underlie the neural mechanisms of cognitive and behavioral changes in life.


By developing novel analytics for images gathered in resting-state and task-evoked functional MRI experiments, we aim to find the variations of the properties and interactions of functional modules of the brain across the life span. We will develop novel algorithms  (i) to parcellate the human brain into functional modules, (ii) to quantify connectional properties and interactions of brain functional modules, and (iii) to determine the variations of properties of functional modules across the life span. We will employ community detection algorithms to cluster functional regions of interests into modules. The interactions of functional modules will be modeled with dynamic Bayesian networks and psychophysiological interactions. We will identify the functional modules that vary significantly with age and associate them to the behavioral and cognitive changes associated with age.


Understanding the modular organization and functional modules across the life span is important to understand brain development and maturation, and human aging. The techniques of quantifying connectional and interaction properties of functional modules of the brain can be used to compare fMRI images of healthy subjects and patients.