CE7412: Computational and Systems Biology

 

E-learning course for PhD students and MEng students

Offered in Semester 2

Instructor: Professor Jagath Rajapakse

 

SYLLABUS

 

1.     Biological foundations

Basis of molecular biology, Gens, proteins, DNA, central dogma of biology, transcription, translation, gene structure, analyzing a genome: PCR, cloning, electrophoresis, gene expression, DNA microarray

 

2.     Probability and information theory

Sets and sequences; functions and spaces; probability theory; Bayes’ theorem; random variables; probability distributions; multidimensional density functions; information theory; sequence alignment by information minimization, mutual information (MI), characterization of splice sites with MI

 

3.     R Programming

Preliminaries, vectors, matrices, data input, control loops, functions, statistical models, graphics, bioconductor.

 

4.     Independent models of bio-sequences

Parameter estimation: ML, and MAP estimators; constrained optimization (Lagrange theory); prior models: maximum entropy principles, Gaussian priors, Dirichlet priors; die models of sequences given data/counts; dice models for pairs/ multiple sequences; random and match models and log-odds ratios for alignment; hypothesis testing; likelihood ratio tests; model selection

 

5.     Markov chains and random walks of sequences

Markov chains, Markov models of sequences, modeling CpG islands, modeling sequences with higher-order models, modeling repeats, random walks, BLAST

 

6.     Hidden Markov models and gene structure prediction

Definition of hidden Markov models (HMM); dice models of sequences; likelihood of sequences; forward algorithm; backward algorithm; Viterbi algorithm; posterior decoding; ML estimation of parameters; Expectation Minimization (EM) algorithm Baum-Welch algorithm; Baldi-Chauvin approach; gene structure prediction: VEIL; GENESCAN; profile HMM for multiple sequence alignment

 

7.     Neural networks and genomic signal prediction

Biological and artificial neurons, perceptrons, feed-forward networks, backpropagation learning, protein secondary structure prediction, PHD method, Markov encoding, signals in genomic sequences, splice sites, transcription start site, and translation initiation site

 

8.     Support vector machines and protein feature prediction

Discrete perceptron, separating hyperplanes, support vector classifier, support vector machines, penalty methods, support vector regression, multi-class SVM, protein solvent accessibility prediction, two-stage SVM approach, predicting solvent accessibility area

 

9.     Classification of gene expressions

Microarrays, gene expression data, classification of gene expression data, decision trees, ensemble methods, bagging and boosting, random forest,

 

10.  Clustering gene expressions and gene networks

cluster analysis, K-means clustering, self-organizing feature maps, hierarchical clustering, feature selection, biclustering, gene regulatory network, Boolean networks, Bayesian networks.

 

11.  Transcription networks

Cognition of the cell, transcription, transcription networks, binding a repressor to an inducer, Michealis-Menten equation, coorporativity of inducer binding, binding an activator to DNA site, Input function of a gene, dynamics of simple gene regulation, negative autoregulation

 

12.  Network motifs

Random networks, detection of network motifs, autoregulation, the feed-forward loop, dynamics of coherent type-1 FFL, incoherent type-1 FFL, single-input module motif,

 

13.  Developmental networks and signal transduction networks

Topological generalization of motifs, multi-output FFL motif, Bifans and dense-overlapping regulons, developmental transcription networks, two-node positive feedback loops, regulating feedback motif, transcriptional cascade, interlocked feedback loops, signal transduction networks, protein kinase perceptrons, neuronal network motifs

 

 

REFERENCE TEXTS

 

1.     R Bioinformatics Cookbook, Dan Maclean, Packt, 2019

2.     Modern statistics for modern biology, Susan Homes and Wolfgang Huber, Cambridge, 2019

3.     The elements of statistical learning: data mining, inference, and prediction, T. Hastie, R. Tibshirani, and J. Friedman, Springer, 2009, Second Edition

4.     An introduction to systems biology: design principles of biological circuits, U. Alon, Chapman & Hall/CRC, 2007

5.     Biological sequence analysis: Probabilistic models of proteins and nucleic acids, R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Cambridge University Press, 1998

 

 

ASSESSMENT

 

     Two Assignments 50% (Individual)

     Project 50% (the project is to be proposed and executed by groups up to three)