CE7412: Computational and Systems Biology


E-learning course for PhD students, Semester 2

Instructor: Professor Jagath Rajapakse




1.     Biological foundations (3 hrs.)

Basis of molecular biology, Gens, proteins, DNA, central dogma of biology, transcription, translation, gene structure, analyzing a genome: PCR, cloning, electrophoresis, gene expression, DNA microarray


2.     Probability and Information Theory (3 hrs.)

Sets and sequences; functions and spaces; probability theory; BayesŐ theorem; random variables; probability distributions; multidimensional density functions; information theory; sequence alignment by information minimization, mutual information (MI), characterization of splice sites with MI


3.     R Programming

Preliminaries, vectors, matrices, data input, control loops, functions, statistical models, graphics, bioconductor.


4.     Independent and Markov models of bio-sequences (3 hrs.)

Parameter estimation: ML, and MAP estimators; constrained optimization (Lagrange theory); prior models: maximum entropy principles, Gaussian priors, Dirichlet priors; die models of sequences given data/counts; dice models for pairs/ multiple sequences; random and match models and log-odds ratios for alignment; hypothesis testing; likelihood ratio tests; model selection


5.     Markov chains and random walks of sequences (3 hrs.)

Markov chains, Markov models of sequences, modeling CpG islands, modeling sequences with higher-order models, modeling repeats, random walks, BLAST


6.     Hidden Markov models and gene structure prediction (3 hrs.)

Definition of hidden Markov models (HMM); dice models of sequences; likelihood of sequences; forward algorithm; backward algorithm; Viterbi algorithm; posterior decoding; ML estimation of parameters; Expectation Minimization (EM) algorithm Baum-Welch algorithm; Baldi-Chauvin approach; gene structure prediction: VEIL; GENESCAN; profile HMM for multiple sequence alignment


7.     Neural networks and protein structure prediction (3 hrs.)

Biological and artificial neurons, perceptrons, feed-forward networks, backpropagation learning, protein secondary structure prediction, PHD method, Markov encoding, signals in genomic sequences, splice sites, transcription start site, and translation initiation site


8.     Support vector machines and protein feature prediction (3hrs.)

Discrete perceptron, separating hyperplanes, support vector classifier, support vector machines, penalty methods, support vector regression, multi-class SVM, protein solvent accessibility prediction, two-stage SVM approach, predicting solvent accessibility area


9.     Classification of gene expressions (3 hrs)

Microarrays, gene expression data, classification of gene expression data, ensemble methods, bagging and boosting, random forest,


10.  Clustering gene expressions and gene networks (3 hrs):

cluster analysis, K-means clustering, self-organizing feature maps, hierarchical clustering, feature selection, biclustering, gene regulatory network, Boolean networks, Bayesian networks.


11.  Transcription networks (3hrs.)

Cognition of the cell, transcription, transcription networks, binding a repressor to an inducer, Michealis-Menten equation, coorporativity of inducer binding, binding an activator to DNA site, Input function of a gene, dynamics of simple gene regulation, negative autoregulation


12.  Network motifs (6hrs.)

Random networks, detection of network motifs, autoregulation, the feed-forward loop, dynamics of coherent type-1 FFL, incoherent type-1 FFL, single-input module motif,


13.  Developmental networks and signal transduction networks (3 hrs)

Topological generalization of motifs, multi-output FFL motif, Bifans and dense-overlapping regulons, developmental transcription networks, two-node positive feedback loops, regulating feedback motif, transcriptional cascade, interlocked feedback loops, signal transduction networks, protein kinase perceptrons, neuronal network motifs





1.     Problems and solutions in biological sequence analysis, M. Borodovsky and S. Ekisheva, Cambridge University Press, 2006

2.     The elements of statistical learning: data mining, inference, and prediction, T. Hastie, R. Tibshirani, and J. Friedman, Springer, 2009

3.     An introduction to systems biology: design principles of biological circuits, U. Alon, Chapman & Hall/CRC, 2007





Take-home exams 60%

Projects 40%