**CE7412: Computational and Systems Biology**

E-learning
course for PhD students, Semester 2

Instructor:
Professor Jagath Rajapakse

**SYLLABUS**

1.
Biological
foundations (3 hrs.)

Basis of molecular biology, Gens, proteins, DNA,
central dogma of biology, transcription, translation, gene structure, analyzing
a genome: PCR, cloning, electrophoresis, gene expression, DNA microarray

2.
Probability
and Information Theory (3 hrs.)

Sets and sequences; functions and spaces;
probability theory; BayesŐ theorem; random variables; probability
distributions; multidimensional density functions; information theory; sequence
alignment by information minimization, mutual information (MI),
characterization of splice sites with MI

3.
R
Programming

Preliminaries, vectors, matrices, data input,
control loops, functions, statistical models, graphics, bioconductor.

4.
Independent
and Markov models of bio-sequences (3 hrs.)

Parameter
estimation: ML, and MAP estimators; constrained optimization (Lagrange theory);
prior models: maximum entropy principles, Gaussian priors, Dirichlet
priors; die models of sequences given data/counts; dice models for pairs/
multiple sequences; random and match models and log-odds ratios for alignment;
hypothesis testing; likelihood ratio tests; model selection

5. Markov chains and random walks of
sequences (3 hrs.)

Markov chains,
Markov models of sequences, modeling CpG islands,
modeling sequences with higher-order models, modeling repeats, random walks,
BLAST

6.
Hidden
Markov models and gene structure prediction (3 hrs.)

Definition of
hidden Markov models (HMM); dice models of sequences; likelihood of sequences;
forward algorithm; backward algorithm; Viterbi algorithm; posterior decoding;
ML estimation of parameters; Expectation Minimization (EM) algorithm Baum-Welch
algorithm; Baldi-Chauvin approach; gene structure
prediction: VEIL; GENESCAN; profile HMM for multiple sequence alignment

7. Neural networks and protein structure
prediction (3 hrs.)

Biological
and artificial neurons, perceptrons, feed-forward
networks, backpropagation learning, protein secondary
structure prediction, PHD method, Markov encoding, signals in genomic
sequences, splice sites, transcription start site, and translation initiation
site

8.
Support
vector machines and protein feature prediction (3hrs.)

Discrete perceptron, separating hyperplanes, support vector classifier, support vector
machines, penalty methods, support vector regression, multi-class SVM, protein
solvent accessibility prediction, two-stage SVM approach, predicting solvent
accessibility area

9.
Classification
of gene expressions (3 hrs)

Microarrays, gene expression data, classification
of gene expression data, ensemble methods, bagging and boosting, random forest,

10. Clustering gene expressions and gene
networks (3 hrs):

cluster analysis, K-means clustering, self-organizing
feature maps, hierarchical clustering, feature selection, biclustering,
gene regulatory network, Boolean networks, Bayesian networks.

11. Transcription networks (3hrs.)

Cognition of the cell, transcription,
transcription networks, binding a repressor to an inducer, Michealis-Menten
equation, coorporativity of inducer binding, binding
an activator to DNA site, Input function of a gene, dynamics of simple gene
regulation, negative autoregulation

12. Network motifs (6hrs.)

Random networks, detection of network motifs, autoregulation, the feed-forward loop, dynamics of coherent
type-1 FFL, incoherent type-1 FFL, single-input module motif,

13. Developmental networks and signal
transduction networks (3 hrs)

Topological generalization of motifs, multi-output
FFL motif, Bifans and dense-overlapping regulons, developmental transcription networks, two-node
positive feedback loops, regulating feedback motif, transcriptional cascade,
interlocked feedback loops, signal transduction networks, protein kinase perceptrons, neuronal network motifs

**TEXTS**

1.
Problems
and solutions in biological sequence analysis, M. Borodovsky
and S. Ekisheva, Cambridge University Press, 2006

2.
The
elements of statistical learning: data mining, inference, and prediction, T.
Hastie, R. Tibshirani, and J. Friedman, Springer,
2009

3.
An
introduction to systems biology: design principles of biological circuits, U. Alon, Chapman & Hall/CRC, 2007

**ASSESSMENT**

Take-home
exams 60%

Projects
40%