I529, Spring 2013

Instructor: Yuzhen Ye, School of Informatics and Computing, Indiana University, Bloomington

Association instructor: Abhinav Mathur (abhimath@indiana.edu)

Contact: yye@indiana.edu

Class meets: T/Th 9:05-9:55AM (Informatics East, I122)

Lab meets: F 3:35-4:25PM (Informatics West, I109)

Office hours: Yuzhen(Weds 11AM-12PM, Lindley 301G); Abhinav (Mons 1:45AM-12:45PM, Info West 001) (or by appointment)

**Prerequisites**I519 or equivalent knowledge in bioinformatics.

This course is designed for the advanced level bioinformatics graduate students after they take I519 (so the students at least know the SW algorithm!). Graduate students with either biology or physical/computer science backgrounds who are interested in bioinformatics applications in molecular biology are also welcome to take this course.

**Description**Machine learning techniques have been successful in analyzing biological data because of their capabilities in handling randomness and uncertainty of data noise and in generalization. In this class, we will learn basics about probabilistic models and machine learning techniques. We will focus on probabilistic models (Markov models, Hidden Markov models, and Bayesian networks) for biological sequence analysis and systems biology. Other machine learning techniques, such as Naive bayes, neural networks and SVMs will only be covered briefly.**Programming language**Python and C (or C++) are the languages we choose for this course (you are welcome to use either one or both). You will also need to know/learn R, which is good for statistical computing, and plotting.**Textbook/References****Required textbook**:

- Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , Cambridge University Press, 1999, (BSA) (available at Amazon (Note: Some of the topics from the course can not be found in this book. We will distribute complementary lecture notes and reading materials along the course for these topics.)

**Course webpage**

http://mendel.informatics.indiana.edu/~yye/lab/teaching/spring2013-I529.php

**Assignments**We will have 5 take-home assignments and 1 class project.**Grading**Combined assignments (30%), One mid-term exam (25%), Final exam (25%), Class Project (20%), Attendance will be considered in borderline cases.**Primer articles**- What is a hidden Markov model? (Primer 1)
- What is the expectation maximization algorithm? (Primer 2)
- How does eukaryotic gene prediction work? (Primer 3)
- How does DNA sequence motif discovery work? (Primer 4)
- Inference in Bayesian networks | A primer on learning in Bayesian networks for computational biology (Primer 5 & 6)
- What are decision trees? | What are artificial neural networks? | What is a support vector machine? (Primer 7 & 8 & 9)
- MCMC: Does it work?

| Understanding the Metropolis-Hastings algorithm | Explaining the Gibbs Sampler (Primer 10 & 11 & 12) - Topics & schedule

Week Date Topics Slides Readings Week 1 Jan 8 (T) Overview of I529 & Knowing your data slides (handout)

BSA 1.1 - 1.2

BSA 11Jan 10 (Th) Probabilistic modeling slides (handout)

Jan 11 (F) Lab 1: Computing resources at IU link Week 2 Jan 15 (T) Frequency and profiles slides (handout)

BSA 1.1 - 1.2 Jan 17 (Th) Frequency and profiles (Cont.) Jan 18 (F) Lab 2: Using R link Week 3 Jan 22 (T) Markov chains slides (handout)

BSA Chapter 4 Jan 24 (Th) Markov chains Jan 25 (F) Lab 3: DIY (MEME & WebLogo) Week 4 Jan 29 (T) Markov chains--variants Primer 1

BSA Chapter 4Jan 31 (Th) Hidden Markov models: overview slides (handout)

Feb 1 (F) Lab 4: group discussion on HW1 Week 5 Feb 5 (T) Hidden Markov models: Viterbi algorithm Feb 7 (Th) Hidden Markov models: forward & backward algorithms Feb 8 (F) Lab 5: $1M challenge Challenge overview; more details Week 6 Feb 12 (T) Generalized HMM Feb 14 (Th) HMM: parameter estimation slides (handout)

Feb 15 (F) Lab 6: $1M challenge Week 7 Feb 19 (T) HMM: parameter estimation Feb 21 (Th) HMM applications in epigenomics slides (handout)

A presentation on epigenomics Feb 22 (F) Lab 7: $1M challenge Week 8 Feb 26 (T) Profile HMM slides (handout)

BSA Chapter 5

Protein familiesFeb 28 (Th) Profile HMM (Cont.) March 1 (F) Lab: group discussion on HW2 Week 9 March 5 (T) Review March 7 (Th) Midterm exam March 8 (F) no lab Week 10 Spring break; no classes Week 11 March 19 (T) Phylo-HMM slides (handout)

March 21 (Th) EM algorithm slides (handout)

Primer 2 & 4 March 22 (F) Week 12 March 26 (T) EM algorithm (cont.) March 28 (Th) MCMC slides (handout)

Primer 10 & 11 & 12 March 29 (F) Lab: Using hmmer link Week 13 April 2 (T) MCMC (Cont.) April 4 (Th) Bayes Classifier slides (handout)

April 5 (F) Week 14 April 9 (T) No class BigRed II workshop April 11 (Th) No class April 12 (F) Lab: group discussion on HW3 Week 15 April 16 (T) Bayesian network slides (handout)

April 18 (Th) Bayesian network (Cont.) April 19 (F) Lab: ML practices link Week 16 April 23 (T) Module network slides (handout)

April 25 (Th) Project presentation April 26 (F) Project presentation