Machine Learning in Bioinformatics (I529)/Topics in Artificial Intelligence (B659)
Instructor: Yuzhen Ye
Association instructor: Moses Stamboulian (mstambou@indiana.edu)
Class meets: MW 2:30P-3:20PM I232 (Informatics, 232)
Lab meets: F 2:30-3:20PM (Informatics West, I109)
Office hours: Tuesdays 2-3:30 (Yuzhen, Luddy Hall 2046).
Syllabus
- Prerequisites
I519 or equivalent knowledge in bioinformatics.
This course is designed for the advanced level bioinformatics graduate students after they take I519 (so the students at least know the SW algorithm!). Graduate students with biology, computer science, or data science background who are interested in bioinformatics applications are also welcome to take this course.
- Description
Machine learning techniques have been successful in analyzing biological data because of their capabilities
in handling randomness and uncertainty of data noise and in generalization. In this class, we will learn about the classic machine learning techniques, such as Naive bayes, decision tree, random forest, neural network using biological problems. We will also learn about recent developments and applications of deep networks and their successful applications to solve some of the hard biological problems. Finally we will learn about probabilistic models, including Markov models, Hidden Markov models, and Bayesian networks) for biological sequence analysis and systems biology.
- Learning goals/outcomes
- For biology/bioinformatics students: understand the computational formulation of biological problems; understand the algorithms behind the commonly used machine learning approaches to solve biological problems.
- For students with computational background: understand and appreciate biological problems that can be solved using ML approaches.
- For all students: learn about the ML approaches that are unique to bioinformatics and their applications; implement simple ML approaches; and be able to use ML approaches implemented in R & python packages (e.g., scikit-learn) to solve real problems.
- Programming language
Python and C (or C++) are the languages we choose for this course (you are welcome to use either one or both). You will also need to know/learn R, which is good for statistical computing, and plotting.
- Assignments
We will have 4 take-home assignments and 1 class project. All your work need to be submitted to IU GitHub (NOT canvas).
- Grading
Combined assignments (35%), Final exam (20%), Paper presentation (by group) (10%) Group project (35%), Attendance will be considered in borderline cases.
- Textbook/References
- Textbook: Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison, Biological Sequence Analysis:
Probabilistic Models of Proteins and Nucleic Acids , Cambridge University Press, 1999, (BSA) (available at Amazon)
- Online book: Neural Networks and Deep Learning
- Course webpage:
http://homes.soic.indiana.edu/classes/spring2019/info/i529-yye/index.php