I123 Data Fluency


Description: Data is big everywhere. This class provides the fundamental skills of the 21st century - understanding data, extracting knowledge, generating predictions and visualizing the results.

Instructor: Sriraam Natarajan, Informatics East, 257. natarasr@indiana.edu

Class Timings: Mondays and Wednesdays 5:45 PM - 7:00 PM Ballantine Hall (BH) Room 215

Office Hours: Mondays and Wednesdays 3:00 PM - 4:00 PM Informatics East, 257

AI: Mayukh Das, maydas@indiana.edu. Office hours: Tuesdays and Thursday 4:00 - 5:30 PM. Informatics Connector 2nd Floor.

Information: There is no doubt that we operate in a world of data. We, ourselves are not responsible for only creating the data, but also securing, managing, and deriving actionable intelligence from data. This course will provide the necessary tools to access, create, manipulate, analyze and visualize the data. The intent of the course is to use real data sets from a wide variety of disciplines including health care, business, humanities, demographics etc. to make the learning of data fluency contextual for the student. Throughout the course, a variety of computational tools will be used by the students to discover patterns and solve problems. Students will be informed of the underlying theories behind the techniques to better understand the tools and be equipped for future problem solving based on extensions of the key learning outcomes of the course.

Textbook: Class slides will be posted. I highly recommend the book - Doing Data Science (Rachel Schutt & Cathy O'Neil) O'Reilly 2013.

Learning Outcomes: At the end of the class, students will have satisfactorily demonstrated that they can:

  1. Describe the nature of the data, how it is structured, stored and accessed – relational models and tables
  2. Derive information from data and support conclusions or recommendations based on evidence existing in the data – data mining and predictive modeling.
  3. Analyze and present the data - visualization

Assessment: Exams (40%), Final Exam (20%), quizzes (5%), mini-project (10%) and Data analysis homeworks (25%)

Topics Covered

  1. What is Data Science?
  2. What is Data?
  3. Exploratory Data Analysis
  4. Mining the Data
  5. Big data
  6. Successful Applications
  7. Into the future

Tentative Schedule

Date Topic
1/12 0th quiz - Introduction to Data Fluency
1/14 What is data? Definition and collection
1/19 No class
1/21 ER Models and Introduction to functions(homework 1 given)
1/26 Sampling (homework 1 due)
1/28 1st quiz and Using Excel - Guest Lecture (homework 2 given)
2/2 Sampling and Big Data
2/4 Visualization (homework 2 due)
2/9 Introduction to Probability
2/11 Probability Theory (continued) (homework 3 given)
2/16 Introduction to Machine Learning
2/18 Machine Learning (continued)
2/23 Mid-term Study Guide (homework 3 due)
2/25 1st Mid-term
3/2 Introduction to Weka for data analysis
3/4 Naive Bayes Classifier
3/9 Naive Bayes Continued, Intro to decision trees (homework 4 given)
3/11 Decision trees continued
3/16 & 3/18 Spring break
3/23 No class (Professor stranded in Chicago)
3/25 Decision Trees completed, Introduction to Evaluation (homework 4 due) (homework 5 given)
3/30 Evaluation & Overfitting
4/1 Supervised learning Review (2nd midterm review) ( Homework 5 due)
4/6 2nd Mid-term
4/8 K-nearest neighbors
4/13 Clustering
4/15 Machine Learning Wrap up ( Homework 6 given )
4/20 & 4/22 Student Presentations ( Projects and Homework 6 due on 4/22 )
4/27 Guest Lecture by Dr. Burr Settles from Duolingo
4/29 Final Mid-term