In this course we will talk about sublinear algorithms, which has its roots in the study of Big Data that occur more and more frequently in various applications, e.g., analyses of financial transactions, internet traffic, social networks, genome sequences, etc. Concretely, we will talk about:
1. Sublinear space algorithms. In particular, data stream algorithms, namely, algorithms that solve a problem by making one pass over the data set while using small memory. These algorithms are important in many application areas such as databases and networking, where data arrives at a high speed and there is no time and/or need to store it for offline processing.
2. Sublinear time algorithms, that is, algorithms that do not even read the whole input when outputting the answers.
3. Sublinear communication algorithms. The data is stored in multiple machines, who want to jointly compute functions defined on the union of the data sets via communication.
4. Random topics: Topics VOTED by students.
Participants are expected to have a basic background in algorithm design and probability.
The evaluation will be based on a set of theoretical exercises and individual project/presentation. The list of questions will be handed out in the middle of the course.
Detailed list of topics is available in the course plan below.
Qin Zhang
Email: qzhangcs@indiana.edu
Office hours: By email appointment
9:30AM10:45AM Mon, Wed.
Informatics East, Room 122.
Week  Date  Section  Content  Literature  Slides  Comments 

1  Aug. 26  0. Introduction  New models for Big Data  slides  
1  Aug. 28  Interesting problems  slides  Basic probability tools  
2  Sep. 2  Labor day. No class  
2  Sep. 4  Basic probabilistic tools  slides  Read Chapter 3, 4 of [MU]  
3  Sep. 9  1. Sublinear in space  Distinct elements, FM sketch  [FM]  slides  Read Section 2 of this notes 
3  Sep. 11  Improvement on FM sketch  [BYJKST]  same  Read Section 3 of this notes  
4  Sep. 16  Heavy hitters, SpaceSaving  [MAA]  same  Guest lecture by Prof. Ergun  
4  Sep. 18  CountMin  [CM]  same  Guest lecture by Prof. Ergun  
5  Sep. 23  Linear sketch  same  Read this note  
5  Sep. 25  Countsketch  [CCF]  same  Read Section 4 of this notes  
6  Sep. 30  Alternative for L2 point query  [GKMS]  same  Read this note  
6  Oct. 2  2. Sublinear in comm.  Connectivity  [AGM1]  slides  Read [AGM1] for details 
7  Oct. 7  L0 sampling  [JST]  same  Read [JST] for details  
7  Oct. 9  Mincut, Bipartiteness, MST  [AGM1], [AGM2]  same  Read [AGM1], [AGM2] for details  
8  Oct. 14  Sparsification  [AGM2]  same  See this note  
8  Oct. 16  3. Sublinear in time  Average degree  [Fei]  slides  Read Section 3.1 in [CS] 
9  Oct. 21  Average degree (cont.)  [GR]  same  See this handwritten note by Ronitt Rubinfeld 

9  Oct. 23  Minimum spanning tree  [CRT]  same  See this note  
10  Oct. 28  4. Random topics VOTED by students 
Compressive sensing, intro  slides  
10  Oct. 30  Matching pursuit, L1/L1 recovery 
same  Read this note  
11  Nov. 4  RIP, L2/L1 recovery  same  Read this note  
11  Nov. 6  Subspace embedding, Lp regression 
[WZ]  slides  
12  Nov. 11  Sampling in distributed monitoring  [CMYZ]  slides  
12  Nov. 13  Lower bounds  slides  
13  Nov. 18  Lower bounds (cont.)  same  
13  Nov. 20  AMS sampling and entropy in data streams 
[CCM]  slides  
14  Nov. 25  Thanksgiving Break. No classes  
14  Nov. 27  Thanksgiving Break. No classes  
15  Dec. 2  5. Homework discussion  
15  Dec. 4  6. Student presentations  Chen: [Space] Trianglecounting Dawoud: [Communication] Distributed monitoring entropy Firoz: [Space] Geometric problems 

16  Dec. 9  Jazayeri: [Time] Average degree Kanakamedala: [Space] Frequent items Narayanan: [Time] Matching 

16  Dec. 11  Taduri: [Space] Frequent items Undrakonda: [Space] Hierarchical heavy hitters 
For assignments, students may discuss answers with anyone, including problem approaches and proofs. But all students must write their own proofs, and writeups. The names of all people that you have talked to should be listed at the beginning of the first page. If a solution comes from existing papers/web/books, they must be properly cited, and you must write the solution in a way that demonstrates your understanding (simply copying the solution will be considered as plagiarism). All deadlines are firm. No late assignments will be accepted unless there are legitimate circumstances.
For projects, you may discuss your project with anyone as well, but if this contributes to your final product, they must be acknowledged. Any outside materials used must be referenced appropriately.
For more details, see Indiana University Code of Student Rights, Responsibilities, and Conduct.