(Computer) Vision for Intelligent Robotics, Fall 2017

Course number: Info I590 / CS B659

Meets: Tuesday/Thursday 4:00-5:15pm

Location: BH 233

Website: http://homes.soic.indiana.edu/classes/fall2017/csci/b659-mryoo/

Instructor: Prof. Michael S. Ryoo

Email: mryoo "at" indiana.edu

Office: Informatics E259

Office hours: Friday 2-3pm

Course description:

In this graduate seminar course, we will review and discuss state-of-the-art computer vision methodologies as well as their applications to robots. Specific topics will include object recognition, activity recognition, deep learning for both images and videos, and first-person vision for wearable devices and robots. The objective of the course is to understand important problems in computer vision and intelligent robotics, discuss advantages and disadvantages of existing approaches, and identify open questions and future research directions.

Prerequisites:

Interest in computer vision; basic programming skills; ability to read and understand conference papers. This course will focus on deep learning techniques and their robotics applications, which will extend topics covered in other computer vision courses including B490/B659. Any previous experience in computer vision, machine learning, and robot vision will be a plus.

Please talk to me if you are unsure if the course is a good match for your background.

(tentative) Schedule:

Date

Description

Papers

Presenters

8/22

Course introduction
Research overview and general background

Ryoo

1. Understanding images and videos

8/29

8/31

9/5

Image features and object classification

[1]

[2]

[3,4]

Ryoo

Naha

Dhody

9/7

9/12

9/14

Object detection/segmentation

[5]

[6]

[7,8]

Ryoo

Ryoo

Boolchandani

9/19

9/21

9/26

Action recognition from videos

[9]

[10]

[11]

Ryoo

Guo

Khamkar

9/28

10/3

10/5

10/10

10/12

More deep learning models

Guest

[12]

[13,14]

[15,17]

[16]

Guest Lecture

Karmazyn Raz

Mishra

Siddarth

Rane

2. Robot learning

10/17

Robot perception: first-person recognition

[18,19]

Ryoo

10/19

10/24

10/26

10/31

11/2

Deep reinforcement learning

[20]

[21]

[22,23]

[24]

[25]

Naha

Mishra

Rane

Khamkar

Naha

11/7

11/9

11/14

11/16

Deep learning for robot action

[26]

[27]

[28]

[29]

Karmazyn Raz

Dhody

Ryoo

Boolchandani

Week14

No class - Thanksgiving

11/28

11/30

Deep learning for robot action (cont’d)

[30]

[31]

Siddarth

Guo

12/5

12/7

Final project presentations

Course requirements and grading:

Paper/experiment presentations (30%): each student is expected to provide ~2 presentations throughout the course. A student may choose to provide either (1) paper presentation or (2) experiment presentation (i.e., presenting the results obtained by testing the method's code on existing datasets) for their presentations.

Paper review and class participation (20%): the students are required to choose a paper per class and submit its short review before the class.

Final project (50%): each student will choose his/her individual research topic and do research. This can be as simple as implementing several previous methods and comparing them, and can be as serious as proposing new concepts and algorithms, implementing them, and evaluating them with public datasets to advance the state-of-the-arts.

References

  1. S. Lazebnik, C. Schmid, and J. Ponce, Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories, 2006.
  2. A. Krizhevsky, I. Sutskever, and G. Hinton, Imagenet Classification with Deep Convolutional Neural Networks, 2012.
  3. He et al., Deep Residual Learning for Image Recognition, 2015.
  4. Huang et al., Densely Connected Convolutional Networks, 2016.
  5. P. Felzenszwalb,  D.  McAllester and D. Ramanan, A Discriminatively Trained, Multiscale, Deformable Part Model, 2008.
  6. R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. CVPR 2014.
  7. J. Long, E. Shelhamer, T. Darrell, Fully Convolutional Networks for Semantic Segmentation, 2015.
  8. Liu et al., SSD: Single Shot MultiBox Detector, 2015.
  9. P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, Behavior Recognition via Sparse Spatio-Temporal Features, 2005.
  10. D. Tran et al., Learning Spatiotemporal Features with 3D Convolutional Networks, 2014.
  11. J. Ng et al., Beyond Short Snippets: Deep Networks for Video Classification, 2015.
  12. A. Nguyen et al., Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, 2015.
  13. S. Bell and K. Bala, Learning visual similarity for product design with convolutional neural networks, 2015.
  14. C. Fan, J. Lee, M. Xu, K. K. Singh, Y. J. Lee, D. J. Crandall, and M. S. Ryoo, Identifying First-person Camera Wearers in Third-person Videos, 2017.
  15. K. Gregor, I. Danihelka, A. Graves, D. Jimenez Rezende, and D. Wierstra, DRAW: A Recurrent Neural Network For Image Generation, 2015.
  16. S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei, End-to-end Learning of Action Detection from Frame Glimpses in Videos, 2016.
  17. A. Piergiovanni, C. Fan, and M. S. Ryoo, Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters, 2016.
  18. M. S. Ryoo et al., Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me?, 2015.
  19. Y. Yang et al., Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web, 2015.
  20. Mnih et al., Playing Atari with Deep Reinforcement Learning, 2013.
  21. Nagabandi et al., Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning, 2017
  22. Lillicrap et al., Continuous control with deep reinforcement learning, 2015.
  23. Mnih et al., Asynchronous Methods for Deep Reinforcement Learning, 2016
  24. Baram et al., End-to-End Differentiable Adversarial Imitation Learning, 2017.
  25. Vezhnevets et al., Feudal networks for hierarchical reinforcement learning, 2017.
  26. Levine et al., Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection, 2016.
  27. Finn and Levine, Deep Visual Foresight for Planning Robot Motion, 2016.
  28. Lee and Ryoo, Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression, 2017.
  29. Sermanet et al., Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation, 2017.
  30. Gupta et al., Cognitive mapping and planning for visual navigation, 2017.
  31. Stadie et al., Third-person imitation learning, 2017.