Manashree Rao

Data Scientist
The goal is to turn data into information, and information into insight

Hello ! My name is Manashree Rao, and I'm a second year Graduate student pursuing Masters in Data Science at Indiana University. I'm highly motivated, and I like to utilize my analytical and technical skills to solve complex, data-driven problems.


A glimpse of all the various projects I have undertaken or was involved in.

Allstate Claims Severitymore_vert
Allstate Claims Severityclose
  • Built a regression model using a weighted ensemble of boosted trees and neural networks to understand different risk factors and predict the cost of insurance claims.
  • Was ranked in top 5% of 3050+ teams post competition completion on private leader-board

Technologies Used: XGBoost, Keras, Tensorflow, scikit-learn, pandas

Analysis of Wikipediamore_vert
Analysis of Wikipediaclose
  • Analysis of English Wiki Dump, Pageviews data and real-time edit stream using Apache Spark, SparkSQL, Spark Streaming to identify patterns
  • Ansible scripts are used to deploy the system on an Openstack cluster.
  • A load balanced visualization framework built using sharded MongoDB, D3.JS, NodeJS and Nginx is used to display results.

Technologies Used: Apache Spark, SparkSQL, Spark Streaming, Sharded MongoDB, D3.js, NodeJS, Nginx, Ansible, OpenStack

Yelp Dataset Challengemore_vert
Yelp Dataset Challengeclose
  • Predicted categories of restaurants.
  • Identified the food items or services with high demand or ones that receive the most criticism from customers to find out which factors influence user’s choice of restaurants.
  • Identified influential factors for popular restaurants on a city-wide scale.

Technologies Used: StanfordNLP, Solr, MongoDB, Python, Apache Spark, MLLib

Santander Customer Satisfaction more_vert
Santander Customer Satisfaction close
  • Built a probability prediction model for analysing customer churn and determine what actions typically predict if a customer is satisfied or dissatisfied with their banking experience, which may lead to retention/loss of customers.
  • Model was engineered using various algorithms like K-Means, Boosted trees, Randomized Forests, t-SNE, etc. scored my team a rank in top 10% of private leader-board.

Technologies Used: Python, Theano, Scikit-Learn, Pandas, Numpy

SemEval 2017 - Semantic Textual Similaritymore_vert
Semantic Textual Similarityclose
  • Created an algorithm for computing the semantic similarity between two sentences leveraging lexical, syntactic and semantic features derived from the text snippets and achieved a mean Pearson Correlation score of 0.75.

Technologies Used: Python, Scikit-learn, Pandas, Word2vec, NLTK

MNIST Digit Classification using Caffe more_vert
MNIST Digit Classification using Caffeclose
  • Classified handwritten digits from the MNIST data using Convolutional Neural Networks on CPU and GPU with an average model accuracy of 98%.

Technologies Used: Caffe

Education & Work Experience

Masters in Data Science Indiana University, Bloomington, United States of America
  • Distributed Machine Learning
  • High Performance Computing
  • Machine Learning & Data Mining
  • Advanced Natural Language Processing & Sentiment Analysis
  • Big Data OSS & Projects
  • Information Retrieval
  • Data Science for Drug Discovery, Health and Translational Medicine
Bachelor of Eng. In Computer Science University of Pune, India
  • Data Structures and Algorithms
  • Computer Networks
  • Advanced Databases and Distributed Operating Systems
  • Software Architecture
  • Cloud Computing
Python MongoDB Spark Hadoop, Hive, Pig R Java SQL
Data Science Intern Philips Research North America
  • Worked on a research project which involves collection and pre-processing of data from digital and social media sources.
  • Developed a framework to generate features from image and combine this with textual information about image for image classification.
  • Utilized NLP techniques like topic modelling, word segmentation, etc. for feature engineering, classification and sentiment analysis and visualized results.
Developer Cyberinfrastructure for Network Science Center CNS
  • Developed web service for Cyberinfrastructure Shell which supports plug-and-play of datasets and algorithms and deployment of this service within Docker Containers.
  • Worked on Sci2, designed for temporal, geospatial, topical, and network analysis and visualization of scholarly datasets.
Software Engineer Persistent Systems
  • Developed a RESTful Integration system between Lithium and Salesforce CRM to analyze and communicate relevant information using case escalation, event subscription framework.
  • Implemented a cookies based SSO system between two cloud platforms and hosted the JAVA application on client's web server.
  • Actively involved in an iterative design and development cycle with regular client interaction for requirement gathering and status reporting.
ML libraries, Indexing & VERSION CONTROL
Scikit-learn, numpy Pandas Lucene/Solr Keras Caffe GIT, SVN JavaScript/jQuery HTML5 / CSS3

Contact Me

  Send Message