Manashree RaoData Scientist
Hello ! My name is Manashree Rao, and I'm a second year Graduate student pursuing Masters in Data Science at Indiana University. I'm highly motivated, and I like to utilize my analytical and technical skills to solve complex, data-driven problems.
A glimpse of all the various projects I have undertaken or was involved in.
- Built a regression model using a weighted ensemble of boosted trees and neural networks to understand different risk factors and predict the cost of insurance claims.
- Was ranked in top 5% of 3050+ teams post competition completion on private leader-board
Technologies Used: XGBoost, Keras, Tensorflow, scikit-learn, pandas
- Analysis of English Wiki Dump, Pageviews data and real-time edit stream using Apache Spark, SparkSQL, Spark Streaming to identify patterns
- Ansible scripts are used to deploy the system on an Openstack cluster.
- A load balanced visualization framework built using sharded MongoDB, D3.JS, NodeJS and Nginx is used to display results.
Technologies Used: Apache Spark, SparkSQL, Spark Streaming, Sharded MongoDB, D3.js, NodeJS, Nginx, Ansible, OpenStack
- Predicted categories of restaurants.
- Identified the food items or services with high demand or ones that receive the most criticism from customers to find out which factors influence user’s choice of restaurants.
- Identified influential factors for popular restaurants on a city-wide scale.
Technologies Used: StanfordNLP, Solr, MongoDB, Python, Apache Spark, MLLib
- Built a probability prediction model for analysing customer churn and determine what actions typically predict if a customer is satisfied or dissatisfied with their banking experience, which may lead to retention/loss of customers.
- Model was engineered using various algorithms like K-Means, Boosted trees, Randomized Forests, t-SNE, etc. scored my team a rank in top 10% of private leader-board.
Technologies Used: Python, Theano, Scikit-Learn, Pandas, Numpy
- Created an algorithm for computing the semantic similarity between two sentences leveraging lexical, syntactic and semantic features derived from the text snippets and achieved a mean Pearson Correlation score of 0.75.
Technologies Used: Python, Scikit-learn, Pandas, Word2vec, NLTK
- Classified handwritten digits from the MNIST data using Convolutional Neural Networks on CPU and GPU with an average model accuracy of 98%.
Technologies Used: Caffe
Education & Work Experience
- Distributed Machine Learning
- High Performance Computing
- Machine Learning & Data Mining
- Advanced Natural Language Processing & Sentiment Analysis
- Big Data OSS & Projects
- Information Retrieval
- Data Science for Drug Discovery, Health and Translational Medicine
- Data Structures and Algorithms
- Computer Networks
- Advanced Databases and Distributed Operating Systems
- Software Architecture
- Cloud Computing
- Worked on a research project which involves collection and pre-processing of data from digital and social media sources.
- Developed a framework to generate features from image and combine this with textual information about image for image classification.
- Utilized NLP techniques like topic modelling, word segmentation, etc. for feature engineering, classification and sentiment analysis and visualized results.
- Developed web service for Cyberinfrastructure Shell which supports plug-and-play of datasets and algorithms and deployment of this service within Docker Containers.
- Worked on Sci2, designed for temporal, geospatial, topical, and network analysis and visualization of scholarly datasets.
- Developed a RESTful Integration system between Lithium and Salesforce CRM to analyze and communicate relevant information using case escalation, event subscription framework.
- Implemented a cookies based SSO system between two cloud platforms and hosted the JAVA application on client's web server.
- Actively involved in an iterative design and development cycle with regular client interaction for requirement gathering and status reporting.