Shuo Yang

I am a final year PhD student under Prof. Sriraam Natarajan at Indiana University, School of Informatics and Computing. My research focuses on machine learning and its application in medical domains, including advice-based learning, cost-sensitive learning, dynamic probabilistic models, continuous-time probabilistic logic models and statistical relatonal learning in hybrid domains.

paradiso banner


Shuo Yang, Tushar Khot, Kristian Kersting and Sriraam Natarajan, Learning Continuous-Time Bayesian Networks in Relational Domains: A Non-Parametric Approach , 30th AAAI Conference on Artificial Intelligence (AAAI), 2016. Code

Haley MacLeod, Shuo Yang, Kim Oakes, Kay Connelly and Sriraam Natarajan, Identifying Rare Diseases from Behavioural Data:A Machine Learning Approach , First IEEE Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2016.

Shuo Yang, Kristian Kersting, Greg Terry, Jeffrey Carr and Sriraam Natarajan,Modeling Coronary Artery Calcification Levels From Behavioral Data in a Clinical Study , Artificial Intelligence in Medicine (AIME), 2015.

Shuo Yang, Tushar Khot, Kristian Kersting, Gautam Kunapuli, Kris Hauser and Sriraam Natarajan, Learning from Imbalanced Data in Relational Domains: A Soft Margin Approach , International Conference on Data Mining (ICDM), 2014. Code

Shuo Yang and Sriraam Natarajan, Knowledge Intensive Learning: Combining Qualitative Constraints with Causal Independence for Parameter Learning in Probabilistic Models, European Conference on Machine Learning, (ECMLPKDD) 2013.

Shuo Yang and Desong Bian, Automatic Detection of T-wave End in ECG Signals, International Symposium on Intelligent Information Technology Application 2008.

Shuo Yang and Desong Bian, Automatic Detection of QRS Onset in ECG Signals, IEEE International Symposium on IT in Medicine and Education 2008.


Knowledge-Intensive Learning

In many domains where there are considerable amount of factors influencing the target variable, the dimension of the parameter space for probabilistic models is exponential in the number of variables, which would require significant amount of training samples to guarantee a reasonable prediction accuracy. For this project, we proposed a way to incorporate the domain knowledge on the independence of causal influence and qualitative constraints which greatly improves the prediction performance by reducing the dimension of feature space as well as constraining the searching space.


Cost-Sensitive Learning

In this project, we consider the problem of incorporating the domain knowledge on different weights of positive samples and negative samples. One of the motivations is the class-imbalance situation in many relational domains where the classifier boundary could be easily dominated by the majority class and overfitting on its outliers. Hence, it is essential to steer the training process toward focusing more on the minority class by assigning different costs on false positive and false negative samples. Besides the requirement enforced by such data properties, there are also practical demands in certain domains, such as the diagnosis problem in medical domains, the quality checking in manufacturing data, the recommendation prediction in recommender systems, etc.


Sequence Data Mining

In most realistic domains, the variables transit between its possible states over time. The data is generated by the dynamic processes with multiple observations at different time points. Dynamic models are needed for modeling such transition intensities over time.


Office Location

919 E 13th St

Bloomington, IN 47408


shuoyang (at)

Social links