Center for Computational Health - overview

Research at the Interface of Data Science and Health

We pursue research in the application of data science to healthcare across the entire continuum from the health of individuals, to that of populations, to the healthcare system itself.

Healthcare is in the midst of dramatic changes on many levels, driven in no small part by the expanding role of data in achieving a deeper understanding of disease, behavior and the interaction of complex systems. New types of data, such as genomic and sensor data, combined with the increasing electronic availability of traditional health data, are having a major impact on conceptual models of how disease is diagnosed and treated.

The Center for Computational Health at IBM T.J. Watson Research Center consists of a multi-disciplinary team of researchers with expertise in machine learning, data mining, visual analytics, biomedical & medical informatics, statistics, behavioral and decision sciences, and medicine. We work on developing cutting-edge methodologies to derive insights from diverse sources of health data, to support use cases in personalized care delivery and management, real world evidence, health behavior modeling, cognitive health decision support, and translational informatics.

Program Director: Jianying Hu

Research Areas

 Patient Similarity Analytics

Incorporating diverse patient attributes to develop similarity analytics by applying advanced machine learning methods to identify precision cohorts, combined with modeling methodologies for personalized predictive models capable of identifying patient level rankings of risk factors, leading to more targeted and actionable insights.

Predictive Modeling

Advanced machine learning approaches to address challenges in developing effective and efficient predictive models from observational healthcare data in different use cases. Examples include matrix based methods to address sparsity, feature engineering (i.e., temporal pattern mining, factor analysis), feature selection, scalable predictive modeling platform, personalized predictive modeling leveraging precision cohorts, and multi-task learning for comprehensive risk assessment. 

Disease Progression Modeling

Understanding disease onset, characteristics of disease stages, rate of progression from asymptomatic to symptomatic disease, from earlier to more severe stages, and factors that influence disease progression pathways.   

Translational Informatics

Drug Similarity Analytics combined with advanced machine learning methods such as joint matrix factorization can help pharmaceutical researchers quickly identify drugs that have similar characteristics to target drugs, supporting three distinct, but equally important use-cases: Drug Safety, Drug Repositioning and Personalized Medicine.  

Visual Analytics and Cognitive Decision Support

Innovative visual analytics platform and user interfaces that accelerate the process of exploring and mining data to derive new insights that can be translated into more effective therapeutics and processes.

Contextual & Behavioral Modeling

Combining real-time data from wearable devices, self-reported activity and clinical data, allows us to model behavior for both prediction and personalized wellness and fitness strategies customized to an individual’s unique needs.

Recent News and Posts

4/7/17 - IBM granted U.S. Patent 9,536,194: Method and system for exploring the associations between drug side-effects and therapeutic indications. 
IBM press release:
Blog Post:

Articles of interest related to CHF prediction work recently published in Circulation: Cardiovascular Quality and Outcomes:

IEEE Spectrum Article:
Blog Post:


Recent Presentations & Events

7th Digital Health 2017 - 7/2-5/2017, London England
Keynote: Health Innovation – An IBM Perspective
Presenter: Ching-Hua Chen

American Medical Informatics Association (AMIA )2016 Annual Symposium, 11/12-16, Chicago, IL

  • Characterizing Physicians Practice Phenotype from Unstructured Electronic Health Records.
    Presenter: Sanjoy Dey
  • Data-Driven Prediction of Beneficial Drug Combinations in Spontaneous Reporting Systems.
    Presenter: Ying Li
  • Predicting Negative Events: Using Post-discharge Data to Detect High-Risk Patients.
    Authors: Lina Sulieman, Daniel Fabbri, Fei Wang, Jianying Hu, Bradley Malin.

Data Analytics Challenge Win: IEEE International Conference on Healthcare Informatics (ICHI), 10/4-7/2016, Chicago, IL
Winner of Data Analytics Challenge - Team HARG, IBM T.J. Watson Research Center
Submitters: Janu Verma, Bum Chul Kwon, Yu Cheng, Soumya Ghosh, Kenney Ng

Best Paper Win: European Semantic Web Conference (ESWC), 5/29-6/2016, Anissaras, Crete, Greece
Best In-Use/Industrial PaperAward - Predicting Drug-Drug Interactions through Large-scale Similarity-Based Link Prediction
Authors: Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, and Ping Zhang

Featured: IBM Watson Health Showcases on Tackling Diabetes at American Diabetes Association’s 76th Scientific Sessions, June 10-14, 2016, New Orleans, LA
Personalized predictive modeling work led by Kenny Ng featured in the press release:

2016 SIAM International Conference on Data Mining, May 5-7, 2016, Miami, FL
Tutorial Presentation: Biomedical Data Mining with Matrix Models
Presenter: Ping Zhang

Keynote: 6th International Conference on Digital Health, April 11-13, 2016, Montreal, Quebec, Canada
Keynote Presentation - "Health Innovation - An IBM Perspective"
Presenter: Ching-Hua Chen

Special Session: ENDO 2016, April 1-4, 2016, Boston, MA
Symposium: Advanced Healthcare Informatics Analytics in the Areas of Precision Medicine, Translational Medicine and Population Health
Presenters: Kenney Ng, Yarra Goldschmidt, Ching-Hua Chen

Plenary Speach: 2016 Asian American Engineer of the Year Symposium, March 12, 2016, New Brunswick, NJ
Plenary speach on Data Driven Healthcare Analytics
Plenary Speaker: Jianying Hu

Invited Presentation: CHDI’s 11th Annual HD Therapeutics Conference, February 22–25, 2016, Palm Springs, CA
Invited closing presentation: Understanding Huntington’s disease progression: A multi–level probabilistic modeling approach
Presenter: Jianying Hu

Invited Panel Presentation: SINAInnovations 2015, October 27-28, 2015, New York, NY
Day One Panel Discussion - Precision Medicine
Invited Panel Presenter: Jianying Hu
Program & Video Link:

Machine Learning in Healthcare, August 8-9, 2014, Los Angeles, CA
Keynote: Data Driven Analytics for Personalized Healthcare
Presenter: Jianying Hu
Program & Video Link: c

Selected Recent Publications

Identifying and investigating unexpected response to treatment: a diabetes case study.
Ozery-Flato, Michal and Ein-Dor, Liat and Parush-Shear-Yashuv, Naama and Aharonov, Ranit and Neuvirth, Hani and Kohn, Martin S and Hu, Jianying
Big data 4(3), 148--159, Mary Ann Liebert, Inc. 140 Huguenot Street, 3rd Floor New Rochelle, NY 10801 USA, 2016

Early detection of heart failure using electronic health records: practical implications for time before diagnosis, data diversity, data quantity, and data density.
Ng K, Steinhubl SR, deFilippi C, Dey S, Stewart WF.
Circuation: Cardiovascular Quality and Outcomes. 2016;9:649-658.

Characterizing physicians practice phenotype from unstructured electronic health records.
Dey S, Wang Y, Byrd R, Ng K, Steinhubl S, deFilippi C, Stewart W
American Medical Informatics Association Annual Symposium (AMIA), 2016.
Data-driven prediction of beneficial drug combinations in spontaneous reporting systems.
Ying Li, Ping Zhang, Zhaonan Sun Jianying Hu
American Medical Informatics Association Annual Symposium (AMIA), 2016.
Integrated machine learning approaches for predicting ischemic stroke and thromboembolism in atrial fibrillation.
Xiang Li, Haifeng Liu, Xin Du, Ping Zhang, Gang Hu, Guotong Xie, Shijing Guo, Meilin Xu, Xiaoping Xie
American Medical Informatics Association Annual Symposium (AMIA), 2016.

Predicting negative events: Using post-discharge data to detect high-risk patients.
Lina Sulieman, Daniel Fabbri, Fei Wang, Jianying Hu, Bradley Malin.
American Medical Informatics Association Annual Symposium (AMIA), 2016.

DPDR-CPI, a server that predicts drug positioning and drug repositioning via chemical-protein interactome.
Heng Luo, Ping Zhang, Xi Hang Cao, Dizheng Du, Hao Ye, Hui Huang, Can Li, Shengying Qin, Chunling Wan, Leming Shi, Lin He, Lun Yang
Scientific Reports, Nature Publishing Group, 2016.

Deep state space models for computational phenotyping.
Soumya Ghosh, Yu Cheng, and Zhaonan Sun
IEEE International Conference on Health Informatics (ICHI) 2016.

Correlating eligibility criteria generalizability and adverse events using Big Data for clinical trials.
Sen A, Ryan PB, Goldstein A, Chakrabarti S, Wang S, Koski E, Weng C.
Ann N Y Acad Sci. 2016 Sep 6. doi: 10.1111/nyas.13195.

Improving precision medicine using individual patient data from trials.
Cahan A, Cimino JJ.
CMAJ. 2016 Aug 29. pii: cmaj.160267.

Using frequent item set mining and feature selection methods to identify interacted risk factors - The atrial fibrillation case study.
Xiang Li, Haifeng Liu, Xin Du, Gang Hu, Guotong Xie, Ping Zhang
Medical Informatics Europe (MIE), 2016.

Visual assessment of the similarity between a patient and trial population: Is this clinical trial applicable to my patient?
Cahan A, Cimino JJ.
Applied Clinical Informatics, 2016 Jun 8;7(2):477-488.

Predicting drug-drug interactions through large-scale similarity-based link prediction.
Achille Fokoue, Mohammad Sadoghi, Oktie Hassanzadeh, Ping Zhang
Extended Semantic Web Conference (ESWC), 2016

Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models
Krause J, Perer A, and Ng K.
Proceedings of the 2016 CHI Conference in Human Factors in Computing Systems, 2016

Integrating population-based patterns with personal routine to re-engage Fitbit use.
Chung C, Danis C.
Proceedings of PervasiveHealth 2016, 2016

Risk prediction with electronic health records: A deep learning approach.
Cheng Y, Wang F, Zhang P, Hu J.
SIAM International Conference on Data Mining (SDM), 2016.

Clustering of elderly patient subgroups to identify medication-related readmission risks.
Olson, Catherine H and Dey, Sanjoy and Kumar, Vipin and Monsen, Karen A and Westra, Bonnie L
International Journal of Medical Informatics 2016 Jan;85(1):43-52, Elsevier.

Wearable technologies and telehealth in care management for chronic illness.
Zhu, Xinxin, and Cahan, Amos.
in Healthcare Information Management Systems
Charlotte A. Weaver, Marion J. Ball, George R. Kim, and Joan M. Kiel, Eds, Springer International Publishing, 2016.

Mining and exploring care pathways from electronic medical records with visual analytics.
A. Perer, F. Wang, and J. Hu.
Journal of Biomedical Informatics (JBI). 2015

Label Propagation Prediction of Drug-Drug Interactions Based on Clinical Side Effects.
Zhang P, Wang F, Hu J, Sorrentino R.
Sci Rep. 2015 Jul 21;5:12339.

Towards actionable risk stratification: a bilinear approach
Wang X, Wang F, Hu J., Sorrentino, R
Journal of Biomedical Informatics (JBI). 2015

Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity
Ng K, Sun J, Hu J, Wang F
AMIA Jt Summits Transl Sci Proc. 2015 Mar 25;2015:132-6.

LINKAGE: An Approach for Comprehensive Risk Prediction for Care Management
Sun Z, Wang F, HU J
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2015

Early detection of heart failure with varying prediction windows by structured and unstructured data in electronic health records.
Yajuan Wang, Ng K, Byrd RJ, Jianying Hu, Ebadollahi S, Daar Z, deFilippi C, Steinhubl SR, Stewart WF.
Conf Proc IEEE Eng Med Biol Soc. 2015 Aug;2015:2530-3.

Clinicians' evaluation of computer-assisted medication summarization of electronic medical records.
Zhu X, Cimino JJ. 
Comput Biol Med. 2015 Apr;59:221-31.

Prescription Extraction from Clinical Notes: Towards Automating EMR Medication Reconciliation
Wang Y, Steinhubl SR, Defilippi C, Ng K, Ebadollahi S, Stewart WF, Byrd RJ.
AMIA Jt Summits Transl Sci Proc. 2015 Mar 25;2015:188-93.

Relative Patterns Discovery toward Big Data Analytics
Pai H, Wu F, Hsueh PY, Lin G, Chan Y-H.
Proceedings of the 2015 IEEE 12th Interntional Conference e-Business Engineering (ICEBE), 2015

PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records
Ng K, Ghoting A, Steinhubl SR, Stewart WF, Malin B, Sun J
Journal of Biomedical Informatics (JBI), 2014

Unsupervised Learning of Disease Progression Models
Wang X, Sontag D, Wang F
Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD), 2014

Predicting changes in hypertension control using electronic health records from a chronic disease management program
Sun J, McNaughton CD, Zhang P, Perer A, Gkoulalas-Divanis A, Denny JC, Kirby J, Lasko T, Salp A, Malin BA
Journal of American Medical Informatics Association (JAMIA), 2014

From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records
Zhou J, Wang F, Hu J, Ye J
Proceedings of 0th ACM SIGKDD international conference on Knowledge discovery and data mining, Pages 135-144  (KDD), 2014
Towards personalized medicine: leveraging patient similarity and drug similarity analytics
Zhang P, Wang F, Hu J, Sorrentino R
Proceedings of AMIA Joint Summits on Translational Sciences, 2014
Exploring joint disease risk prediction
Wang X, Wang F, Hu J, Sorrentino R.
Proceeding of AMIA Annual Symposium, 2014