To get started, we have identified five major example domains of medical data science. Each domain has a Template Project that allows one to quickly recreate an example data analysis from a publication in the given domain. These projects contain the preprocessed primary datasets, cyberinfrastructures, and software methods necessary for the recreation.

Genomics, Transcriptomics, and Proteomics

Learn about using multi-omics data from the Cancer Genome Atlas (TCGA) for patient stratification, survival analysis, biomarker discovery, and pathway impact. The analysis will use a Jupyter notebook and web-based KnowEnG platform. More details can be found [here].

Medical Image Analysis

Learn how to assess blackbox methods for classifying chest x-rays for different pathologies using the Google Cloud Platform AutoML automated deep learning system. Google Cloud Platform access will need to be requested. More details are found [here].

Personalized Medicine

Learn how to identify which clinical features and what subgroups of sepsis patients will most benefit from specific treatment regimes using the Virtual Twins modelling approach in a Juptyer notebook. More details are found [here].

Electronic Medical Records

Learn to predict patient readmission events from diagnostic codes in the MIMIC-III database using the DoctorAI recurrent neural network. The analysis will require access to the AWS-based Cloud9 command line environment and credentials for the MIMIC data. More details are found [here].

Population Health Modeling

Through multiple Jupyter notebooks, learn how to use SIR and SEIR models of disease transmission to estimate the predicted effect of different COVID-19 policies on spread and hospital resource utilization. More details are found [here].