Overview

Medical image processing is a rapidly evolving field that has an important impact on clinical research and practice (2019 review). The investigative tasks involved are varied, including detection and registration of cellular entities, image segmentation and classification, and computer-aided diagnosis and outcome prediction. For this project, we have selected a tutorial that mimics a recent feasibility study which applies the Google Cloud AutoML ML Vision tool to train a deep neural network on over 100,000 NIH Chest X-Ray (CXR) images to learn how to predict from among 14 different pathologies. The main purpose of this tutorial is to gain insight into how to evaluate and interpret the quality of blackbox deep learning models. The CXR dataset, Google AutoML tools, and ideas presented in the tutorial may be useful in guiding potential data science projects. While reading the related papers and working through this analysis, you may want to consider the the critical questions for data analysis related to the DSP Course Competencies.

Tutorial Highlights

Running the Tutorial

In order to run this tutorial, users will need access to a Google Cloud Platform account and should fill out the [cloud access request form]. Once your account has been set up, you will be able to create personal Google Cloud Projects and access Google’s AutoML Vision interfaces for automated architecture search and training of convolutional neural network models. During the tutorial, students will create their own copy of the publicly available NIH Chest X-Ray dataset with over 100,000 labeled images for model training. Completing the tutorial on the full training set will involve several steps that take at least an hour to run and finish, but results in a model that can predict the pathology for a given chest x-ray image. After the tutorial and any project extensions requiring Google Cloud are complete, users will need to delete their Google Cloud projects to halt recurring costs.

Project Extensions and Future Directions

There are many possibilities to extend beyond the tutorial and create a data science project. Projects might focus on how different ways of defining the training input images, labels, and parameters can influence the accuracy and value of the final classification model. Another possibility is using the Google AutoML APIs to build a mobile app that can apply the learned model to medical images that are on a tablet or phone. Alternatively, the Stanford CheXpert and MIT MIMIC dataset are similar, large chest x-ray collections, which can be used to examine the generalizability and difficulties of applying deep learning models across different hospital settings. One could also build models from several other public datasets of labeled medical images including from musculoskeletal radiographs, white blood cells, and cancer. A project may also be designed to investigate building models with higher dimension images (3D scans or time-lapse videos), additional genomic or clinical inputs, or with focuses on tasks other than classification, such as image segmentation.

Potential Project Resources