The UIUC Tech Services can help COM students to set up an account for these computing environment. If you have additional questions, please contact our course coordinator Eliot Bethke. If you would like to have access to this service as soon as its available to COM students, please fill out this [request form] by Mar 19. Requests after this date will be processed on monthly basis.
You are encouraged to explore other public data repositories in the medical domain to establish the research question. Here we provide several examples.
MIMIC is an openly available dataset developed by the MIT Lab for Computational Physiology, comprising deidentified health data associated with ~60,000 intensive care unit admissions. It includes demographics, vital signs, laboratory tests, medications, and more. To help coordinate credentialed access, see instructions on Compass and fill out this [MIMIC Request Access Form]
Health
Catalyst is a data provider for the Carle Foundation Hospital. It
currently contains more than 1.2M patient entries, collected from around
the country. The richness of this dataset provides an powerful way to
establish hypothesis and look for evidence. All medical school students
can establish an free account to access the database through their API.
Contact the course instructors for more information. The data management
and analytic team at Health Catalyst met with our data science team and
provided some use examples of the analytic tools. The video recordings
is provided here
(PW: a$d6f@T0
). This may help new users to navigate in the
database.
Kaggle is a well known data repository for machine learning. However, not all the datasets are medically relevant. You will need to read the data description carefully to determine if they are appropriate for your project. For example using the healthcare tag for your search may help.
HealthData.gov is dedicated to making data discoverable and making valuable government data available to the public in the hopes of better health outcomes for all.
Stats Up AI provides a list of bioinformatic and medical related datasets for machine learning and data science.
Article by Open Data Science summarizes 15 Open Datasets for Healthcare.
Article by Rei Morikawa summarizes 18 Open Healthcare and Medical Datasets for Machine Learning
Berkeley Library’s Health Statistics & Data links to datasets and raw data.
University of Washington Health Science Library provides resources to support research data needs in the health sciences field.
Illinois Library provides links to health-related data sets and statistics.
DREAM Challenges are crowdsourcing challenges examining questions in biology and medicine. The challenges produce standardized data sets and benchmarked methods for future comparison, analysis, and development.
A variety of dataset collection in GitHub