Objectives

The main objective of this phase is to prepare the Phase I report. This will serve as the dataset(s) description and summary of your final report.

  • Explore the template projects
  • Explore cloud computing resources
  • Find your teammate(s)
  • Find publicly available dataset and perform data processing and summarization
  • Each student submits the [Student Team Agreement]
  • Contact and communicate with two mentors and setup a meeting with them to discuss your data and research goal
  • Write and submit the Phase I report as a team through the [Report Submission form]
  • Follow up with mentors and have each submit the [Mentor Confirmation form]

After submitting the report, you may also need to

  • Integrate feedback on your Phase I report
  • Meet with course instructors if you need help finding mentors

Timeline and Key Dates

  • Attend the Kickoff Meeting at Mar 14, 2022
  • [5 Points] Each student submits the [Student Team Agreement] form by Apr 8, 2022
  • [20 Points] Each team submits the Phase I Report by May 6, 2022
  • [Required] Ask your mentors to submit the [Mentor Confirmation form by May 20, 2022
    • You should have two mentors, but at least one must provide at least a guidance role for your project
    • This contact mentor’s evaluations will weigh more in your final grade

Phase I Report Contents

The Phase I Report mainly consists of describing the provenance, characteristics, and quality of dataset(s) that will be the likely focus of your project. The report should be well-organized and answer questions about the suitability of the dataset for a clinically-relevant data analysis project. Beside describing any data use agreements/requirements, data processing tools, and data summary statistics, the reports should attempt to briefly answer some of the following questions:

  • Data Provenance: Where does the data come from? How and when was it obtained (Velocity)? What assumptions were made in its acquisition? What aspects of reality were captured well/poorly? How was the data transformed for sharing and preparation for data analysis? What information was lost in that transformation?
  • Data Characteristics: What scales apply to the different data features in the dataset? Are they categorical? ordinal? numerical? What are the underlying distributions of those features (normal, skewed)? What is the variation in the data features and values (Variety)? Are there extreme outliers? How do the different features relate to each other?
  • Data Quality: How can you be assured of the data quality (Validity)? Are there missing features that can be created and would be useful? Are there missing or noisy data that can be imputed/corrected? Are there outside data sources that can validate or expand the set of reliable features?

Phase I Report Format

This deliverable should be a pdf in the style of a technical report that is not longer than 2 pages (excluding figures, tables, and references). The name of the submitted pdf should contain a short title of the project followed by the last names of the medical student team members (e.g. “Being President - Washington, Adams.pdf”). For suggestions on how to format your report, look to medical journal templates. A useful collection of science/medical journal templates are provided at nextgenediting.com. For style guidance, consider this online guide for writing technical reports written by Michael Ernst.