Data Science

The Data Science secondary field is available to any student enrolled in a PhD program in the Graduate School of Arts and Sciences upon approval of a plan of study by the Data Science Program Committee and the director of graduate studies in the student’s home department.

Data Science lies at the intersection of statistical methodology, computational science, and a wide range of application domains. This secondary field offers strong preparation in statistical modeling, machine learning, optimization, management and analysis of massive data sets, and data acquisition. Students completing the Data Science secondary field will be exposed to topics such as reproducible data analysis, collaborative problem solving, visualization and communication, and security and ethical issues that arise in data science.

The Data Science secondary field is overseen by the joint leadership of the Computer Science and Statistics faculties and administered by the Institute for Applied Computational Science (IACS). All questions should be directed to Daniel Weinstock, associate director of graduate studies (ADGS) in Applied Computation.

Admission

Interested students should consult with their director of graduate studies no later than the first semester of the third year of study and reach out to the ADGS to express interest in applying. The ADGS will provide information about the application, which should include a proposed plan of study.

Applications, which must be approved by the home department DGS, may be submitted twice a year, in the spring semester (deadline: March 1) and fall semester (deadline: October 1) for the following academic term. The ADGS will respond to all applications within one month.

Requirements

Each student’s plan of study for the secondary field will include:

1. Core Courses

At least 3 of the Data Science core courses:

  • AC 209a*        Data Science 1: Introduction to Data Science
  • AC 209b*         Data Science 2: Advanced Topics in Data Science
  • AM 207           Advanced Scientific Computing: Stochastic Methods for    Data Analysis, Inference, and Optimization
  • CS 207            Systems Development for Computational Science
  • AC 221            Critical Thinking in Data Science

*Students can, with the permission of the program committee, count CS 109a/b in place of AC 209a/b.

2. Electives

Two electives in Computer Science or Statistics. Students may choose from a offered by the Computer Science and Statistics faculties.

Alternatively, students may choose to satisfy the elective requirement by taking additional core courses. Students may also choose, as a substitute for one elective, either AC 297r, the IACS Capstone Project course, or AC298r, the interdisciplinary seminar in Computational and Data Science.

3. Oral Examination

As a final requirement, an oral examination by a faculty committee on a data science research topic. Typically students will present on a part of their dissertation thesis work. Students will be evaluated on their ability to explain their work to the interdisciplinary IACS audience and their command of the Data Science methods they have used. The oral presentation should explain how the courses taken to satisfy the Data Science secondary field impact their research.

Advising and Other Activities

Daniel Weinstock, ADGS in Applied Computation, will be responsible for frontline advising of students in the Data Science secondary field. Students interested in the secondary field are encouraged to reach out to Dr. Weinstock before submitting an application. Students enrolled in the secondary field will be able to participate in the activities of the IACS community, including technical and interdisciplinary colloquia, skill-building workshops, and tech-treks to local companies working to apply computation and data science in many different domains.