The Harvard Data Science Initiative (HDSI) represents Harvard University’s commitment to shaping the new science of data. It illuminates the new interdisciplinary pathways that our faculty, students, and real problems, in a world with critical ethical challenges partners will use to solve regarding facts, data, and truth.

harvard data science initiative fellowships

Dear Friends, In this report we share with you the remarkable work of the Harvard Data Science Initiative over the past twelve months. I am enormously proud of what our community has accom- plished, and excited to build on our success in the future. If only one word defines the HDSI’s activity in 2022-2023, it is impact : we have taken the measure of our impact over the past five years, refocused our current efforts to amplify and deepen that impact, and launched collaborations and new programming that will scale our impact in the future. Through collaborations with industry colleagues, we have expanded the scope of our activity at scale and laid the groundwork for sustained and meaningful application of data science towards solving some of society’s toughest challeng- es. I hope you will find the sto- ries of this impact in the follow- ing pages as inspiring as I do. The HDSI enables impact through core activities in the areas of research, education, industry collaboration, and outreach. Through our post- doctoral fellowship program, faculty-led research, Corpo- rate Members Program, and Harvard Data Science Review we are able to galvanize pro- gramming at the intersection of methodology and application in transformative ways.

The range, scope, and com- plexity of programming that we are able to activate is possi- ble only through the work of a team of very talented staff, who have over the past five years, expanded the work of the HDSI and HDSR with energy, skill, and good humor. Their efforts are complemented by the com- mitment and generosity of time of those who serve as speakers, peer reviewers, advisors, and supporters. Our gratitude for their work is profound. As you read about the 2022- 2023 academic year at the HDSI, I hope you will be inspired to get involved in the months ahead. Whether by attending a seminar, reading the Harvard Data Science Review , or ex- ploring collaboration with HDSI scholars, your involvement is welcomed. Sincerely, Elizabeth Langdon-Gray

Letter from the harvard data science initiative executive director

Elizabeth Langdon-Gray HDSI Executive Director

Letter from the harvard data science initiative faculty co- directors

As we move forward into a new era of data science, we remain steadfast in our determination to tackle the world’s most pressing challenges through rigorous research, ethical practices, and responsible data stewardship. “

Our commitment to impact ex- tends beyond our collabora- tion with AWS to our work with our Corporate Members, who partnered with us over the past twelve months to explore issues around trust in science, ESG commitments, metrics, and ac- countability, data science lead- ership, and the idea of impact itself (what does it mean for dif- ferent sectors)? We continue to find great value in these conver- sations, and remain committed to deepening our ties with our partners in industry and working to advance our mutual goals. Of course, the HDSI’s activity is built on a foundation of rigorous scholarship. Our faculty affili- ates and grant recipients con- tinue to advance groundbreak- ing work in areas as diverse as designing better algorithms to

reduce pretrial incarceration, digitizing and analyzing past proposed amendments to the US Constitution, and improving medical image interpretation. And, our postdoctoral fellows engage with the most challeng- ing topics with curiosity, energy, and a commitment to interdis- ciplinary excellence. As we move forward into a new era of data science, we remain steadfast in our determination to tackle the world’s most press- ing challenges through rigorous research, ethical practices, and responsible data stewardship. We invite you to explore the 2023 Annual Report, where you will discover the transformative potential of data science. Best, HDSI Faculty Co-Directors

Dear Friends, We are delighted to present to you the Harvard Data Science Initiative’s 2023 Annual Report, which highlights just some of the groundbreaking research, innovative collaborations, and transformative advancements that have defined the past year. In 2022/2023 we made a wel- come return to in-person con- venings, anchored by the HDSI’s Annual Conference in Novem- ber 2023. This year’s conference connected expert methodol- ogists, data science profes- sionals, and educators across multiple disciplines—health, ed- ucation, economics, social pol- icy, business, and the human- ities —for two days of workshops, tutorials, and panel discussions. The conference, always a high- light of our programming, was made that much better this

year by the contributions of our co-host, the Harvard Business School’s Digital, Data, and De- sign (D 3 ) Institute. Indeed, innovative collabora- tions enriched the work of the HDSI more broadly and deeply than ever before: In November, we announced a major, multi- year collaboration with Ama- zon Web Services that you can read about on page 19 of this report. Our alliance with AWS, through the AWS Impact Com- puting Project at the HDSI, will harness the combined depth of Harvard scholarship and AWS’s unrivalled capabilities in high-performance computing to address far-reaching soci- etal challenges. Early funding will be awarded to faculty-led research on data science as it relates to food security, the so- cial determinants of health, and climate change.

David C. Parkes

Francesca Dominici Clarence James Gamble Professor of Biostatistics, Population and Data Science, Harvard T.H. Chan School of Public Health

George F. Colony Professor of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences

harvard data science initiative engagement and partnerships

hdsi 2022 event highlights

hdsi annual conference 2022

This year’s Harvard Data Science Initiative (HDSI) Annual Conference took place on November 15 and November 16 in Boston, MA at the Harvard John A. Paulson School of Engineering’s Science and Engineering Complex (SEC) and Harvard Business School’s Klarman Hall. During the two-day event, speakers from across Harvard, academia, and in- dustry showcased data science in research and education through multidis- ciplinary panel discussions, keynote addresses, a workshop on artificial intel- ligence, and a tutorial on causal inference. We were delighted to welcome a diverse, multidisciplinary audience from across the University and the public data science community for this annual opportunity to connect with expert methodologists, data science professionals, and educators.

Day 1: November 15, 2022

Workshop: Fairness + Explainability in AI led by HDSI FACULTY AFFILIATES Marinka Zitnik+ Hima Lakkaraju

Tutorial: Causal Inference led by HDSI faculty affiliate José R. ZubizarretA

Day 2: November 16, 2022

four panel discussions Academic Keynote by Maria De-Arteaga, University of Texas at Austin

Industry Keynote by Martin Tingley, Netflix

Corporate Members Program

Causal Inference Working Group

The Causal Working Group is a university-wide working group of researchers interested in the methodologies and applications of causal infer- ence, and supported with a generous grant from the Alfred P. Sloan Foundation. This group is led by Harvard faculty members Francesca Domini- ci, Iavor Bojinov, José R. Zubizarreta, Kosuke Imai, and Luke Miratrix with the goal to build interdisci- plinary collaborations between faculty, staff, and students. The Group hosts an in-person seminar series which welcomes faculty from universities across the nation to present their current causal inference research. View the upcoming seminars for the 2023-2024 academic year.

Developing deep relationships with industry is critical to fulfilling the HDSI’s mission of transformation through data science. Both the problems we tackle and the students we train must be informed by the most difficult challenges facing industry—challenges that can be solved in partnership with academia. HDSI Corporate Members share our commitment to transformational research that will have widespread impact, and represent sectors including the life sci- ences, business consulting, technology, finance, data analytics and publish- ing. The HDSI Corporate Members Program provides a single point of entry, includ- ing facilitated access to Harvard faculty, researchers, and students in data science, and invitations to workshops, seminars, and conferences that show- case the best research and ideas from across the University.

The HDSI Data Science in Industry Seminar series provides the opportunity for Harvard students, faculty, and postdoctoral fellows to hear directly from industry data scientists about the role data science plays in their organization, the methods and techniques being used, and their own ca- reer trajectory. Previous speakers have spanned a broad range of sectors, including finance, tech- nology, sports, pharmaceuticals and the life sci- ences, and media. 2022 Industry Seminar Guests: • Kelsey Rogers, Chief Information Officer, Acentech • Tammy Levy, Head of Data Science, Captain.tv • Liz grennan, Expert Associate Partner, mckinsey & company • Alex singla, Senior Partner and Global Leader of QuantumBlack, AI by McKinsey

• Amazon • Bayer • EARNEST Partners • Elsevier • Harmony Analytics • McKinsey & Company • Microsoft

In Spring 2023, the HDSI and Harvard John A. Paul- son School of Engineering and Applied Sciences hosted a series of virtual seminars on data lead- ership. During this series, the HDSI community had the chance to hear from McKinsey & Com- pany senior leaders along with Harvard faculty and industry leaders from companies and NGOs to discuss pertinent questions and challenges in AI, digital trust, ethics, data governance, and regulation. Read about the series: • Q&A with Liz Grennan + Alex Singla • View seminar recordings

In November 2022, the HDSI and Amazon Web Services (AWS) announced an alliance to support current and future data science research that works to identify potential solutions to complex health, climate, and economic challenges. The joint collabora- tion funds projects led by Harvard faculty working to create new data solutions that amplify the University’s social impact. Read The Harvard Gazette announcement: • Applying cloud computing to major global problems

McKinsey Special Seminar Series

HDSI + AWS Impact Computing Project

Trust in Science is a flagship project of the HDSI, conducted in collaboration with the Harvard Kennedy School’s Program on Science, Technol- ogy & Society (STS). At a time of seemingly wide- spread loss of confidence in science and exper- tise, the Project seeks to illuminate the varied factors that currently impede trusting relations between the producers and users of scientific information. It leverages data science, science and technology studies, and related disciplines to analyze the breakdowns in public trust, and to ask what steps could be taken to promote better mutual understanding. With generous philanthropic support from Bayer, the Project supports faculty-led research efforts, workshops, conferences, symposia, and external engagement to amplify the impact of funded work.

HDSI Faculty Affiliate Kosuke Imai is a Professor of Government and Statistics at the Harvard Facul- ty of Arts & Sciences and a member of the HD- SI’s Causal Inference Working Group. With a gift of Azure credits from Microsoft, Imai and his re- search team developed a tool to detect gerry- mandering which has been used by plaintiffs in gerrymandering cases in several states. Imai’s continued research evaluating the fairness of redistricting rules has revealed that gerryman- dering still disempowers Americans at the dis- trict level. Read about Kosuke Imai’s work: • How to spot a gerrymandered district? Com- pare it to fair ones. • Biggest problem with gerrymandering

BAYER + Trust in science project

HDSI + Microsoft Azure Funding

Letter from the harvard data science review editor- in-chief

HDSR features not only everything data science, but also After all, shouldn't we all be data literate in this day and age? “ data science for everyone. We embrace the idea that this big pie can be palatable, digestible, or even irresistible.

The data science revolution has its grip on every sector, especially academia. Virtually all universities, big and small, want a piece of the pie, which they an- gle for by building infrastructures and programs for competing (and retain- ing) students, scholars, fundings, and so on. Harvard is no exception. But given our unique convening power, we also thought, ‘Why not bake another big pie for everyone to share?’ And so in 2018, we set out to create a global digital hub where data science is the common language and curi- osity the only passport needed, with its intellectual mission being defining and shaping data science as an arti- ficial ecosystem. After a year of brain- storms, blueprints, and a good number of sleepless nights, HDSR was born on July 2, 2019.

digital platform and brewing up ideas such as a conference on data literacy coming in fall of 2023. I am grateful to the co-directors of the HDSI, Francesca Dominici and David Parkes, who strongly supported these HDSR missions from the get-go, and who were the co-pilots of HDSR (from July 2021 to December 2022) while I was off on a sabbatical expedition. Their dedication allowed me to take a breath, learn new culinary (and fermentation) skills, and bring back new recipes that could serve HDSR and its missions. Our shared vision also allowed my sabbatical to serve as a strategic op- portunity to stress-test HDSR during its formative years. This exercise under- scored the importance of our submis- sion model—by invitation, post-screen- ing—revealing its reliance on active content building by the editorial board rather than unsolicited submissions. A comparison of submission numbers from 2022—when active content build- ing was largely on hiatus—and the first three years reiterates the significance of active content building, which ac- counts for over 60% of our submission pipeline. This reminder acted as a spark, ignit- ing the HDSR boards to gear up for the upcoming year with renewed vigor. Yes, we’ve got a lot cooking up in the HDSR kitchen. We’re organizing an array of special issues and themes on topics of global interests and impact, such as generative AI, climate change, and data privacy. And that’s not all; we will be launching new columns dedicat- ed to data ethics, reproducibility, and philosophy for data science, which will bring the total number of columns to ten. It’s like a buffet of thought-pro- voking insights, and there is something (delicious) for everyone.

The excitement of our impending fifth anniversary, still a year away, has already set the gears in motion for planning a delightfaul theme. While holding the suspense to fuel your an- ticipation, I can say that it will pair well with the celebration of HDSR and the broader realm of data science. Before I wrap up, a warm, heartfelt thank you to the HDSI team, Harvard’s Office of the Provost, all HDSR boards and authors, and most importantly, the unsung heroes: our anonymous reviewers, the HDSR editorial office, and our friends at MIT Press and Pub- Pub. Last but not least, our dear read- ers, thank you! We are here because you are there.

Whipple V. N. Jones Professor of Statistics, Harvard University Xiao-Li Meng, Editor-in-Chief, HDSR

CHECK OUT THE HDSR PODCAST! Over 27 episodes released + over 60,000 downloads

HDSR features not only everything data science, but also data science for ev- eryone. We embrace the idea that this big pie can be palatable, digestible, or even irresistible. After all, shouldn’t we all be data literate in this day and age? We certainly think so, and that’s why we’re working hard on enhancing our


The publication of issue 5.1 in January 2023 officially marked the end of HDSI Faculty Co-Chairs Francesca Dominici and David Parkes’s joint term as HDSR Interm Co-Editors-in- Chief. Francesca and David performed double duty as they expertly steered both the HDSR and HDSI ships for 18 months while Founding Editor-in-Chief Xiao-Li Meng was on a much- needed sabbatical. Under Francesca and David’s leadership, the editorial board was reorganized and given more responsibility to add more structure around the way the journal runs. The editorial changes reflected both the work that had been falling on the shoulders of Xiao-Li and also in anticipation of continued growth in data science and interest in HDSR . The changes included the addition of new faces. Thanks to Francesca’s and David’s recruitment efforts, HDSR welcomed new Co-Editors Ryan Adams, Frauke Kreuter, Greg Lewis, and Susan Paddock as well as nine new Associate Editors.





SPECIAL ISSUE Differential Privacy for the 2020 Census : How Do We Make Data Both Private

SPECIAL THEME Value of Science

SPECIAL THEME Changing the Culture of Data Management and Sharing in Biomedicine


United States India united kingdom


SPECIAL THEME World Migration and Displacement : Data, Disinformation, and Human Mobility

Germany Pakistan Australia Bangladesh Philippines Indonesia



meet the hdsI + hdsr


Elizabeth Langdon-Gray HDSI Executive Director

Francesca Dominici HDSI Faculty Co-Director

Lawrence Weissbach HDSI Scientific Director

Jennifer Chow HDSI Director of External Engagement

HDSI Assistant Director of Programs + Operations Kevin Doyle

David C. Parkes HDSI Faculty Co-Director

Sarah E. McCullough HDSI Events + Engagement Coordinator

Catherine Adcock HDSI Program Assistant

Sam Weiss Evans TiS Research Fellow

Amara Deis HDSR Editorial + Administrative Coordinator

Xiao-Li Meng HDSR Editor-in-Chief

Rebecca McLeod HDSR Managing Director

hdsi faculty affiliates


Boston Children’s Hospital Ata Kiapour

Harvard John A. Paulson School of Engineering and

Harvard Faculty of Arts and Sciences Jeffrey Schnapp Gabriel Kreindler Fiery Cushman Kelly McConville Christopher Winship Flavio du Pin Calmon

Applied Sciences Hanspeter Pfister

Alyssa A. Goodman Robert Wheeler Willson Professor of Applied Astronomy, Harvard Faculty of Arts & Sciences

Brigham and Women’s Hospital Azam Yazdani

Melanie Weber James Mickens Demba Ba

Isaac Kohane Chair of the Department of Biomedical Informatics; Marion V. Nelson Professor of Biomedical Informatics, Harvard Medical School

S.C. Samuel Kou Chair of Department of Statistics, Harvard Faculty of Arts & Sciences; Professor of Biostatistics, Harvard T.H. Chan School of Public Health

Dana-Farber Cancer Institute Rafael Irizarry

Elena Glassman Stratos Idreos Finale Doshi-Velez

Harvard Business School Jeremy Yang

Joscha Legewie Peter Huybers Dustin Tingley

Harvard Kennedy School of Government Anders Jensen Rema Hanna Soroush Saghafian Harvard Medical School Nils Gehlenborg Simon Nørrelykke Faisal Mahmood Vesela Kovacheva

Ayelet Israeli Iavor Bojinov Edward McFowland Scott Kominers Eva Ascarza

Gary King Albert J. Weatherhead III University Professor, Harvard University; Director of the Institute for Quantitative Social Science

Xiang Zhou Edo Berger

Mark Glickman Alyssa Goodman

Harvard Graduate School of Education Luke Miratrix Institute for Applied Computational Science Fabian Wermelinger Massachusetts General Hospital Jonghye Woo Ibrahim Chamseddine

Kosuke Imai David Yang Sean Eddy

Hanspeter Pfister An Wang Professor of Computer Science; Academic Dean of Computational Sciences and Engineering, Harvard John A. Paulson School of Engineering and Applied Sciences

John Quackenbush Henry Pickering Walcott Professor of Computational Biology and Bioinformatics; Chair of the Biostatistics Department, Harvard T.H. Chan School of Public Health

Harvard T.H. Chan School of Public Health John Quackenbush Rui Duan

Xiaofeng Liu Bill Lotter Hossein Estiri

Xiao-Li Meng Editor-in-Chief Harvard Data Science Review ; Whipple V. N. Jones Professor of Statistics, Harvard University

Adam Haber Grace Chan Marcia Castro Christopher Golden Rachel Nethery Giovanni Parmigiani Nima Hejazi Satchit Balsari

Mohammad Jalali Jose Zubizarreta

Tanujit Dey Yangming Ou Kun-Hsing Yu Michael Baym Bethany Hedt-Gauthier Marinka Zitnik

Harvard Law School Jonathan Zittrain Jared Ellias



+ Public Service Data Science Graduate Fellows

The HDSI Fellows are outstanding researchers who support postdoctoral scholars with interests in advancing the field of data science. Fellows are independent investigators who pursue their own research in collaboration with Harvard faculty.

The HDSI Public Service Data Science Graduate Fellowship supports master’s students in Harvard’s data science programs (Biomedical Informatics, Health Data Science, Data Science) who want to explore career paths at not-for-profit and public sector organizations through a summer internship.

Melody Huang 2023 Wojcicki Troper HDSI Postdoctoral Fellow

Keyon Vafa 2023 HDSI Postdoctoral Fellow

Johannes Knittel 2023 Wojcicki Troper HDSI Postdoctoral Fellow

Frank Xinnan Cheng Harvard John A. Paulson School of Engineering and Applied Sciences, Watt Time

James liounis


Harvard John A. Paulson School of Engineering and Applied Sciences, OpenDP

Harvard John A. Paulson School of Engineering and Applied Sciences, The World Bank

learn more about the hdsi postdoctoral fellowship program

learn more about the hdsi graduate fellowship program + view past fellows

2023 HDSI funding awardees

2023 SPUDS fellows

summer program for undergraduates in data science Program (SPUDS)

faculty Special Projects

Giovanni Parmigiani + Danielle Braun Clinical Decision Support Tool for Trans and Gender Diverse Patients on Hereditary Cancer Risk Weiwei Pan Textbook and community building initiative: How Artificial Intelligence can help to solve the world’s largest challenges Douglas Finkbeiner Machine Learning Journal Club Serhii Plokhii Historical Data and the Natural World : A Deep Dive into the Ukrainian Past Andrew Witt 3D Motion Capture of Domestic

The Summer Program for Undergraduates in Data Science (SPUDS) is a ten-week summer program, co-sponsored by Harvard College and the Harvard Data Science Initiative (HDSI), that aims to provide a formative and substantive data science research experience and to promote community, creativity, and scholarship amongst Harvard College students. Rafael Irizarry, Professor of Applied Statistics at Harvard and the Dana- Farber Cancer Institute, is faculty director of this year’s SPUDS program. The HDSI is proud to announce the 2023 participants for the third year of the Summer Program for Undergraduates in Data Science (SPUDS):

Competitive Research Fund

Karla Avalos Qi Qi Chin Alexander Glynn

Joanna Aizenberg Rational design of polymeric materials using machine learning Marcia Castro Tracking the Footprint of Mining Extraction in the Brazilian Amazon with a Near Real-Time Artisanal Mining Alert System Melanie Weber Interplay of Symmetry and Scaling in Machine Learning

Interactions for Kinetic Environments Design Alyssa Goodman Cosmic DS Margaret McConnell

Kenneth Gu Lal Kablan Sophie-An Kingsbury Lee

Joshua Park Zoe Shleifer Gabriel Sun Mark Takken Leo Vanciu Henry Wu Diamante Balcazar

Effect of household financial distress on children's health Vijay Janapa Reddi TinyML Club

thank you for your support

The HDSI is tremendously proud and deeply grateful to have brought together such an exceptional group of faculty, students, and staff. We thank everyone who has helped the HDSI reach significant milestones and become what it is today. The HDSI offers numerous ways to connect across Harvard's data science community throughout the year. We welcome you to: • Subscribe to our newsletter

• Attend an event • Apply for funding • Learn more about HDSI Corporate Members Program • Make a gift Thank you for being part of our community.

Harvard Data Science Initiative Harvard University 114 Western avenue boston, MA 02134 datascience@harvard.edu

Design by Sarah E. McCullough

