HDSI Annual Report 2021–2022

20 22 ar ne pn ou ra lt HDSI


08 Letter from the Executive Director 23 HDSI Summer 2022 Public Service Data Science Fellows 33 HDSI Steering Committee Members 11 Year in Review 24 HDSI Internal Awards 34 The Future of the HDSI Table of Contents

04 Letter from the Faculty Co-Directors 16 Harvard Data Science Review 30 Meet the HDSI Team

06 Equity, Diversity, + Inclusion 22 HDSI 2022 Postdoctoral Fellows 32 HDSI Faculty Affiliates + Corporate Members

14 HDSI in the News Over the Years 12 2021–2022 HDSI Events 28 Trust in Science Project

“Looking ahead, we are excited to build on the successes of the past year to further advance the role data science can play in driving change.” ”

Francesca Dominici + David C. Parkes FROM OUR FACULTY CO-DIRECTORS

D ear friends, This past year has been a re- markable one for the data sci- ence commu- nity at Harvard. Research- ers across the University have used data to drive our response to a global pan- demic that has affected ev- ery aspect of our society. Students and faculty have expanded our response to the injustice of persistent, systemic racism to look at bias within data science it- self. And, those responding to the war in Ukraine have embraced data science as a tool to facilitate humanitar- ian aid. These examples, borne of crisis and disrup- tion, have demonstrated the potential of data sci- ence to have impact on the most promising challeng- es faced by society. We are deeply proud of the Har- vard faculty, students, and staff who have responded to these issues with talent and dedication. The year 2022 is re- markable in another way: it marks the HDSI’s 5 th An- niversary. While we were not able to celebrate in person as we had hoped, we reached thousands of individuals through on- line seminars, panels, and workshops. These events showcased both method- ological innovations—for example, causal inference, visualization, or explain- able machine learning— and research that spans disciplines to ensure that data science accelerates advances in all fields. Pro - grammatic efforts like the HDSI’s Trust in Science

Project demonstrate the value of investing deeply in topics of critical societal importance. The past year also saw the launch of new ed- ucational efforts like the Summer Program for Un - dergraduates in Data Sci- ence (SPUDS) that build on the enthusiasm of our stu- dents to develop their data science skills and explore data-driven approaches to old and new questions alike. And, we were able to continue to reach audi- ences outside of Harvard’s walls through Harvard Data Science Review , HD- SI’s award-winning jour- nal, and its newly launched podcast. Looking ahead, we are excited to build on the successes of the past year to further advance the role data science can play in driving change. At the core of the HDSI’s mission has been a commitment to rigor, reproducibility, and contextual understanding, each of which is important in driving impact. We also remain dedicated to ensur- ing that methodologies ad- vance hand in hand with careful consideration of the ethical consequences of their application. Harvard’s breadth of scholarship and the talents of our faculty, students, and staff propel forward our as- pirations for the Harvard Data Science Initiative. We have seen how collabora- tive efforts across disci- plines can create knowl- edge and drive change. We look forward to supporting those efforts in the future and reporting back in 2023 on our work.



George F. Colony Professor of Computer Science, Harvard John A. Paulson School of Engineering and Applied Sciences

Clarence James Gamble Professor of Biostatistics, Population, and Data Science, Harvard T.H. Chan School of Public Health

Equity, Diversity, + Inclusion

At the Harvard Data Science Initiative (HDSI), we actively seek, celebrate, and welcome peo - ple with diverse backgrounds, experiences, and identities. The HDSI is committed to offering opportunities and programs to everyone in order to help individuals succeed and advance the field of data science. We acknowledge that every person’s ability to achieve professional advancement, fulfillment, and security is impacted by factors like one’s race, gender, health, and economic status. These social inequities prevent a significant portion of the world’s population from accessing and pursuing education or gainful employment. We at the HDSI actively think about ways in which we can inspire meaningful social change and emphasize there is strength in diversity. In adherence to Harvard University’s diversity and inclusion values, we will continue to advocate for an increase in diversity and sense of belonging among everyone in our community and beyond—regardless of gender identity and expression, sexual orientation, race, ethnicity, religion, immigration status, nationality, so- cioeconomic status, or disability.

Black History Month + Data Science, February 2022 Women’s History Month + Data Science, March 2022 Asian Pacific American Heritage Month, May 2022 Pride Month + Data Science, June 2022 HOW WE CELEBRATED DIVERSITY THIS YEAR:


that time, the HDSI has: T

he Harvard Data Science Initiative has at its core a goal of driving new inquiry across disciplines through data. Since its launch in 2017, the HDSI has worked to connect computer scientists, statisticians, and domain experts from law, busi- ness, public policy, education, medicine, public health, and the myriad academic disciplines represented in Harvard’s twelve schools and the Radcliffe Institute. In






Year {in} Review

“We serve the data science community at Harvard and beyond— and through this service, we will transform what the world can achieve through data.”

In the 2021—2022 academic year, the HDSI brought together thousands of mem- bers of the Harvard community and the broader data science community to ex- plore topics such as artificial intelligence, causal inference, data visualization, and more. This year we also planned a range of events to celebrate the 5 th Anni- versary of the Initiative.

This breadth of activity has demanded fi - nancial, governance, and administrative models that enable cross-school coordination and facil- itate interdisciplinary collaboration. As the Ini- tiative has grown, its activities have been sup- ported by almost $20M in extramural support. Its impact has been amplified by the dedication of faculty who have reviewed funding propos- als, served on governance committees, and run workshops, seminars, and events. Its complex operations run smoothly because of the efforts of a team of talented staff who have embraced new challenges and innovation. The first five years of any endeavor pres - ent ample opportunities to experiment, succeed, and fail, often all at once. From this period of intense learning, we have emerged with a deep understanding of our guiding principles: We act collaboratively and nimbly. We pair expansive thinking with concrete action. We serve the data science community at Harvard and beyond—and through this service, we will transform what the world can achieve through data.

35 5



9 2,400+



HDSI events Celebrating community, innovation, + impact.


• Data Visualization • From Analysis to Action: Engaging Through Storytelling

PANEL AND KEYNOTE DISCUSSIONS FEATURING SPEAKERS FROM: • AWS Global Impact Computing • Brigham and Women’s Hospital • Dana-Farber Cancer Institute • data.org • Elsevier • Machine Learning-Based Cluster Analysis • Statistical Cluster Analysis and Space- Time Analysis

• Harvard Business School • Harvard Medical School • Harvard T.H. Chan School of Public Health • McKinsey & Company • The New York Times

• DraftKings • Toyota • Soroco INDUSTRY SEMINAR GUESTS:

• Indigo Ag • Oxfam • Montai Health / Flagship Pioneering




Harvard launches data science initiative The Harvard Gazette

What a difference a year of data science makes The Harvard Gazette 2018

Data science for a new era The Harvard Gazette

MIT Press and Harvard Data Science Initiative launch the Harvard Data Science Review MIT NEWS



An emergency response team for data? The Harvard Gazette


Harvard journal speaks to publishers’ association The Harvard Gazette

MA CI HL IE ES VT OE MN EE SN T+ S HARVARD DATA SCIENCE REVIEW INAUGURAL VOLUME (ISSUE 1.1, SUMMER 2019) Harvard Data Science Review (HDSR) operates as an open ac- cess platform of the Harvard Data Science Initiative. The HDSR is dedicated to sharing foundational thinking, research milestones, educational inno- vations, and major applications, in addition to emphasizing data science as a globally impactful, multidisciplinary field.



Since the release of the Inaugural Volume in 2019, HDSR publishes four regular issues per year, plus special issues. The HDSR aims to serve as a new and unique digital platform that reflects the synergistic nature of data sci- ence and provides a crossroads at which fundamental data sci- ence research and education in- tersect with societally important applications from industry, gov- ernments, NGOs, and others.

NAMED BEST NEW JOURNAL IN JANUARY 2021! HDSR started 2021 on a very high note with the exciting news that it won the American Association of Publishers’ 2021 PROSE Award for Best New Journal in Science, Technology & Medicine. The annual PROSE Awards re - cognize the very best in profes- sional and scholarly publishing by celebrating landmark works that have been significant ad - vancements in their respective fields of study each year.

As an open access platform of the Harvard Data Science Ini- tiative, Harvard Data Science Review features foundational thinking, research milestones, educational innovations, and major applications, with a pri- mary emphasis on reproducibi- lity, replicability, and readability. It aims to publish content that help define and shape data scien - ce as a scientifically rigorous and globally impactful multidis- ciplinary field based on the prin - cipled and purposed production, processing, parsing, and analy-

sis of data. By uniting the streng- ths of a premier research journal, a cutting-edge educational publi- cation, and a popular magazine, HDSR provides a crossroads at which fundamental data science research and education intersect directly with societally impor- tant applications from industry, governments, NGOs, and others. By disseminating inspiring, in- formative, and intriguing arti- cles and media materials, HDSR aspires to be a global forum on everything data science and data science for everyone .


” hdsr partnerships THE NATIONAL CENTER FOR SCIENCE AND ENGINEERING STATISTICS + THE COLERIDGE INITIATIVE CO-HOSTED SYMPOSIUM WITH IOM + UNHCR IN MAY 2021 In May 2021, HDSR co-hosted a 3-day virtual global event, World Migration & Displacement Sym- posium: Data, Disinformation and Human Mobility, with the Inter- national Organization for Migra- tion (IOM) and the UN Refugee Agency (UNHCR). Across the three days, the event featured high-level speakers from three organizations as well as guests from industry, NGOs, and academia who contributed to the conversation on the themes of disinformation, vulnerable po- pulations, and COVID-19. The event concluded with a vir- tual Fireside Chat with Founding Editor-in-Chief Xiao-Li Meng and IOM Director General António Vi- torino. In January 2022, HDSR published a special collection of articles to enable further discussion of the work that came out of the sympo- sium. The guest editors represen- ted both partnering UN organiza- tions. The HDSR Podcast aims to show news, policy, and business throu- gh the lens of data science. Each episode is a ‘case study’ into how data is used to lead, mislead, ma- nipulate, and inform the impor- tant decisions facing us today. Topics have ranged from heal- thcare data, the future of AI, and dating apps, to sports analytics, pollsters, and the wine industry. 40% of podcast listeners reside outside of the United States. LAUNCHED MONTHLY PODCAST IN FEBRUARY 2021 Harvard President Larry Bacow “Data science opens up thrilling possibilities for understanding and shaping the world, and HDSR offers a powerful platform for sharing those possibilities with an increasingly wide readership.”

AWARDED SLOAN FOUNDATION GRANT IN NOVEMBER 2021 HDSR ’s funding request of $50,000 to support the print and online publication of its forthco- ming special issue titled “Diffe- rential Privacy for the 2020 U.S. Census: Can We Make Data Both Private and Useful? ” was appro- ved by the Alfred P. Sloan Foun - dation in November 2021.

EDITORIAL + STAFFING CHANGES IN JULY 2021–JANUARY 2022 On July 1, 2021, HDSI faculty Co- Chairs Francesca Dominici and David Parkes became Interim Co-Editors-in-Chief for HDSR while Founding Editor-in-Chief Xiao-Li Meng stepped away for a much-needed sabbatical. A new roster of co-editors re- presenting major areas of data science have been appointed by David and Francesca as well as a number of new associate edi- tors. Amara Deis joined HDSR as the new editorial and administrati- ve coordinator in January 2022, succeeding Paige Sammartino, who left the previous August to attend school full time.

sing new data, products and use resulting from recent data in- vestments. The speakers, who are experts from social science and computer science, discus- sed how new ways of connec- ting and linking data can ad- vance the empirical basis of our understanding of the value of science. A special theme of papers based on the conference will be pub- lished in HDSR’ s forthcoming spring issue (4.2), due out in late April, 2022.

In June 2021, HDSR and HDSI co-hosted the Value of Science: Data Products and Use Conferen - ce with Coleridge Initiative and the National Center for Science and Engineering Statistics. The Value of Science: Data, Pro - ducts & Use Conference is inten- ded to advance understanding of the value of data by showca-

readership stats


From January 2021 through May 2022, HDSR had 915K page views and 458K unique users; 60% which are outside of the United States.




submission stats


Between January 2021 and March 2022, the HDSR re- ceived 199 total submis- sions, including 147 in- vited manuscripts and 52 manuscript proposals.





publication stats


Between January 2021 and June 2022, the HDSR pub- lished 6 issues with 3 special themes, which included 91 articles and 57 commentaries. • Official Statistics from the Changing World of Data Science (October 2021) • World Migration and Displace- ment: Data, Disinformation and Human Mobility (January 2022) • Value of Science (April 2022)






The Harvard Data Science Ini- tiative (HDSI) congratulates and introduces Harvard Master’s stu- dents Esther Brown and Yi-Ting Tsai as the Summer 2022 HDSI Public Service Data Science Grad - uate Fellows. During this intern- ship, Brown and Tsai will gain experience applying responsible data science to address social challenges at not-for-profit and public sector organizations.

ESTHER BROWN Summer 2022 HDSI Public Service Data Science Graduate Fellow

YI-TING TSAI Summer 2022 HDSI Public Service Data Science Graduate Fellow

The Harvard Data Science Initiative (HDSI) welcomes and introduces Ivana Malenica, George Dasoulas, and Esther Rolf as its 2022 HDSI Postdoctoral Fellows. The HDSI Postdoctoral Fellows will work in - dependently over their two-to-three-year fellowships with the guidance and part- nership of Harvard University faculty. This year’s Fellows are exceptional early-career researchers with doctoral degrees in bios-

tatistics and computer science whose in- terests lie at the intersection of machine learning and several different fields. More information: • View the list of all previous fellows. • View resources for prospective fellows. • Join our mailing list to be notified when the application reopens in Fall 2022.

More information: • Learn more about all previous awardees. • Learn more about the Fellowship.

POSTDOCTORAL RESEARCH AWARDS The Harvard Data Science Initiative Postdoctoral Fellow Research Fund incentivizes and supports cross- disciplinary collaboration between data scientists at the postdoctoral level.


RECIPIENTS OF THE POSTDOCTORAL RESEARCH AWARDS FOR FY 2022 ARE: Tian Gu , Department of Biostatistics, Harvard T.H. Chan School of Public Health Qianwen Wang , Department of Biomedical Informatics, Harvard Medical School Elisabeth Webb , McLean Hospital, Harvard Medical School Haichao Wu , Department of Materials Science and Mechanical Engineering, Harvard John A. Paulson School of Engineering and Applied Sciences

faculty Special Projects Awards The Harvard Data Sci- ence Initiative Faculty Special Projects Fund, an award fund intend- ed to support one-time data science opportu- nities for which other funding is not readily available, has been run- ning since 2019.

RECIPIENTS OF THE SPECIAL PROJECTS AWARD FOR FY 2022 ARE: Jeffrey Schnapp TOP @ Harvard Karestan Koenen Climate Change and Mental Health in Madagascar: A Health Systems Ecological Approach Kelly McConville Data Science Book Club Natalia Linos The Constellations Project Pavlos Protopapas Automated Captioning of Data Visualizations Ryoko Sato Workshop to Establish Collaborative Partnership for Costing Study of Innovative Digital Health Technology in Nigeria

COMPETITIVE RESEARCH AWARDS The Harvard Data Science Ini- tiative Competitive Research Fund provides targeted seed and bridge funding to Harvard facul- ty who propose novel methods, innovations, or solutions to data science challenges. Since 2017 the HDSI has provided over $1.5 million in funding across the University.

RECIPIENTS OF THE COMPETITIVE RESEARCH AWARDS FOR FY 2022 ARE: Jill Lepore Amend: Rewriting the Constitution Sharad Goel Designing and Evaluating Reinforcement Learning Algorithms to Reduce Pretrial Incarceration Pranav Rajpurkar Self-Supervision for Label-Efficient Medical Image Interpretation





$3,131,307 CUMULATIVE AWARDS TOTAL 2019-2022



Trust {In} Science Project HDSI AFFILIATE

Trust in Science Project is a flagship project of the Harvard Data Science Ini- tiative (HDSI), conducted in collabora - tion with the Harvard Kennedy School's Program on Science, Technology & Soci - ety (STS). At a time of seemingly wide - spread loss of confidence in science and expertise, the Project seeks to illu - minate the varied factors that currently impede trusting relations between the producers and consumers of scientific information. It leverages data science, science and technology studies, and re- lated disciplines to analyze the break- downs in public trust, and to ask what steps could be taken to promote better mutual understanding. The Project supports Harvard facul - ty-led research efforts, workshops, conferences, symposia, and external engagements to amplify the impact of funded work. It is run by Professor Shei - la Jasanoff and Dr. Sam Weiss Evans, and overseen by internal and external advisory boards. To date, the Project has disbursed over $600,000 to faculty across the University on a wide variety of projects. The first round of research focused on trust issues around the COVID-19 pandemic, which examined issues ranging from differential priva-

cy for contact tracing to data visual- ization of information flow to chat bots designed to counter COVID-19 misinfor- mation. Projects in the second round an - alyzed ways that trust in science might be affected by the role of the “last mile” between the generation and consump- tion of knowledge on COVID, as well as the ways scientific information is visu - alized. The Project is currently embark - ing on a third round of funding focusing on the topic of vaccine hesitancy. The Project has also hosted a series of events, from a workshop on whether the idea of a “right to truth” should be advanced internationally to a roundta- ble with four leading experts from aca- demia and public life on the many crises of trust in expertise that marked 2020, from pandemic response to the census to police reform. The Trust in Science Project is made possible by generous gifts from Bayer and Microsoft.

“Sheila Jasanoff, the Pforzheimer Professor of Science and Technology Studies at Harvard Kennedy School, has been awarded the 2022 Holberg Prize, among the world’s most prestigious awards for academic work in the humanities and social sciences.” ”

Harvard Kennedy School Jasanoff (right)


Elizabeth Langdon-Gray HDSI Executive Director

Francesca Dominici HDSI Faculty Co-Director

David C. Parkes HDSI Faculty Co-Director

Jennifer Chow HDSI Director of External Engagement

HDSI Assistant Director of Programs + Operations Kevin Doyle

Sarah McCullough HDSI Events + Engagement Coordinator

Xiao-Li Meng HDSR Editor-in-Chief

Rebecca McLeod HDSR Managing Director

Amara Deis HDSR Editorial + Administrative Coordinator

Sheila Jasanoff TiS Project Faculty Lead

Sam Weiss Evans TiS Research Fellow

D H PEOPLE OF THE HDSI affiliates faculty The Harvard Data Science Initiative Affiliates Pro - gram supports Harvard faculty who are actively engaged in advancing data science methodologies and applications or data science teaching. Learn about the benefits of becoming a HDSI Faculty Affiliate and how to apply . Eva Ascarza Demba Ba Michael Baym Elena Glassman Bethany Hedt- Gauthier Peter Huybers Scott Kominers Dustin Tingley Xiang Zhou Marinka Zitnik Edo Berger Marcia Castro Mark Glickman Christopher Golden Alyssa Goodman Stratos Idreos Kosuke Imai Luke Miratrix Rachel Nethery John Quackenbush Rui Duan Anders Jensen Rema Hanna Kun-Hsing Yu Soroush Saghafian Faisal Mahmood Ata Kiapour Vesela Kovacheva Jeffrey Schnap Jonathan Zittrain Christopher Winship Rafael Irizarry Hanspeter Pfister Gabriel Kreindler Fiery Cushman Nils Gehlenborg Sean Eddy Finale Doshi-Velez Satchit Balsari James Mickens Hossein Estiri Pierre Jacob Adam Haber Flavio du Pin Calmon HDSI Corporate Members share our commitment to transformational research that will have widespread impact, and represent sectors including the life sciences, business consulting, technology, finance, data analytics and publishing. For more information about the HDSI Corporate Members Program, please email datascience@harvard.edu. cmoermpboerrast e • Amazon • Bayer • EARNEST Partners • Elsevier • Harmony Analytics • McKinsey & Company • Microsoft CURRENT HDSI CORPORATE MEMBERS:

cmoemmmbietrtsee steering

The Harvard Data Science Initiative is advised by university-wide faculty committees, chaired by HDSI Faculty Co-Directors Francesca Dominici and David Parkes.

S I Gary King , Albert J. Weatherhead III University Profes- sor, Director of the Institute for Quan- titative Social Sci- ence Hanspeter Pfister , An Wang Professor of Computer Science; Affiliate Faculty Member, Center for Brain Science Isaac Kohane , Marion V. Nelson Professor & Chair, Department of Biomedical Infor- matics, Harvard Med- ical School Alyssa Goodman , Rob- ert Wheeler Wilson Professor of Applied Astronomy Department of Astronomy, Har- vard Faculty of Arts and Sciences

John Quackenbush , Henry Pickering Wal- cott Professor of Computational Bi- ology and Bioin- formatics; Chair, Department of Bio- statistics Neil Shepard , Frank B. Baird, Jr. Pro- fessor of Science, Economics and Sta- tistics Departments Xiao-Li Meng , Whip- ple V.N. Jones Professor of Sta- tistics, Harvard Faculty of Arts & Sciences; Editor in Chief of Harvard Data Science Review

Giovanni Parmigiani Mauricio Santillana David Yang Jose Zubizarreta Xiaole Shirley Liu Mohammad Jalali Fabian Wermelinger

Joscha Legewie Iavor Bojinov Yangming Ou Tanujit Dey

Chris Tanner Jeremy Yang

HDAATRAVASCRIDENCE INITIATIVE Celebrating 5 Years of Innovation + Serving the Harvard and Data Science Communities Harvard Data Science Initiative Harvard University 44R Brattle Street, Cambridge, MA datascience@harvard.edu




ver the past five years, the Harvard Data Science Initiative (HDSI) has been at the heart of efforts to advance the field of data science through the de-

We look forward to continuing to build our community and to drive impact by advancing methodologies hand in hand with applications. We welcome your interest and encourage members of the Harvard community and beyond to come to us with ideas for how we can do this and what you would like to see from us in the future. The HDSI is tremendously proud and deeply grateful to have brought to- gether such an exceptional group of faculty, students, and staff. We thank everyone who has helped the HDSI reach significant milestones and be - come what it is today. If you would like to learn more about our team, our mis- sion, or how to join us, please visit our website. See you next year! The continued growth of our community is critical to our success. There are many ways to join us, including: • Become an HDSI Faculty Affiliate . • Apply for funding from our programs. • Pitch an idea for an article in Harvard Data Science Review. • Attend one of our seminars or events. • Learn about our Corporate Members Program . • Donate to keep our work growing. • Subscribe to our weekly newsletter.

velopment of new methodologies and the application of data science to im- portant societal challenges. Research- ers have designed and applied statis- tics and computer science tools to aid efforts to to understand, address, and combat the SARS-CoV-2 virus. Fac- ulty, staff, and students have tackled bias and racial inequities by activat- ing data science at the intersection of policy, and by turning a critical lens on bias in data science itself. Facul- ty-led research teams have advanced our understanding of everything from disease to history, from regulatory sys- tems to planetary systems. In addition, Harvard Data Science Review provid- ed a platform for debate and ampli- fied impact, publishing a special issue dedicated to COVID-19 that fostered actionable insights and conversation about how to leverage data science in the battle against the pandemic. These efforts have demonstrated ways in which data science can be applied to meet the challenges facing today’s so- ciety and how it can guide us through times of uncertainty. Since the launch of the HDSI in 2017, we have worked across the university to connect lead- ers from a range of academic disci- plines and to amplify data science’s impact.


Page 1 Page 2-3 Page 4-5 Page 6-7 Page 8-9 Page 10-11 Page 12-13 Page 14-15 Page 16-17 Page 18-19 Page 20-21 Page 22-23 Page 24-25 Page 26-27 Page 28-29 Page 30-31 Page 32-33 Page 34-35

Made with FlippingBook Digital Publishing Software