New collaboration led by Notre Dame uses data revolution to solve current challenges in chemistry

by

Olaf Wiest

Olaf Wiest

A multi-university collaboration led by the University of Notre Dame will use data-driven approaches to make the synthesis of complex organic molecules more predictable and efficient.

Olaf Wiest, professor in the Department of Chemistry and Biochemistry, will direct the Center for Computer-Assisted Synthesis (C-CAS). “This will significantly accelerate progress in drug discovery and materials science where such molecules are critical to fundamental research,” Wiest said.

The goal of C-CAS is to transform how the synthesis of complex organic molecules is planned and executed through applying principles of data science and machine learning to chemistry. C-CAS also trains new “data chemists” who are able to bridge the divide between data science and chemical synthesis by using quantitative, data-driven approaches to chemistry.

“C-CAS provides the opportunity for data scientists to work in alliance with computational and experimental chemists to address the bottleneck in most syntheses: the selection and optimization of individual steps in a rational fashion,” said Wiest.

In addition to Wiest and Nitesh Chawla, the Frank Freimann Professor of Computer Science and Engineering and direct of iCeNSA at Notre Dame, other collaborators include Richmond Sarpong of the University of California, Berkeley; Robert Paton of Colorado State University; Abigail Doyle of Princeton University; and Matthew Sigman of the University of Utah.

The National Science Foundation (NSF) is supporting C-CAS with $2 million in funding. In 2017, the NSF announced its 10 Big Ideas, encompassing a long-term research agenda to benefit future generations. Of the 10, the C-CAS team falls under Harnessing the Data Revolution, and is supported through the Centers for Chemical Innovation Program of the Division of Chemistry. Nine centers are currently in existence, with the NSF creating two to three each year. As a Phase One Center, C-CAS will run for three years, with potential for extension and expansion into a Phase Two Center in the future.

Each lead investigator has complementary expertise. Wiest uses both computational chemistry and experimental methods to elucidate reaction mechanisms and to perform high-throughput calculations on transition structures. Chawla specializes in machine learning. Paton uses computational algorithms to understand catalytic reaction mechanisms and to enhance performance. Sigman develops physical-organic approaches to understand and predict selectivity in organic reactions. Doyle uses ultra-high-throughput experimentation technology and computational machine learning to predict the outcomes of reactions. Sarpong focuses on total synthesis, converting simpler chemical building blocks into complex, medicinally interesting natural products.

In addition to the researchers at various institutions, the group will also work with a number of industrial partners such as large pharmaceutical, chemical and information technology companies. This will allow the practical application of the findings in C-CAS to innovate processed in these industries.

Currently, chemistry data is recorded and shared in myriad ways: in laboratory books, proprietary databases or doctoral theses, or published in papers, online PDFs or patents. Wiest and his team are working to build new computational tools to bring all that data together in one accessible place. To do this, they will work in three parallel but interconnected thrusts. They will unify data from a variety of sources, exploit that unified data to represent chemistry in a way that addresses the problems with optimizing chemical reactions, and apply the data to synthesis planning and the synthesis of complex molecules.

C-CAS will provide training opportunities for a new generation of data chemists and machine learning. Scholars can be trained to bridge the gap between data science and the complex challenges of modern synthetic chemistry. It will also offer a number of research opportunities, especially for scientists with disabilities. C-CAS will use mass and social media as well as in-person communication to engage a broader audience in a discussion of the role and impact of machine learning in modern society.

Visit ccas.nd.edu or follow C-CAS on Twitter at @NSF_CCAS for more information.

Originally published by Tammi Freehling at science.nd.edu on Sept. 3.