Using GPUs to Generate Reproducible Workflows to Accelerate Drug Discovery

GPU Technology Conference 2019 | San Jose, CA

Date: March 19, 2019

Speaker: Amanda Minnich, PhD

Title: Using GPUs to Generate Reproducible Workflows to Accelerate Drug Discovery

Abstract: The existing drug discovery process is costly, slow, and in need of innovation. At ATOM, a public-private consortium consisting of LLNL, GSK, UCSF, and FNL, we built an HPC-driven drug discovery pipeline that is supported by GPU-enabled supercomputers and containerized infrastructure. We'll describe the pipeline's infrastructure, including our data lake and model zoo, and share lessons learned along the way. We'll discuss the data-driven modeling pipeline we're using to create thousands of optimized models and the critical role of GPUs in this work. We'll also share model performance results and touch on how these models are integral to ATOM's new drug discovery paradigm. By building GPU-Accelerated tools, we aim to transform drug discovery from a time-consuming and sequential process to a highly parallelized and integrated approach.

Accelerating Therapeutics for Opportunities in Medicine

2019 SIAM Conference on Computational Science and Engineering | Spokane, WA

Date: February 28, 2019

Speaker: Jonathan Allen, PhD

Title: Accelerating Therapeutic opportunities in Medicine

Abstract: Accelerating Therapeutic opportunities in Medicine (ATOM) is a consortium, which aims to accelerate drug discovery using large scale computational simulations and modeling to guide the design process in order to reduce the number of costly experiments required validation.  Efforts are underway to build a computational framework for predicting drug-like properties of molecules using a large collection of experimental pharmaceutical data supplemented with molecular simulations. Models are being designed and scaled to screen large collections of compounds using DOE high performance computers. A particularly challenging aspect of the project is to develop models that accurately predict on novel parts of the chemical space. The talk will focus on strategies for representing statistical model uncertainty and use of the model’s latent space to evaluate domain of applicability metrics to determine, which new molecules can be evaluated using existing models and where new experimental data is needed.

Three-Dimensional Molecular Representations for Deep Learning of Bioactivities

13th Women in Machine Learning Workshop at NeurIPS 2018 | Montreal, QC

Date: December 3, 2018

Speaker: Amanda Li, PhD

Authors: Amanda Li [1, 2], Seth Axen [2], Eric Stahlberg [1], Michael Keiser [2]

1 Accelerating Therapeutics for Opportunities in Medicine (ATOM), Frederick National Laboratory for Cancer Research

2 University of California, San Francisco

Title: Three-Dimensional Molecular Representations for Deep Learning of Bioactivities

Abstract: In recent years, deep learning has been increasingly applied to pharmaceutical research and drug discovery. However, to train a deep neural network to predict molecular properties, the input representations of molecular structure must invariably be reduced to one-dimensional vectors, or fingerprints, that are equal in length for all training examples. Two-dimensional (2D) molecular fingerprints are widely used, but there are inherent limitations to the similarity patterns they are able to relate. On the other hand, fingerprints which consider three-dimensional (3D) molecular shape must address the challenges of representing the multi-state and dynamic nature of molecules. In this work, we evaluate a 3D representation of molecular structure (E3FP) against its 2D counterpart (ECFP) in multi-task prediction of drug-target bioactivities, and we assess multiple strategies for incorporating multiple 3D molecular conformations, including those that take Boltzmann weighting into account. We compare the performance of these representations in training deep neural networks using publicly-available bioactivity data (ChEMBL, DrugMatrix).

Accelerating cancer drug discovery through accurate safety predictions: one goal of The ATOM consortium

SLAS 2019 Innovation and Application | Washington, DC

Date: February 6, 2019

Speaker: Sarine Markossian, PhD

Authors: Sarine Markossian [1, 2], Kenny Ang [1, 2], Thomas D. Sweitzer [2, 3], Andrew Weber [2, 3], Claire G. Jeong [2, 3], Michelle R. Arkin [1, 2]
1 ATOM, San Francisco, CA 94158, USA
2 Small Molecule Discovery Center and Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94158, USA
3 GlaxoSmithKline, King of Prussia, PA 19406, USA

Title: Accelerating Cancer Drug Discovery through Accurate Safety Predictions: One Goal of the ATOM Consortium

Abstract: The Accelerating Therapeutics for Opportunities in Medicine (ATOM) consortium is an academic, industry, and government partnership with the goal of rapidly accelerating drug discovery by integrating modeling, deep machine learning, and human-relevant complex in vitro models. One of our goals in ATOM is to optimize preclinical safety predictions, so we can incorporate predictive toxicology early in the drug discovery process. Hepatocyte toxicity, or drug induced liver injury (DILI), is a leading cause for attrition during drug development as well as one of the main reasons drugs are withdrawn from the market. We will describe our efforts to profile multiple 2D and 3D High Content assay formats to measure and predict hepatocyte toxicity. These multi-parametric data, coupled with Quantitative Systems Toxicology (QST) tools and deep machine learning, will allow us to predict DILI from the structure of a proposed drug lead. Currently, drug discovery is a slow and sequential process with a high rate of failure. By integrating high-performance computing and human-relevant in vitro models, we plan to transform drug discovery into a rapid, integrated, and patient-centric model.

Computational Pipelining of in silico Cardio- and Hepatotoxicity Models

GSK Data Science Symposium 2018 | Upper Providence, PA

Date: October 3, 2018

Presenter: Margaret Tse, PhD

Title: Computational Pipelining of in silico Cardio- and Hepatotoxicity Models

Abstract: Accelerating Therapeutics for Opportunities in Medicine (ATOM) is a public-private consortium with the goal of accelerating the drug delivery process to deliver an oncology medicine from target to patient in less than one year. We are combining high performance computing, mechanistic modelling, diverse biological data, and new biological technologies and tools to reduce the time, financial cost, and need for animal testing during the drug discovery process.

Cardio- and hepatotoxicity are the focus of our initial safety work in ATOM, since they are often key areas limiting the success of new molecular entities (NMEs). Typically, potential candidate compounds are assessed for possible safety liability using a cascading screen of high throughput cross-reactivity screening (eXP), mechanistic in vitro assays, and ultimately, animal model testing. However, traditional toxicity screening can be a slow and costly process that can be under-predictive of human safety liabilities or can potentially eliminate safe and effective candidates.

To enhance the efficiency and accuracy of preclinical screening, we are developing an in silico pipeline for assessing safety liabilities. We propose the use of machine learning (ML) models to predict the structure-activity relationship between NMEs and binding activity against key off-target molecules. To generate in vivo toxicity prediction, the outputs of the ML are used in Quantitative Systems Toxicology models, such as DILI-sim for hepatoxicity and the Comprehensive in Vitro Proarrhythmia Assay Initiative (CiPA) in silico ventricular myocyte model for cardiac liabilities. Using the combined results of our in silico safety and physiologically based pharmacokinetic models, we are developing clinically relevant, dose-specific metrics of liability to inform drug candidate selection. Our goal is to generate a new paradigm that reduces preclinical attrition by generating more comprehensive, accurate, and efficient toxicity prediction paradigms and ultimately provides a framework to reduce the need for animal trials once expanded to be sufficiently comprehensive and accurate to generate regulatory buy-in.


From Structure to Dose: An in silico Pipeline for Predicting Human Pharmacokinetics

GSK Data Science Symposium 2018 | Upper Providence, PA

Date: October 3, 2018

Presenter: Neha Murad, PhD

Title: From Structure to Dose: An in silico Pipeline for Predicting Human Pharmacokinetics

Abstract: Accelerating Therapeutics for Opportunities in Medicine (ATOM) is a public private consortium with GSK as one of its founding members. ATOM aims to accelerate drug discovery by applying a multidisciplinary approach that integrates high performance computing, diverse biological data, and artificial intelligence. As part of ATOM’s goal to go from structure to optimal dose with minimal in vivo or in vitro testing, we present a Physiologically Based Pharmacokinetic (PBPK) pipeline to support this objective. 

PBPK modeling investigates the ADME (Absorption, Distribution, Metabolism, and Excretion) properties of a drug over a course of time and provides drug concentration time profiles that provide information about the therapeutic window and helps ascertain the optimal dose. Besides information on physiology, other crucial input parameters into a PBPK model are tissue-to-plasma partition coefficients (Kp), hepatic clearance and renal clearance. These input parameters can be calculated using mechanistic models, which requires drug-specific information such as logP, pKa, fu, p, blood: plasma ratio and intrinsic clearance. We aim to evaluate and validate a series of machine learning and mechanistic models for both sets of input parameters and to deploy the best combination of models into the pipeline.  As a first step, the poster presents preliminary analysis and comparison of current mechanistic Kp prediction models in context of the PBPK pipeline (pipelined in Python) using 1263 compounds from the Obach Lombardo data set.  This work will eventually enable us to predict human PK in silico (e.g., volume of distribution, clearance) and will be part of the overall ATOM effort to optimizing efficacy, safety and PK parameters for de novo compounds in drug discovery.


Safety, Reproducibility, Performance: Accelerating cancer drug discovery with cloud, ML, and HPC technologies


4th Computational Approaches for Cancer Workshop at SC18 | Dallas, TX

Date: November 11, 2018

Speaker: Amanda Minnich, PhD

Title: Safety, Reproducibility, Performance: Accelerating cancer drug discovery with cloud, ML, and HPC technologies

Abstract: The drug discovery process is currently costly, slow, and failure-prone. It takes an average of 5.5 years to get to the clinical testing stage, and in this time millions of molecules are tested, thousands are made, and most fail.

The ATOM consortium is working to transform the drug discovery process by utilizing machine learning to pretest many molecules in silico for both safety and efficacy, reducing the costly iterative experimental cycles that are traditionally needed. The consortium comprises of Lawrence Livermore National Laboratory, GlaxoSmithKline, Frederick National Laboratory for Cancer Research, and UCSF. Through ATOM’s unique combination of partners, machine learning experts are able to use HPC supercomputers to develop models based on proprietary and public pharma data for over 2 million compounds. The goal of the consortium is to create a new paradigm of drug discovery that would drastically reduce the time from identified drug target to clinical candidate, and we intend to use oncology as the first exemplar of the platform.

To this end, we have created a computational framework to build ML models that generate all key safety and pharmacokinetics parameters needed as input for Quantitative System Pharmacology and Toxicology models. Our end-to-end pipeline first ingests raw datasets, curates them, and stores the result in our data lake. Next it extracts features from these data and trains and saves the model to our model zoo. Our pipeline generates a variety of molecular features and both shallow and deep ML models. The HPC-specific module we have developed conducts efficient parallelized search of the model hyperparameter space and reports the best-performing hyperparameters for each of these feature/model combinations.

To ensure complete traceability of results, we save the training, validation, and testing dataset version IDs, the Git hash of the code used to generate the model, and the OS- and library-related version information. We have set up a Docker/Kubernetes infrastructure, so when a promising model has been identified, we can encapsulate the pipeline that created it, supporting both reproducibility and portability. Our system is designed to handle protected data and support incorporating proprietary models, which allows the framework to be run on real drug design tasks.

Our models are currently being integrated into an active learning pipeline to aid in de novo compound generation, as well as being sent back to consortium members to incorporate into their drug discovery efforts . Our models and code will also be released to the public at the end of our 3-year proof-of-concept phase. To make these models usable externally, we have built a module that can load in a model from our model zoo and generate predictions for a list of compounds on-the-fly. If ground truth is known, a variety of performance metrics are generated and stored in our model performance tracker database, allowing for easy querying and comparison of model performance.

We are confident that this work will help to transform cancer drug discovery from a time-consuming, sequential, and high-risk process into an approach that is rapid, integrated, and with better patient outcomes.