Computational Pipelining of in Silico Cardio- and Hepatotoxicity Models

Computational Pipelining of in Silico Cardio- and Hepatotoxicity Models | Upper Providence, PA

Date: October 3, 2018

Presenter: Margaret Tse, PhD

Abstract: Accelerating Therapeutics for Opportunities in Medicine (ATOM) is a public-private consortium with the goal of accelerating the drug delivery process to deliver an oncology medicine from target to patient in less than one year. We are combining high performance computing, mechanistic modelling, diverse biological data, and new biological technologies and tools to reduce the time, financial cost, and need for animal testing during the drug discovery process.

Cardio- and hepatotoxicity are the focus of our initial safety work in ATOM, since they are often key areas limiting the success of new molecular entities (NMEs). Typically, potential candidate compounds are assessed for possible safety liability using a cascading screen of high throughput cross-reactivity screening (eXP), mechanistic in vitro assays, and ultimately, animal model testing. However, traditional toxicity screening can be a slow and costly process that can be under-predictive of human safety liabilities or can potentially eliminate safe and effective candidates.

To enhance the efficiency and accuracy of preclinical screening, we are developing an in silico pipeline for assessing safety liabilities. We propose the use of machine learning (ML) models to predict the structure-activity relationship between NMEs and binding activity against key off-target molecules. To generate in vivo toxicity prediction, the outputs of the ML are used in Quantitative Systems Toxicology models, such as DILI-sim for hepatoxicity and the Comprehensive in Vitro Proarrhythmia Assay Initiative (CiPA) in silico ventricular myocyte model for cardiac liabilities. Using the combined results of our in silico safety and physiologically based pharmacokinetic models, we are developing clinically relevant, dose-specific metrics of liability to inform drug candidate selection. Our goal is to generate a new paradigm that reduces preclinical attrition by generating more comprehensive, accurate, and efficient toxicity prediction paradigms and ultimately provides a framework to reduce the need for animal trials once expanded to be sufficiently comprehensive and accurate to generate regulatory buy-in.


From Structure to Dose: An Silico Pipeline for Predicting Human Pharmacokinetics

GSK Data Science Symposium 2018 | Upper Providence, PA

Date: October 3, 2018

Presenter: Neha Murad, PhD

Title: From Structure to Dose: An Silico Pipeline for Predicting Human Pharmacokinetics

Abstract: Accelerating Therapeutics for Opportunities in Medicine (ATOM) is a public private consortium with GSK as one of its founding members. ATOM aims to accelerate drug discovery by applying a multidisciplinary approach that integrates high performance computing, diverse biological data, and artificial intelligence. As part of ATOM’s goal to go from structure to optimal dose with minimal in vivo or in vitro testing, we present a Physiologically Based Pharmacokinetic (PBPK) pipeline to support this objective. 

PBPK modeling investigates the ADME (Absorption, Distribution, Metabolism, and Excretion) properties of a drug over a course of time and provides drug concentration time profiles that provide information about the therapeutic window and helps ascertain the optimal dose. Besides information on physiology, other crucial input parameters into a PBPK model are tissue-to-plasma partition coefficients (Kp), hepatic clearance and renal clearance. These input parameters can be calculated using mechanistic models, which requires drug-specific information such as logP, pKa, fu, p, blood: plasma ratio and intrinsic clearance. We aim to evaluate and validate a series of machine learning and mechanistic models for both sets of input parameters and to deploy the best combination of models into the pipeline.  As a first step, the poster presents preliminary analysis and comparison of current mechanistic Kp prediction models in context of the PBPK pipeline (pipelined in Python) using 1263 compounds from the Obach Lombardo data set.  This work will eventually enable us to predict human PK in silico (e.g., volume of distribution, clearance) and will be part of the overall ATOM effort to optimizing efficacy, safety and PK parameters for de novo compounds in drug discovery.


Safety, Reproducibility, Performance: Accelerating cancer drug discovery with cloud, ML, and HPC technologies


4th Computational Approaches for Cancer Workshop at SC18 | Dallas, TX

Date: November 11, 2018

Speaker: Amanda Minnich, PhD

Title: Safety, Reproducibility, Performance: Accelerating cancer drug discovery with cloud, ML, and HPC technologies

Abstract: The drug discovery process is currently costly, slow, and failure-prone. It takes an average of 5.5 years to get to the clinical testing stage, and in this time millions of molecules are tested, thousands are made, and most fail.

The ATOM consortium is working to transform the drug discovery process by utilizing machine learning to pretest many molecules in silico for both safety and efficacy, reducing the costly iterative experimental cycles that are traditionally needed. The consortium comprises of Lawrence Livermore National Laboratory, GlaxoSmithKline, Frederick National Laboratory for Cancer Research, and UCSF. Through ATOM’s unique combination of partners, machine learning experts are able to use HPC supercomputers to develop models based on proprietary and public pharma data for over 2 million compounds. The goal of the consortium is to create a new paradigm of drug discovery that would drastically reduce the time from identified drug target to clinical candidate, and we intend to use oncology as the first exemplar of the platform.

To this end, we have created a computational framework to build ML models that generate all key safety and pharmacokinetics parameters needed as input for Quantitative System Pharmacology and Toxicology models. Our end-to-end pipeline first ingests raw datasets, curates them, and stores the result in our data lake. Next it extracts features from these data and trains and saves the model to our model zoo. Our pipeline generates a variety of molecular features and both shallow and deep ML models. The HPC-specific module we have developed conducts efficient parallelized search of the model hyperparameter space and reports the best-performing hyperparameters for each of these feature/model combinations.

To ensure complete traceability of results, we save the training, validation, and testing dataset version IDs, the Git hash of the code used to generate the model, and the OS- and library-related version information. We have set up a Docker/Kubernetes infrastructure, so when a promising model has been identified, we can encapsulate the pipeline that created it, supporting both reproducibility and portability. Our system is designed to handle protected data and support incorporating proprietary models, which allows the framework to be run on real drug design tasks.

Our models are currently being integrated into an active learning pipeline to aid in de novo compound generation, as well as being sent back to consortium members to incorporate into their drug discovery efforts . Our models and code will also be released to the public at the end of our 3-year proof-of-concept phase. To make these models usable externally, we have built a module that can load in a model from our model zoo and generate predictions for a list of compounds on-the-fly. If ground truth is known, a variety of performance metrics are generated and stored in our model performance tracker database, allowing for easy querying and comparison of model performance.

We are confident that this work will help to transform cancer drug discovery from a time-consuming, sequential, and high-risk process into an approach that is rapid, integrated, and with better patient outcomes.