2021 Spring Allotrope Connect Info & Reg.| Allotrope Foundation

2021 Spring Allotrope™ Connect

Presentations from Day 1 and 2 are now available on YouTube. Day 3 will be posted once they are processed.

Allotrope Connect:

Bringing together Allotrope members and the broader scientific community to discuss how we are delivering faster insights from new and existing data within organizations by improving standardization of data and its interpretation across laboratory and manufacturing operations.

With over 30 data models, and growing, Allotrope is leading the way in data standardization thereby allowing quicker access to insights within data.

Registration Dates and Links - April 19, 22, 26

Day 1, Monday April 19, 10:00am-12:00pm EDT: Register
Day 2, Thursday April 22, 10:00am-12:00pm EDT: Register
Day 3, Monday April 26, 10:00am-12:00pm EDT: Register

Agenda

Allotrope Connect Agenda

(All times in EDT (GMT-4) and are 30-minute sessions with the exception of lightning talks which are 10min+5min discussion.)

Apr. 19th

10:00am - Introduction & Product Team Update

Presenters and Co-Authors

Vinny Antonucci (Allotrope Foundation Chair, Director of Informatics, Merck)
Matthew Fox (Product Dir., Allotrope Foundation)

Vinny Antonucci Matthew Fox

Abstract / Summary

Allotrope Connect introduction, product update and recent Allotrope Board decisions

-----------------------------------------------

10:30am - Harvesting the digital dividends - Allotrope at Bayer

Presenters* and Co-Authors (Bayer)

Michael Rosskopf*
Henning Kayser*
Eric Schuette*
Cihangir Ezel
Helge Klemmer
Matthias Keck

Michael Rosskopf Henning Kayser Eric Schuette Cihangir Ezel Helge Klemmer

Abstract / Summary

The LifeScience industry undergoes a critical digital transformation which requires to develop new and adapt existing business and innovation models to a new pace. Due to the lack of standardized data access patterns in the past decades, analytical and other wet labs face severe challenges by adopting common digitalization trends as central data aggregation and modularized centrally orchestrated workflows combined with data modelling.

Allotrope has the potential to be a central pillar to address these points. It can help in getting sounder scientific decisions in less time and even help proving hypotheses beyond the initial experimental intention. Due to its high level of maturity, it can break up old structures and simplify system architectures.

We at Bayer will show some ideas on how to make use of Allotrope technologies. We will discuss ideas on future data workflows and give insights into initiatives from our divisions Pharmaceuticals and Crop Sciences. This comprises standardized ADF based output formats of lab instrumentation containing semantic information for data transfer to downstream LIMS and ELN systems which can also be used for data preservation and analytics.

-----------------------------------------------

11:00am - Increasing the scientific visibility of the Allotrope Foundation

Presenter

Dennis Della Corte (BYU)

Dennis Della Corte

Abstract / Summary

Brigham Young University has partnered with the Allotrope Foundation in 2020 to write a proceedings and history report for the Allotrope Connect in Fall 2020. The resulting manuscript is currently in review with Drug Discovery Today and publication is soon anticipated. This presentation will outline the process, show highlights from the paper, and suggest how it can be used for effective internal communication at member and partner companies. In a second part, we will provide an overview of a new manuscript that performs a use cased based analysis of ADF and three other data standards. We will share the current state of the analysis and invite member and partner companies to contribute to it. Finally, we will provide an outlook into planned research articles which evaluate the Allotrope standard for other use cases.

-----------------------------------------------

11:30am - Unlock Insights from Experimental Data and Empower Data Science

Presenters and Co-Authors

Spin Wang (CTO, TetraScience)
Mike Tarselli (CSO, TetraScience)

Spin Wang Mike Tarselli

Abstract / Summary

The Waters Empower Data Science Link (EDSL), powered by TetraScience, enables scientists and engineers to do more with their valuable experimental data. From enabling chromatography fleet management to deducing new trends across experiments or injections - making lab data FAIR and usable accelerates the move to a digital lab. This presentation will cover how the data science link helps support Allotrope’s mission (universal formats, interoperability, simplified IT and system integration) and promotes improved data science while using existing resources.

This session will focus on:

High-throughput, automatic acquisition and conversion of data into a vendor-agnostic and data science-compatible format in the cloud, (which can be easily converted to the Allotrope Data Format (ADF) through third-party tools)
How bidirectional communication between chromatography instrumentation, software, and tools enables sample set method creation, eliminates manual transcription, and increases data integrity
Easy query of converted data based on sample ID, method name, and other fields using common data science tools
TetraScience’s membership in the Allotrope Partner Network and how it informs analytics on chromatography data system (CDS) data

Apr. 22nd

10:00am - Ontology for Process Chemistry – Giving Context to Instrument Data Structured by the Allotrope Data Model

Presenters and Co-Authors

Wes Schafer, Merck Process Research & Development
Oliver He, University of Michigan Medical School
Anna Dunn, Pharmaceutical Development, GlaxoSmithKline (GSK)
Zachary E. X. Dance, Merck Process Research & Development

Wes Schafer Oliver He Anna Dunn

Abstract / Summary

Allotrope is providing a solid foundation for structuring and standardizing analytical instrumentation settings and its data. But in order to fully realize machine learning/artificial intelligence and complete data mining, experimental and workflow details need to be linked to the analytical data in order for it to have context. An important domain in the industry that produces a large amount of analytical data is process chemistry, which is the branch of chemistry (including pharmaceutical chemistry) that studies the development and optimization of the production processes by scaling-up laboratory chemical reactions into commercially viable routes. Key concepts for the domain are product quality, process robustness, economics, environmental sustainability, regulatory compliance and safety.

Last year we proposed building the ontological terms needed to tag and structure the contextual data associated with process chemistry workflows. Here we present the initial version of an Ontology for Process Chemistry that covers route scouting, route optimization and route validation with a line of sight to producing regulatory documentation. Use-cases for building the ontology included: reaction and catalyst screening, kinetic and mechanistic studies, fate and purge studies of process impurities as well as polymorph / form studies. Supporting concepts such as additional synthetic roles, equipment and unit operations are also included.

-----------------------------------------------

10:30am - Methods Hub for HPLC opens new gates into the Intelligent Analysis of Analytical Methods and Results

Presenters and Co-Authors

Steve Emrick (USP)
Dana VanDerwall (Bristal Myers Squibb)
Gerhard Noelken (Pistoia Alliance)

Steve Emrick Dana VanDerwall Gerhard Noelken

Abstract / Summary

The collaboration between Pistoia Alliance and Allotrope Foundation to produce a digital solution for HPLC-UV analytical methods management is nearing completion. By mid 2021, the Methods Database (Methods Db) with OpenLab and Empower Chromatography Data System (CDS) adapters will be available for early implementation by Pistoia members. The Methods Db will allow you to move machine-readable versions of your valuable HPLC-UV methods between these two CDS’s in a vendor-neutral format with consistent context. By the end of 2021, the solution will expand capabilities to include analytical result data adapters, the ability to track historical analytical performance of HPLC-UV methods and chromatographic columns, include the sample preparation process as part of the digital method, and will propose a data standard for chromatographic columns. Community support for the early implementation of this solution is critical to efficiently mature the software and the underlying AFO and ADM content that it leverages, so please attend to learn more about participating in the future. Recognizing that current methods content are distributed across various media from instrument software to the articles in the literature, the project has also added an additional important phase. Natural Language Processing (NLP) will demonstrate streamlining the import of text-based methods into the methods database by mid 2021 via a collaboration with CAS, Elsevier, and USP to facilitate population of the methods Db. Finally, a cloud-based Method Hub will bring all this together and allow scientists to freely exchange analytical methods on demand with global partners to accelerate the pace of R&D.

-----------------------------------------------

11:00am - Integrated Sample Analysis Workflow with IDBS EWorkbook, AGU SDC, and ZONTAL

Presenters* and Co-Authors

Scott Weiss (IDBS)
Klaus Bruch* (AGU)
Wolfgang Colsman* (ZONTAL)
Mike Huang* (IDBS)
Craig Williamson* (IDBS)

Scott Weiss Klaus Bruch Wolfgang Colsman Mike Huang Craig Williamson

Abstract / Summary

Allotrope partner companies IDBS and ZONTAL have teamed up with AGU to provide an end-2-end solution for common problems encountered during the pharmaceutical product lifecycle. Among these problems are automation of experimental design, data capture, data cleaning, and data reporting. We leverage the Allotrope Data Format to increase data quality by contextualizing the raw data according to Allotrope Data models with descriptive information. Further, we create a tight connection between the experiment and researchers by removing artificial boundaries and giving full access to reusable data at all times. Through ADF supported audit trailing, we keep track of full data lineage and changes throughout the product lifecycle to ensure a reduction of workflow related errors. Our presentation will highlight the strengths of this three-way collaboration and will provide tangible insights through a live demonstration.

-----------------------------------------------

11:30am - Complementing ADF With a Scalable Analysis and Analytics Platform, REVEAL : AnalyticalDevelopment

Presenters and Co-Authors (Paradigm4)

Srikant Sarangi (Paradigm4)
Zachary Pitluk (Paradigm4)

Srikant Sarangi Zachary Pitluk

Abstract / Summary

The Allotrope Data Format (ADF) addresses some critical needs in big data management: uniform storage of diverse types of scientific instrument data and associated metadata, the ability to develop consistent and integrated data processing and analysis pipelines. The ADF is vendor, platform, and method-agnostic, and has the capability to store datasets of nearly unlimited size and complexity in a single file, organized as single or multiple n-dimensional arrays (1).

While the ADF has created a remarkable opportunity to efficiently federate data, the scalability challenges associated with accessing and analyzing large datasets stored in files remain. Here, we present a solution that complements the ADF through incorporation of its features into an array-native scientific database management platform. REVEAL: AnalyticalDevelopment enables data selection and computation from imported ADF files using Python and R API’s. REVEAL is innately designed to store data as multi-dimensional arrays, and each array element can hold an arbitrary number of attributes of any data type. REVEAL also supports in-database computation through optimal implementation of algorithms at scale (e.g., regressions, PCA). Additionally, it provides provenance and reproducibility by being an ACID database system.

In this presentation, we demonstrate ingestion, processing, and visualization of a representative LC-UV dataset in ADF format from Agilent that was used for sensitivity analysis (Agilent v0-0-0-2 Performance Data). Data was extracted and loaded into the database using a Python API, and we created a simple R Shiny GUI for visualization of data and data completeness. The dataset consists of measurements on an 8-component mixture separated on the HPLC with 17 replicate separations at each condition with the spectrum from 210 nm to 280 nm sampled at 10 nm increments. Using REVEAL, we perform a DoE analysis to find the combination of experiment variables (detector rate, resolution, and run time) that maximizes peak separation. Higher resolution data, for instance, capturing the rate of increase of absorbance adjacent to absorbance maxima, can also easily be utilized by REVEAL: AnalyticalDevelopment. Amongst the other datasets available through the Allotrope Foundation, we also discuss our work with Bioanalyzer Data provided by Bayer as another example of using the ADF with REVEAL.

In conclusion, REVEAL: AnalyticalDevelopment provides a scalable solution for efficiently working with the ADF and mitigating the challenges posed by in-memory processing of large datasets stored in files.

-----------------------------------------------

Apr. 26th

10:00am - A Platform for Instrument Integration to the Electronic Notebook Enabled by Data Standardization (Lightning Talk w/5min Discussion Following)

Presenters and Co-Authors

Vinny Antonucci (Allotrope Foundation Chair, Director of Informatics, Merck)
Doug West (Merck)

Vinny Antonucci Doug West

Abstract / Summary

Allotrope Foundation has invested several years to develop robust, automated, and accessible technology to drive community adoption in real-life end-to-end (E2E) workflows. This short presentation will introduce one such E2E application of the technology & data standardization: a scalable platform for instrument integration with the electronic notebook. The platform includes automated instrument data collection into the cloud, conversion to Allotrope Data Format, and a scalable integration pattern between instrument data and an electronic laboratory notebook (ELN) which also is configured with a similar modular and standardized data capture design to better receive the data. Infrastructure will also be described to facilitate advanced data analytics of this connected & contextualized data. This presents an opportunity to extend the scope of Allotrope data governance to include simple experiment-based content which complements the current instrument-based content since both are necessary to standardize critical information across any E2E laboratory workflow. Ongoing status updates on the progress of this effort will be provided in future Allotrope Connect meetings.

-----------------------------------------------

10:15am - Modeling Information with the Allotrope Framework (Technical) (Lightning Talk w/5min Discussion Following)

Presenter

Helge Krieg (OSTHUS)

Helge Krieg

Abstract / Summary

We propose to focus on data assets first. The power of knowledge graphs is a precise description of context. Meaning of data becomes exactly understandable and can be processed by applications. Allotrope Foundation Ontology adopted the Basic Formal Ontology as its top-level terminology. It is comparable to an overarching umbrella that spans all branches of entities.

From a data engineer’s perspective, it is information content entities that we are dealing with in the first place. It is only information content entities that represent datums that can have values, such as IDs, quantity values, weights, temperatures, text, etc.

From the perspective of the Basic Formal Ontology, information content entities belong to exactly one of eight separate branches. In other words, one of eight entity types is actually directly associated with our data. Almost 90% of entity types are associated with contextual information such as related process, project, people, their roles and so on.

From our legacy data models, we know common patterns and typical relational structures. There is an opportunity to extend on that and add precision leveraging the Allotrope Framework. We propose a generic approach of mapping relational models to Allotrope Data Models focusing on descriptions first.

The idea is nothing new. Starting from a tabular description which is purely information-focused, we can gradually add contextual information and include further knowledge. Allotrope-aligned description models can be our bridge from tabular models to full graph models.

We benefit from the existing relational hierarchical model and apply simple Allotrope Data Model patterns suggesting a faceted structure of information content entities. Information is key.

-----------------------------------------------

10:30am - Sample Prep Orchestration: Agilent SLIMS in conjunction with Sartorius Cubis II balance

Presenter and Co-Authors

Heiko Fessenmayr (Agilent)
Maximilian Schiffer (Agilent)
Thomas Schink (Sartorius)

Heiko Fessenmayr Maximilian Schiffer Thomas Schink

Abstract / Summary

Committing to the Allotrope Mission with leveraging end-to-end lab workflows by the Allotrope Foundation Framework, we present Agilent SLIMS as an orchestrator for connecting to the unit operation of sample preparation utilizing a Sartorius Cubis II balance. We demonstrate how to unify collected workflow data in the Allotrope Data Format (ADF).

-----------------------------------------------

Agilent%20-%20Heiko%20Fessenmayr_edited.

11:00am - Constructing ADMs with SHACL (Shape Constraint Language) to Describe, Validate and Achieve Data Interoperability (Technical Session)

Presenter

Amnon Ptashek (Technical Dir., Allotrope Foundation)

Amnon Ptashek

Abstract / Summary

Interoperability is a fundamental key to transform the acquisition, exchange, and management of laboratory data throughout its complete lifecycle. Using semantic technology, the Allotrope Data Model (ADM) together with a controlled vocabulary provides a mechanism to model simple as well as rich use case scenarios. SHACL, a W3C Shape Constraint Language for describing and validating RDF graphs, builds with the concept of “Shapes” to describe a data model. SHACL subscribes to the Closed World Assumption (CWA) which typically applies when a system has complete control over the information. As a result SHACL is a powerful tool for data validation.

The presentation reviews the technical aspects of the "Shape" structure. It covers the technology in a high level but at the same token dives and zooms into the fundamentals of SHACL core such as type of Shapes, Constraints, Targets, and Focus Nodes. Following the technical review and a sample exercise, a demonstration of several shapes in a Tabular ADM will be presented.

-----------------------------------------------

11:30am - Developer’s test suite for company-specific extensions of AFO/ADM (Technical Session)

Presenters* and Co-Authors (Merck)

Jan Nespor*
Jan Rosecky*
Jindrich Mynarz

Jan Nespor Jan Rosecky Jindrich Mynarz

Abstract / Summary

Allotrope ontologies (AFO) and data models (ADM) are designed to cover a commonly agreed subset of data for a particular instrument type. This approach helps to define and promote the Allotrope standards across industry. However, use cases of specific pharmaceutical companies typically require ADF to contain additional data, such as internal system identifiers; or meet stricter constraints, e.g. tighter cardinalities. This can be achieved by defining company-specific extensions of Allotrope artifacts, in particular ADM.

Ensuring the correctness and usability of these extensions is critical: constraints need to be expressed as valid SHACL shapes; impose only actually intended restrictions; and be kept additive, i.e. not contradicting the standard ADM and ontological constraints. This needs to be a part of continuous integration as ADM or AFO evolves.

As manual checks of the afore-mentioned are time-consuming and error-prone, we have developed a test suite that executes SHACL QA tools and tries to use MSD extensions (and original ADM and AFO) to validate samples of valid and invalid data. Using example extensions of AFO and ADM, this work shows the integration of automated QA into the development workflow.

-----------------------------------------------