Verification Of Systems Biology Research: Species Translation And Biological Networks

Authored by  E Bilal*, J Binder, S Boue, B Fields*, AW Hayes, A Iskandar, R Kleiman*, P Meyer*, R Norel*, J Park*, C Poussin, K Rhrissorrakrai*, JJ Rice*, J Sprengel*, M Talikka, G Stolovitzky, J Hoeng

Presented at ISMB/ECCB 2013     
* This author is not affiliated with PMI.


The success of Systems Biology in academic and industrial settings hinges on the proper handling and analysis of the volumes of high-throughput data currently being generated. Research entities, such as companies and academic consortia, often conduct large, multi-year scientific studies that entail the collection and study of thousands of individual experiments, regularly over many physical sites and with internal and outsourced components. To extract maximum value, it is critical to verify the accuracy and reproducibility of data and computational methods before the initiation of such large, multi-year studies. However, systematic and well-established verification procedures do not exist for many of the automated collection and analysis workflows in systems biology, which could lead to inaccurate conclusions. Industrial methodology for process verification in research (Improver) was designed as a methodology to validate industrial research processes related to systems biology by deconstructing an industrial research workflow into individual components, termed building blocks that can be independently verified. As a first initiative of the Improver project, in 2012, the Diagnostic Signature Challenge (DSC) was designed with the goal to assess and verify computational approaches that classify clinical samples based on transcriptomics data from 4 disease areas (psoriasis, multiple sclerosis, chronic obstructive pulmonary disease and lung cancer). The second initiative, the Species Translation Challenge (STC) was designed in 2013 to address whether or not biological events observed in rodents are “translatable” to humans. In the next phase we will provide the community with network models of molecular events contributing to the onset of early Chronic Obstructive Pulmonary Disease (COPD). These models of key biological processes include access to underlying scientific literature citations that have been expertly curated to provide mechanistic substantiation for each molecular relationship present in the network model. We will sponsor two Challenges using innovative crowdsourcing approaches to facilitate biomarker discovery for COPD while leveraging the computational approaches developed in the first Challenge and the translational aspects developed during the second Challenge. Biological network perturbations play a fundamental role in today’s systems-based biology, pharmacology, and toxicology. These network models may consist of qualitative causal relationships between biological entities to represent current scientific knowledge. The purpose of the Network Verification Challenge is to engage the scientific community in the review of the relationships between molecular entities and to make improvements on the represented biology covering fundamental processes involved in respiratory disease. Our Grand Challenge, which will be hosted in 2014, aims to discover molecular biomarkers for early stage COPD. To this end, we will provide proteomic and transcriptomic data from both human subjects and animal models. These data will be derived from several biological sources - such as induced sputum, whole blood and nasal fluid - from a case controlled cohort of 240 human subjects and a cigarette smoke-induced COPD C57BL/6 mouse model. By utilizing lessons learned from the preceding Challenges and this new dataset, we as a scientific community will have a greater understanding of the biology that underlies COPD. At completion, Improver expects to provide an accelerated mechanism for the dissemination and validation of knowledge, better maps of disease and a forum for reproducible and re-usable data and analyses. The platform will provide a mechanism to link model generators with researchers and clinicians that are poised to validate modeling hypotheses and incorporate modeling results into research directed at understanding physiological or disease states and therapeutic development efforts. Furthermore, network models in combination with analytics will support the scientific community in biomarker identification for lung disease as well as biomarkers relevant for exposures to environmental or tobacco smoke.