Implementation of a Systems Biology Data Integration Platform

      Ansari, S.; Hermida, L.; Diehl, S.; Poussin, C.; Sewer, A.; Xiang, Y.; Bonjour, F.; Phanzu, K.; Gubian, S.; O'Neel, B.; Hoeng, J.; Peitsch, M.
      Conference date
      Jul 13, 2010
      Conference name
      13th International MGED Meeting

      To enable and support a systems biology approach to research, one requires an underlying infrastructure to manage, integrate, and share high-throughput functional genomics data and workflows from data production through annotation, analysis, and knowledge acquisition. At its core, there should be a comprehensive data management and annotation system and data repository that fully support publicly established standards for storing and reporting high-throughput functional genomics investigations. Such a system will serve as the platform’s central hub and it will integrate with data analysis, visualization and mining tools. It will also enable collaboration between internal teams, publishing of internal investigations and data to public repositories, and incorporation of public investigations and data into the platform for internal comparison and analysis. Here, the implementation of a systems biology data integration and knowledge management platform to support experimental and computational workflows, examining in vivo and in vitro generated systems response profiles (gene expression, microrna, comparative genomic hybridization, and reverse-phase protein array proteomics data) is reported. The platform utilizes open-source, freely available components where suitable, featuring CA array and CA Grid from the National Cancer Institute Biomedical Informatics Grid (NCI CABIG®) software family as its core data management and annotation infrastructure. For data exchange, the community standard mage-tab format was used. CA array is integrated with genepattern, an open-source bioinformatics workflow management system for integrative genomics, which is used for quality check, data analysis, and visualization purposes. CA array is also integrated with several commercial data analysis and biological pathway inferencing systems. Gene-centric, cross-investigation data mining capabilities are provided by the Biomart and Intermine open-source data warehouse systems. Under development are several other modules to integrate the platform with existing laboratory information management systems, as well as additional features to contribute the open-source CA array project.