Development of a “Contrast and Gene Set” Toolkit‐ Integrated Database Platform for Downstream Computational Analysis Supporting Biological Interpretation of Gene Expression Data

      Poussin, C.; L. Hermida,; Sewer, A.; Ansari, S.; Gubian, S.; Hoeng, J.
      Conference date
      Aug 28, 2011
      Conference name
      International Conference in Systems Biology (ICSB) 2011

      Background: In transcriptomics, the identification of differentially expressed genes (DEG) when studying effect(s)/contrast(s) of interest constitutes the central component for further downstream computational analysis (e.g., gene over-representation/enrichment analysis or reverse engineering) leading to mechanistic insights. Therefore, it is essential to adequately store contrast data and automatically extract gene sets from these DEG lists in order to efficiently support these downstream activities and further leverage data on a long-term basis. Methods: We report here the development of a contrast and gene set toolkit-integrated database enabling: 1) storage of contrast data and metadata (associated with the analysis process) in a simple and standard format (IDMaps); 2) automatic ID conversion of data via a mapping and collapsing methodology using the latest public annotations (NCBI gene database); 3) automatic extraction and storage of 3 gene sets (significantly up-regulated, down-regulated, and both directions) derived from stored contrasts; and 4) integration of extensible downstream computational analysis functionality. Gene set enrichment analysis is currently integrated as a downstream method in the toolkit, which is built within galaxy, an open-source workflow management and data integration system. In addition, the toolkit provides both web access and programmatic access via R library. Results: Several examples are presented to show the range of application and the advantages of the toolkit; for instance, internal and external dataset comparison, species and system (in vitro / in vivo) translation analysis, and biological interpretation in a FDR threshold-free and platform-independent manner. Conclusion: The contrast and gene set toolkit-integrated database platform provides a unique and flexible environment to support downstream computational analysis enabling biological interpretation of data. The system has been designed in order to supply the researcher with a simple, efficient, and extensible solution to store and exploit analyzed data in a sustainable manner.