Divergence Weighted Independence Graphs for the Exploratory Analysis of Biological Expression Data


Authored by  Y Xiang, M Talikka, V Belcastro, P Sperisen, M Peitsch, J Hoeng, J Whittaker*

Published in Journal of Health & Medical Informatics     
* This author is not affiliated with PMI.

Abstract

Motivation: Understanding biological processes requires tools for the exploratory analysis of multivariate data generated from in vitro and in vivo experiments. Part of such analyses is to visualise the interrelationships between observed variables. Results: We build on recent work using partial correlation, graphical Gaussian models, and stability selection to add divergence weighted independence graphs (DWIGs) to this toolbox. We measure all quantities in information units (bits and millibits), to give a common quantification of the strength of associations between variables and of the information explained by a fitted graphical model. The marginal mutual information (MI) and conditional MI between variables directly account for components of the information explained. The conditional MIs are displayed as edge weights in the independence graph of the variables, making the complete graph informative as to the unique association between those variables. The summary table of the information decomposition ‘total = explained + residual’ provides a simple comparison of graphical models suggested by different search routines, including stabilised versions. We demonstrate the relevance of the conditional MI statistics to the graphical model of the data by analysing simulated data from the insulin pathway with a known ground truth. Here the method of thresholding these statistics to suggest a network performs at least as well as several other network searching algorithms. In searching a biological data set for novel insight, we contrast the DWIGs from the fitted maximum weight spanning tree and from the fitted model of a stabilised ARACNE network. DWIG is a powerful tool for the display of properties of the fitted model or of the empirical data directly.