Measuring biological impact

      Measuring biological impact: Phenotypic screening and omics infrastructure

      Our work in systems toxicology requires a considerable, multi-dimensional technological infrastructure. We have built an integrated high-performance scientific computing environment that extends across all our R&D sites worldwide, allowing us to generate, process and store substantial volumes data and operate at the cutting-edge of computational science and bioinformatics.

      Systems toxicology relies heavily on the integration of different data modalities to measure changes at different biological levels. As such, we have established substantial capabilities in both genomics and proteomics to ensure our systems-level investigations cover each of DNA, RNA and protein – the three inter-dependent molecules fundamental to biological functions.


      Learn more about:

      High Content Screening

      High Content Screening (HCS) is an automated technology used to carry out functional/biological assays in a rapid, robust and cost-effective manner. The technology is based on the integration of three elements:

      1. A state-of the art fluorescence microscope that allows the visualisation of cells and cellular structures.
      2. A high-throughput image acquisition system that allows the automatic acquisition of hundreds of microscope images in a rapid and reproducible manner.
      3. A software tool that enables the analysis and quantification of the biological signals in each one of the acquired images.

      Using a combination of antibodies (capable of identifying specific components in the cell) and fluorescent dyes, the HCS technology can detect changes in target intensity, localisation and cellular morphology in live or fixed cell.

      At PMI, we use a large battery of HCS assays to investigate how specific substances alter the phenotype of a cell in a particular manner. Currently, we have established more than 10 different assays, allowing us to measure more than 15 different cellular parameters in cellular types representing the lung (normal human bronchial epithelial cells) and the cardiovascular system (human coronary artery endothelial cells).

      Watch on JOVE: High content screening analysis to evaluate the toxicological effects of harmful and potentially harmful constituents (HPHCs)


      Genomics is the combination of DNA and RNA sequencing with bioinformatics to analyze the structure and function of the complete set of nucleic acid molecules within cells. It is crucial component of our systems toxicology programme and we have established a number of genomics capabilities, including Next Generation Sequencing, microarray and gene quantification technologies, which enable us to gain important insights into the potential of smoke-free products.

      Next Generation Sequencing

      Next Generation Sequencing, also known as high-throughput sequencing, is a flexible technology that has a number of applications. Amongst other things, it enables us to determine precise DNA sequences (DNA-seq), to identify modification of DNA (WGBS-seq) or to quantify the relative abundance of RNA molecules (RNA-seq).

      At PMI, DNA-seq is applied for both plant and mammalian genomics research. It is used for the de novo assembly of the tobacco genome, which is 1.5 times larger than the human genome[1], as well as for the identification of DNA polymorphisms induced in mammalian DNA by exposure to cigarettes or to RRPs.

      The more complex WGBS-seq method aims at identifying the methylation of cytosine, a common modification of DNA involved in the regulation of gene expression. When combined with RNA-seq, it provides important insights on the response of the studied biological system to exposure to cigarette smoke or RRPs aerosols.

      The measurement of gene expression by RNA-seq also contributes to the identification of perturbed biological networks.

      Microarray technology

      Our laboratory is equipped with three QIAcube robots which are able to extract RNA from various samples with a high level of purity, offering the possibility to perform gene expression analysis in human, rat and mouse as well as microRNA profiling using the Affymetrix GeneChip system.

      Recognized as the gold standard for microarrays, the Affymetrix GeneChip system is used to monitor gene expression for thousands of transcripts (47,000 transcripts for a human chip) in a controlled process which offers high data reliability. The system also allows the inclusion of multiple probes to interrogate the same target sequence, providing statistical rigor to our data interpretation.

      We have two further robots which allow us to prepare 600 samples per week for microarrays.

      Gene quantification

      We are also equipped with a real-time polymerase chain reaction (PCR) analysis instrument, allowing us to perform a number of tasks, including gene expression analysis and microRNA profiling. RNA is reverse transcribed and the resulting complementary DNA is used as a template in the PCR reaction to detect and quantitate gene expression products.


      [1] Sierro, N, et al. The tobacco genome sequence and its comparison with those of tomato and potato. Nat Commun, 2014. 5: p. 3833.


      Proteomics – the systematic approach to characterising all proteins in a cell population – holds significant potential for product assessment by complementing the genome-centric view of biological networks with protein-specific data[1].

      We have optimised a number of proteomics approaches which, alongside data obtained from our genomics work, give us the ability to identify systems response profiles of individual smoke/aerosol constituents and understand the comparative biological impact of cigarette smoke and smoke-free product aerosols.

      Two-dimensional gel electrophoresis: the matrix assisted laser desorption ionization mass spectrometry (MALDI-MS) approach for biomarker discovery

      Two-dimensional gel electrophoresis (2D-GE) enables the discovery of biomarkers from biological samples. The 2D-GE workflow relies on the separation of proteins based on their pH (charge) and size and has the capability to separate and visualise up to 2,000 proteins in one gel. In the first dimension, the proteins are separated through a process known as isoelectric focusing (IEF), where proteins are separated by the pH value at which they exhibit a neutral charge. In the second dimension, the proteins are further separated based on mass.   To allow for the visualisation of protein spots, the 2D-GE gels are stained with Sypro Ruby, Coomassie Blue or silver staining. Using state-of-the-art image acquisition and analysis software, such as TotalLab’s SameSpots, we are then able to simultaneously compare control and treated samples for the detection of differentially expressed proteins. Peptides are extracted from the differentially expressed protein spots, and then analysed by MALDI-MS for protein identification.

      Gel-free liquid chromatography mass spectrometry approaches

      Mass spectrometry (MS) is widely considered to be the central technology platform for toxicoproteomics. It has brought many advantages including unsurpassed sensitivity, improved speed and the ability to produce high-throughput datasets. Prior to analysis by MS, the most commonly used method for protein separation is liquid chromatography (LC). The LC approach takes advantage of differences in the physiochemical properties of proteins and peptides, ie, size, charge and hydrophobicity[2].  

      Both label and label-free approaches are used for differential protein quantification of our samples. In the label-free approach, proteins or peptides of each sample are separated by LC and subsequently analysed by MS. The main advantages of this approach are:

      • Comparison of multiple samples is possible (no restriction in sample number)
      • It covers a broad dynamic range of concentrations  
      • No further sample treatment is required  

      In the label-based approach, samples are modified prior to analysis. Two of the most common label-based techniques we use are isobaric tags for relative and absolute quantitation (iTRAQ) and tandem mass tags (TMT). These approaches allow for:

      • Simultaneous comparison of large numbers of samples (up to eight for iTRAQ and up to ten for TMT)
      • Reduction of required MS runs (equating to a reduction of analysis time) as samples are pooled before MS analysis
      • Low probability of introducing experimental errors during analysis due to pooling

      Targeted mass spectrometry approaches

      Because system biology requires accurate quantification of a specified set of proteins across multiple samples, targeted approaches have been developed for biomarker quantification. Parallel reaction monitoring (PRM) and selected reaction monitoring (SRM) were developed to reliably deliver precise quantitative data for defined sets of proteins, across multiple samples using the unique properties of MS.

      The major advantages of the PRM technique that we use for our targeted analysis are[3].

      • Multiplexing of tens to hundreds of proteins that can be monitored during the same run
      • Absolute and relative quantification is possible
      • The method is highly reproducible
      • The method yields absolute molecular specificity
      • The generated data can be easily interpreted
      • The analysis can be automated
      • It has a high dynamic range
      • Quantitative information can be determined from datasets of complex samples resulting in extraction of high-quality data

      Antibody-based approaches

      Antibody-based approaches to protein analysis include reverse protein array (RPA), enzyme-linked immunosorbent assay (ELISA) and Luminex technology. RPA represents an example of a key measurement platform for systems biology-based risk assessment. The method has the sensitivity to detect post-translational modifications such as phosphorylation. Using this technology it has been shown, for example, that cancer progression is associated with increased phosphorylation of Akt, suppression of apoptosis pathways and decreased phosphorylation of ERK 95[5]. Luminex is used for routine measurement of cytokine release both in vitro and in vivo.

      High-performance computing

      We have built a high-performance scientific computing (HPSC) environment, which provides a scalable computing infrastructure to capture, store, manage and analyze the large volumes of data generated by our data production pipelines in genomics, transcriptomics, proteomics and histopathology.

      We have been developing our HPSC environment over the past eight years and the capabilities we have built include:

      • The ability to process and interpret terabytes of data in a very short timeframe, giving us the ability to understand complex biological interactions underlying disease mechanisms
      • Genome sequencing Text mining Computational fluid dynamics
      • Molecular dynamic simulations


      The evolution of our scientific computing capability

      In 2009, we developed our HPSC environment with a 64-node cluster with a total of 500 processing cores. In the same year, we invested in the IBM Blue Gene/P Supercomputer, which comprised of 1,024 nodes with 4,096 cores. The ability to analyze large volumes of genomics, transcriptomics, proteomics and metabolomics data transformed the way in which we designed experiments, and gave rise to our ability to conduct computational systems toxicology.

      In addition, our Computational Fluid Dynamics capabilities enable us to model the aerosols from smoke-free products, leading to a better understanding of necessary smoke-free product design features. In 2012, novel laboratory techniques had put a strain on our HPSC environment, as the amount of data generated by our systems toxicology approach increased massively. For instance, the processing of Next Generation Sequencing data requires close to 2TB of RAM for each operation. Our bioinformatics workflows often require in excess of 100GB of RAM, and our ability to calculate network perturbation amplitudes from our biological data requires approximately 10GB RAM for each job. Furthermore, our Computational Fluid Dynamics simulations require the continuous use of significant computational capacity for months on end. High Performance Scientific Computing infrastructure In 2013, we therefore began the construction of our most recent HPSC environment, which now uses virtual servers, large memory servers, GPU servers, flash storage and more than 500TB of hard disk storage. All systems are linked through 56 Gigabit InfiniBand networks.


      Computational Fluid Dynamics

      Cigarette smoke and smoke-free product aerosols are generally complex systems of solid or liquid particles suspended in multicomponent gas mixtures. After being generated in smoking products, smoke/aerosols are transported through the products into the respiratory tract of the user, where they can evolve and be deposited in, or absorbed by the body. In order to assess the toxicity and biological impact of cigarette smoke and smoke-free product aerosols, it is important to understand where and how much of the smoke/aerosol is deposited in the respiratory system. To achieve this, we use Computational Fluid Dynamics (CFD) simulations, based on the established laws of physics integrated with smoke/aerosol transport, evolution, and deposition mechanisms.

      CFD simulations allow us to understand and characterize the deposition of smoke/aerosols in the respiratory tract of humans, in vivo models and in vitro exposure systems. They offer detailed, non-invasive information about the physics of the formation, evolution, transport and deposition of smoke/aerosols, and can be used to virtually simulate the functioning of smoking products. In addition, the amounts of deposited compounds and their compositions can be quantified, and the influence of properties such as humidity and inhalation behavior can also be investigated. CFD can also be used to study the generation of smoke/aerosols by smoking products and how physical and chemical smoke/aerosol characteristics are influenced by the design of the smoking products. Moreover, the influence of smoking topologies, environmental conditions, material selections and operating conditions can also be studied using CFD, all of which can assist in designing products which are optimized to reduce risk.