Computer-assisted structure identification (CASI)--an automated platform for high-throughput identification of small molecules by two-dimensional gas chromatography coupled to mass spectrometry.


Authored by  A Knorr, A Monge, M Stueber, A Stratmann, D Arndt, E Martin, P Pospisil

Published in Analytical chemistry.     85(23): 11216-11224.
Abstract

Compound identification is widely recognized as a major bottleneck for modern metabolomic approaches and high-throughput nontargeted characterization of complex matrices. To tackle this challenge, an automated platform entitled computer-assisted structure identification (CASI) was designed and developed in order to accelerate and standardize the identification of compound structures. In the first step of the process, CASI automatically searches mass spectral libraries for matches using a NIST MS Search algorithm, which proposes structural candidates for experimental spectra from two-dimensional gas chromatography with time-of-flight mass spectrometry (GC × GC-TOF-MS) measurements, each with an associated match factor. Next, quantitative structure-property relationship (QSPR) models implemented in CASI predict three specific parameters to enhance the confidence for correct compound identification, which were Kovats Index (KI) for the first dimension (1D) separation, relative retention time for the second dimension separation (2DrelRT) and boiling point (BP). In order to reduce the impact of chromatographic variability on the second dimension retention time, a concept based upon hypothetical reference points from linear regressions of a deuterated n-alkanes reference system was introduced, providing a more stable relative retention time measurement. Predicted values for KI and 2DrelRT were calculated and matched with experimentally derived values. Boiling points derived from 1D separations were matched with predicted boiling points, calculated from the chemical structures of the candidates. As a last step, CASI combines the NIST MS Search match factors (NIST MF) with up to three predicted parameter matches from the QSPR models to generate a combined CASI Score representing the measure of confidence for the identification. Threshold values were applied to the CASI Scores assigned to proposed structures, which improved the accuracy for the classification of true/false positives and true/false negatives. Results for the identification of compounds have been validated, and it has been demonstrated that identification using CASI is more accurate than using NIST MS Search alone. CASI is an easily accessible web-interfaced software platform which represents an innovative, high-throughput system that allows fast and accurate identification of constituents in complex matrices, such as those requiring 2D separation techniques.