Prediction models of retention indices: application to gas chromatography coupled to high-resolution mass spectrometry for two column types: DB-624 and HP-5ms


Authored by  A G Haiduc, E Dossin, P Diana, P Guy, NV Ivanov, M Peitsch

Presented at Metabolomics 2019    
Monitoring of volatile and semi-volatile compounds was performed using gas chromatography (GC) coupled to high resolution electron ionization mass spectrometry, using both headspace and liquid injection modes on DB-624 and HP-5ms columns. A total of 1’300 reference compounds (n=400 analyzed on HP-5ms and n=900 on DB-624 columns), including n-alkanes (covering C5 to C30) as reference index markers, were analyzed and experimental linear retention indices (LRI) were determined. These reference compounds were randomly split into training and validation sets.

LRI for all 1’300 reference compounds were predicted based upon computational Quantitative Structure-Property Relationship (QSPR) models using calculated 2D descriptors, and based on multiple approaches: PLS, Lasso regression, stepwise MLR, Genetic Evolution Algorithm predictor selection and Neural Networks, with PLS providing the fastest calculation and most accurate prediction level. Correlation coefficients for experimental versus predicted LRI values were calculated at 0.96 for DB-624 and 0.98 for HP-5ms for the training sets and at 0.94 and 0.95 for the validation sets, respectively.

These models were then used to predict LRI values for several thousand reported metabolite compounds. The predicted LRI values can be used for column type selection as well as increased confidence level in unknown identification by means of the Mahalanobis distance.