GC/MS data reduction using retention time alignment and spectral subtraction

Abstract

The identification of chemical compounds in a complex mixture is a challenge. In the context of drug surveillance for the sporting world, large-scale screening of urine and blood samples is undertaken using methods such as GC/MS with low-resolution mass spectrometers that measure integer values of m/z ratio. The analysis of the GC/MS data can be automated using standard mass spectrometry software to detect peaks in the chromatograms and to search a library of mass spectra of known drugs. Because of noise and the presence of co-eluting compounds, the mass spectra are usually not exactly the same as those in the library. The match quality for a genuine match can be quite low, and the library search settings must be sufficiently sensitive so as not to miss positive samples. Therefore many false matches are reported for checking and validation by human analysts, and, since almost all the samples are negative, this process of checking is tedious, time-consuming and cost-inefficient. The usual technique to remove unwanted background is to subtract the spectrum of an adjacent scan of the same sample. Our proposed method instead subtracts the spectrum of a second similar sample. The intention is that any contributions from a substance common to the two samples will be eliminated, and that any substance that is in the first sample but not in the second will still be recorded in the subtracted dataset. Assuming a suitable second sample is available that does not contain banned substances, those that are present in the first sample can be more easily detected. The subtraction is applied to each scan of the test sample. For this to work, it is essential that retention times are precisely aligned so that a corresponding scan of the second sample can be chosen. Although many methods of alignment are described in the literature, simple linear alignment based on a correlation measure is found to be sufficient. It is also necessary to scale the spectra being subtracted to allow for differences between the two samples in the concentration of the common compounds. Subtracting a similar dataset will reduce the number of peaks to be considered, and our hypothesis is that a library search of the resulting dataset will produce a smaller number of false matches than the same library search applied to the original data. An experiment was carried out to test this and the number of false matches was indeed found to be reduced. The more similar the second sample was to the first, the better was the result. It was also verified that true matches of compounds of interest are still reported by the library search of the subtracted data

    Similar works

    Full text

    thumbnail-image