|
ABSTRACTS
ON MASS SPECTRAL EXPERT SYSTEMS, MASS OCCURRENCE PROBABILITIES AND MASS SPECTRAL
IDENTIFICATION
|
|
|
BACK TO PUBLICATIONS LIST
REPRINTS FOR PAPERS 8-10 ARE NOT AVAILABLE FROM THIS SITE
|
6. Large Scale Evaluation of a Pattern Recognition/Expert System for Mass Spectral Molecular Weight Estimation, Scott, Levitsky and Stein -- 1993 A previously developed fast, personal-computer based method of estimating molecular weights of organic compounds from low resolution mass spectra has been thoroughly evaluated. The method is based on a rule-based pattern recognition / expert system approach which uses empirical linear corrections which are iteratively applied to two mass spectral features to yield estimates. This technique has been extensively evaluated with 400 spectra of volatile and nonvolatile compounds of pharmaceutical interest and with 31378 high quality NIST spectra of compounds of molecular weight 30-500. Subsets of the NIST spectra were evaluated including a 23989 spectra volatile set. The overall median and average absolute deviations from the true molecular weights of the 400 spectra were 1.5 and 13 daltons. For the large NIST set and volatile subset the overall median and average absolute deviations were 1.8-2.0 and 13-17 daltons. In both sets of spectra the best results were obtained with the nonhalobenzene and unknown classes. Due to misclassification errors better results were obtained by bypassing the classifier. Median errors with spectra with the molecular ion present were ca. twenty times lower than those without the molecular ion. The present system can rapidly produce molecular weight estimates with median absolute errors of 2 (average 15) daltons. 7. Probabilities of Peak Occurrence and Information Content of the NBS/EPA/MSDC Mass Spectral Database, Scott -- 1990 The probabilities of peak occurrence and contents of binary information were calculated for the 43990 mass spectra in the 1987 NBS/EPA/MSDC data base. The median molecular weight of compounds in the data base was 230. Compounds composed of combinations of C, H, N, and O comprised 64% of the data base. The numbers of base peaks per mass channel are tabulated. A subset of compounds (30480) with low molecular weights was selected as a volatile-compound data base; the median molecular weight of this group was 189. The probabilities and information contents for the whole set of spectra and the volatile hydrocarbons, oxygenated hydrocarbons, chlorocarbons and chlorohydrocarbons, and bromohydrocarbons were calculated. The most common peak in the entire data base and in the volatile set occurred at mass 41. All peaks in both of these sets of spectra with probabilities greater than 0.50 occurred below mass 78. The probabilities over the total and volatile-compound data base showed a general decrease with increasing mass channel with a division into odd and even-mass curves which converged at high masses. Mass channels with 0.90-1.0 bit information content occurred below ca. mass 100. Information contents decreased with increasing mass and the two odd- and even-mass curves were superimposed on the general trend. 8. Simple Tools for the Computer Aided Interpretation of Mass Spectra, Heuerding and Clerc -- 1993 Simple programs to obtain from low-resolution mass spectra self-consistent sets of plausible estimates for the molecular mass and the elemental composition of the more important peaks are described. The algorithms used are described in detail. 9. Chemical Substructure Identification by Mass Spectral Library Searching, Stein -- 1995 A library-search procedure that identifies structural features of an unknown compound from its electron-ionization mass spectrum is described. Like other methods, this procedure first retrieves library compounds whose spectra are most similar to the spectrum of an unknown compound. It then deduces structural features of the unknown compound from the chemical structures of the retrievals. Unlike other methods, the significance of each retrieved spectrum is weighted according to its similarity to the spectrum of the unknown compound. Also, a "peaks-in-common" screening step serves to reduce search times and an optimized dot product function provides the match factor. If the molecular weight of the unknown compound is provided, the identification of certain substructures can be improved by including "neutral loss" peaks. Correlations between the presence of a substructure in a test compound and its presence among library retrievals were derived from the results of searching the NIST/EPA/NIH reference library with a 7891 compound test set. These correlations allow the estimation of probabilities of substructure occurrence and absence in an unknown compound from the results of a library search. This method may be viewed as an optimization of the "K-nearest neighbor" method of Isenhour and co-workers, with improvements that arise from spectrum screening, peak scaling, an optimal distance measure, a relative-distance weighting scheme, and a larger reference library. 10. Estimating Probabilities of Correct Identification from Results of Mass Spectral Library Searches, Stein -- 1994 This work presents a method for using mass spectral match factors reported by library search systems to obtain certain probabilistic indicators of correct identification. The overall probability that a retrieval is correct is formally separated into two indepen- dent terms. One of these is the probability that a retrieval is correct, assuming that the correct match is contained in the library. This can be computed from test results. The other term represents the probability that the spectrum of the unknown compound is actually in the library. While the absolute value of this term cannot be computed, a relative value based solely on search results can be derived. This value may, if desired, be used to refine an initial estimate of the overall probability. Parameters used in this calculation are based on changes in test results caused by the logical removal of the test compounds from the library. These methods were parameterized from results of searching the NIST/EPA/NIH Mass Spectral Database with 12592 good quality replicate spectra and a simple mass spectral comparison function. The methodology should be equally applicable to other libraries and search systems.
|