|
ABSTRACTS ON LIMITS TO MASS SPECTRAL MOLECULAR WEIGHTS, EXPERT SYSTEMS AND LIBRARY SEARCH ALGORITHMS | |
|
|
1. Upper Limits for Molecular Weights of Five Classes of Volatile Organic Compounds from Low Resolution Mass Spectra , Scott -- Unpublished, 1996 Simple rules for determining the upper limits to the molecular weights of volatile members of five chemical classes have been established from NIST/ EPA/NIH reference mass spectra. The classes are nonhalobenzenes; chlorobenzenes; bromo- and bromochloro- alkanes and alkenes; mono- and dichloro- alkanes and alkenes; and tri-, tetra- and pentachloro- alkanes and alkenes. The rules are based on class constants added to the average of two experimental spectral features, Maxmass and Himax1. The class upper bounds, for 99% of the compounds studied, were ca. 30 to 60 Daltons above this average, except for bromo- and bromochloro- alkanes and alkenes where it was 122 Daltons. The robustness of these bounds is discussed and a strategy to use them in compound identification is outlined. 2. Bounds for Molecular Weights of Organic Compounds from Low Resolution Mass Spectra, Scott -- 1995 Simple empirical rules for determining the upper and lower limits to the molecular weight of organic compounds from low resolution mass spectra have been established and evaluated. The rules are based on the average of two easily determined spectral features, Maxmass and Himax1. The lower limit is this average less five Daltons and the upper limit is the average plus 100 Daltons. These limits have been evaluated with a set of 400 randomly chosen NIST/NIH/EPA reference spectra, with 400 pharmaceutically related spectra from a Swiss database and with 100 bromine and chlorine containing NIST reference spectra. In only 12 cases were the true molecular weights lower than the estimated lower limits and these were due to contaminated spectra. The upper limit rule includes 98% of the spectra of all volatile compounds tested and 96% of all compounds tested. 3. Pattern Recognition/Expert System for Identification of Toxic Compounds from Low Resolution Mass Spectra, Scott -- 1994 An empirical rule-based pattern recognition/expert system for classifying, estimating molecular weights and identifying low resolution mass spectra of toxic and other organic compounds has been developed and evaluated. The system was designed to accommodate low concentration spectra and provide some information for mixtures. It consists of a classifier followed by molecular weight estimators, filters and identification modules. Computed series of allowed molecular weights and selected base peaks for five classes are used in the filters to reduce misclassification and ensure correct identification. The target classes are nonhalobenzenes; chloro- benzenes; bromo- and bromochloro-alkanes/alkenes; mono- and dichloroalkanes/ alkenes; and tri-, tetra- and pentachloro-alkanes /alkenes. The identification module for the 75 target compounds relies upon the high accuracy of the molecular weight estimators and base peak data for unique identification. The total system was extensively tested with reference spectra of 32 potential air pollutants, 99 randomly selected compounds, 37 GC/MS field spectra and with 400 pharmaceutical related spectra. Even with incomplete spectra the classification and identification performance was very good with accuracies of 97 (test, random and pharmaceutical) and 95 % (field GC/MS). The median absolute deviations from the true molecular weights of the test, random, field and pharmaceutical spectra were 1-2 Da and the average absolute deviations were 6-10 Da. The program is very fast and runs on a personal computer. 4. Empirical Pattern Recognition/Expert System for Molecular A fast, personal-computer based method of estimating molecular weights of organic compounds from low resolution mass spectra has been redesigned and implemented with a rule-based expert system. It has a sequential design with a pattern recognition classifier followed by molecular weight estimator and filter modules for each of six classes. The classes are nonhalobenzenes; chlorobenzenes; bromo- and bromo- chloroalkanes/alkenes; mono- and di chloroalkanes/alkenes; tri-, tetra- and penta- chloroalkanes/alkenes; and unknowns. The classifier was derived from 106 NIST/EPA/ MSDC reference spectra. The filters employ computed series of allowed molecular weights and selected base peaks for each class, except unknown, to reduce misclassification. Empirical linear corrections from the training spectra are applied to two mass spectral features, MAXMASS and HIMAX1, to yield estimates and lower limits to the molecular weights. Extensive testing of the system was conducted with 32 test, 99 randomly chosen and 37 field GC/MS spectra and results were compared to those from STIRS. The median absolute deviations from the true molecular weights of the test, random and field GC/MS spectra with the expert system were all 1 dalton (average 5.6, 7.3, 5.9 daltons, respectively). This approach also was evaluated with 400 spectra of volatile and nonvolatile compounds of pharmaceutical interest. The median and average absolute deviations from the true molecular weights of the 400 spectra were 2 and 10 daltons. Classification of the evaluation spectra, including many incomplete spectra, was very good with accuracies of 97 (test, random and pharmaceutical) and 95% (field GC/MS). 5. Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification, Stein and Scott -- 1994 REPRINTS OF THIS PAPER ARE NO LONGER AVAILABLE Five algorithms proposed in the literature for library search identification of unknown compounds from their low resolution mass spectra were optimized and tested by matching test spectra against reference spectra in the NIST-EPA-NIH Mass Spectral Database. The algorithms were probability -based matching (PBM), dot-product, Hertz et al. similarity index, Euclidean distance, and absolute value distance. The test set consisted of 12,592 alternate spectra of about 8000 compounds represented in the database. Most algorithms were optimized by varying their mass weighting and intensity scaling factors. Rank in the list of candidate compounds was used as the criterion of accuracy. The best performing algorithm (75% accuracy for rank 1) was the dot-product function that measures the cosine of the angle between spectra represented as vectors. Other methods in order of performance were the Euclidean distance (72%), absolute value distance (68%), PBM (65%), and Hertz et al. (64%). Intensity scaling and mass weighting were important in the optimized algorithms with the square root of the intensity scale nearly optimal and the square or cube the best mass weighting power. Several more complex schemes also were tested, but had little effect on the results. A modest improvement in the performance of the dot-product algorithm was made by adding a term that gave additional weight to relative peak intensities for spectra with many peaks in common. |