12-11-96                                   (Peer Reviewed but Unpublished)

                   Upper Limits for Molecular Weights of Five Classes of Volatile                 Organic Compounds from Low Resolution Mass Spectra                                                                Donald R. Scott                 

                         Scott Associates, 8505 Bobcat Drive, Round Rock, TX 78681, USA

                     bar              lhomep.gif (1206 bytes)

  Abstract

          Simple rules for determining the upper limits to the molecular weights of volatile members of five chemical classes have been established from NIST/ EPA/NIH reference mass spectra. The classes are nonhalobenzenes; chlorobenzenes; bromo- and bromochloro- alkanes and alkenes; mono- and dichloro- alkanes and alkenes; and tri-, tetra- and pentachloro- alkanes and alkenes. The rules are based on class constants added to the average of two experimental spectral features, Maxmass and Himax1. The class upper bounds, for 99% of the compounds studied, were ca. 30 to 60 Daltons above this average, except for bromo- and bromochloro- alkanes and alkenes where it was 122 Daltons. The robustness of these bounds is discussed and a strategy to use them in compound identification is outlined.


  Introduction

     Mass spectrometry is commonly used, normally with a separation technique, to identify organic compounds in complex and other samples. In complex samples containing many compounds the identification process can be difficult. Additional information about the unknown compound, e.g. possible molecular weight ranges, can be very useful in the spectral interpretation. In previous studies [1-4] a rule-based expert system has been developed which estimates molecular weights and their upper and lower bounds from mass spectra. An effective rule for predicting the lower limit to the molecular weight for organic compounds was found along with a less effective, but general, rule for the upper limit. These bounds can be used to limit the range of candidate compounds in library searches or to help verify the assumed identify of an unknown compound.    

  In the present study new, and more effective, upper bounds for 'volatile' members of five important chemical classes were determined from reference spectra to supplement the previous more general bound for all classes. These classes are nonhalobenzenes; chlorobenzenes; bromo- and bromochloro- alkanes and alkenes; mono- and dichloro- alkanes and alkenes; and tri-, tetra- and pentachloro- alkanes and alkenes. These particular chemical classes are important in environmental, forensic and toxicological studies as well as in other areas. A strategy for using the derived upper limits to aid in identification of unknown spectra also is outlined.


  Results and discussion

     In a recent study [4] a general rule for the upper limit for all organic compounds was proposed. This was 100 Daltons above a reference mass, included 96% of all compounds studied and was derived from 900 reference spectra of compounds of molecular weights from 31 to 819 Daltons. The basis of the upper bound estimation is the use of a mass marker in the spectrum which is intense enough to be easily detected above noise levels and also correlates with the molecular weight. The average of two spectral features, Maxmass and Himax1, has been found to be effective as a reference mass. Maxmass, which could also be denoted Himax5, is the largest mass peak in the experimental spectrum with an intensity of at least 5% of the base peak and Himax1 is the largest mass peak with an intensity of at least 1%.  See the following figure of the spectrum of 1,3-butadiene and the Maxmass and Himax1 values for this compound. The compound molecular weights described here were calculated with isotopically averaged atomic weights and are not necessarily equal to the mass of the molecular ion as used by mass spectroscopists.


              butams.gif (4230 bytes)

 


To determine more efficient, i.e. lower, upper bounds for the volatile members of the five classes, the differences between the molecular weight and this reference average were examined for all spectra in each class. These more efficient upper limits would be of particular interest in interpreting GC/MS spectra. The general class names describe the major members of the classes, but in two cases there are smaller subsets of related compounds included. Detailed descriptions of the five classes are given below.

     The NIST/EPA/NIH mass spectral database, PC Version 4.0, which contains 62235 spectra, was sequentially searched with appropriate elemental and molecular weight constraints to collect potential class spectra. These candidate spectra were then manually examined for relevant class structures. Plots of the cumulative number of class spectra vs. the difference between the true molecular weight and the reference average was used with statistical analysis of the data to establish the upper limits.

      Nonhalobenzenes

     This class had a total of 487 members and two subclasses. The main group of 330 compounds were alkyl, alkenyl and alkynyl substituted benzenes with only one phenyl group per compound. The prototype for this group was benzene. The largest substituents of all three types had ten carbons. There were a large number of cyclic alkyl and alkenyl substituents with up to eight carbons and a few bicyclic substituents with seven to nine carbons occurred. Some deuterated benzenes also were included. The molecular weight range searched was 1 to 218 Daltons and the members with the highest molecular weight (218 Daltons) were [1-(1,1-dimethylethyl)-3,3-dimethylbutyl]benzene; (1-ethyloctyl)- benzene; (1-propylheptyl)benzene; (1-butylhexyl)benzene; decylbenzene; (1-methylnonyl)benzene; 1,4-bis(1-ethylpropyl)- benzene; p-di-tert-pentylbenzene; and 1-phenyl-1-nonane.

     One subgroup had 42 members and consisted of cyano and alkyl- and alkenylcyano substituted benzenes as well as some isocyano benzenes with only one phenyl group per compound. The prototype for this group was benzonitrile. Some three- and four- carbon cyclic substituents were present, but no biphenyls or bi- or tri-cyclic substituents occurred. The range searched was 1 to 218 Daltons and the members with the highest molecular weight (185 Daltons) were 1-cyano-4-(5-hexenyl)benzene and 1-cyano-4-cyclohexylbenzene.

     The other subgroup had 115 members and consisted of phenyl aldehydes and ketones with only one phenyl group per compound. The prototypes for this subgroup were acetophenone and benzaldehyde. No biphenyls or bi- or tricyclic substituents were included. The 1 to 218 Daltons range was searched and the ketone with the highest molecular weight (218 Daltons) was 1-phenyl-1-nonanone. The largest aldehyde was 2-(phenylmethylene)octanal at 216 Daltons.

     In the determination of the upper limit for this class all three groups of compounds were included.  An upper bound of 56 Daltons above the reference average included 99% of all the compounds in this class. The four compounds which were not included in this definition were (1,2,2-trimethyl-3-butenyl)benzene; (2-ethenyl-1,4,4-trimethylpentyl)benzene; 7-phenyl-5-hepten-2-one; and 1-phenyl-2-octanone. 95% of the compounds would be included in an upper limit of 21 Daltons above the reference average.

     Chlorobenzenes

     This class had 125 members and each compound contained only carbon, hydrogen, deuterium, chlorine and only one benzene ring. The prototype for this class was chlorobenzene. Other members were alkyl and alkenyl substituted chlorobenzenes or chloroalkyl and chloroalkenyl substituted benzenes. The largest alkyl and alkenyl substituents had six carbons and there was an alkyne substituent with three carbons. There were no biphenyls or exocyclic compounds included and the highest number of chlorines was three. The 1 to 197 Daltons range was searched and the compounds with the highest molecular weight (197 Daltons) were 2-chloro-1,3-bis(1-methylethyl)benzene and (chloromethyl)penta- methyl benzene.

      An upper bound of only 29 Daltons above the average included 99% of all the compounds in this class. The only compound excluded was 1-chloro-4- (4-methyl-4-pentenyl) benzene. A very low 13 Daltons above the average would include 95% of the compounds.

     Mono- and dichloro- alkanes and alkenes

     This class had 225 members and each compound contained only carbon, hydrogen, chlorine and in the two subsets, oxygen. The prototypes for the main group of 203 members were chloromethane and chloroethene. The largest alkane had eight carbons and the largest alkene and alkyne seven. Cyclic alkanes and alkenes were included up to cyclohexane and cycloheptatriene. Bicyclic and tricyclic compounds also occurred. The range searched was 1 to 155 Daltons and the members with the highest molecular weight (155 Daltons) were 1,2- and 1,6-dichlorohexane.

     There were two oxygen-containing subsets. One had eight chloro and alkylchloro substituted ethylene oxides, e.g. (chloromethyl)oxirane. All members had only one chlorine but alkyl ring substituents included cyclobutyl and cyclopropyl groups as well as up to four carbon acyclic alkyl and alkenyl groups. The molecular weight range searched was 1 to 155 Daltons and the members with the highest molecular weight (149 Daltons) were 2-butyl-2-(chloromethyl)oxirane, 2-(chloromethyl)-2-(2-methylpropyl)oxirane and 2-(chloromethyl)-2-(1,1-dimethylethyl) oxirane.

     The other oxygen containing subset had 14 acyclic chloroalkyl or chloroalkenyl ethers with one or two chlorines, e.g. (2-chloroethoxy)ethene. Six was the largest number of carbons included in a compound. The molecular weight range searched was 1 to 155 Daltons and the members with the highest molecular weight (143 Daltons) were 1,2-dichloro-1-ethoxyethane and s-dichloroethylether.

     In the determination of the upper limit for this class all three groups of compounds were included.  An upper limit of 47 Daltons above the reference average included 99% of all the compounds. The two compounds which were not included in this definition were 1,6-dichlorohexane and 1,3-dichlorocyclopentane. 95% of the compounds would be included in an upper bound of 36 Daltons above the average.

    Tri-, tetra- and pentachloro- alkanes and alkenes

     This class had only 57 members and each was an acyclic or, less commonly, cyclic alkane or alkene with three to five chlorines. The prototypes for this class were chloroform and trichloroethene. No member contained elements other than carbon, hydrogen, and chlorine. The largest alkane had seven carbons and the largest alkene had nine. The molecular weight range searched was 1 to 251 Daltons and the member with the highest molecular weight (238 Daltons) was 1,1,1,7-tetrachloroheptane.

      An upper limit of 53 Daltons above the reference average included 99% of all the compounds. The only compound which was excluded was 1,1,1,7-tetrachloroheptane. The 95% upper limit was 52 Daltons above the reference mass.

     Bromo- and bromochloro- alkanes and alkenes

     This class had 212 members with two subclasses, bromo- alkanes and alkenes with 174 members and bromochloro- alkanes and alkenes with 38. Each compound contained only carbon, hydrogen, and chlorine and/or bromine. The prototypes for this class were bromomethane and bromochloroethene. The largest alkane had 15 carbons, the largest alkene 14 and the largest alkyne 12. Both acyclic and cyclic compounds occurred including bi- and tri-cyclic members. The molecular weight range searched was 1 to 300 Daltons and the member with the highest molecular weight (300 Daltons) was 4,5-dibromodecane.

     Both groups of compounds were included in the upper limit determination.  A high upper limit of 122 Daltons above the reference average included 99% of all the compounds. The only compound which was excluded by this definition was 1-bromopentadecane. The 95% limit was 90 Daltons above the reference average.


  Conclusions

     The 99%, and 95%, upper limits for the five classes are summarized in Table 1 along with the number of compounds in each class, the molecular weight ranges, and the maximum class deviation from the average of Maxmass and Himax1. The upper bounds range from a low 29 Daltons, for chlorobenzenes, to a very high 122 Daltons for the bromo- and bromochloro- alkanes and alkenes. The other three classes have upper bounds of about 50 Daltons, all relative to the average of Maxmass and Himax1. All of these bounds are much lower than the previously proposed one of 100 Daltons for 96% of all compounds [4], except that for the bromo class. The upper bound for all compounds in the previous study at the present 99% level would be a very high 193 Daltons.


                Table 1. Upper Limits for Molecular Weights of Five Classes

                    Classa               Nb      MW      Upper Limitc          Max. Deviationd

                                                                          Daltons                      Daltons

                                                                       95%      99%

 

          Nonhalobenzenes     487    78-218      21        56                           91

          Chlorobenzenes        125    113-197    13        29                          54

          Mono- and dichloro- 225     50-155      36        47                          50

            alkanes and alkenes

          Tri-, tetra- and           57     119-238     52       53                           72

            pentachloro- alkanes

            and alkenes

          Bromo- and bromo-   212     95-300      90      122                          133

            chloro alkanes and

            alkenes

 

a See text for detailed class definitions.

b Number of NIST reference spectra in class.

c Distance in Daltons above average of Maxmass and Himax1 for specified percent of compounds.

d Largest deviation of a true molecular weight in the class from average of Himax1 and Maxmass.

 


    The bounds defined at the 99% level exclude only one to four compounds per class. These high-deviation compounds all have long alkyl or alkenyl groups, but there are also similar class members which do not show this behavior.

     The following simple rules for the molecular weight upper bounds of the 'volatile' components of the five classes are proposed:

           UPPER BOUND = 0.5*[MAXMASS + HIMAX1] + CLASS CONSTANT

where the constant is 56, 29, 47, 53, and 122 Daltons for the nonhalobenzenes; chlorobenzenes; mono- and dichloro- alkanes and alkenes; tri-, tetra- and pentachloro- alkanes and alkenes; and bromo- and bromochloro- alkanes and alkenes, respectively.

     The robustness of these rules for the class molecular weight upper bounds is very important. Application of these rules does not require the entire mass spectrum, only the largest mass peaks at the 1 and 5% intensity levels. Therefore low-mass truncated spectra can still be examined for molecular weight information.

     Since the limits are determined by class constants added to the reference average; only changes in the average can alter the limits. No decreases in the average can ever occur upon addition of extra spectral peaks to an initially uncontaminated spectrum. This is because both Maxmass and Himax1 are defined as the largest mass peaks in the spectrum with specified intensities. Therefore, the upper limit can never be lowered by contamination of a spectrum with impurities. However, it can be increased or remain unchanged. This upper limit is robust since additional large mass ions added to the spectrum can not cause it to be exceeded. It can be increased but not exceeded and remains a true, although less efficient, upper limit. See the following figure for an illustration.

                       UPPER LIM SHIFT

     It is expected that these rules may be used as library search constraints in identification of spectra of unknown compounds. This use assumes that the class for the spectrum of the unknown compound is known, or can be reasonably assumed. In many cases this class information can be established by the analyst from sample type, previous experience, or from other analytical data. The bromo- and chloro- classes, which constitute the majority of the studied classes, can be established from characteristic isotopic clusters in the spectra.

     If the class of the unknown compound cannot be assumed, then one approach is to use the highest upper bound of four of the five classes, 56 Daltons above the Maxmass and Himax1 average, as the molecular weight constraint. The very high limit for bromo compounds would not be very effective as a constraint and can safely be ignored in most cases as bromo compounds are rarely found. This 56 Dalton limit is still considerably better than the previous general upper limit of 100 Daltons for all compounds. If the analyst is willing to assume a higher risk of excluding candidate compounds and can assume that no bromo or tri-, tetra- or pentachloro- alkanes or alkenes are involved; then the lower 95% upper limit of 36 Daltons above the reference limit could be used. Another approach is to use a pattern recognition based classifier to establish the class. At least two have been described in the literature [3,5].

     Another possible use of these upper bounds and the previous established lower limit of 5 Daltons below the reference average [4] is in quality assurance activities, e.g. in GC/MS analyses. In this case the identity and therefore class will be known. If a class included in these studied is indicated, then an appropriate upper bound can be calculated from the rules. This, together with the general lower bound, will produce a narrow range of allowed molecular weights for the unknown spectrum, i.e. ca. 61 Daltons except for bromo compounds. If one of the five classes studied is not indicated, then the broader general upper bound of 100 Daltons will have to be used. This molecular weight range can then be used to verify or reject the assumed compound identity for the spectrum in question.

 


  References

[1] D. R. Scott, Expert system for estimates of molecular weights of volatile organic compounds from low resolution mass spectra, Analytica Chimica Acta, 246 (1991) 391-403.

[2] D. R. Scott, Rapid and accurate method for estimating molecular weights of organic compounds from low resolution mass spectra, Chemometrics and Intelligent Laboratory Systems, 16 (1992) 193-202.

[3] D. R. Scott, Empirical pattern recognition/expert system for molecular weight estimation of low resolution mass spectra, Analytica Chimica Acta, 285 (1994) 209-222.

4] D. R. Scott, Bounds for molecular weights of organic compounds from low resolution mass spectra, Chemometrics and Intelligent Laboratory Systems, 27 (1995) 129-133.

[5] W. J Dunn III, S. L. Emery, W. G. Glenn and D. R. Scott, Preprocessing, variable selection and classification rules in the application of SIMCA pattern recognition to mass spectral data, Environ. Sci. Techol., 23, 1499-1505.