Environmental Project no. 636, 2001 Report on the Advisory list for selfclassification of dangerous substancesContentsSummaryThis report features a description of the Danish Environmental Protection Agency's (EPA) Advisory list for selfclassification of dangerous substances. The substances have been identified by means of computer models, so-called QSAR models (Quantitative Structure-Activity Relationship). The list is intended as an aid to producers / importers in their selfclassification. Part I of this report features a description of the background of the list, its contents, and its application. Part II comprises a technical description of the QSAR models used, the creation of the list, and its relationship to the criteria for classification of selected dangerous properties. The list can be found on the Danish EPA's homepage ( www.mst.dk ) under the heading "chemicals". With the aid of QSAR models, the Danish EPA has examined approximately 47,000 chemical substances, identifying 20,624 substances which are deemed to require classification for one or more of the following dangerous properties: Acute oral toxicity, sensitization by skin contact, mutagenicity, carcinogenicity, and danger to the aquatic environment. According to classification criteria, classification should be carried out on the basis of the knowledge available, which is most often from the results of laboratory tests on animals. However, in the experience of the Danish EPA, manufacturers / importers find it difficult to comply with their duty to assess whether a substance they wish to introduce to the market should be classified because of lack of available data. The fact is that only very little information is available on the dangerous properties of chemical substances. The Danish EPA estimates that for approximately 90 per cent of all substances, only few or no test results from animal testing etc. are available on any dangerous properties to humans or the environment. In addition to results from animal testing, the criteria for classification also provide opportunities for using alternative methods. This could for instance be studies which do not require the use of laboratory animals, but are based on comparisons with other similar chemicals by so-called structure-activity relationships. QSAR modelling is such an alternative method to assess the potential danger of chemical substances. For several years now, the Danish EPA has carried out work to develop and apply QSAR models in order to predict the properties of chemical substances. The models used here are now so reliable that they are able to predict whether a given substance has one or more of the properties selected with an accuracy of approximately 70-85 per cent. In spite of the general lack of data, reliable information on the dangerous properties of substances from suitable animal testing, etc. might be available for some substances found in the Advisory list for selfclassification of dangerous substances. To the extent that this is the case, such information should be employed for selfclassification in preference to the recommendations of this list. It should be emphasized that the list is not binding. The responsibility for carrying out correct classification still rests with the manufacturer / importer. The Danish EPA calls upon importers/manufacturers to use the Advisory list for selfclassification of dangerous substances as a tool in their assessment of the dangerous properties of chemicals in cases of insufficient or no data for the selected dangerous properties. 1. Background, contents, and use of the list1.1 BackgroundWhen chemical substances are to be classified in terms of the danger they represent, their inherent properties are assessed on the basis of the knowledge and information available /1,57/. Such assessment is often carried out on the basis of results from animal testing. Assessment must be carried out individually for each property, which means that extensive animal testing may be required for a single substance. Thus, complete identification of all the properties that are classified at present can entail up to 30 animal studies on animals for just one substance. Studies have shown that very little information is available on the danger posed to human beings and the environment by chemical substances in the European market. In 1999, the European Commission assessed the scope of available test data for substances which are available on the market in large quantities (more than 1,000 tonnes per manufacturer/importer per year in the EU). The Commission found that the minimum information on dangerous properties of substances required under EU regulations in order to carry out risk assessment of industrial chemicals was only available for 14 per cent of all the substances studied. For 21 per cent of all substances, no test data at all was available as regards their toxicity towards human beings or the environment /2/. In 2000, the Danish EPA carried out a study to determine the extent of the data available on the danger, presented by the approximately 100,000 substances in the EU Inventory of Existing Substances* /3/, in two of the world's largest sources of publicly available test data (RTECS, 2000**; AQUIRE, 1994***. This study showed that test data on selected types of effects were available for the following percentages of all Einecs substances /4/: Table 1
The criteria on classification describe how available experimental test data (from animal testing, etc.) should be used in assessment and classification of the toxicity of substances for human beings and the environment. These criteria also describe how the danger presented by substances can be assessed by means of comparison to other, similar substances with known toxic properties (SARs, Structure-Activity Relationships). Finally, these criteria include the use of expert judgements, e.g. from practical experience of a given substance, as the basis for classification /1,57/. In Denmark as well as internationally in the EU and the OECD the importance of developing alternative methods, which are not based on animal testing, are emphasized. Lower organisms such as algae and bacteria are already being used in tests for certain properties, and today good results have been achieved by means of alternative tests rather than tests on animals. A test method for skin irritation, which does not require the use of living animals, was recently added to the rules on classification /1,57/. As regards many dangerous properties, however, efforts made to discover suitable methods for testing which do not require use of laboratory animals have untill now not succeeded. 1.2 QSAR models - an alternative method for assessment of dangerQuantitative structure-activity relationships (QSAR models) can be used for assessment of dangerous properties as an alternative to animal testing. A QSAR model relates an effect with molecular descriptors found to be tied to this effect. Using information on the relevant molecular descriptors the models can predict effect for substances without test data. By using the ability of computers to go through large quantities of information, QSAR models have in this project been used to assess a big number of substances. The principle behind structure-activity relationships is that substances with comparable structures possess similar properties. SARs and QSARs are well-known tools for assessment of chemical substances. These tools are used by authorities in the USA and the EU, as well as by industry, to assess physico-chemical, toxicological, and eco-toxicological properties and to predict the fate of substances in the environment. The criteria for EU classification include the possibility of using expert judgements as well as conclusions based on structural analogies /1,57/. SARs and QSARs have been used for classification of effects on the aquatic environment in cases where no test data on toxicity or degradation in the aquatic environment were available. As regards classification for impacts on human health, SARs have been applied in specific cases, and this tool was recently used in a discussion of two special properties: Narcotic effect and defatting properties. 1.3 The Advisory list for selfclassification of dangerous substancesThe Danish EPA has carried out work on QSAR models for several years, an area that continues to develop. At present, the Danish EPA has access to reliable models which are capable of predicting whether a substance possesses one or more of the dangerous properties selected in this context. The substances on this list have been assessed for the following dangerous properties: Acute oral toxicity, sensitization by skin contact, mutagenicity, carcinogenicity, and danger to the aquatic environment. According to validation results the models available to the Danish EPA identify the substances which possess these properties with a degree of accuracy of approximately 70 - 85 per cent, depending on the model used. The basis for the list was the European Inventory of Existing Substances, Einecs. For technical reasons, the QSAR models can only assess chemical substances with unambiguous chemical structure, so-called discrete substances. The Danish EPA has used validated QSAR models to carry out a systematic assessment of the approximately 47,000 discrete organic substances in Einecs. Also, the approximately 7,000 chemical substances which have already been classified by EU authorities, have not been included in the assessment*. The criteria for computer-model selection of substances for a given property have been defined to match the criteria for classification of chemical substances as closely as possible. /1/. For properties, where the criteria are open to interpretation, such definitions have been specified in accordance with the Danish EPAs best judgement with a view to providing the public with an operative list. The preparation of this list is described in more detail in Part II. The result of the computer-based assessment is this Advisory list for selfclassification of dangerous substances, which comprises 20,624 chemical substances with suggested classifications for one or more of the dangerous properties selected. By making this Advisory list for selfclassification of dangerous substances available to the public, the Danish EPA wishes to offer manufacturers / importers a tool which can be used when carrying out selfclassification of chemical substances for those dangerous properties which are included in the list. Enterprises are encouraged to include the advisory classifications provided in this list in their assessment of chemical substances where no results from animal testing or other reliable data on the relevant dangerous properties are available. The selected dangerous properties and classifications are listed in Table 2. Table 2
Figure 1 shows how many of the 20,624 substances in this list have been included with advisory classifications for each dangerous property. Figure 1:
At the same time, the list cover only those selected dangerous properties which feature the most reliable computer-generated predictions. Therefore these substances may well possess other dangerous properties. Finally, for each of the selected dangerous properties, only the substances for which the model predictions are most reliable, have been included in the list. As a result, substances that were assessed but not included in this list may well possess one or more of the dangerous properties selected. Similarly, if a substance is included in this list and does not have an advisory classification for e.g. carcinogenicity, the substance can nevertheless have this property. The reason for this could be that the models for carcinogenicity applied do not have good coverage for this specific chemical substance. If a substance is not included in the list, or it is on the list but without one or more advisory classifications, this can then be due to the models predicting that the substance does not possess these dangerous properties, or it can be because the models are not able to give a good prediction in these cases. Finally, out of the substances that a model cover, it can sometimes erroneously estimate substances as not having a property which they in fact do have (false negatives). With other substances, the models will attribute a specific property to a substance, which actually does not possess that property (false positive). The Advisory list for selfclassification of dangerous substances can be used to identify substances that do possess dangerous properties, well knowing that that some predictions will be false positives. If the list had contained negative predictions, a part of these would also be incorrect (false negatives). By this, substances which in reality possess dangerous properties would be advised not to be classified for this. This list, only containing positive predictions, can not be used to "acquit" substances of dangerous properties. 1.4 The duty of manufacturers and importers to carry out selfclassificationManufacturers or importers are responsible for investigating the properties of chemical substances and for classifying them in accordance with their inherent dangerous properties before marketing them. Such selfclassification must be carried out on the basis of available information on substances in accordance with the criteria of the Statutory Order on Classification /1,57/. As regards the approximately 7,000 substances for which harmonised classification has been adopted, the classification of the List of dangerous substances shall be applied /5/. For the remaining approximately 93,000 of the 100,000 substances in the EU Inventory of Existing Substances /2/, importers/manufacturers are obliged to assess whether such a substance should be classified as dangerous (selfclassification). Selfclassification must be carried out in accordance with the criteria in Appendix 1 of the Statutory Order on Classification. The Advisory list for selfclassification of dangerous substances is intended as a tool to help manufacturers / importers fulfil their duty to carry out correct classification in those cases where no other information is available on a given substance. When preparing this list, the Danish EPA has not examined whether data on individual substances is available in literature. The duty to map available information on substances for selfclassified lies with manufacturers / importers. Reliable test results or relevant specialist knowledge on specific substances should always be used in preference to computer predictions. This is to say that where such information - which runs contrary to the recommendations of this list - is available, it should be used instead of the classifications featured in this advisory list. At the same time, it should be emphasised that this advisory list includes only some of the dangerous properties which must be considered by manufacturers / importers in their assessment of substances. Manufacturers / importers should also carry out assessment of other properties regarding flammability, explosivity, and danger to human health and the environment. Use of the list It is recommended that the list be used for selfclassification in the following way:
2. Technical description of the creation of the list and the QSAR models used2.1 IntroductionIn a field developing as rapidly as QSARs are today, there will always be better models, better validations and new endpoints becoming available - and consequently never a "right" time to release advisory classification based on them. It is however, felt that considerable information has been accumulated which can now be of help in the otherwise difficult task of assessing the toxicology of many thousands of otherwise untested chemicals. This knowledge may also be of assistance in helping to direct future testing plans to the areas for which it is most urgently needed. 2.1.1 SAR / QSARThe concept that similar structures will have similar properties is not new. Already in the 1890s it was discovered, for example, that the anaesthetic potency of substances to aquatic organisms was related to their oil/water solubility ratios, a relationship which led to the use of LogP (octanol/water) as a prediction of this effect. Today it is known that all chemicals will exhibit a minimum or "basal" narcotic effect, which is related to their absorption to cell membranes, and which is well predicted by their lipophilic profile. SARs and QSARs ((Quantitative) Structure Activity Relationships) are based on a comparison of the structure and physico-chemical properties (descriptors) with measured parameters or endpoints for a range of chemicals called a training set. The endpoint may for instance be another physical-chemical property or it may be a biological effect. The descriptors may include LogP, molecular indices, quantum mechanical properties, shape, size, charge, distributions, etc. The comparison is often made with statistical tools. The goal is to determine which descriptor(s) are in an essential way connected with the endpoint in question, and to set up a relationship between these descriptors and the endpoint. When the result is expressed qualitatively the relationship is a SAR, and when the result is expressed quantitatively the relationship is a QSAR. A QSAR is a relation between the quantitative descriptors for chemical substances and a more or less graduated scale of property or effect. Once a correlation between structure / properties is established it can be used for predictions of the endpoints for other chemicals, for which the descriptors are known or can be estimated. In general, development and use of the correlations are done by computers. 2.1.2 The domain of the modelsThe domain limits the QSARs use to the endpoint being modelled and the group of substances for which it is valid. The domain of the QSAR is defined in the selection of the training set; the coverage of the descriptors of the training set define the "area" of "the chemical universe" for which the model is valid. 2.1.3 Accuracy of the model predictionsIn order to check a models predictive ability it should be validated. Validation is a trial of the model performance for a set of substances independent of the training set, but within the domain of the model. The model predictions for these substances are compared with measured endpoints for the substances in order to establish the accuracy of the models. Ideally all models should be assessed by seeing how well they predict the activity of chemicals, which were not used to make them. This is not, however, always simple. In part valuable information may be left out by setting aside chemicals to be used in such an evaluation, and in part it can be extremely difficult to assess how "external" chemicals relate to the models domain; that is, if they represent a random distribution within this thereby giving a fair picture of the performance of the model. This problem is often addressed by using one or another form of cross-validation. Statistical evaluation is an extremely important method of determining the performance of these models, and in some cases (where there is little or no test data to be found which was not used to develop the model) it is the only method available. The validation techniques most commonly mentioned in this report include the "drop one" "Q2" procedure, where one substance at a time is removed, and then predicted by a model made on the remainder of the training set. This is done once for every substance. While widely used, this form of cross-validation can have a tendency to over-predict goodness of fit. A more robust technique for these data sets is for example the "3x10% out", which consists of removing a random sample of 10% three times, and each time making a new model which is then used to predict the excluded chemicals. Instead of running this process three times it can be run until all of the chemicals have been estimated. However, three runs will generally be sufficient to establish the correlation /50/. For the validation of a parametric model the result can be expressed as the sensitivity, the specificity and the concordance of the model. The sensitivity is a measure of how well the model "catches" the substances with the effect being modeled. A sensitivity of 80% means that 80% of the "true positives" in the validation set were correctly predicted as positives, and that the remaining 20% were falsely predicted as negatives (false negatives). The specificity is a measure of how many false positives the model predicts. A specificity of 80% means that 80% of the "true negatives" in the validation set were correctly predicted as negatives, and the remaining 20% of the negatives were falsely predicted as positives (false positives). The concordance is an overall measure of the correctness of the predictions. A concordance of 80% means that 80% of the substances in the validation set were correctly predicted as positives or negatives, and the remaining 20% are the false predictions (false negatives and false positives). Predictive ability will vary depending on both the method used, and the endpoint in question. In general, predictive ability of contemporary QSAR systems can often correctly predict the activity of about 70 85% of the chemicals examined, provided that the query structures are within the domains of the models /53,54/. This also applies to the models described in this paper. Of course, a model can never be more accurate then the test data on which it was based. Therefore it is extremely important to be aware of the accuracy and reproducibility of the test data used for making a model. If a biological test gives the wrong results 17% of the time, the "perfect" model based on these tests would also be wrong in 17% of the time. In addition to assessing the predictive ability of a model, it is also necessary to consider in which context it will be used. In some cases a large number of "false positives" or "false negatives" may be acceptable, while in others they will not be. In this exercise there was no deliberate attempt to adjust the weight of these factors in either direction. The specific "context " in which these models have been used is simply that where there are no tests or other information available, the alternative is that the substance is not assessed at all for the endpoints covered. 2.1.4 SoftwareToday numerous computerized systems exist for predicting a large range of effects reaching from biodegradability to cancer. These include fragment based* statistical systems such as TOPKAT and M-CASE, as well as three-dimensional Modelling of ligand docking** such as Comparative Molecular Field Analysis (COMFA). Mention should also be made of OASIS /46,47/, a sophisticated program package able to estimate a wide variety of effects using 3-D and Quantum Mechanical parameters, and which is currently being used to estimate binding of chemicals to Estrogen receptors /48/. In essence, these programs dont really do anything "new." They are simply grouping substances with similar structures and similar effects, including use of global or local parameters such as LogP and electrophilicity in much the same way as an expert might do. However, they do this at very high speed and take account of a large number of factors simultaneously (such as critical inter-atomic distances) which can assist an expert in finding hitherto unobserved relationships. In addition, the programs TOPKAT and M-CASE described below, emulate another human characteristic, and reject estimates for chemicals where there is simply not enough information to provide a sound prediction. They accomplish this by iterative statistical methods rather than by human intelligence or intuition. M-CASE M-CASE is a knowledge-based artificial intelligence system capable of learning directly from data. Models made in this program can predicts various toxic endpoints on the basis of discrete structural fragments found to be statistically relevant to a specific biological activity, either increasing or decreasing it. The program can thus provide a "chemical" explanation to observed biological properties. It assumes that the presence of fragments previously found in a number of active compounds is indicative of potential activity. This fragment-based method is assumed to be a reasonable basis to assess the activity of new molecules. On the basis of the presence of the fragments in a query molecule the program will estimate a value for its potency by using "local QSARs" for the various fragments. If so found, "global QSARs" like the relation between LogP and toxicity to aquatic organisms may also be included in the model. The program gives a warning if there are fragments in the query molecule, that are not found in the training set of the model, indicating that the query molecule is outside the domain of the model /38,43/. Estimates for substances found to be within the domain of the model and for which sound predictions could be made are referred to as AOKs ("All OK chemicals") in this paper. TOPKAT TOPKAT assesses toxicity of chemicals from their molecular structure utilizing QSTR (Quantitative Structure Toxicity Relationship) models for assessing specific adverse health effects /56/. When querying the program by entering a code for chemical structure, the program determines the compound class of the structure for those models which have class-specific sub-models. Next, the system computes the descriptors needed for the specific toxicity model. These consist of for example electrotopological state, kappa indices, molecular weight and symmetry indices. Then the program checks whether all the fragments present in the query molecule were present with adequate frequency in the training set for the specific equation. If there are no missing fragments, the program next checks whether the query is within the optimum prediction space of the equation. If this is the case, the training set of the model is searched for the compounds most similar to the query molecule, and the concordance between the actual and predicted values for those compounds is determined /45/. If there is reasonable agreement between oberserved / predicted values for the four most similar substances the estimate is accepted and referred to as AOKs in this paper. Epiwin This suite of programs developed by Syracuse Research Corporation was used to estimate three ecotoxicological parameters Biodegradation, LogP and Bioconcentration. Unlike TOPKAT and M-CASE, Epiwin does not attempt to define a predictive space, and all estimates were used "as is". Chem-X This program has features for making estimates for a large number of physical-chemical properties of chemicals, making 2D- and 3D-QSARs and storing large amounts of data and chemical structures in databases. The Danish EPA has built up a database in Chem-X which contains QSAR predictions for about 166.000 substances /55/, including almost all of the discrete organic chemicals in Einecs, a total of approximately 47,000 substances. Estimates are available for a number of endpoints covering both health- and environmental concerns. The QSAR estimates for these chemicals create the background for the recommended selfclassifications. Detailed facilities for searching, displaying and manipulating chemical structures are also available in this data package. This tool was used extensively to compare test data, predictions and selected sub-substructures while performing "expert" assessment of the QSARs. Possibilities for dissemination of this database and the detailed QSAR predictions are currently unclear due to issues of copyright. 2.2 Methodology in making the list2.2.1 The selected dangerous propertiesThe following endpoints were addressed:
2.2.2 The evaluated chemical substancesThe overall purpose of the project was to evaluate as many as possible of the substances in Einecs (European Inventory of Existing Commercial Chemical Substances) /2/. The list consists of 100.116 entries, covering organic and inorganic substances in both single substance entries and mixtures. The screening was limited to cover "discrete organics," meaning that UVCBs (Unknown, Variable Composition and Biologicals) and other ill-defined structures or mixtures were excluded for practical reasons if you dont know what it is, you cant really make a model. Exceptions were made where this seemed logical (C12 C16 n-alcohols has been entered as C14 n-alcohol hydrochloride salts have been entered as the parent compound, etc.). Inorganic substances have likewise not been evaluated. These are usually better approached by simpler methods of evaluating the availability of the respective an- and cations with well known hazard profiles. "Organo-metallics" have also been excluded as being poor candidates for modeling. Finally, as a matter of resources, only such chemicals as were available with 3-D structural information were used /7/. In so far as this was possible using a CAS number comparison, all substances already classified on Annex I of the formal EU list (List of dangerous substances) were also removed as they should never be the subject of provisional classification. This resulted in a total of 46,707 or about half of all Einecs chemicals, which could be subjected to screening. 2.2.3 Test dataFor the vast majority of the chemicals no measured data was available. However, if measured data were available as part of the model, this was generally used in preference to the estimates. It is important to stress that no attempt was made to search the worlds published or unpublished databases for toxicological information to determine whether a QSAR was even necessary for each endpoint. This task is the responsibility of the manufacturer / importer of the individual chemicals. 2.2.4 Use of QSAR modelsThe technical specifications for the models and a description of the criteria for assignment of advisory classifications for each effect are given in the technical sections for the individual endpoints. It should also be stressed that the models available do not predict a "classification" they predict biological activity that may lead to a classification. Further criteria have therefore been applied to each endpoint to try and link the biological prediction with a risk phrase. Because of the large number of chemicals involved, "rules" were used to achieve this purpose. Such rules are also imperfect, but in essence the process is no different than that imposed upon a human expert forced to use common sense to provide a provisional classification for any given substance for which the desired test data does not exist. Only model predictions that satisfied a formal criteria were used: For M-CASE the predictions had to fall within the optimum prediction space of the model, meaning that there were no unknown fragments, and that there was sufficient knowledge about the known fragments to give an unequivocal prediction. As described in the technical sections, expert inspection has been undertaken where time allowed to confirm the probable activities given by the QSARs. This has included evaluation of the QSAR estimates in comparison with known biological activities and chemical properties. No in depth toxicological assessment of the individual chemical substances has been undertaken. Questionable QSAR predictions for each endpoint were excluded. The effort used on expert inspection varied with the endpoint in question. In general most time was used in assessing the predictions for Mutagenicity and Carcinogenicity, and least was used on Allergy and Aquatic Effects. 2.2.5 The resultIt is important to understand that the results as given in the Advisory list only represent POSITIVE predictions. No distinction has been made between a negative prediction for an endpoint, and an unreliable prediction (a non-AOK prediction) which was simply discarded. Evaluated substances not on the list, or substances which are on the list but without advisory classifications for one or more of the selected dangerous properties, may have been predicted as not having this / these dangerous properties, or the models may not have been valid for this substance. Therefor the advisory list can not be used to conclude that these substances do not posess dangerous properties. Depending on the endpoint in question, unreliable predictions were obtained for between 5 and 65% of the chemicals examined. 2.3 Acute oral toxicityEU criteria for classification The formalized criteria for classification for acute oral toxicity includes a number of options of tests including fixed-dose procedure and interpretation of the various sources of information about acute oral toxicity, but is often based on acute LD50 tests in the rat for which the following classification criteria are used: Table 3
An advisory classification of Xn;R22 is recommended in all cases where a rat oral LD50 of £ 2000 mg/kg is predicted or based on measured data. For reasons indicated below, no attempt was made to differentiate between the different levels of acute toxicity, and it is important to recognize that this classification will often be less stringent than classification based on measured data. If test results measured in the rat were readily available (had been used to make the model) these took precedence over any predictions. As acute toxicity data from the mouse following a variety of different routes of administration was also available in some cases, this was used to predict rat oral LD50s using the QSARs preferentially as follows /8,9/: Table 4
iv: Intravenous Biological data consisting of LD50s in mice or rats was available for just over 10% of the chemicals processed. If no biological data were available, rat oral LD50 was estimated according to the QSTR model TOPKAT (v 5.01). According to TOPKAT, the model contains about 4,000 substances and their own cross-validation for this endpoint indicates 86-100% of estimations falling within a factor of five from test results /10/. Danish EPAs external evaluation of this model using 1,840 chemicals not contained in the TOPKAT data set gave somewhat poorer results; R2 = 0.31. According to this evaluation 86% of estimations fall within a factor of ten from test results /11/. The distribution can be seen in table 5. Table 5
Where TOPKAT was able to make a robust prediction (AOK) it found 57% of all chemicals to have an acute oral LD50 in rat of £ 2,000 mg/kg. The percentage of chemicals with acute toxicitys of £ 2,000 mg/kg for 12,632 chemicals tested for acute toxicity in rat found in the Registry of Toxic Effects of Chemical Substances (RTECS 1998) /52/ was 61%. That these two percentages are so similar is not surprising, since RTECS data was also the chief source of biological information used to construct the TOPKAT model. A schematic diagram of the systematic evaluation is given in figure 2. Figure 2 Look here! Approximately 10,200 compounds were estimated as having an acute LD50 in rat of 2,000 mg/kg or less*. About 700 were removed by expert judgement in an attempt to exclude amino-acid and protein-type compounds which were considered likely to break down due to the effects of gastric acidity, or substances for which gastric absorption was expected to be poor. This resulted in 9,538 substances with an advisory classification of Xn;R22. 2.4 Sensitization by skin contactEU criteria for classification Classification as sensitizing by skin contact, R43 ("May cause sensitization by skin contact"), is based either on animal studies or practical experience or combinations thereof. The animal criterion is based on either an adjuvant or non-adjuvant test. Different adjuvant tests exist, but the Magnusson-Kligmanns method (GPMT: Guinea Pig Maximization Test) is preferred. Response in 30% of the animals results in classification. For a non-adjuvant test (for example the Büehler test) 15% responding animals is regarded as positive. The human data can be results from patch testing, case studies or epidemiological studies. Evaluation based on QSAR models Two approaches were used to estimate contact sensitisation /14,15/. The first approach uses two TOPKAT QSTR models. The first model was used to predict "Allergy versus non-allergy", and, in cases where this was positive, the second model was used to predict "Strong versus weak/moderate allergy". The models used were primarily related to the GPMT. Only predictions of "Strong allergy" were considered as being likely to fulfill the EU criteria for R43. In a second approach, predictions were also made using M-CASE. The data set used to produce the M-CASE models differed somewhat from the TOPKAT set, in that both data from the GPMT and human data were represented. Only positive predictions with M-CASE scores of > 40 (corresponding to "very active") were selected. Table 6
It is difficult to know how representative New Chemicals are with regard to the universe of Existing Chemicals. Generally New Chemicals are more complex structures with higher molecular weights. Perhaps the most surprising aspect of this exercise was to find that for over three thousand chemicals that should have been assessed for this endpoint, such a tiny percentage of useful test data could be found. Compounds predicted as positive by either TOPKAT or M-CASE according to the above criteria were selected, provided that they were either AOK in the first, or contained no unknown fragments or equivocal results in the latter. While it was considered to use "positive" in both models as a criteria, in the end this seemed inefficient, not so much duo to lack of concordance between model predictions, but because the acceptance domains (AOK or all fragments known) of the two methods differed considerably. No attempt was made to further reduce the list by systematically applying expert judgement. A schematic diagram of the systematic evaluation is given in figure 3. Figure 3 9,668 chemicals met the above criteria, for which an advisory classification of R43 is suggested. This strike many experts as being a rather large number of chemicals and while these models represent the current "state-of-the-art" it may indicate that they are over-sensitive. However, it was very difficult to obtain any reliable indication of how many Existing Chemicals would cause contact allergy if actually tested in animals or humans. Estimates of percentages of allergens on Einecs ranged from 5-25%, with some preference being expressed for 10%, which is the number of Annex I substances currently classified for this effect. It is not possible, however, to estimate the influence of confounders on the distribution represented in Annex I. Positive bias can have been introduced because chemicals testing positive are over-represented. Negative bias can have been caused by the fact that most of the chemicals have never been tested at all. The question of numbers remains open. 2.5 MutagenicityEU criteria for classification The criteria for classification for mutagenicity is divided into 3 different categories: Classification as mutagen, category 1 (mut1;R46, may cause heritable genetic damage) is based on evidence of a causal association between human exposure to the substance and heritable genetic damage. Classification as mutagen, category 2 (mut2;R46, may cause heritable genetic damage) is based on animal studies showing mutagenity to germ cells either in assays on germ cells or by demonstrating mutagenic effects in somatic cells in vivo or in vitro as well as metabolic proof that the substances reaches the germ cells. The criteria for classification as mutagen, category 3 (mut3;R40, possible risks of irreversible effects) is based either on in vivo mutagenicity tests or on cellular interactions with in vitro tests acting as supportive evidence. For this classification, it is not necessary to demonstrate germ cell mutations. Evaluation based on QSAR models A number of models were applied for this endpoint. The different models predict a number of genotoxicity endpoints. Induction of micronuclei in vivo, was required, as this demonstrates chromosomal damage in somatic cells in vivo. The remaining endpoints reflect in vitro genotoxicity, where positive results would not normally lead to classification for this effect. However, positive results for these endpoints provide supporting evidence for data from in vivo estimates. Table 7
It is not suggested that positive in vitro evidence should also be necessary when classifying substances with positive in vivo test data. However, it was not felt that the QSAR model for the mouse micronucleus test alone was sufficient, and data estimates from additional QSARs relevant to the endpoint were therefor used to increase the likelihood of a correct positive prediction. Chemicals for which model estimates were positive for mouse micronucleus and structural alerts for DNA reactivity (here an exception was made in that predictions with one unknown fragment were also accepted) and which also had two positive genotoxicity endpoints, passed the criteria for the systematic evaluation. Two models for Salmonella (Ames) mutagenicity were used, a TOPKAT and a M-CASE module respectively. This related primarily to the fact that the models differed with regard to domain, and often a robust prediction was only available for one model. If robust predictions were available for both models, and in disagreement, this was taken into account on a case-by-case basis during the final evaluation. A schematic diagram of the systematic evaluation is given in figure 4. Figure 4 2,272 Einecs chemicals met the criteria in the systematic evaluation. As none of these models identifies germ cell mutagenicity, the current QSARs do not allow discrimination between the EU categories for mutagenic effects in the three categories and the lower classification is therefore assigned as advisory classification in all cases. Expert judgment was undertaken to confirm the robustness of the predictions of these 2,272 chemicals. This process included examination of the 2- or 3-d chemical structure, and visual comparison with test data within structural groups. If this procedure raised any doubt, substances were removed from the list for more detailed consideration in the future. This resulted in a final selection of 1,678 substances with an advisory classification mut3;R40. 2.6 CarcinogenicityEU criteria for classification This end-point can result in classification in 3 different categories: Classification as carcinogen in category 1 (carc1;R45, Toxic; may cause cancer or carc1;R49, Toxic; may cause cancer by inhalation) is based on strong causal relationship in humans. Classification as carcinogen in category 2 (carc2;R45, Toxic; may cause cancer or carc2;R49, Toxic; may cause cancer by inhalation) is based on conclusive animal data from 2 species or 1 species with supportive evidence such as genotoxic effects in vitro or in vivo. Classification as carcinogen in category 3 (carc3;R40, Harmful; possible risks of irreversible effects") is subdivided into two:
Evaluation based on QSAR models While there are many non-genotoxic carcinogens acting by a wide variety of often-unknown mechanisms, it was chosen to focus here on chemicals likely to cause cancer through a genotoxic mechanism. Therefor a pre-selection criteria for genotoxicity was set up. The criteria for the pre-selection for carcinogenicity was a positive estimate for structural alerts for DNA reactivity (AOK or one unknown fragment) and two positive AOK genotoxicity predictions out of five models for genotoxicity. The technical specifications for the models used to predict genotoxicity is given in the chapter "Mutagenicity". As opposed to the selection criteria for mutagenicity, a positive mouse micronucleus test was not demanded, as not all genotoxic carcinogens are necessarily clastogenic (cause loss, addition or rearrangement of parts of chromosomes). This gave a pre-selection of 3.362 Einecs chemicals. A total of ten cancer models were available, plus four sub-models. Table 8
* NTP: National Toxicology Program, USA The accuracy of these models can be difficult to determine, as there are few independent tests that have not already been used in the construction of the models themselves, which can be used for an independent assessment. This is particularly the case for TOPKATs models, where the only real estimates consist of the producers own "1 out" Q2 cross-validations. For M-CASE, other statistical methods are available. In a long-running project, where several cancer models predicted the outcome for NTP chemicals which had not yet been tested, upon completion of these tests (for 45 substances) the general conclusion was that accuracy of around 70% was achieved for clearly carcinogenic or non-carcinogenic substances /31/. Due to the small number of chemicals in this analysis it is difficult to know how much weight can be assigned to the conclusion. 3,362 substances met the pre-selection criteria for genotoxicity. For a substance to be selected as a probable carcinogen it was necessary for the following criteria to be fulfilled: At least two positive predictions (sub-models excluded) for carcinogenicity. An exception was made for the M-CASE CPDB models. Because the data is less homogeneous, both rat and mouse predictions had to be positive to count as one prediction, and in addition to this the carcinogenic potency had to include TD50s for tumor induction of less than 1,000 mg/kg/day. These two CPDB models were developed by Danish EPA using M-CASE methodology which is described for this data set in the following references /34,35,40/. If one or more positive tests could be seen (part of the training set for the model) for any cancer endpoint, this took precedence over model results and resulted in an over-all positive classification recommendation. While in most cases this resulted in little change (the models are heavily biased towards making a correct prediction for substances used to make them), it was felt that there was no reason to artificially reduce the quality of the advisory classification by neglecting to use data, which happen to be present. A schematic diagram of the systematic evaluation is given in figure 5. Figure 5 According to these criteria, 1,272 substances were selected for advisory classification for carcinogenicity. Expert judgment was performed on the QSARs. In this proces, all data was used including predictions of TOPKAT FDA Carcinogenicity sub-models, the probability of rapid metabolism or excretion, and where appropriate, predictions of aryl hydroxylase activity /37/. Where any doubts were raised, substances were removed from this version of the list to be considered in more detail in the future. This resulted in 652 substances selected for advisory classifications for carcinogenicity. It is not felt that the models employed allow discrimination between classification in the three categories, so the lower classification Carc3;R40 was applied in all cases. 2.7 Danger to the aquatic environmentEU criteria for classification The classification criteria are composed of three main elements: Biodegradability, Bioconcentration potential, and Toxicity to aquatic organisms. Classifications are assigned according to the following scheme: Table 9
* The lowest effect concentration for fish, daphnia or algae is used Evaluation based on QSAR models Advisory classifications were assigned on basis of combinations of estimates for biodegradation, bioconcentration and acute toxicity according to the criteria in table 9. Classification with risk phrase R53 alone was not done in this exercise, as the strong co-linearity between water solubility and bioconcentracion factor made it redundant. Biodegradation Biodegradability was estimated using the Syracuse BIOWIN program (v. 3.02) /17,41/. Only the linear equation for rapid/non-rapid biodegradation was applied. Previous validation of this parameter compared with MITI "ready/not-ready results showed that while a number of "not-ready" chemicals were missed, 93% of "not ready" predictions were correct /18/. In other words while this model may fail to identify all "non-ready" substances, the number of false predictions for lack of degradability will be acceptably low. A total of about 14,000 Einecs chemicals were found to be "not-readily degradable" according to this criteria /51/*. Bioconcentration The classification and labeling guidelines prefer measured data for Bioconcentration, but as this is seldom available, a LogP (octanol/water) of greater than three is recommended as an indication that BCF will be 100 or greater, in accordance with the linear equation of Vieth and Kosian /41/. While a good rule-of-thumb, this relation both over- and underestimates BCF for many classes of chemicals, and takes no account of the fact that bioconcentration is a bilinear function of LogP, decreasing when this is sufficiently high. Bioconcentration was therefore predicted using Syracuse BCFWIN (v. 2.13), a method based on a combination of logP (octanol-water) relations and structural fragment categories. This method was evaluated by its authors as having a statistical accuracy of R2 = 0.74 (n = 694, S.D. 0.65, mean error = 0.47), which is a significant improvement over the standard equation of Vieth and Kosian (log BCF = 0.85 * log Kow 0.70) where predictions for the same 694 compounds had a statistical accuracy of R2 = 0.32 (S.D. 1.62 and mean error = 1.12) /20/. About 11,000 Einecs chemicals were found with BCF estimates of equal to or greater than 100. No attempt was made to further assess bioaccumulation potential caused by possible presence in the diets of aquatic organisms, as it was not felt that an appropriate general model was available. Acute toxicity For aquatic toxicity classifications, values (L(E)C50) for fish, daphnia and algae are recommended (although seldom available for most existing chemicals). In the current exercise it was decided to only use predictions for fish, due to their robustness and the availability of high quality test data for model construction. For Acute aquatic toxicity to fish a M-CASE model developed by Danish EPA using 96h LD50 data on 569 chemicals from the Duluth Fathead minnow Database /22/ was applied. The model had an R2 of 0.85. Cross-validation of this model gave a Q2 of 0.735 (3*10% out). A description of the M-CASE methodology used for the Fathead minnow data can be found in the following references /21,42/. Only predictions within the optimum prediction space of the model (no fragment or other warnings) were used. As there was insufficient test data on the Fathead minnow for very lipophylic substances the M-CASE model was only applied for chemical substances with LogP of six or less. Another relationship was used for chemicals with a LogP of greater than six. Here, all substances were assumed to act by non-polar narcosis, and toxicity at equilibrium was estimated according to a relation to the predicted Bioconcentration factor: LC50 (equilibrium) = 8.15 mmol /BCF The choice of 8.15 mmol corresponds to the theoretical level inducing aquatic effects represented by the non-polar narcosis fish QSAR recommended in the EU TGD /41/. Non-polar narcosis Lethal Body Burdens for fish are generally assumed to be within the range of about 28 mmol /23,58/. While simple LogP (octanol/water) relationships exist for predicting the non-polar narcotic toxicity for fish, daphnia and algae /41/, these do not distinguish specific toxicitys unique to any of the three taxa, and were not felt to offer any advantage over using the fish models alone, which also adequately predict non-polar narcosis. For all practical purposes, non-polar narcosis induces effects at the same concentration levels in all three taxa /18/. Using both estimates, about 10,000 Einecs chemicals were found with toxicitys to fish of LC50 £ 100 mg/l. A schematic diagram of the systematic evaluation is given in figure 6. Figure 6 Look here! A total of 8,731 substances were selected according to one of the four classification categories as indicated above. Considering that the number of robust (AOK) predictions for fish toxicity was just fewer than one-half of the chemicals screened, this number seems in reasonable concordance with what would be expected for Existing Chemicals. The advantages of being able to predict toxic effects specific to both fish, daphnia and algae are obvious, and this can hopefully be accomplished in the future. A M-CASE model for acute toxicity to daphnia has recently been completed by Danish EPA (n = 574, R2 = 0.826, 3*10% out Q2 = 0.692). It is still being refined, and predictions for all chemicals will soon be available. A M-CASE model for toxicity to algae is under development.
3. References
|