Report on the Advisory list for selfclassification of dangerous substances 2. Technical description of the creation of the list and the QSAR models used2.1 IntroductionIn a field developing as rapidly as QSARs are today, there will always be better models, better validations and new endpoints becoming available - and consequently never a "right" time to release advisory classification based on them. It is however, felt that considerable information has been accumulated which can now be of help in the otherwise difficult task of assessing the toxicology of many thousands of otherwise untested chemicals. This knowledge may also be of assistance in helping to direct future testing plans to the areas for which it is most urgently needed. 2.1.1 SAR / QSARThe concept that similar structures will have similar properties is not new. Already in the 1890s it was discovered, for example, that the anaesthetic potency of substances to aquatic organisms was related to their oil/water solubility ratios, a relationship which led to the use of LogP (octanol/water) as a prediction of this effect. Today it is known that all chemicals will exhibit a minimum or "basal" narcotic effect, which is related to their absorption to cell membranes, and which is well predicted by their lipophilic profile. SARs and QSARs ((Quantitative) Structure Activity Relationships) are based on a comparison of the structure and physico-chemical properties (descriptors) with measured parameters or endpoints for a range of chemicals called a training set. The endpoint may for instance be another physical-chemical property or it may be a biological effect. The descriptors may include LogP, molecular indices, quantum mechanical properties, shape, size, charge, distributions, etc. The comparison is often made with statistical tools. The goal is to determine which descriptor(s) are in an essential way connected with the endpoint in question, and to set up a relationship between these descriptors and the endpoint. When the result is expressed qualitatively the relationship is a SAR, and when the result is expressed quantitatively the relationship is a QSAR. A QSAR is a relation between the quantitative descriptors for chemical substances and a more or less graduated scale of property or effect. Once a correlation between structure / properties is established it can be used for predictions of the endpoints for other chemicals, for which the descriptors are known or can be estimated. In general, development and use of the correlations are done by computers. 2.1.2 The domain of the modelsThe domain limits the QSARs use to the endpoint being modelled and the group of substances for which it is valid. The domain of the QSAR is defined in the selection of the training set; the coverage of the descriptors of the training set define the "area" of "the chemical universe" for which the model is valid. 2.1.3 Accuracy of the model predictionsIn order to check a models predictive ability it should be validated. Validation is a trial of the model performance for a set of substances independent of the training set, but within the domain of the model. The model predictions for these substances are compared with measured endpoints for the substances in order to establish the accuracy of the models. Ideally all models should be assessed by seeing how well they predict the activity of chemicals, which were not used to make them. This is not, however, always simple. In part valuable information may be left out by setting aside chemicals to be used in such an evaluation, and in part it can be extremely difficult to assess how "external" chemicals relate to the models domain; that is, if they represent a random distribution within this thereby giving a fair picture of the performance of the model. This problem is often addressed by using one or another form of cross-validation. Statistical evaluation is an extremely important method of determining the performance of these models, and in some cases (where there is little or no test data to be found which was not used to develop the model) it is the only method available. The validation techniques most commonly mentioned in this report include the "drop one" "Q2" procedure, where one substance at a time is removed, and then predicted by a model made on the remainder of the training set. This is done once for every substance. While widely used, this form of cross-validation can have a tendency to over-predict goodness of fit. A more robust technique for these data sets is for example the "3x10% out", which consists of removing a random sample of 10% three times, and each time making a new model which is then used to predict the excluded chemicals. Instead of running this process three times it can be run until all of the chemicals have been estimated. However, three runs will generally be sufficient to establish the correlation /50/. For the validation of a parametric model the result can be expressed as the sensitivity, the specificity and the concordance of the model. The sensitivity is a measure of how well the model "catches" the substances with the effect being modeled. A sensitivity of 80% means that 80% of the "true positives" in the validation set were correctly predicted as positives, and that the remaining 20% were falsely predicted as negatives (false negatives). The specificity is a measure of how many false positives the model predicts. A specificity of 80% means that 80% of the "true negatives" in the validation set were correctly predicted as negatives, and the remaining 20% of the negatives were falsely predicted as positives (false positives). The concordance is an overall measure of the correctness of the predictions. A concordance of 80% means that 80% of the substances in the validation set were correctly predicted as positives or negatives, and the remaining 20% are the false predictions (false negatives and false positives). Predictive ability will vary depending on both the method used, and the endpoint in question. In general, predictive ability of contemporary QSAR systems can often correctly predict the activity of about 70 85% of the chemicals examined, provided that the query structures are within the domains of the models /53,54/. This also applies to the models described in this paper. Of course, a model can never be more accurate then the test data on which it was based. Therefore it is extremely important to be aware of the accuracy and reproducibility of the test data used for making a model. If a biological test gives the wrong results 17% of the time, the "perfect" model based on these tests would also be wrong in 17% of the time. In addition to assessing the predictive ability of a model, it is also necessary to consider in which context it will be used. In some cases a large number of "false positives" or "false negatives" may be acceptable, while in others they will not be. In this exercise there was no deliberate attempt to adjust the weight of these factors in either direction. The specific "context " in which these models have been used is simply that where there are no tests or other information available, the alternative is that the substance is not assessed at all for the endpoints covered. 2.1.4 SoftwareToday numerous computerized systems exist for predicting a large range of effects reaching from biodegradability to cancer. These include fragment based* statistical systems such as TOPKAT and M-CASE, as well as three-dimensional Modelling of ligand docking** such as Comparative Molecular Field Analysis (COMFA). Mention should also be made of OASIS /46,47/, a sophisticated program package able to estimate a wide variety of effects using 3-D and Quantum Mechanical parameters, and which is currently being used to estimate binding of chemicals to Estrogen receptors /48/. In essence, these programs dont really do anything "new." They are simply grouping substances with similar structures and similar effects, including use of global or local parameters such as LogP and electrophilicity in much the same way as an expert might do. However, they do this at very high speed and take account of a large number of factors simultaneously (such as critical inter-atomic distances) which can assist an expert in finding hitherto unobserved relationships. In addition, the programs TOPKAT and M-CASE described below, emulate another human characteristic, and reject estimates for chemicals where there is simply not enough information to provide a sound prediction. They accomplish this by iterative statistical methods rather than by human intelligence or intuition. M-CASE M-CASE is a knowledge-based artificial intelligence system capable of learning directly from data. Models made in this program can predicts various toxic endpoints on the basis of discrete structural fragments found to be statistically relevant to a specific biological activity, either increasing or decreasing it. The program can thus provide a "chemical" explanation to observed biological properties. It assumes that the presence of fragments previously found in a number of active compounds is indicative of potential activity. This fragment-based method is assumed to be a reasonable basis to assess the activity of new molecules. On the basis of the presence of the fragments in a query molecule the program will estimate a value for its potency by using "local QSARs" for the various fragments. If so found, "global QSARs" like the relation between LogP and toxicity to aquatic organisms may also be included in the model. The program gives a warning if there are fragments in the query molecule, that are not found in the training set of the model, indicating that the query molecule is outside the domain of the model /38,43/. Estimates for substances found to be within the domain of the model and for which sound predictions could be made are referred to as AOKs ("All OK chemicals") in this paper. TOPKAT TOPKAT assesses toxicity of chemicals from their molecular structure utilizing QSTR (Quantitative Structure Toxicity Relationship) models for assessing specific adverse health effects /56/. When querying the program by entering a code for chemical structure, the program determines the compound class of the structure for those models which have class-specific sub-models. Next, the system computes the descriptors needed for the specific toxicity model. These consist of for example electrotopological state, kappa indices, molecular weight and symmetry indices. Then the program checks whether all the fragments present in the query molecule were present with adequate frequency in the training set for the specific equation. If there are no missing fragments, the program next checks whether the query is within the optimum prediction space of the equation. If this is the case, the training set of the model is searched for the compounds most similar to the query molecule, and the concordance between the actual and predicted values for those compounds is determined /45/. If there is reasonable agreement between oberserved / predicted values for the four most similar substances the estimate is accepted and referred to as AOKs in this paper. Epiwin This suite of programs developed by Syracuse Research Corporation was used to estimate three ecotoxicological parameters Biodegradation, LogP and Bioconcentration. Unlike TOPKAT and M-CASE, Epiwin does not attempt to define a predictive space, and all estimates were used "as is". Chem-X This program has features for making estimates for a large number of physical-chemical properties of chemicals, making 2D- and 3D-QSARs and storing large amounts of data and chemical structures in databases. The Danish EPA has built up a database in Chem-X which contains QSAR predictions for about 166.000 substances /55/, including almost all of the discrete organic chemicals in Einecs, a total of approximately 47,000 substances. Estimates are available for a number of endpoints covering both health- and environmental concerns. The QSAR estimates for these chemicals create the background for the recommended selfclassifications. Detailed facilities for searching, displaying and manipulating chemical structures are also available in this data package. This tool was used extensively to compare test data, predictions and selected sub-substructures while performing "expert" assessment of the QSARs. Possibilities for dissemination of this database and the detailed QSAR predictions are currently unclear due to issues of copyright. 2.2 Methodology in making the list2.2.1 The selected dangerous propertiesThe following endpoints were addressed:
2.2.2 The evaluated chemical substancesThe overall purpose of the project was to evaluate as many as possible of the substances in Einecs (European Inventory of Existing Commercial Chemical Substances) /2/. The list consists of 100.116 entries, covering organic and inorganic substances in both single substance entries and mixtures. The screening was limited to cover "discrete organics," meaning that UVCBs (Unknown, Variable Composition and Biologicals) and other ill-defined structures or mixtures were excluded for practical reasons if you dont know what it is, you cant really make a model. Exceptions were made where this seemed logical (C12 C16 n-alcohols has been entered as C14 n-alcohol hydrochloride salts have been entered as the parent compound, etc.). Inorganic substances have likewise not been evaluated. These are usually better approached by simpler methods of evaluating the availability of the respective an- and cations with well known hazard profiles. "Organo-metallics" have also been excluded as being poor candidates for modeling. Finally, as a matter of resources, only such chemicals as were available with 3-D structural information were used /7/. In so far as this was possible using a CAS number comparison, all substances already classified on Annex I of the formal EU list (List of dangerous substances) were also removed as they should never be the subject of provisional classification. This resulted in a total of 46,707 or about half of all Einecs chemicals, which could be subjected to screening. 2.2.3 Test dataFor the vast majority of the chemicals no measured data was available. However, if measured data were available as part of the model, this was generally used in preference to the estimates. It is important to stress that no attempt was made to search the worlds published or unpublished databases for toxicological information to determine whether a QSAR was even necessary for each endpoint. This task is the responsibility of the manufacturer / importer of the individual chemicals. 2.2.4 Use of QSAR modelsThe technical specifications for the models and a description of the criteria for assignment of advisory classifications for each effect are given in the technical sections for the individual endpoints. It should also be stressed that the models available do not predict a "classification" they predict biological activity that may lead to a classification. Further criteria have therefore been applied to each endpoint to try and link the biological prediction with a risk phrase. Because of the large number of chemicals involved, "rules" were used to achieve this purpose. Such rules are also imperfect, but in essence the process is no different than that imposed upon a human expert forced to use common sense to provide a provisional classification for any given substance for which the desired test data does not exist. Only model predictions that satisfied a formal criteria were used: For M-CASE the predictions had to fall within the optimum prediction space of the model, meaning that there were no unknown fragments, and that there was sufficient knowledge about the known fragments to give an unequivocal prediction. As described in the technical sections, expert inspection has been undertaken where time allowed to confirm the probable activities given by the QSARs. This has included evaluation of the QSAR estimates in comparison with known biological activities and chemical properties. No in depth toxicological assessment of the individual chemical substances has been undertaken. Questionable QSAR predictions for each endpoint were excluded. The effort used on expert inspection varied with the endpoint in question. In general most time was used in assessing the predictions for Mutagenicity and Carcinogenicity, and least was used on Allergy and Aquatic Effects. 2.2.5 The resultIt is important to understand that the results as given in the Advisory list only represent POSITIVE predictions. No distinction has been made between a negative prediction for an endpoint, and an unreliable prediction (a non-AOK prediction) which was simply discarded. Evaluated substances not on the list, or substances which are on the list but without advisory classifications for one or more of the selected dangerous properties, may have been predicted as not having this / these dangerous properties, or the models may not have been valid for this substance. Therefor the advisory list can not be used to conclude that these substances do not posess dangerous properties. Depending on the endpoint in question, unreliable predictions were obtained for between 5 and 65% of the chemicals examined. 2.3 Acute oral toxicityEU criteria for classification The formalized criteria for classification for acute oral toxicity includes a number of options of tests including fixed-dose procedure and interpretation of the various sources of information about acute oral toxicity, but is often based on acute LD50 tests in the rat for which the following classification criteria are used: Table 3
An advisory classification of Xn;R22 is recommended in all cases where a rat oral LD50 of £ 2000 mg/kg is predicted or based on measured data. For reasons indicated below, no attempt was made to differentiate between the different levels of acute toxicity, and it is important to recognize that this classification will often be less stringent than classification based on measured data. If test results measured in the rat were readily available (had been used to make the model) these took precedence over any predictions. As acute toxicity data from the mouse following a variety of different routes of administration was also available in some cases, this was used to predict rat oral LD50s using the QSARs preferentially as follows /8,9/: Table 4
iv: Intravenous Biological data consisting of LD50s in mice or rats was available for just over 10% of the chemicals processed. If no biological data were available, rat oral LD50 was estimated according to the QSTR model TOPKAT (v 5.01). According to TOPKAT, the model contains about 4,000 substances and their own cross-validation for this endpoint indicates 86-100% of estimations falling within a factor of five from test results /10/. Danish EPAs external evaluation of this model using 1,840 chemicals not contained in the TOPKAT data set gave somewhat poorer results; R2 = 0.31. According to this evaluation 86% of estimations fall within a factor of ten from test results /11/. The distribution can be seen in table 5. Table 5
Where TOPKAT was able to make a robust prediction (AOK) it found 57% of all chemicals to have an acute oral LD50 in rat of £ 2,000 mg/kg. The percentage of chemicals with acute toxicitys of £ 2,000 mg/kg for 12,632 chemicals tested for acute toxicity in rat found in the Registry of Toxic Effects of Chemical Substances (RTECS 1998) /52/ was 61%. That these two percentages are so similar is not surprising, since RTECS data was also the chief source of biological information used to construct the TOPKAT model. A schematic diagram of the systematic evaluation is given in figure 2. Figure 2 Look here! Approximately 10,200 compounds were estimated as having an acute LD50 in rat of 2,000 mg/kg or less***. About 700 were removed by expert judgement in an attempt to exclude amino-acid and protein-type compounds which were considered likely to break down due to the effects of gastric acidity, or substances for which gastric absorption was expected to be poor. This resulted in 9,538 substances with an advisory classification of Xn;R22. 2.4 Sensitization by skin contactEU criteria for classification Classification as sensitizing by skin contact, R43 ("May cause sensitization by skin contact"), is based either on animal studies or practical experience or combinations thereof. The animal criterion is based on either an adjuvant or non-adjuvant test. Different adjuvant tests exist, but the Magnusson-Kligmanns method (GPMT: Guinea Pig Maximization Test) is preferred. Response in 30% of the animals results in classification. For a non-adjuvant test (for example the Büehler test) 15% responding animals is regarded as positive. The human data can be results from patch testing, case studies or epidemiological studies. Evaluation based on QSAR models Two approaches were used to estimate contact sensitisation /14,15/. The first approach uses two TOPKAT QSTR models. The first model was used to predict "Allergy versus non-allergy", and, in cases where this was positive, the second model was used to predict "Strong versus weak/moderate allergy". The models used were primarily related to the GPMT. Only predictions of "Strong allergy" were considered as being likely to fulfill the EU criteria for R43. In a second approach, predictions were also made using M-CASE. The data set used to produce the M-CASE models differed somewhat from the TOPKAT set, in that both data from the GPMT and human data were represented. Only positive predictions with M-CASE scores of > 40 (corresponding to "very active") were selected. Table 6
It is difficult to know how representative New Chemicals are with regard to the universe of Existing Chemicals. Generally New Chemicals are more complex structures with higher molecular weights. Perhaps the most surprising aspect of this exercise was to find that for over three thousand chemicals that should have been assessed for this endpoint, such a tiny percentage of useful test data could be found. Compounds predicted as positive by either TOPKAT or M-CASE according to the above criteria were selected, provided that they were either AOK in the first, or contained no unknown fragments or equivocal results in the latter. While it was considered to use "positive" in both models as a criteria, in the end this seemed inefficient, not so much duo to lack of concordance between model predictions, but because the acceptance domains (AOK or all fragments known) of the two methods differed considerably. No attempt was made to further reduce the list by systematically applying expert judgement. A schematic diagram of the systematic evaluation is given in figure 3. Figure 3 9,668 chemicals met the above criteria, for which an advisory classification of R43 is suggested. This strike many experts as being a rather large number of chemicals and while these models represent the current "state-of-the-art" it may indicate that they are over-sensitive. However, it was very difficult to obtain any reliable indication of how many Existing Chemicals would cause contact allergy if actually tested in animals or humans. Estimates of percentages of allergens on Einecs ranged from 5-25%, with some preference being expressed for 10%, which is the number of Annex I substances currently classified for this effect. It is not possible, however, to estimate the influence of confounders on the distribution represented in Annex I. Positive bias can have been introduced because chemicals testing positive are over-represented. Negative bias can have been caused by the fact that most of the chemicals have never been tested at all. The question of numbers remains open. 2.5 MutagenicityEU criteria for classification The criteria for classification for mutagenicity is divided into 3 different categories: Classification as mutagen, category 1 (mut1;R46, may cause heritable genetic damage) is based on evidence of a causal association between human exposure to the substance and heritable genetic damage. Classification as mutagen, category 2 (mut2;R46, may cause heritable genetic damage) is based on animal studies showing mutagenity to germ cells either in assays on germ cells or by demonstrating mutagenic effects in somatic cells in vivo or in vitro as well as metabolic proof that the substances reaches the germ cells. The criteria for classification as mutagen, category 3 (mut3;R40, possible risks of irreversible effects) is based either on in vivo mutagenicity tests or on cellular interactions with in vitro tests acting as supportive evidence. For this classification, it is not necessary to demonstrate germ cell mutations. Evaluation based on QSAR models A number of models were applied for this endpoint. The different models predict a number of genotoxicity endpoints. Induction of micronuclei in vivo, was required, as this demonstrates chromosomal damage in somatic cells in vivo. The remaining endpoints reflect in vitro genotoxicity, where positive results would not normally lead to classification for this effect. However, positive results for these endpoints provide supporting evidence for data from in vivo estimates. Table 7
It is not suggested that positive in vitro evidence should also be necessary when classifying substances with positive in vivo test data. However, it was not felt that the QSAR model for the mouse micronucleus test alone was sufficient, and data estimates from additional QSARs relevant to the endpoint were therefor used to increase the likelihood of a correct positive prediction. Chemicals for which model estimates were positive for mouse micronucleus and structural alerts for DNA reactivity (here an exception was made in that predictions with one unknown fragment were also accepted) and which also had two positive genotoxicity endpoints, passed the criteria for the systematic evaluation. Two models for Salmonella (Ames) mutagenicity were used, a TOPKAT and a M-CASE module respectively. This related primarily to the fact that the models differed with regard to domain, and often a robust prediction was only available for one model. If robust predictions were available for both models, and in disagreement, this was taken into account on a case-by-case basis during the final evaluation. A schematic diagram of the systematic evaluation is given in figure 4. Figure 4 Look here! 2,272 Einecs chemicals met the criteria in the systematic evaluation. As none of these models identifies germ cell mutagenicity, the current QSARs do not allow discrimination between the EU categories for mutagenic effects in the three categories and the lower classification is therefore assigned as advisory classification in all cases. Expert judgment was undertaken to confirm the robustness of the predictions of these 2,272 chemicals. This process included examination of the 2- or 3-d chemical structure, and visual comparison with test data within structural groups. If this procedure raised any doubt, substances were removed from the list for more detailed consideration in the future. This resulted in a final selection of 1,678 substances with an advisory classification mut3;R40. 2.6 CarcinogenicityEU criteria for classification This end-point can result in classification in 3 different categories: Classification as carcinogen in category 1 (carc1;R45, Toxic; may cause cancer or carc1;R49, Toxic; may cause cancer by inhalation) is based on strong causal relationship in humans. Classification as carcinogen in category 2 (carc2;R45, Toxic; may cause cancer or carc2;R49, Toxic; may cause cancer by inhalation) is based on conclusive animal data from 2 species or 1 species with supportive evidence such as genotoxic effects in vitro or in vivo. Classification as carcinogen in category 3 (carc3;R40, Harmful; possible risks of irreversible effects") is subdivided into two:
Evaluation based on QSAR models While there are many non-genotoxic carcinogens acting by a wide variety of often-unknown mechanisms, it was chosen to focus here on chemicals likely to cause cancer through a genotoxic mechanism. Therefor a pre-selection criteria for genotoxicity was set up. The criteria for the pre-selection for carcinogenicity was a positive estimate for structural alerts for DNA reactivity (AOK or one unknown fragment) and two positive AOK genotoxicity predictions out of five models for genotoxicity. The technical specifications for the models used to predict genotoxicity is given in the chapter "Mutagenicity". As opposed to the selection criteria for mutagenicity, a positive mouse micronucleus test was not demanded, as not all genotoxic carcinogens are necessarily clastogenic (cause loss, addition or rearrangement of parts of chromosomes). This gave a pre-selection of 3.362 Einecs chemicals. A total of ten cancer models were available, plus four sub-models. Table 8
* NTP: National Toxicology Program, USA The accuracy of these models can be difficult to determine, as there are few independent tests that have not already been used in the construction of the models themselves, which can be used for an independent assessment. This is particularly the case for TOPKATs models, where the only real estimates consist of the producers own "1 out" Q2 cross-validations. For M-CASE, other statistical methods are available. In a long-running project, where several cancer models predicted the outcome for NTP chemicals which had not yet been tested, upon completion of these tests (for 45 substances) the general conclusion was that accuracy of around 70% was achieved for clearly carcinogenic or non-carcinogenic substances /31/. Due to the small number of chemicals in this analysis it is difficult to know how much weight can be assigned to the conclusion. 3,362 substances met the pre-selection criteria for genotoxicity. For a substance to be selected as a probable carcinogen it was necessary for the following criteria to be fulfilled: At least two positive predictions (sub-models excluded) for carcinogenicity. An exception was made for the M-CASE CPDB models. Because the data is less homogeneous, both rat and mouse predictions had to be positive to count as one prediction, and in addition to this the carcinogenic potency had to include TD50s for tumor induction of less than 1,000 mg/kg/day. These two CPDB models were developed by Danish EPA using M-CASE methodology which is described for this data set in the following references /34,35,40/. If one or more positive tests could be seen (part of the training set for the model) for any cancer endpoint, this took precedence over model results and resulted in an over-all positive classification recommendation. While in most cases this resulted in little change (the models are heavily biased towards making a correct prediction for substances used to make them), it was felt that there was no reason to artificially reduce the quality of the advisory classification by neglecting to use data, which happen to be present. A schematic diagram of the systematic evaluation is given in figure 5. Figure 5 Look here! According to these criteria, 1,272 substances were selected for advisory classification for carcinogenicity. Expert judgment was performed on the QSARs. In this proces, all data was used including predictions of TOPKAT FDA Carcinogenicity sub-models, the probability of rapid metabolism or excretion, and where appropriate, predictions of aryl hydroxylase activity /37/. Where any doubts were raised, substances were removed from this version of the list to be considered in more detail in the future. This resulted in 652 substances selected for advisory classifications for carcinogenicity. It is not felt that the models employed allow discrimination between classification in the three categories, so the lower classification Carc3;R40 was applied in all cases. 2.7 Danger to the aquatic environmentEU criteria for classification The classification criteria are composed of three main elements: Biodegradability, Bioconcentration potential, and Toxicity to aquatic organisms. Classifications are assigned according to the following scheme: Table 9
* The lowest effect concentration for fish, daphnia or algae is used Evaluation based on QSAR models Advisory classifications were assigned on basis of combinations of estimates for biodegradation, bioconcentration and acute toxicity according to the criteria in table 9. Classification with risk phrase R53 alone was not done in this exercise, as the strong co-linearity between water solubility and bioconcentracion factor made it redundant. Biodegradation Biodegradability was estimated using the Syracuse BIOWIN program (v. 3.02) /17,41/. Only the linear equation for rapid/non-rapid biodegradation was applied. Previous validation of this parameter compared with MITI "ready/not-ready results showed that while a number of "not-ready" chemicals were missed, 93% of "not ready" predictions were correct /18/. In other words while this model may fail to identify all "non-ready" substances, the number of false predictions for lack of degradability will be acceptably low. A total of about 14,000 Einecs chemicals were found to be "not-readily degradable" according to this criteria /51/****. Bioconcentration The classification and labeling guidelines prefer measured data for Bioconcentration, but as this is seldom available, a LogP (octanol/water) of greater than three is recommended as an indication that BCF will be 100 or greater, in accordance with the linear equation of Vieth and Kosian /41/. While a good rule-of-thumb, this relation both over- and underestimates BCF for many classes of chemicals, and takes no account of the fact that bioconcentration is a bilinear function of LogP, decreasing when this is sufficiently high. Bioconcentration was therefore predicted using Syracuse BCFWIN (v. 2.13), a method based on a combination of logP (octanol-water) relations and structural fragment categories. This method was evaluated by its authors as having a statistical accuracy of R2 = 0.74 (n = 694, S.D. 0.65, mean error = 0.47), which is a significant improvement over the standard equation of Vieth and Kosian (log BCF = 0.85 * log Kow 0.70) where predictions for the same 694 compounds had a statistical accuracy of R2 = 0.32 (S.D. 1.62 and mean error = 1.12) /20/. About 11,000 Einecs chemicals were found with BCF estimates of equal to or greater than 100. No attempt was made to further assess bioaccumulation potential caused by possible presence in the diets of aquatic organisms, as it was not felt that an appropriate general model was available. Acute toxicity For aquatic toxicity classifications, values (L(E)C50) for fish, daphnia and algae are recommended (although seldom available for most existing chemicals). In the current exercise it was decided to only use predictions for fish, due to their robustness and the availability of high quality test data for model construction. For Acute aquatic toxicity to fish a M-CASE model developed by Danish EPA using 96h LD50 data on 569 chemicals from the Duluth Fathead minnow Database /22/ was applied. The model had an R2 of 0.85. Cross-validation of this model gave a Q2 of 0.735 (3*10% out). A description of the M-CASE methodology used for the Fathead minnow data can be found in the following references /21,42/. Only predictions within the optimum prediction space of the model (no fragment or other warnings) were used. As there was insufficient test data on the Fathead minnow for very lipophylic substances the M-CASE model was only applied for chemical substances with LogP of six or less. Another relationship was used for chemicals with a LogP of greater than six. Here, all substances were assumed to act by non-polar narcosis, and toxicity at equilibrium was estimated according to a relation to the predicted Bioconcentration factor: LC50 (equilibrium) = 8.15 mmol /BCF The choice of 8.15 mmol corresponds to the theoretical level inducing aquatic effects represented by the non-polar narcosis fish QSAR recommended in the EU TGD /41/. Non-polar narcosis Lethal Body Burdens for fish are generally assumed to be within the range of about 28 mmol /23,58/. While simple LogP (octanol/water) relationships exist for predicting the non-polar narcotic toxicity for fish, daphnia and algae /41/, these do not distinguish specific toxicitys unique to any of the three taxa, and were not felt to offer any advantage over using the fish models alone, which also adequately predict non-polar narcosis. For all practical purposes, non-polar narcosis induces effects at the same concentration levels in all three taxa /18/. Using both estimates, about 10,000 Einecs chemicals were found with toxicitys to fish of LC50 £ 100 mg/l. A schematic diagram of the systematic evaluation is given in figure 6. Figure 6 Look here! A total of 8,731 substances were selected according to one of the four classification categories as indicated above. Considering that the number of robust (AOK) predictions for fish toxicity was just fewer than one-half of the chemicals screened, this number seems in reasonable concordance with what would be expected for Existing Chemicals. The advantages of being able to predict toxic effects specific to both fish, daphnia and algae are obvious, and this can hopefully be accomplished in the future. A M-CASE model for acute toxicity to daphnia has recently been completed by Danish EPA (n = 574, R2 = 0.826, 3*10% out Q2 = 0.692). It is still being refined, and predictions for all chemicals will soon be available. A M-CASE model for toxicity to algae is under development.
|