Environmental Project No. 1322, 2010 The Advisory list for self-classification of dangerous substancesVer. 2.1 (June 2010)Contents1 Introduction to classification and (Q)SAR 2 Creation and use of the advisory self-classification list
Technical description of the self-classifications
Annex 2. Analysis of positive predictions of cancer classification PrefaceThe current report is a background report for the Danish EPA advisory self-classification list. The list is based on assessments from (Q)SAR researchers from the National Food Institute – Technical University of Denmark. The advisory self-classification list is available as a database via www.mst.dk. This is a consolidated report, which includes documentation on all endpoints presently covered by the Danish EPA advisory self-classification list;
This report provides the following background material:
SummaryAll chemical substances marketed in the EU must be classified and labelled according to the regulation on classification and labelling of dangerous substances /7/. Substances with harmonised classifications adopted in the EU are included in the List of harmonised classification and labelling of hazardous substances (Annex VI of 1272/2008/EU). This list covers around 7000 substances which have been classified for their hazardous properties. However, this also means that about 93,000 of the 100,204 existing substances in the EU (EINECS list), are not classified in a harmonised way. For these substances, it is the manufacturer's or importer's responsibility to carry out an appropriate classification of the dangerous intrinsic properties (“self-classification”). In most cases however, there are currently no test data (from animal testing, etc.) available on their properties in relation to human health or the environment hazards. To address this issue, the Danish Environmental Protection Agency published the Advisory self-classification list in 2001 /5/. The Advisory self-classification list is created by the use of (Q)SARs ((Quantitative) Structure-Activity Relationships) to predict the intrinsic properties and harmful effects of chemicals. The updated Advisory self-classification list contains the results of a systematic assessment of 49,292 discrete[1] organic EINECS substances in relation to the following endpoints for which new and/or improved (Q)SAR model predictions were available:
The advisory classifications for mutagenicity, carcinogenicity, and danger to the aquatic environment are 2009 updates of the advisory classifications on the 2001 self-classification list, and reproductive toxicity is a new classification endpoint from 2009 /62/. Acute oral toxicity is a new 2010 update of the advisory classifications from 2001, and skin irritation is a new 2010 endpoint. The Advisory self-classification list also contains the 2001 results of a systematic assessment of approximately 47,000 EINECS substances for the following endpoint /5/:
For the classification endpoint skin sensitisation the advisory classifications of the Advisory self-classification list (2001) has been maintained. The reason is that technical issues related to new modelling tools prevented the update of these advisory classifications. The updated advisory list is available as an Excel file for download and as an online searchable database from DK-EPA's website (http://www.mst.dk). The consolidated Advisory self-classification list including the current 2001, 2009 and 2010 advisory classifications contains 34,292 chemicals with advisory classifications for one or more of the selected endpoints. The advisory classifications are made by using combinations of (Q)SAR models relevant for each classification endpoint . This report describes the basic methodology used and how specific model predictions have been applied. This report is an update of the report published in October 2009 /62/. One further update of the advisory list is planned; to modify the advisory classifications to meet the classification criteria set out in the new CLP-regulation for the classification and labelling of chemicals /7/. [1] Discrete organic substance means organic substances with an unambiguous 2D structural formula. Dansk sammenfatningAlle kemiske stoffer, der markedsføres i EU, skal klassificeres og mærkes efter reglerne i klassificeringsbekendtgørelsen (Bek. nr. 329 af 16/5 2002) og listen over farlige stoffer (1272/2008/EU, bilag VI, tabel 3.2). Listen over farlige stoffer dækker i dag ca. 7.000 stoffer, hvis fareklassificering er blevet harmoniseret i EU. Det betyder, at omkring 93.000 af de 100.204 eksisterende stoffer i EU (EINECS-fortegnelsen) endnu ikke har undergået en harmonisering af deres fareklassificering. For disse stoffer er det producentens eller importørens ansvar at påføre en korrekt klassificering for stoffernes iboende farlige egenskaber (”selvklassificeringer”). Imidlertid er der for de fleste af disse stoffer kun få eller ingen test resultater (fra dyreforsøg m.m.) om stoffernes farlighed overfor mennesker eller miljø. Som et bidrag til at afhjælpe denne problemstilling, offentliggjorde Miljøstyrelsen i 2001 den såkaldte selvklassificeringsliste /5/ og nærværende rapport beskriver opdateringen af denne liste. Selvklassificeringslisten er lavet ved brug af (Q)SAR modeller ((kvantitative) struktur-aktivitets sammenhænge), som er blevet brugt til at forudsige de iboende egenskaber og farlige virkninger af kemiske stoffer. Modellerne er blevet anvendt til en systematisk vurdering af 49.292 organiske enkeltstoffer[2] fra EINECS-fortegnelsen for følgende effekter:
De vejledende klassificeringer for skader på arveanlæggene, kræftfremkaldende effekt og farlighed for vandmiljøet er opdateringer af de vejledende klassificeringer på selvklassificeringslisten fra 2001 og vejledende klassificeringer for reproduktionstoksicitet er fra 2009 /62/. Vejledende klassificeringer for akut dødelig virkning ved indtagelse er nye 2010 opdateringer af 2001 vejledende klassificeringer, og vejledende klassificeringer for hud irritation er nye fra 2010. Den vejledende selvklassificeringsliste indeholder også resultaterne fra 2001 af en systematisk vurdering af ca. 47,000 EINECS stoffer for følgende effekt /5/:
De vejledende klassificeringer for allergifremkaldende effekt ved hudkontakt fra den tidligere vejledende selvklassificeringsliste er bibeholdt. Det skyldes, at tekniske forhold vedrørende de nye modelleringsværktøjer indtil videre har forhindret opdateringer af de vejledende klassificeringer, selvom nye modelværktøjer i mellemtiden er blevet udviklet og valideret. Den opdaterede liste med selvklassificeringer er tilgængelig via Miljøstyrelsen hjemmeside (www.mst.dk) som Excel fil til download og som en søgbar online database. Den konsoliderede vejledende selvklassificeringsliste inklusive de nuværende 2001, 2009 og 2010 vejledende klassificeringer indeholder 34.292 kemiske stoffer med vejledende klassificeringer for en eller flere af de udvalgte effekter. De vejledende klassificeringer er lavet ved hjælp af kombinationer af (Q)SARs som er relevante for hver enkelt klassificering. Rapporten beskriver det principielle grundlag for at anvende sådanne modeller samt hvordan modellerne konkret er blevet anvendt i dette projekt. Denne rapport er en opdatering af en rapport som blev publiceret i oktober 2009 /62/. Der er ydermere planlagt en opdatering af listens vejledende klassificeringer i forhold til klassificeringskriterierne opstillet i den nye CLP-forordning for klassificering og mærkning af kemiske stoffer /7/. [2] Hermed menes organiske stoffer med en entydig 2D strukturformel. 1 Introduction to classification and (Q)SAR1.1 BackgroundWhen chemical substances are classified in terms of the danger they represent, their inherent properties are assessed on the basis of the knowledge and information available /2, 60/. Such assessments are often carried out on the basis of laboratory test results because the hazard classification criteria to a large extent refer to such results. Assessment must be carried out individually for each property, which means that often extensive animal testing may be required for a single substance. Thus, complete identification of all the properties for which hazard classification criteria exist, at present requires results from many animal studies for just one substance. Given the extensive requirements for data from animal studies in chemical hazard and risk assessment, it is not surprising that lack of test data represents a major problem in the assessment of dangerous properties of chemicals. It is a well-known fact that there are currently few or no test data for a very large fraction of the 100,204 chemical substances on the European INventory of Existing Commercial chemical Substances (EINECS) /3, 4, e.g. 36/.This means that many chemical substances within the European market may have unknown dangerous properties even though they have been used for many years. With the new chemicals legislation in EU, REACH, new information demands for chemicals have been imposed in the EU. However, especially for chemicals produced in volumes below 10 tpa per manufacturer or importer in the EU it is unlikely that test data on a broad spectrum of dangerous properties will be available within the foreseeable future. With the aid of mathematical modelling, so-called (Quantitative) Structure-Activity Relationships, (Q)SARs, for prediction of properties of chemicals can be established. Classifications based on (Q)SARs predicted dangerous properties can save time and money if used as an alternative to animal testing, as well as increase the level of information for chemicals that will not undergo testing. The Danish EPA in 2001 published the first version of the advisory self-classification list of dangerous substances (denoted AL2001 in the current report) /5/ where 20.624 substances were assigned advisory classifications according to the following dangerous properties: Acute oral toxicity, sensitisation by skin contact, mutagenicity, carcinogenicity, and danger to the aquatic environment. 1.2 Classification of chemicalsCriteria for classification, packaging and labelling of dangerous substances and preparations is harmonised in order to protect public health and the environment and ensure the free movement of such products /6, 7, 60/. Harmonised hazard labelling allows consumers to recognize dangerous substances and preparations easily and to take adequate measures as regards risk avoidance and safe handling and disposal. Existing regulation The present regulation for classification and labelling involves an evaluation of the hazard of a substance or preparation in accordance to Council Reg. 1272/2008/EU /7/ and a communication of that hazard via the label. Classification of a substance or preparation is considered in relation to several endpoints concerning physical-chemical properties, health effects or environmental properties. This evaluation must be made for any substance or preparation manufactured within or imported into the EU and placed on the EU market. Classification and labelling is therefore an essential element of risk management measures of chemicals. All marketed substances and preparations must be evaluated for hazard classification and labelling, irrespective of the quantity placed on the market. The labelling is the first and in practice often the only information on the hazards of a chemical that reaches the user, which could be a consumer or a worker. In addition the hazard classification has a large number of downstream consequences within the EU legislation. New regulation By January 2009 the new CLP regulation on classification, labelling and packaging of substances and mixtures has had legal effect in the EU /7/. This regulation will gradually replace the present regulation for classification and labelling. The new regulation will come into force for single substances December 1st 2010 and for mixtures June 1st 2015 /7/. Until December 1st 2010 substances and mixtures shall be classified labelled and packaged in accordance with the present legislation or they can be classified according to the CLP regulation. The CLP regulation is based on the Globally Harmonised System of Classification and Labelling of Chemicals (GHS, UN 2007) /61/. The GHS classification criteria are in certain cases slightly different than those of the current legislation /7/. 1.3 (Q)SARs and their use in chemical assessmentStructure-activity relationships (SARs) and quantitative structure-activity relationships (QSARs), collectively referred to as (Q)SARs, are theoretical models that can be used to predict the physico-chemical, biological (e.g. toxicological) and environmental fate properties of molecules based on the chemical structure. (Q)SARs tools are used more and more by authorities e.g. in the US and the EU, as well as by industry, to assess physico-chemical, (eco-)toxicological, and fate properties of substances. REACH In the new EU chemicals legislation, REACH, all other options, including use of (Q)SARs, should be considered before performing (or requiring) vertebrate testing /1/. Annex XI of REACH contains the following wording regarding (Q)SARs: Results obtained from valid qualitative or quantitative structure-activity relationship models ((Q)SARs) may indicate the presence or absence of a certain dangerous property. Results of (Q)SARs may be used instead of testing when the following conditions are met:
There will be no formal adoption process for (Q)SARs under REACH. QSAR Model Reporting Formats (QMRF’s) to compile information on endpoint, training set, validation results etc. for individual models will be gathered in a JRC QSAR Model Database. There will not be made fixed criteria for how the (Q)SARs should perform to receive regulatory acceptance, but rather a learning-by-doing process to gain experience and common understanding of use of (Q)SARs in chemical assessments /9/. In the hazard and risk assessment process, (Q)SARs are already often used in combination with other sources of information on chemicals, either to prioritise chemicals for further assessment, to supplement or to replace testing. With the implementation of REACH it is expected that (Q)SARs will be used increasingly for the direct replacement of test data as their use, when available and adequate, is in fact an obligation /9/. The goal of assessing many thousands of chemicals under REACH may not be achievable without the use of (Q)SARs and other non-test methods. Especially for low tonnage chemicals, (Q)SARs and other non-test methods may also give further information beyond the standard information requirements of regulations such as REACH. 2 Creation and use of the advisory self-classification list
Following development of new and/or improved (Q)SAR-models the list of advisory self-classification of dangerous substances has been updated and expanded. This chapter of the report presents the methodology applied. 2.1 The selected dangerous propertiesThe following endpoints were addressed using (Q)SARs :
(Q)SAR-predictions for these endpoints were used to assign the classifications listed in Table 1.
Table 1: Advisory classifications in the consolidated AL 2.2 The evaluated chemical substancesThe overall purpose of the current project was to evaluate as many chemical substances as possible with relevance to the existing regulation for chemicals within the EU. Under REACH /1/ all chemicals with tonnages above 1 ton/year should undergo pre-registration between June 1st 2008 and December 1st 2008. It would have been relevant to evaluate all chemicals in this inventory but at the time of preparation of the advisory classification list no official list existed. It was therefore decided to base the evaluations on the EINECS list /3,4/. This list consists of 100,204 entries, covering organic and inorganic substances in both single substance entries (mono-constituent substances) and mixtures (multi-constituent substances and UVBCs). The exercise was limited to cover "discrete organics," meaning that multi-constituent substances and UVCBs (Unknown, Variable Composition and Biologicals) were excluded for practical reasons – “if you don’t know what it is, you can’t model it”. Inorganic substances have likewise not been evaluated. These are usually better approached by simpler methods of evaluating the availability of the respective an- and cations with well-known hazard profiles. "Organo-metallics" have also been excluded as being poor candidates for modelling. As an error check, only such structural representations, which could be successfully converted to 3D were used /10/. When it was possible using a CAS number comparison, all substances already classified on the list with formal EU harmonized classifications, Annex I of Directive 67/548/EEC (List of dangerous substances, /2/) were also removed. However, as there is no official overview of the substances covered by the group entries in Annex I, and because a chemical may have more than one CAS number, a few chemicals covered by Annex I may not have been removed from AL2009. This resulted in a total of 49,292 discrete organic substances, or about half of all EINECS chemicals, which could be subjected to (Q)SAR based assessment. 2.3 Test dataFor the vast majority of the assessed chemicals no test data were available. However, if test data were available as part of the (Q)SAR-model, this was generally used in preference to the estimates. It is important to stress that no attempt was made to search published or unpublished databases for toxicological, ecotoxicological or environmental fate information to determine whether a (Q)SAR was necessary for any endpoint assessed. 2.4 Reliability of (Q)SAR-predictionsThe reliability of (Q)SAR-predictions depend on numerous parameters relating to the mathematical methods used, the number and precision of the underlying data used for developing the model and how suitable the model is for the particular substance. In general the uncertainty of (Q)SARs is caused predominantly by two different reasons: a) the inherent variability of the input data used to establish the model (training set); and b) the uncertainty resulting from the fact that a model can only be a partial representation of reality (in other words it does not model all possible mechanisms concerning a given endpoint and it does not cover all types of chemicals). However, as a model averages the uncertainty over al chemicals, it is possible for an individual model estimate to be more accurate than an individual measurement /9/. The reliability of (Q)SAR predictions can be described in many ways. Usually a range of parameters and concepts are used (see e.g. /9/ for a more extensive review). These concepts may not be known by all readers. Annex 1 contains descriptions of the concepts applied in this report. 2.5 ValidationValidation is a trial of the model performance for a set of substances independent of the training set, but within the domain of the model. The model predictions for these substances are compared with measured endpoints for the substances in order to establish the predictivity of the model. Ideally all models should be assessed by checking how well they predict the activity of chemicals, which were not used to make them. This is, however, not always simple. In part valuable information may be left out by setting aside chemicals to be used in such an evaluation, and in part it can be extremely difficult to assess how “external” chemicals relate to the model’s domain; that is, if they represent a random distribution within this applicability domain and thereby giving a fair picture of the predictivity of the model. This problem is often addressed by using cross-validation, where a number of partial models are “externally validated” by splitting the training set into a reduced training set and a testing set. The reduced training set is used to develop a partial model, while the remaining data are used as a test set to evaluate the model predictivity. This is repeated a number of times and the results are used to calculate the predictivity measures for the models; for quantitative models in the form of Q² and SDEP (standard deviation error of prediction), and for qualitative (yes/no) models in the form of sensitivity, specificity and concordance (se Annex 1 and refs /9/ and /11/ for further details). While drawbacks of cross-validation exist /14, 15/, much of the criticism is directed towards a particular form of cross validation; the leave-one-out cross-validation /14/. In the validations carried out on the models applied in this project the more stable leave-many-out (LMO) cross-validation approach by leaving out random pos/neg balanced sets of 50% of the chemicals, repeated ten times, was used (se also the indicated LMO 50 % values in the tables in chapter 3) /13/. Leaving out 50% of the chemicals in the partial validation models is a large perturbation of the training set, which generally leads to realistic, and often pessimistic, measures of the predictivity of the model. The commercial models for acute oral toxicity and cancer were validated by external validation /24/. Concordance will vary depending on both the method used, and the endpoint in question. In general, accuracy of contemporary (Q)SAR systems can often correctly predict the activity of about 70 – 85% of the chemicals examined, provided that the query structures are within the domains of the models. This also applies to the models described in this paper. QSAR Model Reporting Formats (QMRF’s) for all the toxicity models applied in this project, including training sets for the DK models, have been submitted to the EU JRC QSAR Model Database and the OECD QSAR Application Toolbox /35, 37/. 2.6 Applicability domainWhen applying (Q)SARs it is important to assure that an obtained prediction falls within the domain of the models i.e., that there is sufficient similarity (in relevant descriptors) between the query substance and substances in the training set of the model. There is no single and absolute applicability domain for a given model /9/. Generally, the broader the applicability domain is defined the lower predictivity can be expected. The applicability domain should be clearly defined and the validation results should correspond to this defined domain, which is again used when the model is applied for predictions. The applicability domains for MultiCASE models as defined by the US Food and Drug Administration (FDA) /24/ and implemented in the MultiCASE software were used in this project. No warnings in the predictions were accepted, except warning for one unknown fragment in chemicals where a significant biophore has been detected. Only positive predictions where no significant deactivating fragments were detected were accepted. For the acute oral toxicity predictions from the Pharma ToxBoxes, reliability indexes (RI) are given by the software. Based on an analysis performed by DTU Food on an external validation set provided by Pharma Algorithms Inc. / ACD/Labs, an RI cut-off of 0.5 was applied in this project. The EPISUITE models for rapid biodegradation /43, 45/ and the bioconcentration factor in fish /42/ do not automatically flag the predictions for domain coverage. No attempts have been made to consider the applicability domain for predictions made by these models. Depending on the endpoint in question, predictions outside the applicability domain were obtained for between 27 and 58% of the chemicals examined by the individual MultiCASE and Pharma ToxBoxes models. 2.7 Application of the modelsIt is important to note that the applied models in principle do not predict a "classification" – they predict a biological activity that may lead to a classification. Because of the large number of chemicals involved, “rules” were used for each endpoint to try and link the biological prediction with a risk phrase. In essence the process is not different than that imposed upon a human expert forced to interpret the information available in order to comply with the duty to make an assessment and self-classification. The applied models have been used in combinations / batteries within the chosen classification endpoints to reach a final call in an attempt to reach further reliability beyond individual model predictions and to best comply with the classification criteria. 2.8 The resultThe result of the computer-based assessment is this consolidated advisory self-classification list, which comprises 34,292 chemical substances with advisory classifications for one or more of the dangerous properties selected. The results only represent POSITIVE predictions (for quantitative models “positive” here means predicted to have the effect or property as determined in relation to a cut-off point). No distinction has been made between a negative prediction for an endpoint, and an unreliable prediction (prediction outside the applicability domain of the model), which was simply discarded. Evaluated substances which are not on the list, or substances which are on the list but without advisory classifications for one or more of the selected dangerous properties, may have been predicted as not having this / these dangerous property(ies), or the models may not have been valid for this substance (i.e. predictions were outside the applicability domain for these models). Therefore the advisory list cannot be used to conclude that these substances do not possess dangerous properties. Another important point is that the advisory self-classification list represents (Q)SAR based identifications of possible hazardous properties of the included chemicals; no attempt has been made to evaluate the risk that these chemicals constitute in their current use in the EU. All results are available on the website of the Danish EPA (www.mst.dk) where searches can be made on substance name (in Danish), CAS-number, EINECS number, EINECS name, CAS number and chemical formula. The whole list can also be downloaded via www.mst.dk as an Excel file. Figure 1: Number of substances with individual advisory classifications 2.9 How the self-classification list can help manufacturers and importers to comply with the classification dutiesBy making the advisory self-classification list available to the public, the Danish EPA wishes to offer manufacturers and importers a tool, which is based on predictions from sophisticated modelling software. The predictions have been interpreted in relation to the hazard classification criteria and transformed into advisory classifications which are easy to use. They can be applied when carrying out self-classification of chemical substances for the dangerous properties included in the list. If available, reliable test data or predictions using other non-test methods on specific substances should always be considered in parallel to computer predictions and expert judgements in a weight of evidence (WoE) approach to decide on the appropriate classification for a given endpoint. It is recommended that the list is used in the following way in the classifications of chemicals:
[3]For the endpoint sensitisation by skin contact the methodology undertaken was slightly different with a start list with somewhat fewer chemicals, etc. Details are given in the documentation report from 2001 /5/. Technical description of the self-classifications
The current chapter gives the detailed description of how the advisory classifications were assigned to the chemicals in the advisory self-classification list. This includes description of the classification rules and the (Q)SARs used for predicting the dangerous properties of the chemicals. 2.10 MutagenicityThe criteria for classification for mutagenicity are divided into 3 different categories: Classification as mutagen, category 1 (Mut1;R46, May cause heritable genetic damage) is based on evidence of a causal association between human exposure to the substance and heritable genetic damage. Classification as mutagen, category 2 (Mut2;R46, May cause heritable genetic damage) is based on animal studies showing mutagenity to germ cells either in assays on germ cells or by demonstrating mutagenic effects in somatic cells in vivo or in vitro as well as metabolic proof that the substances reaches the germ cells. The criteria for classification as mutagen, category 3 (Mut3;R68, Possible risks of irreversible effects) is based either on in vivo mutagenicity tests or on cellular interactions with in vitro tests acting as supportive evidence. For this classification, it is not necessary to demonstrate germ cell mutations. (Q)SAR based evaluation Five models predicting genotoxicity in vivo endpoints were applied in the screening. Data for the training sets were obtained from the literature. The technical specifications for the models are given in Table 2. Drosophila melanogaster Sex-Linked Recessive Lethal (SLRL) (in vivo) The training set consists of data from Lee et al. /16/. In the experimental method, Drosophila melanogaster males and females are used. Males are treated with the test substance and mated individually to virgin females. The test detects the occurrence of mutations, point mutations and small deletions, in the germ line of the insect. The mutations are phenotypically expressed in males carrying the mutant gene. When the mutation is lethal in the hemizygous condition, its presence is inferred from the absence of one class of male offspring out of the two that are normally produced by a heterozygous female. The assay has a low sensitivity for genotoxins other than direct-acting agents and simple promutagens, but a very high specificity, which means that in general a positive result has considerable value for prediction of potential genotoxicity in mammals. Mutations in mouse micronucleus (in vivo) The training set includes data from Hayashi et al. /17/, Mavournin et al. /18/, Waters et al. /19/, and Morita et al. /20/. The test detects micronuclei produced by damage to the chromosomes or the mitotic apparatus in red blood cells. Micronuclei are small nuclei produced during cell division. They contain chromosome fragments or whole chromosomes. In the test, mice are exposed to the test substance and young red blood cells (erythrocytes) from the bone marrow are isolated and analysed for micronucleus. The test is especially relevant to assess mutagenic hazard in that it allows consideration of factors of in vivo metabolism, pharmacokinetics and DNA-repair processes. Dominant lethal effect in rodents (in vivo) The training set is comprised of data from Green et al. /21/ and other references. In the experimental method, mice and rats are used. Treated males are mated to virgin females according to an experimental scheme. Females are sacrificed in the second half of pregnancy and uterine contents are examined to determine the number of implants and live and dead embryos. The category of early embryonic deaths is the most significant index of dominant lethality and as such used as endpoint. The test identifies major genetic damage, mainly the induction of structural and numerical chromosomal anomalies. Sister chromatid exchange in mouse bone marrow (in vivo) Data from Tucker et al. /22/ are used in the training set. The sister chromatid exchange (SCE) assay detects interchange of DNA between two sister chromatids of a duplicating chromosome. Mice are exposed to the test chemical. Then a thymidine analog, bromodeoxyuridine (BrdU) is injected. If DNA exchanges occur, BrdU can be identified by use of a fluorescence technique in chromosomes in the metaphase. The test is considered to be a sensitive method for evaluating mutagenicity and may be an indicator of carcinogenicity. Comet assay in mouse (in vivo) The training set includes data from Sasaki et al. /23/ plus a number of physiological chemicals theoretically assumed not to have the effect (such as various amino acids, sugar molecules, fatty acids etc.). The latter was included to get a better distribution between positives and negatives in the training set for the model). Included in the training set of the model are results from eight tissue types; stomach, colon, liver, kidney, bladder, lung, brain and bone marrow. The comet assay detects DNA strand break and can be applied to virtually any organ of interest. In the experimental test, a microgel electrophoretic technique is used for detecting DNA damage at cell level. The tested chemical is positive if it produces breaks in DNA-strings, resulting in small strings of DNA that are able to migrate further in a microgel, than intact DNA strings. In the microscope, damaged DNA is seen as a “comet” while not damaged DNA appear as a dot. If appropriately performed, the test has been shown to be reliable with high sensitivity to detect DNA damage in organs that cannot be investigated in other classical mutagenicity assays.
Table 2: Technical summary for the mutagenicity models Figure 2: Schematic diagram illustrating the systematic evaluation applied to assign advisory classifications for mutagenicity. For a substance to be selected as a probable mutagen it was necessary for the following criteria to be fulfilled: Positive prediction in two or more models, accepting only predictions where no significant deactivating fragments were detected. If one or more positive tests could be seen (as part of the training sets for the models) for any genotoxicity endpoint, this took precedence over model predictions. When classification is proposed on basis of test data, a positive result in a single in vivo test is sufficient evidence on which to base the classification. In contrary to that, positive predictions in at least two models were required. 5,742 of the chemicals investigated in the current project met the criteria in the systematic evaluation and were assigned advisory classifications Mut3;R68. 2.11 CarcinogenicityThis endpoint can result in classification in 3 different categories: Classification as carcinogen in category 1 (Carc1;R45, Toxic; May cause cancer, or Carc1;R49, Toxic; May cause cancer by inhalation) is based on a strong causal relationship in humans. Classification as carcinogen in category 2 (Carc2;R45, Toxic; may cause cancer, or Carc2;R49, Toxic; may cause cancer by inhalation) is based on conclusive animal data from 2 species or 1 species with supportive evidence such as genotoxic effects in vitro or in vivo. Classification as carcinogen in category 3 (Carc3;R40, Harmful; Possible risks of irreversible effects”) is subdivided into two:
(Q)SAR based evaluation Four models predicting carcinogenicity in vivo and models predicting three genotoxicity in vitro endpoints were applied in the screening. Commercial MultiCASE training sets constitutes the basis of the carcinogenicity models. The technical specifications for the models are given in Table 3. Carcinogenicity male and female, rats and mice (in vivo) The models are the MultiCASE commercial models AG1-4 /24/. The training sets were constructed using the NTP (US National Toxicology Program) rodent carcinogenicity database, the Lois Gold Carcinogen Potency Database, FDA/CDER (US Food and Drug Administration / Center for Drug Evaluation and Research) archives, and the scientific literature. Training sets include both non-proprietary and proprietary data. Proprietary (confidential) data constitute around ten percent of the training sets. The open models based on the non-proprietary data were also available and consulted in the screening process. In the experimental test, the test substance is administered by an appropriate route to the animals for a major portion of their lifespan. The highest dose level should elicit signs of toxicity, without substantially altering the normal lifespan due to effects other than tumours. During and after exposure, the animals are observed daily to detect signs of toxicity, particularly the development of tumours. Reverse mutation test, Ames (in vitro) The training set is from Kazius et al. /25/. The bacterial reverse mutation test detects point mutations, which involve substitution, addition or deletion of one or a few DNA base pairs. Amino-acid (histidin) requirering strains of Salmonella typhimurium are used. Mutations, which revert mutations present in the test strains and restore the functional capability of the bacteria to synthesise the amino acid (histidin), are detected. These appear by the ability of the bacteria to grow in the absence of histidin required by the parent test strain. The test is a useful tool as an initial screen for potential in vivo genotoxic activity, and has become the most extensively used in vitro short-term test in the screening for mutagenicity. Chromosomal aberration CHO/CHL (in vitro) This model was used by Niemela and Wedebye /28/ to evaluate the OECD principles for development and validation of (Q)SARS /27/. The Chinese Hamster Ovary (CHO) model is the commercial MultiCASE model A61 /26/ and the training set for the Chinese Hamster Lung (CHL) model was taken from Ishidata /28,29/. The in vitro mammalian chromosome aberration test identifies agents that cause structural chromosome aberrations in cultured cells. Chromosome damage is expressed as breakage of single or both chromatids, sometimes followed by reunion between chromatids or of both chromatids at an identical site. Many compounds that are positive in this test are mammalian carcinogens causing DNA damage. Mutations in mouse lymphoma (in vitro) The training set is comprised of data from Grant et al. /30/. The mouse lymphoma assay detects mutations affecting the heterozygous thymidine kinase (TK) locus. It identifies chemicals acting as clastogens (delete, add, or rearrange chromosome sections) as well as point mutagens. Mutations in genes coding for TK are identified. TK is involved in the phosphorylation of thymidin and subsequently in the formation of DNA. Positive chemicals may give rise to mutations in genes coding for TK. A mutation may result in loss of the ability to phosphorylate the pyrimidin analogs, which is detected by the test. The assay has a reputation for high sensitivity and low specificity of detecting genotoxic agents. However, in this exercise the model is used to give mechanistic information to chemicals already predicted to be carcinogens.
Table 3: Technical summary for the carcinogenicity models Identification of carcinogenic substances For a substance to be selected as a probable carcinogen it was necessary for the following criteria to be fulfilled: Positive according to the ICSAS methodology /24/, corresponding to two or more positive carcinogenicity predictions, accepting only predictions for chemicals without significant deactivating fragments. If one or more positive tests could was observed (as part of the training sets for the models) for any cancer endpoint, this took precedence over model predictions. As the models are heavily biased towards making a correct prediction for substances used to make them the latter criterion only resulted in little change. However, it was felt that there was no reason to artificially reduce the quality of the advisory classification by neglecting to use data, which happen to be present. One or more negative tests in the training set of each model also took precedence over predictions of that model, except in cases where positive training set tests were present in other cancer models. Employing this carcinogenicity identification algorithm resulted in a list of 3,726 positive predictions. Figure 3: Schematic diagram illustrating the systematic evaluation applied to assign advisory classifications for carcinogenicity. Identification of genotoxic carcinogens While there are many non-genotoxic carcinogens acting by a wide variety of often-unknown mechanisms, it was chosen to focus here on chemicals likely to cause cancer through a genotoxic mechanism. Therefore, a further selection criterion for genotoxicity was set up. As opposed to the selection criteria for mutagenicity, not all genotoxic carcinogens are necessarily clastogenic (cause loss, addition or rearrangement of parts of chromosomes). To select the genotoxic chemicals from the chemicals already predicted positive for in vivo carcinogenicity,which include genotoxic as well as non-genotoxic carcinogens, a battery of models for sensitive in vitro genotoxicity endpoints was used. The genotoxicity criterion was a positive estimate in one or more of the models for the following in vitro genotoxicity endpoints; Reverse mutation test (Ames), chromosomal aberrations (CHO/CHL), or mutations in mouse lymphoma. A schematic diagram of the systematic evaluation is given in Figure 3. According to these criteria, 3,726 of the chemicals assessed in the current project were identified as genotoxic carcinogens and selected for advisory classification for carcinogenicity. It is not felt that the models employed allow discrimination between classification in the three categories, so the lower classification Carc3;R40 was applied in all cases. 2.12 Reproductive toxicityThis endpoint can result in classification in 3 different categories: Classification as toxic to reproduction in category 1 (Rep1;R60, Toxic; May impair fertility, or Rep1;R61, Toxic; May cause harm to the unborn child) is based on a strong causal relationship in humans. Classification as toxic to reproduction in category 2 (Rep2;R60, Toxic; May impair fertility, or Rep2;R61, Toxic; May cause harm to the unborn child) is based primarily on animal data, and secondly on “other relevant information”. Data from in vitro studies, or studies on avian eggs, are regarded as “supportive evidence” and would only exceptionally lead to classification in the absence of in vivo data. Classification as toxic to reproduction in category 3 (Rep3;R62, Harmful; Possible risks of impaired fertility, or Rep3;R63, Harmful; Possible risk of harm to the unborn child) is based primarily on animal data, and secondly on “other relevant information”. Substances in category three are insufficiently investigated, but raising concern for man. Classification for reproductive toxicity covers a wide range of effects on either fertility or to the developing organism before and after birth (structural or functional damage). The (Q)SAR models applied in the current project only cover certain but far from all types of harm to the unborn child. Hence only certain types of mechanisms causing malformations or foetal mortality are covered.No (Q)SAR models were used for effects concerning other types of developmental toxicity and fertility. (Q)SAR based evaluation Three models predicting in vivo teratogenicity or fetal lethality related endpoints were applied in the assessment. A commercial MultiCASE training set constitutes the basis of one model. Data for the training sets for the two other models were obtained from the literature. The technical specifications for the models are given in Table 4. Teratogenic risk (in vivo) The model is the MultiCASE commercial model A49 /31/. The training set is composed of data taken from the TERIS (Teratogen Information System) and a compilation in which the FDA (US Food and Drug Administration) definitions were used to quantify risk of developmental toxicity from drugs used during pregnancy. The training set consists of clinical and epidemiologicdata. Many biological mechanisms are involved in the effects. Drosophila melanogaster SLRL effect (in vivo) The training set consists of data from Lee et al. (1983) /32/. In the experimental method, Drosophila melanogaster males and females are used. Males are treated with the test substance and mated individually to virgin females. The test detects the occurrence of mutations, point mutations and small deletions, in the germ line of the insect. The mutations are phenotypically expressed in males carrying the mutant gene. When the mutation is lethal in the hemizygous condition, its presence is inferred from the absence of one class of male offspring out of the two that are normally produced by a heterozygous female. The assay has a low sensitivity for genotoxins other than direct-acting agents and simple promutagens, but a very high specificity. Dominant lethal effect in rodents (in vivo) The training set is comprised of data from Green et al. (1985) /33/ and other references /21/. In the experimental method, mice and rats are used. Treated males are mated to virgin females according to an experimental scheme. Females are sacrificed in the second half of pregnancy and uterine contents are examined to determine the number of implants and live and dead embryos. The category of early embryonic deaths is the most significant index of dominant lethality and as such used as endpoint. The test identifies major genetic damage, mainly the induction of structural and numerical chromosomal anomalies.
Table 4: Technical summary for the models for reproductive toxicity. The dominant lethal test in rodents and the Drosophila SLRL test are initially meant for genotoxicity effects on germ cells, but the resulting effect is early embryonic deaths and lethal effect on offspring, respectively. Therefore, the endpoints are relevant for reproductive toxicity assessment. In many cases, a toxicological threshold is assumed to exist for reproductive toxicity. With mutagenic chemicals this may not be the case. Figure 4: Schematic diagram illustrating the systematic evaluation applied to assign advisory classifications for reproductive toxicity. For a substance to be selected as probable toxic to reproduction in the assessment, the criterion was a positive prediction in any of the three models and without a negative prediction in the teratogenic risk in humans model (see Figure ) (see also /34/). The screening resulted in a list of 4,036 positive predictions. The models employed do not allow discrimination between classification in the three classification categories, so the lower classification Rep3;R63 was applied in all cases. 2.13 Acute oral toxicityThe formalized criteria for classification for acute oral toxicity includes a number of options of tests including fixed-dose procedure and interpretation of the various sources of information about acute oral toxicity, but is often based on acute LD50 tests in the rat for which the following classification criteria are used:
Table 5: EU criteria for classification for acute oral toxicity 2.13.1 (Q)SAR based evaluationIf test results measured in the rat were readily available (had been used to make the model) these took precedence over any predictions. Moreover, as acute toxicity data from the mouse following a variety of different routes of administration was also available in some cases, this was used to predict rat oral LD50’s using the QAARs (Quantitative activity-activity relationships) preferentially as follows /63,64/:
iv: intravenous Table 6: QAAR equations for acute oral toxicity correlating mouse and rat data by different routes Biological data consisting of LD50’s in mice or rats was available for about 15% of the chemicals processed. If no test data were available, rat oral LD50 was estimated according to the Pharma Algorithms Inc. ToxBoxes (vers. 2.9) acute toxicity LD50 for Rat (oral) which is based on RTECS (Registry of Toxic Effects of Chemical substances) and ESIS (European Survey of Information Society) data for 8,631 substances /65, 67/. In the Pharma ToxBoxes predictions of LD50 are given together with applicability domain estimates in the form of reliability indexes (RI=Reliability Index), which take into account the similarity of the query compound to the training set, the difference between predicted LD50 and experimental values for similar compounds, and the consistence of experimental values for similar compounds. An external validation of this model using a test set with 2,167 tests from Pharma Algorithms gave a multiple R-squared of 0.524. When using a RI of 0.5, the R-squared went up to 0.639 for the 1,332 tests that met this RI cut-off. This is a significant improvement over the TOPKAT model which was applied as basis for advisory classifications for acute oral toxicity in the 2001 version of the list. This TOPKAT model was evaluated by external validation with 1,840 chemicals resulting in an R-squared of 0.31.
Table 7: Technical summary for the Pharma ToxBoxes acute toxicity model In modern acute oral toxicity tests using small numbers of animals, statistical variation is often within a factor of 2-4, and inter-laboratory variations of up to an order of magnitude is not uncommon /66/. The accuracy of the Pharma model is considered to be sufficient to differentiate between the three different levels of acute toxicity (“harmful”, “toxic” and “very toxic”). A schematic diagram of the systematic evaluation is given in figure 5. This resulted in 13,873 substances with an advisory classification of Xn;R22, 1,184 substances with an advisory classification of T;R25 and 168 substances with an advisory classification of Tx;R28. In total 15,225 substances with advisory classifications for acute oral toxicity. Figure 5: Diagram illustrating the systematic evaluation used to assign advisory classifications for acute oral toxicity 2.14 Sensitisation by skin contactThe current advisory classifications for sensitisation by skin contact originate from 2001 and have not been updated. The general documentation on the assessments undertaken - start list, criteria for application domain etc. - can be found in the documentation report from 2001 /5/. No attempt to search for and exclude chemicals with advisory classifications for skin sensitization, which have received harmonized EU classifications since 2001 have been made. The general advice on the use of the advisory self-classification list is to first check whether the substance in question has harmonized EU classifications and if so classify accordingly. Classification as sensitising by skin contact, R43 (“May cause sensitisation by skin contact”), is based either on animal studies or practical experience or combinations thereof. The animal criterion is based on either an adjuvant or non-adjuvant test. Different adjuvant tests exist, but the Magnusson-Kligmann’s method (GPMT: Guinea Pig Maximization Test) is preferred. Response in 30% of the animals results in classification. For a non-adjuvant test (for example the Büehler test) 15% responding animals is regarded as positive. The human data can be results from patch testing, case studies or epidemiological studies. 2.14.1 (Q)SAR based evaluationTwo approaches were used to estimate contact sensitisation /68,69/. The first approach uses two TOPKAT QSTR models. The first model was used to predict “Allergy versus non-allergy”, and, in cases where this was positive, the second model was used to predict “Strong versus weak/moderate allergy”. The models used were primarily related to the GPMT. Only predictions of “Strong allergy” were considered as being likely to fulfill the EU criteria for R43. In a second approach, predictions were also made using MultiCASE. The data set used to produce the MultiCASE models differed somewhat from the TOPKAT set, in that both data from the GPM tests and human data were represented. Only positive predictions with MultiCASE scores of > 40 (corresponding to “very active”) were selected.
Table 8: Technical summary for the models for sensitisation by skin contact External validation of both TOPKAT and MultiCASE models was also attempted using confidential results from the EU New Chemicals program. Using the two-stage TOPKAT model (n= 64 AOK[4] predictions) 67% of positives were correctly identified, and 77% of negatives. For MultiCASE, (n= 75 AOK predictions) 45% of positives were correctly predicted, and 81% of negatives /70/. It is difficult to know how representative “New Chemicals” are with regard to the universe of Existing Chemicals (EINECS). Generally “New Chemicals” are more complex structures with higher molecular weights. Perhaps the most surprising aspect of this exercise was to find that for over three thousand chemicals that should have been assessed for this endpoint, such a tiny percentage of useful test data could be found. Compounds predicted as positive by either TOPKAT or MultiCASE according to the above criteria were selected, provided that they were either AOK in the first, or contained no unknown fragments or equivocal results in the latter. While it was considered to use “positive” in both models as a criterion, in the end this seemed inefficient, not so much due to lack of concordance between model predictions, but because the acceptance domains (AOK or all fragments known) of the two methods differed considerably. No attempt was made to further reduce the list by systematically applying expert judgment. A schematic diagram of the systematic evaluation is given in figure 6. Figure 6: Schematic diagram illustrating the systematic evaluation applied to assign advisory classifications for sensitisation by skin contact 9.669 chemicals met the above criteria, for which an advisory classification of R43 was assigned. This strike many experts as being a rather large number of chemicals and while these models represent the current “state-of-the-art” it may indicate that they are over-sensitive. However, it was very difficult to obtain any reliable indication of how many Existing Chemicals would cause contact allergy if actually tested in animals or humans. Estimates of percentages of allergens on EINECS ranged from 5-25%, with some preference being expressed for 10%, which is the number of Annex I (now Annex VI of 1272/2008/EU) substances currently classified for this effect. It is not possible, however, to estimate the influence of confounders on the distribution represented in Annex I. Positive bias can have been introduced because chemicals testing positive are over-represented. Negative bias can have been caused by the fact that most of the chemicals have never been tested at all. The question of numbers remains open. 2.15 Skin irritationSubstances which cause significant inflammation of the skin determined on the rabbit according to the cutaneous irritation Annex V test method (persisting for =24h after exposure =4h) should be classified for skin irritation with Xi;R38 (Irritating to skin). 2.15.1 (Q)SAR based evaluationIf test results measured in the rabbit were readily available (had been used to make the model) these took precedence over any predictions. Positive test data for rabbits were available for 213 of the chemicals processed. If no test data were available, skin irritation was estimated according to the DK MultiCASE model for severe skin irritation vs. mild skin irritation. The training set for the model includes data from RTECS /71/ on 701 chemicals[5], HSDB /72/ on 31 chemicals[6], EU Annex I classifications for a total of 56 chemicals[7], and expert judgments for certain groups of chemicals for a total of 49 chemicals[8]. As the model training set contains both information on skin irritation and corrosion, positive predictions from the model may in reality be due to either of the effects.
Table 9: Technical summary for the model for skin irritation The software used in the current project is unable to predict the properties of ionized compounds (salts) and therefore predictions have not been made for ionized compounds, as skin irritation is a local effect, which can be highly sensitive to pH. A schematic diagram of the systematic evaluation is given in figure 7.
Figure 7: Schematic diagram illustrating the systematic evaluation used to assign advisory classifications for skin irritation This resulted in 8,005 substances, which were assigned an advisory classification of Xi;R38. As the model does not discriminate between strong irritants and corrosive chemicals, the advisory classifications based on the predictions from the model should be considered as “minimum classifications”. 2.16 Danger to the aquatic environmentThe classification criteria are composed of three main elements: 1) potential for rapid degradation, 2) bioconcentration potential in fish, and 3) short-term toxicity to aquatic organisms (fish, daphnia, and algae). Classifications are assigned according to the following scheme:
Table 10: EU criteria for classification for danger to the aquatic environment * The lowest effect concentration, EC50, for fish, daphnia or algae is used (Q)SAR based evaluation Advisory classifications were assigned on the basis of combinations of estimates for ready biodegradability, bioconcentration and acute toxicity according to the criteria in Table 5. Classification with risk phrase R53 alone was not done in this exercise, as the strong co-linearity between water solubility and bioconcentration factor made it redundant. It is noted that compared to the classification criteria according to which abiotic degradation (and assessment of primary degradation products for their environmental hazard classification) can be used, only predictions concerning potential for rapid biodegradation was employed here. Furthermore only predictions for bioconcentration in fish were used even though the classification criteria refers to use of log Kow when reliable measured BCF data in fish are not available. BiodegradationBiodegradability was estimated using the Syracuse BIOWIN program /43/. Only the non-linear equation for rapid/non-rapid biodegradation (BPP2) was applied. Previous validation of this parameter compared with 304 MITI “ready/not-ready (45:259) results showed that while a relatively high percentage of “not-ready” chemicals were missed (sensitivity result was 53%), 97% of “not ready” predictions were correct (PPV, Positive Predictive Value) in this “chemical universe” of 85% not-ready chemicals /44/. MITI data was also applied by Tunkel et al /41/ who found a sensitivity of 53%, a specificity of 86% and a PPV of 83% for 884 chemicals (385 ready: 499 not-ready). These findings were largely confirmed in a comparison exercise made by the Danish EPA and based on chemicals assessed at OECD (SIAM 11-18), where 128 chemicals (59 ready:69 not-ready), which were not part of the BPP2 training set indicated a sensitivity of 54%, a specificity of 85% and a positive predictive value of 80% /38/. In other words while this model may fail to identify around half of all “non-ready” substances, the number of false predictions for not-ready biodegradability will be very low. A total of 11,766 chemicals of the 49,292 chemicals studied were found to be “not-readily degradable” according to this criterion. BioconcentrationThe classification and labelling guidence prefers measured data for bioconcentration, but as this rarely is available, a Log Kow of greater than three is recommended as an indication that BCF will be 100 or greater, in accordance with the linear equation of Veith /55/. While a good rule-of-thumb, this relation both over- and underestimates BCF for many classes of chemicals, and it is only applicable in the Log Kow interval 2-6. Bioconcentration was therefore predicted using Syracuse BCFWIN /42/, a method based on a combination of Log Kow relations and structural fragment categories. This method was evaluated by its authors as having a statistical accuracy of R² = 0.74 (n = 694, S.D. 0.65, mean error = 0.47), which is a significant improvement over the standard equation of Veith (log BCF = 0.85 * Log Kow – 0.70) where predictions for the same 694 compounds had a statistical accuracy of R² = 0.32 (S.D. 1.62 and mean error = 1.12). No attempt was made to further assess bioaccumulation potential. For chemicals predicted to have aquatic toxicity concentrations below 10 mg/L and to be readily biodegradable, 4,662 chemicals were predicted to have BCF estimates of equal to or greater than 100. Acute toxicityFor aquatic toxicity classifications, it is recommended to used L(E)C50-values for fish, daphnia and algae. Aquatic toxicity to fish, daphnia and algae were predicted using three models and a theoretical equation. Fish For acute aquatic toxicity to fish a DK MultiCASE model using 96h LD50 data on 569 chemicals from the Duluth Fathead minnow database was applied /48/. Cross-validation of this model gave a R² of 0.735. As there was insufficient test data for very hydrophobic substances the MultiCASE model was only applied for chemical substances with Log Kow of 6 or less. Daphnia For acute aquatic toxicity to daphnia a DK MultiCASE model using 48h EC50 data on 641 chemicals from various sources was applied /49/. Cross-validation of this model gave a R² of 0.69. As there was insufficient test data for very hydrophobic substances the MultiCASE model was only applied for chemical substances with Log Kow up to 7. Algae For acute aquatic toxicity to daphnia a DK MultiCASE model using EC50 data on 531 chemicals (396 tests made at the Technical University of Denmark for the Danish EPA, plus literature data from various sources) /50/ was applied. Cross-validation of this model gave a R² of 0.74. A regression equation was used on top of MultiCASE predictions to adjust for Log Kow contribution to the toxicity: Log EC50 (μM) = 0.593*Log EC50 (MultiCASE prediction, μM) – 0.257*Log Kow + 1.076 N = 343, R2 = 0.743, S.E = 0.853 (Log Kow below –1 were set to –1, Log Kow above 7 and less or equal to 8 were set to 5, and Log Kow above 8 were set to 1) As there was insufficient test data for very hydrophobic substances the MultiCASE model was only applied for chemical substances with Log Kow of up to 8. Non-polar narcosis predictions for highly hydrophobic substances Another relationship was used for chemicals with a Log Kow of greater than six. Here, all substances were assumed to act by non-polar narcosis (minimum or baseline toxicity) , and toxicity at dynamic equilibrium (or steady state) was estimated according to a relation to the predicted bioconcentration factor in small fish: LC50 (equilibrium) = 8.15 mmol /BCF The choice of 8.15 mmol corresponds to the theoretical level inducing aquatic lethal effects represented by the non-polar narcosis fish (Q)SAR recommended in the REACH-guidance /51/. Non-polar narcosis Lethal Body Burden’s for fish are generally assumed to be within the range of about 2–8 mmol /53/. While simple Log Kow relationships exist for predicting the non-polar narcotic toxicity for fish, daphnia and algae, these do not distinguish specific toxicity’s unique to any of the three taxa, and were not felt to offer any advantage over using the fish models alone, which also adequately predict non-polar narcosis. For all practical purposes, non-polar narcosis induces effects at the same concentration levels in all three taxa for chemicals with these high Log Kow values. Aquatic toxicity screening Using the three Multicase models and the non-polar narcosis equation, 18,809 of the chemicals assessed in the current project had acute aquatic toxicity’s of = 100 mg/L.
Table 11: Technical summary for the models used for classification of danger to the aquatic environment. Advisory classificationsA total of 18,809 of the chemicals assessed in the current project were selected according to one of the four classification categories based on the combination of model predictions as indicated in the classification criteria and shown in Figure 8. The classifications for danger to the aquatic environment were assigned to the following number of chemicals:
Figure 8: Schematic diagram illustrating the systematic evaluation applied to assign advisory classifications for danger to the aquatic environment. [4]AOK means within applicability domain as defined in /5/ [5] 291 were positives and 410 were negatives. For the positives, the search criterion in RTECS was the RTECS code “SEV” for severe skin irritation and no requirements on dose or duration of exposure was made. For the negatives, the search criterion in RTECS was the RTECS code “MLD” for mild skin irritation, and moreover a requirement of 500 mg and 24H exposure was set. [6] The 31 chemicals from HSDB were all positives; highly irritation or corrosive according to HSDB criteria. [7] The 56 chemicals with EU classifications were either corrosive with R34 (causes burns) or corrosive with R35 (causes severe burns). [8] Of the expert judgment groups entered, some consisted of presumably not irritating chemicals that the model was otherwise confused by and where experimental data could not be found, together with some well known groups of positives. A QMRF including the full training set with statements of source for both test data and expert judgments will be submitted for inclusion the EU QMRF inventory and is also available on request. 3 Discussion & Conclusions
This consolidated report contains documentation for all of the current advisory classifications on the DK EPA Advisory self-classification List (AL), i.e.:
The 2009 update of the advisory classifications for cancer and mutagenicity was made using entirely new models; i.e. none of the models used to make the advisory classifications for mutagenicity and carcinogenicity on AL2001 were used in the update. Annex 2 contains examples of how further structural analyses of substances belonging to various chemical classes can be made on top of the predicted properties from this project to visualize and gain further insight into relations between sub-structures and, in this case, the carcinogenicity properties of chemicals. For the environmental advisory classifications some of the models used for AL2001 were used again for AL2009 (BCFWIN and model for aquatic toxicity to Fathead minnow), and new models were applied for biodegradation and aquatic toxicity to Dahpnia and Algae. Comparisons between AL2001 and AL2009 are made in the following for the individual advisory classifications represented in both lists. The following text originates from the AL2009 report /62/ and has only partly been updated to the 2010 update of acute oral toxicity and the addition of skin irritation. 3.1 Chemicals on AL2010 that were not on AL2001As shown in figure 9, a larger number of chemicals have been assigned advisory classifications for the individual advisory classifications in the current advisory list than in the former. This is due primarily to the application of entirely different models with in many cases larger chemical domains than the models applied for AL2001. Also, a little more substances were included in the start list for AL2009 and AL2010 than for AL2001 (49,292 for AL2009/AL2010 and approximately 47,000 for AL2001) For the advisory classifications for danger to the aquatic environment the reasons for the differences more specifically relate to the addition of aquatic toxicity models for Daphnia and Algae, plus the use of the non-linear BIOWIN 2 model instead of the linear BIOWIN 1 model, which was used for AL2001. BIOWIN 1 has a lower sensitivity than BIOWIN 2. For the carcinogenicity and mutagenicity endpoints the increased number of predictions on AL2009 as compared with AL2001 is generally due to the use of new and improved (Q)SAR-models with larger applicability domains. Figure 9 presents an overview of the number of advisory classifications for individual endpoints on AL2001 and the current consolidated AL2010. Reproduction and skin irritation are included although these endpoints were not addressed in AL2001. Figure 9: Overview of the number of substances for each advisory classification in the current version compared to the 2001 version of the Advisory self-classification list. (Note: Reproductive toxicity and skin irritation were not included in AL2001. The advisory classifications sensitisation by skin contact, R43, have not been updated, and the number in the current version is therefore the same as in 2001.) 3.2 Chemicals on AL2001 that are not on the current listThere are also substances that were assigned advisory classifications on AL2001 that are not on AL2009/2010. It is for the individual endpoints seen that between 11 and 14% of the advisory classifications from AL2001 are not on AL2009/2010. An exception is acute oral toxicity, where around 45% of the 2001 advisory classifications are not on AL2010, primarily due to different applicability domains for the TOPKAT model applied for AL2001 and the Pharma ToxBoxes model applied for the current update. The differences for the other endpoints are also primarily due to the use of new models for AL2009/2010. Chemicals on AL2001 may not have been included in updates made for AL2009/2010 for one or more of the following reasons:
For the mutagenicity endpoint, for example, where five models were used, many of the chemicals that were included on AL 2001 but not on AL 2009 did not have robust predictions (within applicability domain) in two or three models, but often with flags in one or more of these models showing that a possible active fragment was identified. Additionally, many have positive predictions in models for in vitro genotoxicity endpoints (which were not included in the evaluation). In total, the majority of the chemicals that were not identified this time appear to be borderline mutagens. As there were mixed results (negative / out-of-domain / positive) from the battery of models applied within an endpoint, it is not possible to separate the chemicals strictly into groups of chemicals that were not identified this time because they could not be predicted (i.e. outside domain) or because the models applied in the new selection algorithm for AL2009 predict them to be negative for the effect. A detailed comparison between numbers of chemicals with advisory classifications for carcinogenicity, mutagenicity and danger to the aquatic environment on AL2001 and AL2009 is given in table 12. VIOS det
* Due to overlap; some chemicals have advisory classifications for more than one CMR endpoint Table 12. Overview of the occurrence of substances on AL2001 and AL2009 3.3 ConclusionDue primarily to the application of combinations of new (Q)SAR models, in many cases with larger applicability domains, the number of substances with advisory classifications has increased considerably for individual classifications as compared to AL2001. Moreover, reproductive toxicity (possible harm to the unborn child), skin irritation, and differentiated advisory classifications for acute toxicity (harmful, toxic and very toxic) were included for the first time. 4 References
Annex 1. Glossary
Annex 2. Analysis of positive predictions of cancer classificationLeadScope is a predictive data-mining tool for exploring and filtering data sets based on both structural features and associated data[9]. This software contains a predefined library of over 27,000 chemical functional groups (medicinal chemistry building blocks), which can be applied in the analysis of structural similarities within data sets. Structural similarities may lead to logical paths linking chemical structures with a biological endpoint. In this example, structural similarities associated with (Q)SAR predictions used for the advisory classifications for cancer were analysed based on a large data set to try and gain further insight into the predictions. A random set of 21,000 chemicals from the full set of around 185,000 chemicals in the DK (Q)SAR prediction database was imported into LeadScope. The size of the set, which was chosen for practical and technical reasons, is judged to be representative of the full database.The cancer predictions made in the four Multicase FDA cancer models[10] for carcinogenicity to male and female Mice and Rats, respectively, were entered as the overall call made by the so-called FDA ICSAS methodology[11]. Also entered were predictions from the Multicase Ames mutagenicity model (described in 3.2.2), and an overall prediction of in vivo genotoxicity[12] based on five Multicase models for in vivo genotoxicity endpoints (Drosophila SLRL, mutations in Mouse micronucleus, dominant lethal mutations in rodents, sister chromatid exchange in mouse bone marrow, and COMET assay in mouse). The 21,000 chemicals were organized into groups based on structural features according to the LeadScope library of chemical functional groups. This first rough structural grouping in LeadScope is shown in figure 1. The groups are coloured based on the cancer predictions from the FDA cancer models. Groups with over-representation of positive predictions have red bars, groups with over-representation of equivocal predictions or predictions, which are out of the applicability domain, have grey bars, and groups with over-representation of negative predictions have green bars. Interpretation of colours is indicated in the bottom right corner. The length of the bars indicates the number of chemicals (plotted on a log scale). For each group there are a number of more narrowly defined sub-groups, named clusters, which may have different distributions of positives, negatives and “out-of-domain” chemicals. Figure 1. First rough structural grouping in LeadScope of the 21,000 chemicals with FDA ICSAS cancer calls Out of the 21,000 chemicals, 4,705 chemicals were assigned to the group “reactive groups” by LeadScope. This group is marked with blue in figure 1, and was selected for further analysis in this annex. Identification of a group of genotoxic carcinogens Within the “reactive groups” LeadScope made a number of chemical clusters. Figure 2 gives the first part of a list of these clusters, and again clusters with over-representation of positive cancer calls are shown in red. Further down the list are further out-of-domain clusters (grey) and negative clusters (green). In the leftside of figure 2, the cluster numbered “90” is highlighted in blue. This cluster is in red colour and contains 24 chemicals. Figure 2. Cluster 90 with positive predictions for cancer The first 20 chemicals in cluster 90 are given in figure 3. The FDA predictions of cancer are given for each chemical in the upper left corner. FDACALL of “1.0” means positive cancer prediction. Figure 3. Chemicals in cluster 90 From figure 4 it can be seen that all 24 chemicals in cluster 90 are predicted positive for both cancer (yellow column to the left) and for Ames mutagenicity (yellow column to the right). The chemicals in cluster 90 appear on this basis to be genotoxic carcinogens. Figure 4. FDA cancer predictions and Ames mutagenicity predictions for chemicals in cluster 90 Identification and mechanistic profile of a group of steroidal carcinogens If we go back to the clusters within the “reactive groups” and instead of cluster 90, choose cluster 51, we find a very different group of chemicals. In the left side of figure 2, cluster number 51 is highlighted in blue. This cluster contains 156 chemicals, with over-representation of positive cancer predictions as can be seen from the red colour of the bar. Figure 5. Clusters within the ”Reactive groups” with cluster 51 highlighted (left) Cluster 51 is composed of steroids that are likely to be promoters of cancer. The first of the 156 chemical structures are given in figure 6. Figure 6. Chemicals in cluster 51; steroids which are likely to be promoters In figure 7, the distribution of cluster 51 chemicals with positive and negative cancer predictions, Ames mutagenicity predictions and in vivo mutagenicity predictions is graphed. “0.0” are the negatives and “1.0” are the positives. Approximately half of the chemicals in cluster 51 are predicted positive for carcinogenicity as can be seen from the graph in the upper left part of figure 6. Almost all chemicals are predicted negative in the Ames model (upper right part), and all chemicals are predicted negative for in vivo genotoxicity (lower left part). I.e. according to the model predictions from models for cancer and genotoxicity, some of the chemicals in this steroid cluster are carcinogens, but probably with a non-genotoxic mechanism. It is well-known that some steroids can cause cancer through a hormonal non-genotoxic mechanism[13]. Figure 7. Distribution of cancer predictions (FDACALL), Ames mutagenicity (AMESCALC) predictions and in vivo mutagenicity (M_1) predictions in cluster 51 The picture of a non-genotoxic mechanism is confirmed in figure 8 and 9, where the chemicals predicted to be negative (figure 8) and positive (figure 9), respectively, for Ames mutagenicity are highlighted in yellow. Both the predicted Ames positive and negative chemicals are evenly distributed between the chemicals predicted positive and negative for cancer, i.e. there’s no significant relation between Ames positive and positive cancer predictions, this confirms that the chemicals in cluster 51 are not likely to be carcinogenic by a genotoxic mechanism. Figure 8. Distribution of Ames negatives among the carcinogenicity and in vivo mutagenicity predictions Figure 9. Distribution of Ames positives among the carcinogenicity and in vivo mutagenicity predictions The seven chemicals predicted to be positive for Ames mutagenicity are shown in figure 10. All of them contain additional reactive fragments such as the diketone, the hydroperoxy group, and the strained 3-member ring (epoxide). By inspection the chemicals look like potential genotoxic compounds by electrophilic mechanisms, not because of the steroid part of the structures but rather because of the additional reactive fragments. Figure 10. The seven steroid chemicals predicted positive for Ames mutagenicity Structural identifiers for carcinogenicity of steroids In the following, LeadScope was asked to find rules about chemical feature combinations that can be used to discriminate between positive and negative cancer predictions within the cluster of 156 steroid chemicals. Figure 11 shows the generated fragment combination tree. The interpretation of the colours of the boxes is given in the bottom right corner; red box again means over-representation of chemicals with positive cancer predictions, green boxes means over-representation of non-cancer predictions, etc. Figure 11. A fragment combination tree within the steroids (red means over-representation of positive cancer predictions) In figure 12, the red box is marked and the rules leading to classification into this box appears in the bottom windows. As it appears, a positive prediction in the steroid cluster is associated with the 17-hydroxy-steroid skeleton (lower left window) and an unsaturated ketone ring (lower right window). There are 18 chemicals in the selected box. Figure 12. A positive prediction is associated with the 17-hydroxy-steroid skeleton (left) and an unsaturated ketone ring, cyclohexenone, (right) The 18 chemicals in the red box are given in figure 13. The highlighted part of the structure is the combination of the two structural features; the steroid fragment and the unsaturated ketone ring. The cancer predictions, FDA calls “1.0”, “0.0” or “?”, for cancer are shown in the upper left corner for each chemical. 14 of the 18 chemicals are predicted positive for cancer, 3 are predicted negative and 1 is equivocal/out-of-domain. In other words, this simple rule, i.e. a combination of a 17-hydroxy-steroid skeleton and an unsaturated ketone ring, has a discrimination of 14:3 (not including the out-of-domain prediction) for predicting whether a chemical is predicted to be carcinogenic by the Multicase FDA cancer models. In other words, based on the 17 chemicals with robust cancer predictions, this rule has a Positive Predictive Value (PPV) of 14*100%/17=82%. Figure 13. Overlay of steroids containing the two structural combinations Characterizing non-carcinogenic steroids Of the remaining 138 chemicals in cluster 51, 132 were predicted negative for cancer. Some of these are shown in figure 11. This gives the rule of structure combinations of steroid skeleton plus cyclohexenone a discrimination of 132:6 for predicting whether a chemical is not predicted to be carcinogenic by the Multicase FDA cancer models. In other words, based on the 138 chemicals, this rule has a Negative Predictive Value (NPV) of 132*100%/138=97%. Figure 14. 132 out of the 138 substances are predicted negative for cancer LeadScope also identified another rule for discrimination between positive and negative FDA cancer predictions as shown in figure 15 (red box). This rule combines a distance between two hydrogen bond acceptors (HBA) and a cyclohexenone fragment. 6 chemicals had this structure combination, of which 4 were predicted positive for cancer according to the Multicase FDA cancer models. This gives a discrimination for positives of 4:2, or in other words, based on the 6 chemicals, this rule has a Positive Predictive Value (PPV) of 4*100%/6=67%. Figure 15. Another feature combination (6 structures) within cluster 51 The 6 chemicals are given in figure 16, with the FDA cancer calls in the upper left corner for each chemical. Figure 16. 4 of the 6 structures are predicted positive for cancer [9] 1. Roberts G., Myatt G.J., Johnson W.P., Cross K.P., Blower P.E., ”LeadScope: Software for Exploring Large Sets of Screening Data”, J. Chem. Inf. Comput. Sci., 2000, 40 (6), 1302-1314. [10] J. Matthews and J.F. Contrera. A new highly specific method for predicting the carcinogenic potential of pharmaceuticals in rodent using enhanced MCASE (Q)SAR-ES software. Reg. Toxicol. and Pharmacol. 28 (1998) pp. 242-264. [11] Positive according to the FDA ICSAS methodology corresponds to two or more positive cancer calls, accepting only predictions for chemicals without significant deactivating fragments. See footnote 2 for reference. [12] The criteria for the overall call for genotoxicity is the one used for advisory classifications and described in 3.1 Mutagenicity; positive experimental test result in at least one training set or positive predictions in at least two models. [13] E.g. Lima, B.S., Van der Laan, J.W.; ”Mechanisms of Nongenotoxic Carcinogenesis and Assessment of the Human Hazard”, Reg. Tox. and Pharm. (2000) 32, 135-143.
|