| 
 The Advisory list for self-classification of dangerous substances 2 Creation and use of the advisory self-classification list
 Following development of new and/or improved (Q)SAR-models the list of advisory self-classification of dangerous substances has been updated and expanded. This chapter of the report presents the methodology applied. 2.1 The selected dangerous propertiesThe following endpoints were addressed using (Q)SARs : 
 (Q)SAR-predictions for these endpoints were used to assign the classifications listed in Table 1. 
 Table 1: Advisory classifications in the consolidated AL 2.2 The evaluated chemical substancesThe overall purpose of the current project was to evaluate as many chemical substances as possible with relevance to the existing regulation for chemicals within the EU. Under REACH /1/ all chemicals with tonnages above 1 ton/year should undergo pre-registration between June 1st 2008 and December 1st 2008. It would have been relevant to evaluate all chemicals in this inventory but at the time of preparation of the advisory classification list no official list existed. It was therefore decided to base the evaluations on the EINECS list /3,4/. This list consists of 100,204 entries, covering organic and inorganic substances in both single substance entries (mono-constituent substances) and mixtures (multi-constituent substances and UVBCs). The exercise was limited to cover "discrete organics," meaning that multi-constituent substances and UVCBs (Unknown, Variable Composition and Biologicals) were excluded for practical reasons – “if you don’t know what it is, you can’t model it”. Inorganic substances have likewise not been evaluated. These are usually better approached by simpler methods of evaluating the availability of the respective an- and cations with well-known hazard profiles. "Organo-metallics" have also been excluded as being poor candidates for modelling. As an error check, only such structural representations, which could be successfully converted to 3D were used /10/. When it was possible using a CAS number comparison, all substances already classified on the list with formal EU harmonized classifications, Annex I of Directive 67/548/EEC (List of dangerous substances, /2/) were also removed. However, as there is no official overview of the substances covered by the group entries in Annex I, and because a chemical may have more than one CAS number, a few chemicals covered by Annex I may not have been removed from AL2009. This resulted in a total of 49,292 discrete organic substances, or about half of all EINECS chemicals, which could be subjected to (Q)SAR based assessment. 2.3 Test dataFor the vast majority of the assessed chemicals no test data were available. However, if test data were available as part of the (Q)SAR-model, this was generally used in preference to the estimates. It is important to stress that no attempt was made to search published or unpublished databases for toxicological, ecotoxicological or environmental fate information to determine whether a (Q)SAR was necessary for any endpoint assessed. 2.4 Reliability of (Q)SAR-predictionsThe reliability of (Q)SAR-predictions depend on numerous parameters relating to the mathematical methods used, the number and precision of the underlying data used for developing the model and how suitable the model is for the particular substance. In general the uncertainty of (Q)SARs is caused predominantly by two different reasons: a) the inherent variability of the input data used to establish the model (training set); and b) the uncertainty resulting from the fact that a model can only be a partial representation of reality (in other words it does not model all possible mechanisms concerning a given endpoint and it does not cover all types of chemicals). However, as a model averages the uncertainty over al chemicals, it is possible for an individual model estimate to be more accurate than an individual measurement /9/. The reliability of (Q)SAR predictions can be described in many ways. Usually a range of parameters and concepts are used (see e.g. /9/ for a more extensive review). These concepts may not be known by all readers. Annex 1 contains descriptions of the concepts applied in this report. 2.5 ValidationValidation is a trial of the model performance for a set of substances independent of the training set, but within the domain of the model. The model predictions for these substances are compared with measured endpoints for the substances in order to establish the predictivity of the model. Ideally all models should be assessed by checking how well they predict the activity of chemicals, which were not used to make them. This is, however, not always simple. In part valuable information may be left out by setting aside chemicals to be used in such an evaluation, and in part it can be extremely difficult to assess how “external” chemicals relate to the model’s domain; that is, if they represent a random distribution within this applicability domain and thereby giving a fair picture of the predictivity of the model. This problem is often addressed by using cross-validation, where a number of partial models are “externally validated” by splitting the training set into a reduced training set and a testing set. The reduced training set is used to develop a partial model, while the remaining data are used as a test set to evaluate the model predictivity. This is repeated a number of times and the results are used to calculate the predictivity measures for the models; for quantitative models in the form of Q² and SDEP (standard deviation error of prediction), and for qualitative (yes/no) models in the form of sensitivity, specificity and concordance (se Annex 1 and refs /9/ and /11/ for further details). While drawbacks of cross-validation exist /14, 15/, much of the criticism is directed towards a particular form of cross validation; the leave-one-out cross-validation /14/. In the validations carried out on the models applied in this project the more stable leave-many-out (LMO) cross-validation approach by leaving out random pos/neg balanced sets of 50% of the chemicals, repeated ten times, was used (se also the indicated LMO 50 % values in the tables in chapter 3) /13/. Leaving out 50% of the chemicals in the partial validation models is a large perturbation of the training set, which generally leads to realistic, and often pessimistic, measures of the predictivity of the model. The commercial models for acute oral toxicity and cancer were validated by external validation /24/. Concordance will vary depending on both the method used, and the endpoint in question. In general, accuracy of contemporary (Q)SAR systems can often correctly predict the activity of about 70 – 85% of the chemicals examined, provided that the query structures are within the domains of the models. This also applies to the models described in this paper. QSAR Model Reporting Formats (QMRF’s) for all the toxicity models applied in this project, including training sets for the DK models, have been submitted to the EU JRC QSAR Model Database and the OECD QSAR Application Toolbox /35, 37/. 2.6 Applicability domainWhen applying (Q)SARs it is important to assure that an obtained prediction falls within the domain of the models i.e., that there is sufficient similarity (in relevant descriptors) between the query substance and substances in the training set of the model. There is no single and absolute applicability domain for a given model /9/. Generally, the broader the applicability domain is defined the lower predictivity can be expected. The applicability domain should be clearly defined and the validation results should correspond to this defined domain, which is again used when the model is applied for predictions. The applicability domains for MultiCASE models as defined by the US Food and Drug Administration (FDA) /24/ and implemented in the MultiCASE software were used in this project. No warnings in the predictions were accepted, except warning for one unknown fragment in chemicals where a significant biophore has been detected. Only positive predictions where no significant deactivating fragments were detected were accepted. For the acute oral toxicity predictions from the Pharma ToxBoxes, reliability indexes (RI) are given by the software. Based on an analysis performed by DTU Food on an external validation set provided by Pharma Algorithms Inc. / ACD/Labs, an RI cut-off of 0.5 was applied in this project. The EPISUITE models for rapid biodegradation /43, 45/ and the bioconcentration factor in fish /42/ do not automatically flag the predictions for domain coverage. No attempts have been made to consider the applicability domain for predictions made by these models. Depending on the endpoint in question, predictions outside the applicability domain were obtained for between 27 and 58% of the chemicals examined by the individual MultiCASE and Pharma ToxBoxes models. 2.7 Application of the modelsIt is important to note that the applied models in principle do not predict a "classification" – they predict a biological activity that may lead to a classification. Because of the large number of chemicals involved, “rules” were used for each endpoint to try and link the biological prediction with a risk phrase. In essence the process is not different than that imposed upon a human expert forced to interpret the information available in order to comply with the duty to make an assessment and self-classification. The applied models have been used in combinations / batteries within the chosen classification endpoints to reach a final call in an attempt to reach further reliability beyond individual model predictions and to best comply with the classification criteria. 2.8 The resultThe result of the computer-based assessment is this consolidated advisory self-classification list, which comprises 34,292 chemical substances with advisory classifications for one or more of the dangerous properties selected. The results only represent POSITIVE predictions (for quantitative models “positive” here means predicted to have the effect or property as determined in relation to a cut-off point). No distinction has been made between a negative prediction for an endpoint, and an unreliable prediction (prediction outside the applicability domain of the model), which was simply discarded. Evaluated substances which are not on the list, or substances which are on the list but without advisory classifications for one or more of the selected dangerous properties, may have been predicted as not having this / these dangerous property(ies), or the models may not have been valid for this substance (i.e. predictions were outside the applicability domain for these models). Therefore the advisory list cannot be used to conclude that these substances do not possess dangerous properties. Another important point is that the advisory self-classification list represents (Q)SAR based identifications of possible hazardous properties of the included chemicals; no attempt has been made to evaluate the risk that these chemicals constitute in their current use in the EU. All results are available on the website of the Danish EPA (www.mst.dk) where searches can be made on substance name (in Danish), CAS-number, EINECS number, EINECS name, CAS number and chemical formula. The whole list can also be downloaded via www.mst.dk as an Excel file. 
 Figure 1: Number of substances with individual advisory classifications 2.9 How the self-classification list can help manufacturers and importers to comply with the classification dutiesBy making the advisory self-classification list available to the public, the Danish EPA wishes to offer manufacturers and importers a tool, which is based on predictions from sophisticated modelling software. The predictions have been interpreted in relation to the hazard classification criteria and transformed into advisory classifications which are easy to use. They can be applied when carrying out self-classification of chemical substances for the dangerous properties included in the list. If available, reliable test data or predictions using other non-test methods on specific substances should always be considered in parallel to computer predictions and expert judgements in a weight of evidence (WoE) approach to decide on the appropriate classification for a given endpoint. It is recommended that the list is used in the following way in the classifications of chemicals: 
 [3]For the endpoint sensitisation by skin contact the methodology undertaken was slightly different with a start list with somewhat fewer chemicals, etc. Details are given in the documentation report from 2001 /5/. 
 
 |