The Advisory list for self-classification of dangerous substances

Annex 1. Glossary

  Description
Training set The collection of experimental data on a range of chemicals that have been used to develop the (Q)SAR-model.
Sensitivity The sensitivity is a measure of how well the model ”catches” the substances with positive effect in relation to the endpoint being modelled. A sensitivity of 80% means that 80% of the ”true positives” in the validation set were correctly predicted as positives (the remaining 20% were falsely predicted as negatives (false negatives)). The sensitivity is not dependent on the prevalence of positives in the “chemical universe”.
Specificity The specificity is a measure of how well the model predicts substances with lack of effects in relation to the endpoint modelled. A specificity of 80% means that 80% of the ”true negatives” in the validation set were correctly predicted as negatives (the remaining 20% of the negatives were falsely predicted as positives (false positives)). The specificity is not dependent on the prevalence of negatives in the “chemical universe”.
Concordance Also referred to as overall accuracy. The concordance is an overall measure of the correctness of the predictions. A concordance of 80% means that 80% of the substances in the validation set were correctly predicted as positives or negatives (the remaining 20% are the false predictions i.e. false negatives and false positives).
Predictive values Positive and negative predictive values, PPV and NPV are measures of how well the model positive or negative predictions, respectively, are correct. A PPV of 80% means that 80% of the positive predictions in the validation set were correct (the remaining 20% were false positives). The predictive values are dependent on the split between positives and negatives in the “chemical universe”.
Applicability domain The Applicability Domain (AD) of a (Q)SAR expresses the limits of the training set of the model for which it can give predictions for new compounds with a reliability as determined in the validation. The limits of the training set are expressed by parameters characterising the physico-chemical, structural or biological space of the model. The development of statistical and mathematical methods for defining applicability domains is an active field of current research /9/.
Validation Validation is a trial of the model performance for a set of substances independent of the training set, but within the domain of the model. The model predictions for these substances are compared with measured endpoints for the substances in order to establish the sensitivity and specificity and overall accuracy of the model.

 



Version 1.0 March 2010, © Danish Environmental Protection Agency