ROC

From 太極
Jump to navigation Jump to search

ROC curve

  • Binary case:
    • Y = true positive rate = sensitivity,
    • X = false positive rate = 1-specificity
  • Area under the curve AUC from the wikipedia: the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').

where is the score for a positive instance and is the score for a negative instance, and and are probability densities as defined in previous section.

Survival data

'Survival Model Predictive Accuracy and ROC Curves' by Heagerty & Zheng 2005

  • Recall Sensitivity= , Specificity= ), is binary outcomes, is a prediction, is a criterion for classifying the prediction as positive () or negative ().
  • For survival data, we need to use a fixed time/horizon (t) to classify the data as either a case or a control. Following Heagerty and Zheng's definition (Incident/dynamic), Sensitivity(c, t)= , Specificity= ) where M is a marker value or . Here sensitivity measures the expected fraction of subjects with a marker greater than c among the subpopulation of individuals who die at time t, while specificity measures the fraction of subjects with a marker less than or equal to c among those who survive beyond time t.
  • The AUC measures the probability that the marker value for a randomly selected case exceeds the marker value for a randomly selected control
  • ROC curves are useful for comparing the discriminatory capacity of different potential biomarkers.

Confusion matrix, Sensitivity/Specificity/Accuracy

Predict
1 0
True 1 TP FN Sens=TP/(TP+FN)=Recall
FNR=FN/(TP+FN)
0 FP TN Spec=TN/(FP+TN), 1-Spec=FPR
PPV=TP/(TP+FP)
FDR=FP/(TP+FP)
NPV=TN/(FN+TN) N = TP + FP + FN + TN
  • Sensitivity = TP / (TP + FN) = Recall
  • Specificity = TN / (TN + FP)
  • Accuracy = (TP + TN) / N
  • False discovery rate FDR = FP / (TP + FP)
  • False negative rate FNR = FN / (TP + FN)
  • False positive rate FPR = FP / (FP + TN)
  • True positive rate = TP / (TP + FN)
  • Positive predictive value (PPV) = TP / # positive calls = TP / (TP + FP) = 1 - FDR
  • Negative predictive value (NPV) = TN / # negative calls = TN / (FN + TN)
  • Prevalence = (TP + FN) / N.
  • Note that PPV & NPV can also be computed from sensitivity, specificity, and prevalence:

Precision recall curve

Incidence, Prevalence

https://www.health.ny.gov/diseases/chronic/basicstat.htm

Calculate area under curve by hand (using trapezoid), relation to concordance measure and the Wilcoxon–Mann–Whitney test

genefilter package and rowpAUCs function

  • rowpAUCs function in genefilter package. The aim is to find potential biomarkers whose expression level is able to distinguish between two groups.
# source("http://www.bioconductor.org/biocLite.R")
# biocLite("genefilter")
library(Biobase) # sample.ExpressionSet data
data(sample.ExpressionSet)

library(genefilter)
r2 = rowpAUCs(sample.ExpressionSet, "sex", p=0.1)
plot(r2[1]) # first gene, asking specificity = .9

r2 = rowpAUCs(sample.ExpressionSet, "sex", p=1.0)
plot(r2[1]) # it won't show pAUC

r2 = rowpAUCs(sample.ExpressionSet, "sex", p=.999)
plot(r2[1]) # pAUC is very close to AUC now

Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction

http://circ.ahajournals.org/content/115/7/928

Performance evaluation

Some R packages

Comparison of two AUCs

Confidence interval of AUC

How to get an AUC confidence interval. pROC package was used.