The demo below demonstrates aspects of signal detection theory (SDT). SDT applies to a broad range of problems spanning multiple fields -- it can apply almost anywhere a yes/no decision is made on the basis of noisy evidence. In psychology and neuroscience, it is often employed in cases where an observer is trying to detect the presence of something, for example a flash of light or an audible tone. More interesting real-world examples occur in baggage screening (is there something dangerous in this piece of luggage?) or radiology/screening (is the diagnosis positive or negative?).
SDT helps us address a fundamental issue with these sorts of problems: if the problem is difficult, then that means that sometimes "signal-absent" trials (there's no cancer in the image) look like "signal-present" trials (cancer present), and sometimes "signal-present" trials (there is cancer in the image) look like "signal-absent" trials (but it's hard to see). In these cases, tradeoffs must be made!
A problem like this involves four possibilities, because the truth has two states (signal-absent or signal-present), and we have to make a discrete decision (present or absent). The good states are (1) correct rejections: the signal is absent, and the observor says it is absent, and (2) hits: the signal was present and the observor said it's present. The bad states are (3) false alarms: the observor said it was present, but it wasn't, and (4) misses: the observor said the signal was absent, but it was really there.
Back to the cancer scenario, imagine if you could sum up all evidence to a single number. Higher numbers are consistent with the presence of cancer, and lower numbers are consistent with the absence of cancer. However, sometimes someone with a relatively low number has cancer, and sometimes someone with a relatively high number does not. Thus, you're either going to have to diagnose a large number of people incorrectly, or fail to diagnose a large number of people who really have cancer. It might seem obvious that something as dire as cancer would require us to diagnose readily, but in reality that could be problematic -- treatment could be unnecessarily harmful relative to the risk of going undiagnosed for a bit longer. Assuming that the number can't be improved upon further, one has to adopt a criterion that determines the trade-off one is willing to accept. Sensitivity refers to how severe the overlap problem is. A low-sensitivity problem means that a lot of noise trials look like signal and vice-versa. A high-sensitivity problem means that the noise trials look generally unlike signal trials -- you'll generally rarely get a noise trial that "looks" like a signal trial (i.e., has a large amount of signal-evidence), and vice-versa.
In terms of mere psychology experiments, we as experimenters face the problem of trying to separate our observor's criterion from their sensitivity. Suppose someone comes into the lab, and we are trying to determine their sensitivity to light. We show them dim flashes and ask them if they see it, and they always say "yes." Without having trials in which the signal did not appear, we have no idea whether our participant really is a very sensitive observor, or whether they just liked saying "yes" a lot. Thus, we need to also include trials where there is no signal. If the participant always says "no" on these trials, then we can conclude that they really can see the flashes.
This basic problem is exarcerbated in very difficult conditions, such as near-threshold presentations where even an earnest, honest observor sometimes just doesn't know if the signal was there or not. This forces people to adopt a criterion, but it's hard to control individuals' internal criteria. But we are often interested in isolating the sensitivtiy of the observor. Luckily, the tools of signal detection theory allow us, with a few assumptions, to determine and separate sensitivity from criterion, even under very challenging detection conditions.
The key to signal detection theory is to assume that the decision is, like in our cancer example above, based ultimately on a value of evidence that could be quantified (i.e., we could theoretically assign a number to the evidence accumulated by the observor in favor of signal-present on any given trial). That number, in the case of something as complex as the human mind, would inevitably be the result of many processes and subject to "noise." That explains why there is so much variability in the amount of evidence from trial to trial. Below we envision the "noise distribution" and the "signal distribution" as bell curves -- the one on the left represents noise trials and the one on the right represents signal trials. These don't represent what happens on a single trial -- they represent the probabilities of drawing a certain amount of evidence over very many trials. That is, the average no-signal trial will have a value that corresponds to the peak of the left distribution and the average singal trial will result in an evidence value that corresponds to the peak of the right distribution. But sometimes, a noise trial has a high value, and sometimes, a signal trial has a low value. The observor has to decide what level of evidence means "signal," and accept that sometimes she will make either of the types of errors (false alarms or misses).
By default in this example, they are overlapping a fair amount, and the criterion is set in the middle. But remember, in a real experiment, we can't "see" these curves. We can only observe responses -- "yes" and "no." Luckily, as the experimenter, we have control over the trials, and we know whether the signal really was there or not. Thus, if we get enough trials, we can guess both our observor's sensitivity and their criterion. We do this by calculaing the proportion of signal-absent trials that were false alarms (aka false positive rate) and the proportion of signal-present trials that were hits (aka, the true positive rate). The false positive rate corresponds to the portion of the left curve that is marked with squares. The true positive rate corresponds to the portion of the right curve that is marked with green.
In the default condition, the true positive rate is the same as 1-false positive -- these numbers are calculated below the graph. Anytime this occurs, we could conclude (based on our assumptions) that the observer is unbiased -- they are maximizing the number of trials on which they are correct, regardless of the type of trial. This is true of all sensitivities. Leaving criterion where it is (0), use the second slider to alter sensitivity to values higher or lower than d' = 1. By the way, d' is just a number that indicates the distance between the peaks -- higher is better, and d' should not go below 1. Notice that true positive always equals true negativeas long as the criterion is set right in the middle.
To preview what the plot on the right indicates, press "Plot point on ROC" for a few different sensitivity values, keeping criterion in the middle. You might notice that they span a line going from (0.5, 0.5) to (0, 1). As long as the criterion stays in the middle, higher sensitivity always leads to a higher true positive rate and a lower false positive rate.
Now pick a fixed value of d' above 0, like 1. Change criterion and notice that the true positive rate and false positive rate change. For several different criterion values, press "Plot point." If you do this for enough different criteria, you will see a curve develop. Toggle over to "Plot ROC line for current sensitivity." This will draw the curve for you. This represents the "receiver operating characteristic" or ROC curve, which indicates, for an observer with fixed sensitivity, their range of options. They can choose to be anywhere on this curve by altering the standard of evidence required to say "signal present." But there's always a tradeoff unless the curves are far apart -- increase the good true positive rate also increases the bad false positive rate, and decreased false positive rate always means decreased true positive rate.
Now moving the sensitivity around, keeping the "Plot ROC line" option selected. You will see that sensitivity closer to 0 leads to an ROC that is close to the dashed straight line -- here the tradeoffs are worse. But for large d', the curve is bowed towards (0, 1) -- middle values of criteria keep high true positives with relatively few false positives.
One of the upshots of this is that knowing just the position of our observer in the plot on the right can give us a great guess, under most conditions, as to what their true sensitivity is. And we can plot their performance on the righthand plot just knowing their performance after they complete a number of trials. Of course, the more trials, the better the estimate.