Steve Chapman had a column last week about the benefits of assembling large DNA databases of the populace for the purpose of solving crimes.
The L.A. Times has a story this weekend on why that creates some problems that might not be readily apparent.
The main problem is that the odds of a false match increase exponentially when you’re running a DNA sample against a database of hundred of thousands of people (in Britain, the number is well into the millions)–a Bayes’ Theorem problem. The problem is exacerbated when you’re dealing with decayed DNA from old “cold cases,” where you have even fewer markers than in well-preserved DNA samples.
Let’s say the U.S. adopts a Great Britain policy on collecting DNA–basically a move toward, at some point in the future, having DNA on file for everyone in the country. Well now the 1 in 1.1 million odds against the suspect in the L.A. Times case are being run against a database of 380 million people. The numbers say that you’re going to pull up about 345 matches in the U.S. alone. In the California case, the database is obviously much smaller than the entire U.S. population, and only one of those 345 people showed up from the 330,000-person FBI DNA database–the (admittedly unsympathetic) subject of the article. But any of the other 344 potential matches in the U.S. (or the 2,200 matches worldwide) could have committed the crime. They just weren’t in the database.
DNA database searches are an excellent starting point for law enforcement. But given the odds of false matches when running DNA against an extensive database, we should be very careful about moving the burden of proof onto matches to prove their innocence. It’s also unfortunate that the judge in the case profiled in the L.A. Times would only allow the prosecution’s miscalculated 1 in 1.1 million chance of a false match into evidence, and not the more statistically sound 1 in 3. Even if one were to accept the idea that the scientific community is divided over the proper way to calculate the possibility of a false match (and I’m not convinced there’s really that much of a debate), you’d think a judge should either allow the jury to be made aware of that division of opinion, and that there are serious statisticians and scientists who would put the odds much, much lower than the odds suggested by the prosecutors in the case.