On Vandalism
(See this post for a little bit of context.)
Vandalism is a problem, Wikipedians are quick to assert, but one that is solved by constant vigilance–Wikipedians are watching recent changes “like hawks”.
“Yes, vandalism is common on Wikipedia,” we read in the recent collaboratively edited press release, “but Wikipedia heals quickly.” After all, “IBM researchers found that most vandalism on Wikipedia was reverted in less than five minutes.”
We see this statement frequently repeated at Wikipedia and elsewhere.
Most vandalism on Wikipedia is reverted in less than five minutes. Let us assume, for the moment, that that statement is true. Does it imply that vandalism is a solved problem for wikipedia? Well, no. Suppose that 99 out of every 100 articles that get vandalized are reverted within 24 hours. Then there is more vandalism in Wikipedia today than there was yesterday. Without knowing the rate that un-corrected vandalism is added to Wikipedia, it is entirely possible that the percentage of vandalized articles is greater today than it was yesterday. The rate at which most vandalism is reverted isn’t the right question to ask, we should be concerned with whether the amount of vandalism is shrinking or growing.
But it gets worse than that. Most vandalism on Wikipedia is reverted in less than five minutes. Is that a meaningful thing to say? In order to know that most vandalism is reverted within minutes, wouldn’t we need to identify all vandalism, at least for a representative sample of Wikipedia articles? At best what we really mean is that most known vandalism is reverted in less than five minutes. Unknown vandalism is, well, unknown.
But wait–there’s more. Most vandalism on Wikipedia is reverted in less than five minutes. Did IBM researchers actually say that? Well, no. As far as I can see, the article to which everyone links seems to have only one paragraph on vandalism, which reads as follows:
“As publicly editable sites, Wikis are vulnerable to vandalism. We’ve examined many pages on Wikipedia that treat controversial topics, and have discovered that most have, in fact, been vandalized at some point in their history. But we’ve also found that vandalism is usually repaired extremely quickly–so quickly that most users will never see its effects. The pictures below tell the story.”
“Visualizing every saved version of the page on “abortion”, with each version getting equal space. The vertical black interruptions indicate times when a visitor has deleted most of the page.”
“Same page on “abortion”, but here horizontal spacing corresponds to time, so that rapid-fire changes show up almost on top of each other. Because vandalism is repaired so quickly, it does not show up in this view of the visualization”
Wait a minute. The IBM tool visualizes (a) the number of lines in the article and (b) who created those lines. It doesn’t give any insight at all into the content of those lines. It seems that they’ve defined “vandalism” as “deleting most of the page”, and that in articles they’ve examined this is usually repaired “extremely quickly”. Wikipedian’s don’t even enumerate “deleting most of the page” on their list of common types of vandalism.
Where’s the “most vandalism” part? Or even the “five minutes” part? What IBM researchers really say is that for the controversial articles they have examined, page-wipes are restored quickly.
It seems that this “IBM researchers found most vandalism on Wikipedia is reverted in less than five minutes” line is a complete myth: IBM researchers didn’t actually make that claim, it’s not a meaninful claim to make, and it doesn’t really tell us anything at all about the volume of vandalism within Wikipedia.

