On Vandalism
(See this post for a little bit of context.)
Vandalism is a problem, Wikipedians are quick to assert, but one that is solved by constant vigilance–Wikipedians are watching recent changes “like hawks”.
“Yes, vandalism is common on Wikipedia,” we read in the recent collaboratively edited press release, “but Wikipedia heals quickly.” After all, “IBM researchers found that most vandalism on Wikipedia was reverted in less than five minutes.”
We see this statement frequently repeated at Wikipedia and elsewhere.
Most vandalism on Wikipedia is reverted in less than five minutes. Let us assume, for the moment, that that statement is true. Does it imply that vandalism is a solved problem for wikipedia? Well, no. Suppose that 99 out of every 100 articles that get vandalized are reverted within 24 hours. Then there is more vandalism in Wikipedia today than there was yesterday. Without knowing the rate that un-corrected vandalism is added to Wikipedia, it is entirely possible that the percentage of vandalized articles is greater today than it was yesterday. The rate at which most vandalism is reverted isn’t the right question to ask, we should be concerned with whether the amount of vandalism is shrinking or growing.
But it gets worse than that. Most vandalism on Wikipedia is reverted in less than five minutes. Is that a meaningful thing to say? In order to know that most vandalism is reverted within minutes, wouldn’t we need to identify all vandalism, at least for a representative sample of Wikipedia articles? At best what we really mean is that most known vandalism is reverted in less than five minutes. Unknown vandalism is, well, unknown.
But wait–there’s more. Most vandalism on Wikipedia is reverted in less than five minutes. Did IBM researchers actually say that? Well, no. As far as I can see, the article to which everyone links seems to have only one paragraph on vandalism, which reads as follows:
“As publicly editable sites, Wikis are vulnerable to vandalism. We’ve examined many pages on Wikipedia that treat controversial topics, and have discovered that most have, in fact, been vandalized at some point in their history. But we’ve also found that vandalism is usually repaired extremely quickly–so quickly that most users will never see its effects. The pictures below tell the story.”
“Visualizing every saved version of the page on “abortion”, with each version getting equal space. The vertical black interruptions indicate times when a visitor has deleted most of the page.”
“Same page on “abortion”, but here horizontal spacing corresponds to time, so that rapid-fire changes show up almost on top of each other. Because vandalism is repaired so quickly, it does not show up in this view of the visualization”
Wait a minute. The IBM tool visualizes (a) the number of lines in the article and (b) who created those lines. It doesn’t give any insight at all into the content of those lines. It seems that they’ve defined “vandalism” as “deleting most of the page”, and that in articles they’ve examined this is usually repaired “extremely quickly”. Wikipedian’s don’t even enumerate “deleting most of the page” on their list of common types of vandalism.
Where’s the “most vandalism” part? Or even the “five minutes” part? What IBM researchers really say is that for the controversial articles they have examined, page-wipes are restored quickly.
It seems that this “IBM researchers found most vandalism on Wikipedia is reverted in less than five minutes” line is a complete myth: IBM researchers didn’t actually make that claim, it’s not a meaninful claim to make, and it doesn’t really tell us anything at all about the volume of vandalism within Wikipedia.


Is this your own insight or did you simply read it from one of the relevant postings on wikien-l? Cite your sources :)
Comment by Mathias Schindler — October 24, 2005 @ 6:56 am
Sigh. Now you’re just being petty. The words and ideas above are my own. If you’d like to suggest otherwise, by all means provide links. Note that Joi Ito gets the description of the IBM research right, but not the headline. Otherwise I’m not aware of anyone else identifying the fallacy of this statement, although it’s screamingly obvious to anyone who’s bothered to think about what it would mean to demonstrate anything about “most vandalism”, or bothered to actually read the IBM report.
Comment by eblogger — October 24, 2005 @ 7:52 am
Apart from the IBM site, there are two papers I am aware of by the producers of history flow.
One can be found at http://web.media.mit.edu/~fviegas/papers/history_flow.pdf
It is slightly different to the IBM web site you have linked to.
You might want to grep for “The site is subject to frequent vandalism and inaccuracy, just as skeptics might suspect—but the active Wikipedia community rapidly and
effectively repairs most damage. Indeed, one type of malicious edit we examined is typically repaired within two minutes. ”
You might want to update your posting including a line with “I am still right since the paper is not talking about all kinds of vandalism” or similar.
The paper itself you were not referring to until now is well linked from wikipedia on several places.
So far, it seems that you didn’t do at least a minimal amount of research. Discussing the IBM paper would reveal substancial differences between most “IBM researchers found out”-statements mostly in non-wikimedia owned newspapers and the actual context of their sentences. Missing that chance is telling more about this series of misconceptions than it is telling about the amount and severity of vandalism.
Comment by Mathias Schindler — October 24, 2005 @ 9:32 pm
Mathias,
While the PDF is more detailed (thanks for the link), as you note, it is not substantively different from the previously linked write-up, in that only “mass deletions”–an obvious and relatively insignficant form of vandalism–are considered. That link adds more detail, but not really any new perspective. I’m not sure how that contradicts anything I wrote.
You mentioned “two papers” “apart from the IBM site”, but you only linked to one. Was there a second link you had in mind?
Also, regarding “statements mostly in non-wikimedia owned newspapers”, four of my eleven examples were from Wikipedia itself, not least of which the heavily scrutinized http://en.wikipedia.org/wiki/WP:ITAAW.
Comment by eblogger — October 24, 2005 @ 9:45 pm
So we are back in the research mode, something that should have been done before writing about misconceptions :)
Comment by Mathias Schindler — October 25, 2005 @ 1:16 am
I don’t follow you. Is there anything you’re actually disputing about what I wrote?
Comment by eblogger — October 25, 2005 @ 3:10 am