EBlogger

November 7, 2005

Nathan Kaiser, fact-checker

Filed under: britannica, wikipedia

There’s a brief interview with Jimmy Wales at nPost.com that has this entertaining little tidbit:

Interviewer: Back to the accuracy of the Wikipedia postings. Because it is much more dynamic than other encyclopedias that are out there, it could be more accurate in some areas.

Wales: That is absolutely true. There are quite a few good examples of that. There is a small scandal going on in Germany. One of the questions on the German version of ‘Who Wants to be a Millionaire’ was wrong. The show had referenced an answer on the German version of Brittanica, which was wrong. It was wrong on Wikipedia as well, but we were able to update it immediately.

Piercing insight aside embedded in the “question” aside (it could be more accurate, it could also be less accurate, for the very same reason), one should point out that (a) Britannica is spelled B-r-i-t-a-n-n-i-c-a–it’s just not hard to get it right and that (b) while there are versions of Britannica in Korean, French, Chinese, Japanese and a large number of print translations, there is no “German version of Britannica”. Wales no doubt was refering to Brockhaus.

Shame on you, Nathan Kaiser, for failing to do the least bit of copy-editing or fact checking.

October 24, 2005

On Vandalism

(See this post for a little bit of context.)

Vandalism is a problem, Wikipedians are quick to assert, but one that is solved by constant vigilance–Wikipedians are watching recent changes “like hawks”.

“Yes, vandalism is common on Wikipedia,” we read in the recent collaboratively edited press release, “but Wikipedia heals quickly.” After all, “IBM researchers found that most vandalism on Wikipedia was reverted in less than five minutes.”

We see this statement frequently repeated at Wikipedia and elsewhere.

Most vandalism on Wikipedia is reverted in less than five minutes. Let us assume, for the moment, that that statement is true. Does it imply that vandalism is a solved problem for wikipedia? Well, no. Suppose that 99 out of every 100 articles that get vandalized are reverted within 24 hours. Then there is more vandalism in Wikipedia today than there was yesterday. Without knowing the rate that un-corrected vandalism is added to Wikipedia, it is entirely possible that the percentage of vandalized articles is greater today than it was yesterday. The rate at which most vandalism is reverted isn’t the right question to ask, we should be concerned with whether the amount of vandalism is shrinking or growing.

But it gets worse than that. Most vandalism on Wikipedia is reverted in less than five minutes. Is that a meaningful thing to say? In order to know that most vandalism is reverted within minutes, wouldn’t we need to identify all vandalism, at least for a representative sample of Wikipedia articles? At best what we really mean is that most known vandalism is reverted in less than five minutes. Unknown vandalism is, well, unknown.

But wait–there’s more. Most vandalism on Wikipedia is reverted in less than five minutes. Did IBM researchers actually say that? Well, no. As far as I can see, the article to which everyone links seems to have only one paragraph on vandalism, which reads as follows:

“As publicly editable sites, Wikis are vulnerable to vandalism. We’ve examined many pages on Wikipedia that treat controversial topics, and have discovered that most have, in fact, been vandalized at some point in their history. But we’ve also found that vandalism is usually repaired extremely quickly–so quickly that most users will never see its effects. The pictures below tell the story.”

The “pictures below” are:

“Visualizing every saved version of the page on “abortion”, with each version getting equal space. The vertical black interruptions indicate times when a visitor has deleted most of the page.”



and

“Same page on “abortion”, but here horizontal spacing corresponds to time, so that rapid-fire changes show up almost on top of each other. Because vandalism is repaired so quickly, it does not show up in this view of the visualization”



Wait a minute. The IBM tool visualizes (a) the number of lines in the article and (b) who created those lines. It doesn’t give any insight at all into the content of those lines. It seems that they’ve defined “vandalism” as “deleting most of the page”, and that in articles they’ve examined this is usually repaired “extremely quickly”. Wikipedian’s don’t even enumerate “deleting most of the page” on their list of common types of vandalism.

Where’s the “most vandalism” part? Or even the “five minutes” part? What IBM researchers really say is that for the controversial articles they have examined, page-wipes are restored quickly.

It seems that this “IBM researchers found most vandalism on Wikipedia is reverted in less than five minutes” line is a complete myth: IBM researchers didn’t actually make that claim, it’s not a meaninful claim to make, and it doesn’t really tell us anything at all about the volume of vandalism within Wikipedia.

October 20, 2005

Wikipedians on Quality

Filed under: web2.0, wikipedia

In a recent post to a Wikipedia mailing list, Wikipedia co-founder Jimmy Wales described Nick Carr’s post on “The amorality of Web 2.0″ (which I, along with much of the blogosphere previously linked to) as “a valid criticism” and agreed that “the two examples [Carr] puts forward are, quite frankly, a horrific embarassment” and “nearly unreadable crap”.

This sparked several uncharacteristicly self-critical responses from Wikipedians:

Although the raw numbers [of editors] are large, the number of articles is even larger, and so there are not enough editors to go around. […] Where are all the subject-matter experts?

We’d like to think that it’s inevitable we’ll asymptotically approach high quality, as Tony defended with [[Eventualism]]. But I think it’s too simplistc.

In my view, wikipedia has to undergo a paradigm change if it really wants to succeed in creating a good encyclopedia. […] We shouldn’t give up the principle of open editing but we should make clear now from the beginning that we seek good writers and knowledgeable people, not anyone. Yes, anyone can edit an article. But not anyone should edit any article.

If Robert Henry [sic] is right (and judging by a number of fine articles now laying in ruins I suspect he is), then WP, should it desire to get finer control on article quality, needs to modify its “completely open” model a little bit.

[Via Andrew Orlowski at the Register]

October 11, 2005

Wikipedia is not Open Source

(See this post for a little bit of context.)

Wikipedia and other “open content” initiatives are often lumped together with “open source” projects.

For instance, a Google search on “wikipedia open source” currently finds over 8 million hits. The expression “open source encyclopedia” currently finds more that 12 million. Wikipedians themselves are fond of drawing a comparision to open source projects, invoking Linus’s Law (also here), citing a benevolent dictator, or comparing the project to Linux or the Apache Web Server.

While the Wikipedia is certainly “open” for editing and is made available under a license derived from one used for open source software, it is managed differently than every every open source project on the planet, at least every one I’m aware of.

In an open source software project, one is free use the software, to obtain and examine the software’s source code, to modify it locally, and with various limitations, to redistribute it in binary or source form. One is encouraged, and in some circumstances required, to make his modifications available for others to use. But there is always someone, or a team of someones, who acts as the maintainer of the software. In the case of the Linux kernel, it was for a long time a single individual, and is now that individual and team of trusted lieutenants. In the case of the Apache Web Server, it is the “Project Management Committee”, a group, in principal, of the most meritorious contributors (who approve new members by unanimous vote). While there are many contributors to each project, and many proposed contributions, there is always someone—a maintainer, a gatekeeper, an authority, an expert, that reviews and approves each contribution.

While I’ve never followed the day-to-day Linux development, I can tell you that at the Apache Software Foundation there is an extensive, formal, and documented process to ensure that every contribution is carefully reviewed. The Foundation is legally accountable for certain types of copyright and patent infringement, and prides itself on the quality of the software it produces. Reviews, and the “web-of-trust” that determines who is qualified to do such a review, are an important part of the Apache development process. Presumably it is not a coincidence that this process produces the most popular web server in the world, and one that is remarkably secure, robust and stable.

The absence of gatekeepers is not a new complaint about Wikipedia. The obvious retort, of course, is that other contributors will review changes after the fact. This is sometimes known as a “commit then review” protocol in open source circles. But open source projects only allow commit-then-review contributions from a trusted few. The Wikipedia review process, by allowing arbitrary commit-then-review contributions, assumes (a) that someone is actually reviewing the contribution, and that (b) that someone is capable of performing an informed review of that contribution. It is possible for both of these assumptions to be correct. It is worth noting, however, thus far at least, these are unproven assumptions.

The presence of errors within the Wikipedia (and let’s be honest, the presence of more errors than virtually any “traditional” encyclopedia)–despite its impressive popularity–makes one wonder just how many eyeballs are needed before all bugs become shallow.

Update [11 Oct 2005 20:03 GMT]:

Based on comments here and elsewhere, I seem to have either riled or confused some folks, so perhaps I wasn’t quite clear. Let me restate the above as follows:

1) When people (including Wikipedia contributors) talk about Wikipedia they often appeal to a comparision to open source, and ascribe aspects/virtues of open source intitiatives to Wikipedia.

2) Wikipedia is organized differently than other “open” projects, in the sense that every open source project (as opposed to open content) maintains a gatekeeper in one form or another, while Wikipedia does not.

3) As a result, some the aspects ascribed to Wikipedia via the comparision in point #1 may not apply. Since (among many differences) they follow a different review process, things that are true about Linux or httpd may not be true about Wikipedia.

In other words, the essence of Wikipedia may be different than that of open source projects. (In fact, the essence of Wikipedia is much more like that of Ward’s Wiki than many would seem to like to admit.)

October 5, 2005

Britannica, Wikipedia and General Reference

I’ve been a contributor to open source projects, an employee of Encyclopædia Britannica, and an observer (and occasional contributor) to Wikipedia for several years now. Over the years, I’ve noticed a number of misconceptions that Wikipedians have about Britannica, that Britannicians have about Wikipedia, and that the public at large seems to have about both and Wikipedia and Britannica, and general reference sources as a category.

Over the coming weeks, I’m going to attempt to address some of these misconceptions in a series of blog posts. As I do, I’ll update this post with links the subsequent entries, so that this post can serve as a sort of index of related entries.

August 17, 2005

Wikipedia from a Library Science point of view

Over at The Speculative Librarian, Joshua Lambert is blogging a detailed analysis of Wikipedia and electronic reference in general from the perspective of William Katz’s Introduction to Reference Works. Good stuff, through and through.

His latest post is a quite detailed analysis and raises a number of insightful points. Among the interesting discussion points:

“Britannica, the most scholarly of all general purpose encyclopedias, is written for laypeople. Editors take the articles written by scholars and try to ‘rephrase specialized thought into common language’ (Katz, 226)[…] My question is if encyclopedias are written for lay people, why can’t lay people write them?”

and

“Katz says that all good encyclopedia will have an index. Well, Wikipedia does not but it has something better which most electronic encyclopedia have. It uses hyperlinks. Hyperlinks are the equivalent of and index except they are embedded within the article itself. There is not need for a separate index if people can link directly to other articles by clicking a hyperlink. […] The search/query entry may also be seen as replacing the index.”

(As Britannica supports hyperlinks and search but makes extensive use of its award winning index electronically, I can say I know quite a few people who would suggest that a search interface is not an adequate replacement for a well organized index.)

August 5, 2005

Wikipedia to tighten editorial rules

Filed under: wikipedia

Just three days after declaring in a guest post on Lawrence Lessig’s blog that “anti-elitism” is not a “serious objection” to Wikipedia, co-founder Jimmy Wales tells the German newspaper Sueddeutsche Zeitung that Wikipedia plans to impose stricter editorial rules to prevent vandalism and that some form of “commission” might required to ensure the integrity of the content.

UPDATE: Jimmy Wales later wrote on Lessig’s blog that the Sueddeutsche Zeitung story got it wrong.

July 27, 2005

Madhava of Sangamagramma

Over at the Olin Reference Blog, Lynn mentions that a student was “looking for books or articles” on the mathematician Madhava, and that various specialized and general reference sources weren’t helpful, including Britannica.

While there’s not a lot of coverage of Madhava the mathematician at Britannica.com, a a quick search yielded a brief mention of Madhava the mathematician in the analytic trigonometry section of the trigonometry article and a suitable internet link at Madhava of Sangamagramma, which includes a number of good references in the bibliography.

Lynn, if your patrons have access to school.eb.com or search.eb.com, they’ll find similiar information there.

Also, Wikipedia currently has about a paragraph on Madhava and points to the same internet link that Britannica does. The Wikipedia text on Madhava seems largely cribbed from that article.

Get free blog up and running in minutes with Blogsome | Theme designs available here