Wikipedia is not Open Source
(See this post for a little bit of context.)
Wikipedia and other “open content” initiatives are often lumped together with “open source” projects.
For instance, a Google search on “wikipedia open source” currently finds over 8 million hits. The expression “open source encyclopedia” currently finds more that 12 million. Wikipedians themselves are fond of drawing a comparision to open source projects, invoking Linus’s Law (also here), citing a benevolent dictator, or comparing the project to Linux or the Apache Web Server.
While the Wikipedia is certainly “open” for editing and is made available under a license derived from one used for open source software, it is managed differently than every every open source project on the planet, at least every one I’m aware of.
In an open source software project, one is free use the software, to obtain and examine the software’s source code, to modify it locally, and with various limitations, to redistribute it in binary or source form. One is encouraged, and in some circumstances required, to make his modifications available for others to use. But there is always someone, or a team of someones, who acts as the maintainer of the software. In the case of the Linux kernel, it was for a long time a single individual, and is now that individual and team of trusted lieutenants. In the case of the Apache Web Server, it is the “Project Management Committee”, a group, in principal, of the most meritorious contributors (who approve new members by unanimous vote). While there are many contributors to each project, and many proposed contributions, there is always someone—a maintainer, a gatekeeper, an authority, an expert, that reviews and approves each contribution.
While I’ve never followed the day-to-day Linux development, I can tell you that at the Apache Software Foundation there is an extensive, formal, and documented process to ensure that every contribution is carefully reviewed. The Foundation is legally accountable for certain types of copyright and patent infringement, and prides itself on the quality of the software it produces. Reviews, and the “web-of-trust” that determines who is qualified to do such a review, are an important part of the Apache development process. Presumably it is not a coincidence that this process produces the most popular web server in the world, and one that is remarkably secure, robust and stable.
The absence of gatekeepers is not a new complaint about Wikipedia. The obvious retort, of course, is that other contributors will review changes after the fact. This is sometimes known as a “commit then review” protocol in open source circles. But open source projects only allow commit-then-review contributions from a trusted few. The Wikipedia review process, by allowing arbitrary commit-then-review contributions, assumes (a) that someone is actually reviewing the contribution, and that (b) that someone is capable of performing an informed review of that contribution. It is possible for both of these assumptions to be correct. It is worth noting, however, thus far at least, these are unproven assumptions.
The presence of errors within the Wikipedia (and let’s be honest, the presence of more errors than virtually any “traditional” encyclopedia)–despite its impressive popularity–makes one wonder just how many eyeballs are needed before all bugs become shallow.
Update [11 Oct 2005 20:03 GMT]:
Based on comments here and elsewhere, I seem to have either riled or confused some folks, so perhaps I wasn’t quite clear. Let me restate the above as follows:
1) When people (including Wikipedia contributors) talk about Wikipedia they often appeal to a comparision to open source, and ascribe aspects/virtues of open source intitiatives to Wikipedia.
2) Wikipedia is organized differently than other “open” projects, in the sense that every open source project (as opposed to open content) maintains a gatekeeper in one form or another, while Wikipedia does not.
3) As a result, some the aspects ascribed to Wikipedia via the comparision in point #1 may not apply. Since (among many differences) they follow a different review process, things that are true about Linux or httpd may not be true about Wikipedia.
In other words, the essence of Wikipedia may be different than that of open source projects. (In fact, the essence of Wikipedia is much more like that of Ward’s Wiki than many would seem to like to admit.)
Britannica isn’t
(disclaimer: I’m not an English native speaker so every sentence here might be plain wrong. If in doubt, ask for clearification)
A few days ago, the EBlogger announced a series of articles about “misconceptions” reating to Wikipedia…
Trackback by Work in Progress — October 11, 2005 @ 11:11 am
What does your premise of “Wikipedia is not open source” have to do with your conclusion that “[…] despite its impressive popularity-makes one wonder just how many eyeballs are needed before all bugs become shallow.”?
The software that runs Wikipedia is licensed under the GPL and the data is licensed under the GFDL.
Somewhere hidden in there was an argument that Wikipedia is not open source principally because it is more open source than other projects. License-wise this is not true, and it’s hard to fathom how, when someone is more of something, it makes them less of it. And there is a benevolent dictator.
Comment by Brian — October 11, 2005 @ 12:38 pm
Hi EBlogger,
First rule of the attention economy: Start with something that draws anybody’s attention, such as “George W. Bush it not a president” or “Wikipedia is not open source” and then wait for the replies.
Was this posting just an (fruitful) attempt to lower any kind of expectation about the level of this debate? Anyway, I did some comments over in my weblog and trackbacked you. Hopefully it will pass your manual filter.
Was your “disappearing url”-bug in the comment section fixed?
Cheers,
Mathias
Comment by Mahtias Schindler — October 11, 2005 @ 12:50 pm
From the Brittanica article about open-source - at least what I can access without paying for it ;) - “computer software whose source code is put into the public domain, subject to the restriction that any improvements or derived software also include the source code and be put into the public domain.” AFAIK (and I freely admit I am no expert) the only requirement of open source is that it is under an open source license and can be freely modified, both of which obviously apply to Wikipedia. Having formal reviewing and gatekeepers is nothing to do with open source, they are merely means to an end that many open source projects make use of. Just because Wikipedia is more open than typical open source projects, that doesn’t stop it being one.
Comment by the wub — October 11, 2005 @ 2:31 pm
Brian,
I think you misunderstand me. My argument boils down as follows:
1) Many folks like to ascibe certain virtues of open source projects like linux to wikipedia.
2) Among several, a salient difference between wikipedia and linux is that their review process is different. Linux has gatekeepers whose role is to prevent bad edits from making into the product. Wikipedia does not.
3) Therefore, it is possible that certain virtues of open source ascribed in #1 may not actually apply to wikipedia.
Statements #1 and #2 are observations. Statement #3 is my conclusion.
I didn’t make some sort of strawman argument like “wikipedia is less open because it is more open”. What I argue is that open source projects are different, by construction, than open content ones, like Wikipedia, as it is currently run.
Comment by eblogger — October 11, 2005 @ 7:11 pm
Statement 1 is actually missing any reliable background. Simply googleing for some keywords does not help you (even if you chose a more plausible combination of keywords). Again: Who and How many is “Many folks”.
Statement 2: Is this your own observation? If so, how do you explain the concept of watchlists and topic related wikiprojects. Is “gatekeeper” only somewhone who is doing mandatory checks before committing code?
3) What language does not see any difference between “is not” and “may not”? :)
Comment by Mahtias Schindler — October 11, 2005 @ 8:09 pm
> Again: Who and How many is “Many folks”.
Mahtias, whether or not the google queries are indicative of anything at all, you’re not actually suggesting that wikipedians don’t draw parallels between wikipedia and open source, are you? Of course they do. Virtually everyone does. It would be foolish not to.
Comment by eblogger — October 11, 2005 @ 10:03 pm
> Statement 2: Is this your own observation?
Well, yes, but one that is difficult to refute. Clearly Wikipedia follows a commit-then-review process, and I have observed that at times the “review” part is either skipped altogther or dramatically shortchanged. How else could obviously false or inappropriate information remain in the text in such a relatively high proportion? The number of mistakes/false statements, etc. exceed simple human error. Clearly not every edit is reviewed, or at least competently reviewed.
Comment by eblogger — October 11, 2005 @ 10:07 pm
> 3) What language does not see any difference between “is not” and “may not”? :)
The same ones that don’t see any difference between “may be” and “is”. ;)
Comment by eblogger — October 11, 2005 @ 10:09 pm
Quoting you in posting #7: “draw parallels between wikipedia and open source”.
Does “drawing parallels between EB and books” make any sense to you?
Comment by Mathias Schindler — October 12, 2005 @ 12:45 am
Eblogger, you need a history lesson. Dissociating Wikipedia from open-source is ludicrous, hand-wavy, and putting me to sleep.
Comment by Brian — October 12, 2005 @ 2:38 am
Brian, again you miss my point, I’m beginning to think willfully so. I’m distinguishing the review process for “open content” as practiced by wikipedia, and the review process for”open source”, as practiced by, well, every open source project, and as your link notes, some open content ones as well.
Comment by eblogger — October 12, 2005 @ 2:40 pm
eblogger: I think we are moving in circles. Let me ask you this way:
Do you think that there is a difference between “open source” and “open source software”?
If the answer is yes, what is this difference?
Comment by Mathias Schindler — October 12, 2005 @ 6:38 pm
Mathias, we are definitely going around in circles.
Comment by eblogger — October 12, 2005 @ 8:51 pm
Then where is this question I asked in #13 already answered?
Comment by Mathias Schindler — October 12, 2005 @ 9:51 pm
Mathias,
Since I believe you’re truly being genuine here, I’ll try once more to state this as clearly as possible.
When people talk about open source projects like Linux and Apache, they are talking about the license by which the software is available, yes, but they are also talking about more than that. By “open source”, they mean the philosophy behind it, the process by which it is developed, and the emergent effects (like Linus’s law) that are as much the result of the philosophy and process as they are a result of the license.
When I write “Wikipedia is not open source”, clearly I don’t mean that the GNU FDL is so unlike the GPL or the APL to constitute a different beast. (Indeed, I specifically stated the opposite of that.) Largely, I don’t mean that the philosophy behind the Wikipedia is so unlike the one behind Linux or the Apache web server to constitute a different beast. (I specifically stated the opposite of that as well.) I do mean that the process by which Wikipedia is developed, or more specifically, the process by which Wikipedia is reviewed, is so unlike the process used to develop and review Linux, Apache and every other open source project in the world, that it may very well constitute a different beast—so different that the emergent effects we ascribe to other “open” projects may not apply to Wikipedia.
In other words, Wikipedia is not “open source”, not because its license and not because of its philosophy, but because of its review process. I’m really not foolish enough to be believe that the review process formally defines the phrase “open source”. I am, however, pragmatic enough to recognize that when one says “open source”, they are generally talking about more than a license.
You don’t have to agree with me, but I don’t think I know any other ways to say this.
Comment by eblogger — October 12, 2005 @ 11:31 pm
Eblogger: I really don’t get your point. “Open source” has absolutely nothing to do with the content posted on the web site in question. phpBB, for example, has a series of forums about the site. But nobody at phpBB “carefully reviews” every post to verify their accuracy.
In any event, Wikipedia does contain errors. But no web site, encyclopedia, or other resource is error-free. And when I find an error on Wikipedia, I can fix it within seconds. When I find an error on Brittanica, there’s little that I can do, save for writing the editor and hoping that he fixes the article.
In many cases, Wikipedia has a better, more in-depth article than I might find from Brittanica on most articles that I’ve looked at. Wikipedia’s article on Windows XP is a well-written article about the operating system that a good portion of the world’s computers use. I’d be interested to see if Brittanica has anything on it. Sure, Wikipedia has its drawbacks. But so does Brittanica. And I trust Wikipedia as much as I would Brittanica, knowing that thousands of users like myself are constantly looking through articles for vandalism and inaccuracies.
Comment by Ral315 — October 13, 2005 @ 4:10 am
Your statement in #16 was much more precise and omitted more than ever a situation in where you relied on the beliefs of a third person to strenghten you theory.
After all, I can collect a series of sentencens from your postings and replies you should be able to agree to:
# This posting is not about open source, it’s about development of text (as in source code or content).
# Apache, Linux and the OSS projects you are aware of seem to be using different techniques to maintain the source code.
# Effects that arise from a certain way of developing text/code may not arise if you use a different or another way of text/code.
You know, a lot of companies fell into the illusion of a giant workforce that can be used in the own plan for free if you simply release the source code under a free license. That’s how Mozilla and OpenOffice.org started. For years, Mozilla development was sluggish. Even more, for a very long time, most of the OOo developers were simply sun employees (I wonder how the ratio has changed since then). Maybe they fell victim to the same misconception you did, that there are some pseudoDarwinian and pseudomiracolous powers behind this “open source” thing that turn water into wine and wood into gold, simply if you call it open or free.
Open source projects that are no spin-off from a commercial product but from a group of intelligent people working in their free time heavily rely on certain factors that are no requirements in the license, such as a positive social climate or a project leader who is able to inspire people more than the superior in your for-money-job does.
I see that your original entry in this blog is still work in progress. A little change here and there and some annexes .o( claim-then-review-protocoll anyone? ) and this posting might gain more plausibility.
While you cite ESR without attribution, I deeply miss any mention of a Bazaar in your posting.
Comment by Mathias Schindler — October 13, 2005 @ 6:54 am
Source code has a review process because broken code doesn’t run and because malicious code is immediately dangerous. But if you wanted to collect as much code as possible, and make it open source, you would just accept code from anyone. If you want a review process, read up on Wikipedia 1.0. If you’re just stuck in the 90s, realize that things change, and this is the third millenium.
Comment by Brian — October 13, 2005 @ 5:06 pm
I think I understand your basic argument, although clearly the term “open source” is malleable enough that it can be defined to include Wikipedia. Your interpretation is a plausible contrarian argument, but since a definitive answer is not possible, the debate could go on endlessly.
Anyway, in trying to clarify your point, you now add, “the essence of Wikipedia is much more like that of Ward’s Wiki than many would seem to like to admit.” I’m puzzled by the implied allegation that we’re trying to deny or downplay this heritage, nor do I see why it would be something to be embarrassed about.
Discussions and media coverage of Wikipedia often provide historical background, including Ward’s Wiki and the origins of the wiki concept, so if this is some dark mystery, it’s being hidden in plain sight. While Ward’s Wiki and Wikipedia now have important differences in wiki terms, obviously some similarities remain. However, since Wikipedia is for most people their first introduction to wikis, additional reference points are usually necessary. These naturally tend toward open source software and principles like Linus’s law. While the concepts may not translate exactly, I think they’re still useful in understanding Wikipedia.
Comment by Michael Snow — October 13, 2005 @ 6:24 pm
I notice that Andrew Orlowski echoes this theme at the Register: Why Wikipedia isn’t like Linux.
Comment by eblogger — October 27, 2005 @ 11:21 pm