Wikipedia doubling time

The English language version of Wikipedia had its one-millionth article on 8 March, and has just passed 1.1 million, 50 days later. That gives an implied doubling time of about a year. The doubling time seems to be fairly stable, since the 500 000 mark was reached in March 2005, and 250 000 in April 2004.

A straightforward extrapolation gives a billion articles in 2016. I’ll open this up for comments now, then give my own thoughts (taking advantage of yours, naturally).

Update over the fold

Lots of fun in the comments, here and at CT, and now I’ll try to be at least semiserious. If a trend can’t be sustained it won’t be. Or, if you prefer, exponential curve eventually become logistic. So, the real point is to work out the constraints on Wikipedia’s growth, and take a guess at what the endpoint of this process will look like.

It’s a pretty safe bet that the current phase of rapid growth will take Wikipedia to 10 million articles, if not within the 3.5 years implied by the recent exponential trend. There’s no shortage of topics, plenty of room for growth in the number of contributors, and no obvious problem handling such an expansion within the current software and scheme of social organization. The result would be a general reference system that would be better for that purpose than anything that’s been seen previously. Among other things, such a system would replace Google for many purposes (though not the ones that make most of Google’s money).

Going beyond that to 100 million articles would imply some radical changes. As far as content is concerned, something on this scale would compete across the board with specialist reference works like national dictionaries of biography, the Palgrave dictonary of economics and so on. An obvious way to approach this goal would be to subsume a number of existing projects, as was done with the 1911 Britannica, and the gazetteer entries on US towns. But that would certainly require both new organizational and licensing arrangements, and probably a more complex architecture than would exist at present. More importantly, it’s hard to see something on this scale functioning without a substantial number of full-time paid staff. On this scale, Wikipedia coverage of current events would also be directly competitive with the mainstream media.

Another order of ten multiplication and Wikipedia would be comparable in size with the Internet as a whole (the visible Internet currently has about 10 billion articles). As Joel Turnipseed pointed out in comments, the obvious analogy is with Borges’ map, which was on the same scale as the country it described. Extrapolations to this point and beyond are fun, but probably best left to science fiction.

My best guess is that sometime around 10 million articles, the growth rate will slow, eventually becoming linear. At this point, either some other project will take off, or Wikipedia itself will transform into something radically different.

22 thoughts on “Wikipedia doubling time

  1. Wikipedia has its obvious flaws, but that shouldn’t make anyone lose sight of its sheer audacious genius. Of course if you really want an authoritative, defnitive answer to somethign specific it can’t be relied on, but then, what else can? If you want to quickly get a few facts about virtually anything in the world, it’s just brilliant.

    Many of wikipedias critics dont appear to understand the meaning of the word collaborative and user editable. Rather than bagging it’s inaccuracies, why don’t they fix them? There may have to be additional solutions put in place at some stage, a greater degree of authoritarianism, but on contentious subjects every intelligent reader must know that there are multiple truths.

    As for the doubling time, I’m sure this wont stay linear forever, it will have to slow down at some stage. But even so, it wont be long before just about everything gets some sort of coverage.

  2. Wikipedea is an interesting experiment. It surely makes the pipe-dream of commercialising information and knowledge a lot harder to pursue.

    I assume the growth rates mentioned by JQ refer to the English version. There are Wikipeada in other languages. Comparing entries on a topic in different languages can be interesting, too.

    The editing of inaccuracies is a risky job in so far as it may be time consuming and a waste of time because any ‘correction’ is erasable.

    It might be useful to have a date on the entries so that the history of the development of an ‘article’ can be traced. Or, alternatively, the most recent edition could be supplemented by a record of its development.

    Wikipedea doesn’t lend itself as a reference because the content evolves. (I found the same problem in an organisation which adopted intranet without working out first a protocol for archiving the documentation. It is an easy way of generating asymmetric information and confusion.)

    Such are my initial reactions to Wikipedea.

  3. I don’t think most of the critics of Wikipedia are genuinely concerned about it’s errors. Most criticisms seem to eminate from people who want the truth to fit their views and get angry when inconvenient facts are pointed out. Prior to Wikipedia they could usually point to a website upholding their views, however wrong. Now that there is an easily accessible and relatively reliable source they don’t like it, so the response is to sneer, including the increasing charge that it shows a left-wing bias – inevitabley true in some cases, but not applicable to most of those million articles.

  4. There’s plenty of opportunity for continued doubling for a while, considering the notability discussion. My own impression is: lots of fields haven’t had their academic experts in to overhaul their area yet (astronomy on Wikipedia is excellent, I believe language is reasonable, maths is good, but I’m told that biology is dreadful); and lots of major if not world-ending news events prior to Wikipedia’s foundation are not covered. Have a look at the new articles about Australia for a sense of the scope for new articles.

    That said, the rate of new articles being added would have to slow at some point. There are currently just over a million registered users: there are almost as many articles as users. (Obviously, a large proportion, almost certainly a majority, are inactive, and conversely some unregistered users are quite active, but that’s a rough guide I guess.) As the growth of new users, or more properly active users, slows, I imagine the number of articles would tend towards increasing linearly rather than doubling. I would be surprised if it started growing more slowly than that unless Wikipedia starts failing, because of the continual supply of notable new events and people that the world supplies us.

  5. Business and finance need some considerable work too. Proud to say I am doing my bit.
    As for using it – I generally use it as a primary reference for an area I am unfamiliar with or where I am simply curious. It cannot be cited as a definitive reference on anything, but it has been a long time (I think) since anyone used any of the encyclopedia as a sole reference on anything academic.
    Essentially, it is what it calls itself – an encyclopedia. Don’t expect it to be definitive on anything, but it is useful on (just about) everything.
    The South Park episode analyses (they are all there) is particularly funny.

  6. Robert, I am aware of a ‘last modified date’. However I can’t figure out how one can retrieve a past version. Without the retrieval function, referencing by ‘last modified date’ does not help. It seems to me it would be interesting to observe how articles evolve.

  7. As per Mary’s comments, the article count is unlikely to keep doubling until 2016, because we’re going to start running out of English-speaking authors well before then. My guess is that growth will go subexponential sometime in the next few years.

    In my opinion mediawiki is the single greatest application of the internet – not just for wikipedia but for internal company use. We use it to document everything from project management to time sheets to you-name-it.

  8. Mary, thank you very much for this useful bit of information. Its all there. Its nice.

  9. StephenL: Plenty of leftist writers have left Wikipedia because of perceived “right-wing” bias – a quick google will find all sorts of people complaining about all sorts of perceived bias at wikipedia.

    According to his own wikipedia page, Jimmy Wales, the founder of wikipedia is an objectivist…

    The reality is that left and rightwing tribalists are always sniffing for bias or triumphantly stating that “the other side fears the truth”.

  10. High growth rates with a doubling time inside 12 months and few significant limits in sight suggest that this is a disruptive technology. In other words it may be lower quality but it is also very much lower cost (ie it is free).

    If anybody was looking when the Internet was growing in the 1980s then maybe they should have seen it coming. I used to use the Internet at Uni in the early 1990s and I remember dismissing it as just a big private network for universities.

    If anybody was looking at credit cards in the 1960s they would have seen massive levels of growth. However because it was growth from a low base it did not really hit mainstream consciousness (and mass market appeal) for several decades.

    I harp on about e-gold because it’s user base at least doubles every 24 months. And as a payment system the more users it has the more benefit it has from network effects. So it should snow ball.

  11. John,

    That’s interesting. I hadn’t encountered anyone who criticised wikipedia as a whole as too right-wing, although I did at one stage encounter a particular article which was heavily biased to the right (should have edited it, but didn’t have enough confidence in the topic).

    I imagine that in the past something like this would have been attacked at least as much from the left, but in the context of the “Republican war on science”, which is in a sense a war on evidence and facts, I expect that most of the attacks will come from the right, at least for the moment.

  12. Certainly I think Wikipedia is a very hostile environment for the kind of factoid-based talking points that dominate rightwing discussion today. The process of debate makes it quite hard to sustain unsourced claims, and the NPOV norm makes it difficult to use the kind of rhetorical bombast in which rightists trade.

    Of course, this kind of thing isn’t confined to the right. Not that long ago, it was probably more common on the left, and it’s still present in some circles, so I wouldn’t be surprised to find some people attacking Wikipedia as too rightwing.

  13. JQ, if by “right” you mean the american right then I agree with you. In the context of Australian politics, my experience of (non-blogosphere) debate is that argument by factoid is largely a “leftist” trait.

    Perhaps that’s because I visit universities from time to time…

  14. Someone told me the other day that his “left-wing” friends describe the Sydney Morning Herald as having a “right-wing” bias whilst his “right-wing” friends describe it as having a “left-wing” bias.

    The world does seem somewhat trapped by the sometimes arbitrary and occassionally imagined dichotomy of “left” and “right”.

  15. While the numbers grab attention, I’ve noted an nicely increasing number citations in the last year, IMHO a direct result of robust Wikipedia criticisms. As more other works come online, this should increase verifiability (but not accuracy!) far above traditional encyclopedias.

    More impressively, and less noted, the increasing use of categories and organization is adding a whole new level of information. Consider for one impressive example.

    Finally, I love the Featured Pictures : I spend so much of my time at my computer reading text. I even bought some A3 photographs off one Wikipedia photographer based in Victoria.

    With our modern concept of ideas & other intangibles as property, it’s a shock that people would produce them for no payment other than, well, intangibles…Rusty.

  16. I agree, Rusty. Categorisation is a really big deal, though there are still plenty of problems to be worked out. I’m working on the economics categories at the moment, using the JEL codes as a basis.

Comments are closed.