I was recently at Wikimedia’s HQ in San Francisco to give a short talk about some of the geographic imbalances in Wikipedia. I was telling a now familiar story: one where not only do we not see many contributions from global margins, but we also don’t see many contributions about global margins. Over lunch, our conversation shifted to Wikimedia’s vision statement:
Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.
Asof Bartov then kindly directed me to a legitimate effort by Wikipedia user emijrp to quantify the sum of all human knowledge. The page, for instance, notes that there are an estimated 100,000 species of bacteria, 21,000 known drugs, 6900 human languages, 8000 sports, 97,000,000 songs etc. etc. – but of course a far smaller number of Wikipedia articles about (or even mentions of) all of those things.
I should say here that I believe Wikipedia truly is one of the most wonderful human inventions. The platform, the licences used to govern open content, the global network on which it is built, and, of course the social practices that allow it to be brought into being, have done more to spread information than just about anything else in the history of humanity.
But is ‘the sum of all knowledge’ really the end-point? Not only is the sum of all knowledge not something that we can ever collate into a single platform, but I worry that even attempting to do so can make us lose sight of what might actually be accomplishable.
It isn’t wrong to strive to know the unknown. But if the unknown is so immeasurably large (despite emijrp’s best efforts), and much of human knowledge is necessarily socially or individually contingent, then focusing solely on the percentage of the total amount of universal knowledge that we have been able to capture and codify somewhat loses the plot of what really matters.
Think, for instance, of a city. What element of, or processes and practices in, a city should be represented in a Wikipedia article? The issue is that there is no objective standard of notability here. Do we want an article about every street in the city? Every house in the city? Every brick in the city?
But let’s think about the city a little differently. What about a city in which there are fundamentally different visions of what the city itself is. Compare, for instance, the English, Arabic, and Hebrew versions of the city of Jerusalem: they all present significantly different visions and versions of the very same place. As do the Estonian and Russian versions of the Bronze Statue of Tallinn. And as do countless other objects of interest to people.
My point is that getting closer to the sum of all knowledge is not necessarily the biggest issue. The issue is recognising that knowledge can be contested. Not everything can be supported by a citation. Not everything can be boiled down to the truth. Perhaps what we need to strive for then, is not just a way to get closer to the sum of all knowledge, but also a way of making sure we always recognise the multiplicity and diversity of knowledges. The sum of all knowledge is a laudable goal, but let’s make sure we focus more explicitly on the diversity of knowledge whilst we get there.
The expression “the sum of all human knowledge” is an unfortunate one. It is a given among the community that the aim of the projects currently under way is not so grandiose as a plain reading of the statement would imply. Nonetheless we should eschew future looking statements of impossibility, unless we have thought carefully about them – case in point Google Streetview goes a fair way towards having a significant amount of data on every brick in every building on the planet. It is certainly not unlikely that we will have an article on ever virus that we have identified within a year or so.
The English Wikipedia has some fairly clear criteria for inclusion, information needs to significant to the topic of the article it is in. For a topic to deserve an article it needs to be notable, either it falls into a number of predetermined classes, or it has significant coverage in more than one independent reliable source. The operation of these rules can be seen (not always perfectly, and not always prettily) at “Articles for Deletion” debates.
The question of “who’s truth” is a very serious one. It is also not one that requires multi-language Wikipedias to arise. The article “Bronze Soldier of Tallinn” has eight pages of talk archives, but after the 2007 events it became a “neutral article”. This is one advantage the English Wikipedia (in common with other “large” Wikipedias) has, the much higher number of editors and readers ensure that partisan statements are not so likely to go unchallenged. This is not to say we don’t have issues, where the dominant narrative drives out another. These are not always in the areas that our own dominant cultural narrative would lead us to expect.
As to a multiplicity of knowledges, and things that can’t be cited, the latter are simply not suitable for an encyclopedia, which is a tertiary source. We rely on researchers, anthropologist, ethnographers, folklorists, taxonomists, journalists and all the other myriad primary and secondary sytematisers of knowledge to collect and assess knowledge. For otherwise we could not, for example, prevent the article on Ebola stating that it is a disease that doctors give you, a widely held belief in Africa, or in other articles, that aliens regularly abduct humans (widely believed in the US), that the Queen is secretly a lizard, that there was no Holocaust, that firemen take your blood, that man has not been to the moon, and that the world is flat. All these, and many more, are believed widely, (and most are documented on Wikipedia) but they are not supported by reliable sources.
Deviating from requiring information to be verifiable would undermine Wikipedia as a source of knowledge, and specifically place people in harms way who rely Wikipedia for information in critical circumstances.