The idea to write this blog post on Democracy arose out of the need to describe what it is in context of Brexit. For more on the Brexit referendum itself see this. In this post I am trying to distill my own understanding of Democracy and have included the results of a numerical experiment I ran to quantify some ideas around the concept.
Democracy is essentially an algorithm to correct political error. In that respect Democracy belongs to a special class of algorithms, with Darwinian evolution, scientific peer review or machine learning being other notable members of the same class. The kinship between these disparate and very fundamental processes is not coincidental. It is explained by Popperian epistemology, which makes the existence and mitigation of error central to the idea of any knowledge generation.
Any discussion of the process of knowledge creation may seem like a digression at this point. However, please persevere for the next three paragraphs as setting this context is important for the central thesis on Democracy. According to Popper, knowledge itself can be understood as explanations, i.e. guesses or conjectures with two major criteria for goodness: falsifiability and parsimony. Any knowledge creator (sentient or otherwise) must therefore create knowledge in exactly this manner: creatively produce guesses or conjectures (including even, what look like, wild ones) and criticise them to remove those that are erroneous. Two immediate corollaries of this theory arise: a) existence of error is a permanent feature of any form of knowledge. Claims of knowledge that are perfect (e.g. a manual revealed by so-called prophets) are therefore, for want of a better word, baloney. And b) boundless knowledge-generation must require the ability or enabling culture to air seemingly wild guesses and criticise even ostensibly unimpeachable maxims.
The above meta-theory explains why Darwinian evolution works at all – because mutation takes the role of, as it were, guesswork and natural selection acts as the trenchant critic of those guesses, inexorably optimizing on some measure of fitness to local environment. The actual error-correction itself happens at the level of the DNA molecule, which is where the knowledge created by selection pressure is stored for all life on Earth. The same is true of the growth of scientific (or mathematical) knowledge – a result of human creativity generating the conjectures and peer-review providing the criticism. The same predictor-corrector epistemology has been formalised in various successful computational techniques and algorithms, of which actor-critic reinforcement learning (AI) algorithms are the latest powerful example. In short, Popper compliance is expected of any type of knowledge creator worth its salt.
Parsimony is an equally important attribute of good explanations as falsifiability. Parsimony in this sense does not imply mere simplicity, though it is that too. It implies the quality of being just so and no other way, like an edifice of cards where every individual card is optimally load-bearing right at the tipping point. Any tiny variation or change in structure and the edifice comes crashing down. Great explanations are exactly delicately poised in this manner. Here’s one of my favourite thinkers, the inimitable David Deutsch, explaining the idea of good explanations:
Democracy is another avatar of the same underlying idea applied to politics. It may superficially look as if “Democracy” is the answer to the age-old Platonic question: who should rule? as if the collective will of people is somehow sensible/rational guidance (it almost never is due to low signal:noise ratio). The label “Democracy” (Gk. people rule) itself helps to reinforce the confusion further. Nonetheless, equating Democracy with literal rule-of-the-people is a completely mistaken and often dangerous (cf. populism) assumption. Democracy works not because popular opinions are better than those of rulers, but because convincing some people of (real/imagined) shortcomings of the ruler is easy.
Human beings are innately risk-averse and familiarity seeking agents. It takes very long for humans to agree what is objectively good for them, even when that good should be obvious (herein lies the root of all tyranny!). Yet in light of the epistemology described above, an “objectively good” political idea has no meaning if the bad ones weren’t tried out and discarded. We have cultural concepts of academic freedom or the freedom of speech (at least functioning in some Western societies) to generate all kinds of good/bad ideas that are then open to scrutiny and review – where both the idea generator and the reviewer agree to abjure violence when playing the actor-critic game. Democracy is the same game to try out political ideas and consign bad ones to the dustbin of history without violence. Note that the act of consigning bad ideas to some “dustbin” is not by fiat. So, binned politics can (and do) get refurbished and replayed. Nonetheless our priors about their badness are updated and their efficacy grows less with each replay (e.g. German or East European Neo-Nazism is a mere shadow of NSDAP’s politics). Also in a functioning democracy, no single person/group actually sits in judgement on what constitutes objectively good or bad politics – though some people (say populist ideologues or utopia-seekers) may think they do. The system on the whole is rigged to be more sensible than the sum of its parts.
In well-designed democratic systems (more on this later) one does not even need to convince the entire population but only a fraction of it (the swing voters), and nor is good reasoning required to convince them. Emotional appeals work just as well. Such a system ensures that any leader, including the worst one (which is what really concerns us) is susceptible to swings of opinion for rational and purely emotional reasons. The better any voting system translates that swing into gain/loss of power, the better it is for hedging against downside political risk. Therefore reducing information asymmetry makes for better democracies, no matter how noisy, chaotic or opinionated people get.
Finally, while an educated and informed electorate is useful for making better guesses of who would make an able leader, a well-designed democratic system is not beholden to that. It is idiot-proofed. In many cases, useful idiots imperil their own existence to save a system that was never designed with them in mind, in much the same way that living organisms act as machinery for their genes. A recent paragon of this “useful idiocy” is the UKIP (United Kingdom Independence Party), the largest party to represent UK in the European Parliament (with 20 MEPs voted for via Proportional Representation). Yet the same party had exactly zero MPs in the UK Parliament, because of the First Past the Post system ensured they could not convert their strong but diffuse electoral appeal into seats. Nonetheless, the Kippers (a fond pejorative for UKIP sympathisers and apologists) strongly lobbied for Brexit, in effect cutting the very branch they sat on and obviating themselves for evermore, in service of an emotive idea of exceptionalism of the British political system.
In the spirit of putting ideas to the test, I created a simulation in Python of a political electoral system in a (hypothetical) country with 500 (this is a model input) constituencies. The toy model is based on the following quasi-realistic assumptions:
- The political spectrum/opinion can be represented on an axis, with normally distributed weights/frequencies corresponding to different intervals. So, say, -0.5 to 0.5 as Centrist, 0.5 to 1.5 as Centre-Right and 1.5 and beyond as Right and symmetrically negative intervals for the Left.
- Each constituency is divided into a section of voters that are ideologically core voters, i.e. not swayed by political headwinds and those that change opinions based on political climate, i.e. swing voters. The core opinions are distributed normally across the country on the whole, yet the number of core voters varies randomly (again normally distributed) across constituencies. E.g. some constituencies are traditionally swing constituencies and others are core. The mean percentage of swing voters is a model input (set at 25% based on UK’s example).
- The swing in political opinion is modelled as a mean-reverting process (cf. Ornstein-Uhlenbeck process) which is a mathematical representation of a random quantity that has the property of reverting to its long-term average over some time-scale (another model input). Here the long-term mean of the swing is 0, i.e. if we wait long enough swing voters concur with core voters eventhough they may drift away in the short-term. The time-scale is generational, i.e. around 7 election time-periods.
- The swing happens similarly across all constituencies, i.e. all constituencies are causally aware of each other (no information asymmetry across constituencies). The swings from election-to-election can be very large or rather tame – a feature controlled by yet another volatility input to the model.
- I assume that the larger the swing, the more concentrated voting is around that opinion. This feature is a simplistic way to represent herd-mentality in swing voters especially when political swings are extremal. So, a swing of zero is as diffuse as core-voter political distribution (in pt 1), but a swing of 2.0 (well into Right-wing territory) implies that swing voters across constituencies will tend to vote right-wing. In mathematical terms, the variance of the swing voter distribution is narrower the larger the swing magnitude.
- Finally, the model is agnostic to actual truth values (assuming such an evaluation is possible at all) of political claims by Leftists or Right-wingers or, for that matter, Centrists. E.g. political centre of the 1930s Weimar Germany was well to the right of modern German politics, even by most conservative Bavarian-belt standards of today. The assumption here is that whatever the Centre may represent, its relative frequency of core support versus the fringes is a stable normalish distribution. Note that it doesn’t have to be the case and actual distributions may be rather skewed.
One of the sanity checks for this toy model for me was to numerically test out Popper’s theory that a democracy with First Past the Post (FPTP) system is better designed for its political error-correction function, as opposed to systems like Proportional Representation (PR). PR cannot avoid coalitions where fringe parties still form the government (even when the mandate is against them) and can still remain in position to affect government policy by acting as king-makers. In other words, PR is not the best democratic system to remove bad leaders. Turns out the guy was making sense and my numerical experiment, at least, bears this out quite nicely.
The above show simulations of how the Left, Right and Centre parties’ seats evolve across elections for Proportional Representation (PR) and First Past the Post (FPTP), where the average swing vote-share is set to 25% and the volatility is high. The results compare the final seats in 2 systems for exactly the same voting across all 500 constituencies. It is immediately obvious that under PR, no party ever gets absolute majority (>250 seats), which implies a polity hobbled with coalition politics for generations. Secondly, signals from voter swings barely register in PR, as opposed to FPTP where the effect is dramatic. This is evident in the results of the last few elections, which represent a clear swing for the Left – increase of seats from 150 to over 250, whereas the Right is decimated in FPTP. The same swing is also somewhat visible in PR, but the Right still retain around 120 seats leaving lots of room for a Centre-Right coalition to form the goverment even though the mandate was to deprive the Right of power. Finally, for most elections where the swings are indecisive, the power remains well with the Centrists with full majority in FPTP, with very little need to share it with the political fringes. In sharp contrast, PR systems tend to minimize centrist power in times of high political volatility, leading to minimal seats for the centrist party in almost all simulated elections.
This toy model is quite simplistic and makes some questionable assumptions. But there are a lot of ways to play around with it, not least by extending it to calibrate to real training datasets, i.e. constituency-level historic voting patterns and polling data and get something predictive out of it à la Nate Silver. I am even tempted to have a stab at it, as most of these data, say for the UK, are publicly available and not very difficult to clean up and process. Not sure how much time I’d get to devote to this, but watch this space.