By their words ye shall know them: Evidence of genetic selection against general intelligence and concurrent environmental enrichment in vocabulary usage since the mid 19th century



Ever since Galton (1869) forecast declining intelligence on the basis of the shifting demographics of the Victorian population, there has been controversy about the future of human intelligence.

Early use of IQ testing seemed to confirm Galton’s (1869) predictions, as most studies found that IQ was inversely related to fertility, suggesting directional genetic selection for lower intelligence (Lynn, 2011) – a trend that persists into the present (Lynn and van Court, 2004; Meisenberg, 2010; Reeve et al., 2013; Kanazawa, 2014).

In the West, up until the early to mid 19th century, those with high levels of socioeconomic status, wealth, and education (all of which are proxies for intelligence; Herrnstein and Murray, 1994) had higher numbers of surviving offspring relative to those with comparatively lower levels (Clark, 2007; Skirbekk, 2008), suggesting that higher intelligence may have conferred fitness advantages on individuals having to cope with extremes of cold, disease outbreaks, and conflict (Woodley and Figueredo, 2013). Subsequent increases in global temperature, coinciding with the end of the Little Ice Age in the mid-19th century, reduced environmental harshness, boosting agricultural yields thus reducing ecological stress and conflict (see: Zhang et al., 2007, 2011 for a demonstration of the inverse historical relationship between temperature and conflict). This would have substantially relaxed selection against those with lower intelligence (Woodley and Figueredo, 2013). This was coupled with advances in medicine (which would have included better means of fertility control, hygiene, nutrition, and medication; Lynn, 2011), and also social innovations such as welfare, mass schooling, and universal healthcare. The combined effect of these was a demographic transition characterized by general reductions in fertility, which were most pronounced among those with higher intelligence (Lynn, 2011). This was mediated primarily by fertility control coupled with the increasing prevalence of opportunities to delay fertility (i.e., higher education, increasing status competition, etc., which disproportionately attenuated the fertility of high-IQ women relative to men; Low et al., 2002; Meisenberg, 2010).

Child mortality was historically concentrated among those with low socioeconomic status (Geary, 2000) and impacted 50% of all children born in some European regions during the Renaissance – dropping to around 1% in the modern era, starting in the 19th century (Volk and Atkinson, 2008). Reductions in child mortality therefore further boosted the reproductive success of those with low IQ relative to those with high IQ. Additionally, historically high levels of child mortality would also have functioned as a source of purifying selection against de novo Single Nucleotide Polymorphisms (SNPs) and other deleterious mutations, which based on present rates of accumulation (i.e., 70 de novo SNPs per diploid genome per generation; Kong et al., 2012) should be associated with reproductive failure rates of a similar predicted magnitude to those that were actually observed historically in the child mortality data (i.e., 88%; Keightley, 2012).

The presence of directional selection favoring lower IQ coupled with increasing levels of mutation accumulation stemming from the breakdown of purifying selection should have reduced population-level IQ in the West since the 19th century. Consistent with this expectation, early intelligence researchers predicted that time-series studies of populations would reveal substantial generational declines in intelligence (Lentz, 1927; Cattell, 1937). Despite these predictions, the first such studies revealed that IQ scores had in fact risen across time (i.e., Cattell, 1950). This apparent contradiction was even termed “Cattell’s Paradox” (Higgins et al., 1962) after psychometrician Raymond B. Cattell, who in the 1930s was prominent in predicting declining intelligence due to the lower fertility of those with higher intelligence (i.e., Cattell, 1937).

Debate as to the reality of massive secular gains in IQ scores was put to rest in the 1980s by Flynn (1984, 1987), who documented a steady rise in IQ across countries and across IQ batteries averaging three points per decade. The preponderance of the data collected subsequently has reinforced Flynn’s finding (Trahan et al., 2014; Pietschnig and Voracek, 2015). Few now dispute the reality of this ‘Flynn effect’ (Herrnstein and Murray, 1994), however, there remains much debate as to its causes (Williams, 2013; Pietschnig and Voracek, 2015).

In the 1990s Lynn (2011) proposed a solution to Cattell’s Paradox based on the idea that while “genotypic intelligence” (i.e., the theoretical level of intelligence resulting from the action of genes alone) has been declining due to genetic selection, these declines have been massively offset by gains in “phenotypic intelligence” (i.e., the intelligence that results from the interaction between a population’s genes and improved environments). Loehlin (1997) illustrated this with the analogy of rising tides (representing the phenotypic-IQ-boosting effects of improving environments) lifting leaky boats (representing the much smaller losses expected on the basis of selection for lower IQ).

Anomalies Lead to New Findings

Lynn’s model fails to account for certain observations. For example, if intelligence has been rising overall, why were 19th century Western populations much more innovative on a per capita basis across a wide range of fields (science, technology, mathematics, literature, and philosophy) and also much more generative of geniuses than modern populations (Murray, 2003; Huebner, 2005; Simonton, 2013), despite there being more individuals alive today with greater access to education, hygiene, high-quality nutrition, and other proposed elicitors of the Flynn effect (Williams, 2013)? This may seem like a counter-intuitive proposition, as there are many examples of substantial scientific progress in the modern era (i.e., in fields such as computing, genetics, materials science etc). Nonetheless, while breakthroughs are still occurring, they are occurring at a lower rate than was the case in the past, as are the geniuses responsible for them.

These indices of innovation and genius demonstrating per capita declines are based on the historiometric method of collating notable developments and individuals across many different encyclopedic reference works, first proposed by Galton (1869). There is a striking degree of agreement among different reference works as to precisely what counts as a major innovation, and who was responsible for it, which suggests that this is a robust method for estimating secular trends (Murray, 2003).

Simulated historical trends in “genotypic intelligence” (the population level intelligence change that would be expected due to selection alone, absent the Flynn effect) predict changing rates of innovation and genius (Woodley, 2012; Woodley and Figueredo, 2013). Clearly, there are factors other than intelligence influencing the innovativeness and creativity of populations, such as the presence or absence of key cultural factors and ‘low-hanging fruit,’ that is, novel innovations and discoveries that are easy to make (e.g., Horgan, 1997; Cowen, 2011), however, the findings of Woodley (2012) and Woodley and Figueredo (2013) suggest that losses in intelligence occurring despite the Flynn effect may nonetheless have been an important contributing factor to these trends also.

The Co-occurrence Model: Decreases in g and the Flynn Effect Occur Simultaneously

The general intelligence factor, or g factor (Spearman, 1904), represents the component of intelligence which all tests of mental ability collectively share – it measures the ability to cope with cognitive complexity and is thus extremely important, as it is better at predicting individual differences in various life outcomes than are the relatively narrower cognitive abilities (Jensen, 1998).

The method of correlated vectors (MCV) compares the g loading (i.e., the correlation between a particular ability measure and the g factor) of different subtests of an IQ battery with the magnitudes of a set of associated effect sizes by correlating the vector of one with the vector of the other. This indicates the extent to which a given correlate of IQ is associated with g or narrower cognitive abilities (Jensen, 1998). When using MCV to examine whether the magnitude of genetic selection against IQ is related to the g loadings of subtests, positive correlations have been found (Woodley and Meisenberg, 2013a; Peach et al., 2014). When the same analysis is attempted for the Flynn effect, the effect size is negatively related to subtest g loadings (te Nijenhuis and van der Flier, 2013). The vectors of genetic and biological factors such as subtest heritabilities, inbreeding depression, reaction times, and indicators of mutation load typically correlate positively with g loadings in MCV (Prokosch et al., 2005; Rushton and Jensen, 2010; te Nijenhuis et al., 2014b), whereas the vectors of environmental effects, such as intelligence gains among adopted children, gains via educational interventions and gains via retesting typically correlate negatively with g loadings (te Nijenhuis et al., 2007, 2014a, 2015).

This suggests that the highly heritable g factor has been declining historically due to genetic selection and accumulating mutations (thus accounting for the apparent high intellectual productivity of 19th century populations relative to modern ones) whereas more trainable and less heritable specialized abilities exhibiting lower g loadings have been increasing in populations over time in response to educational and environmental improvements. Thus, Flynn effects and declines in g due to genetic selection and mutation accumulation co-occur, albeit hierarchically in that selection and mutation affect the top of the latent cognitive ability hierarchy [i.e., Carroll’s (1993) Stratum III] whereas the Flynn effect is restricted to the narrower abilities and test specificities at the bottom of the hierarchy [i.e., Carroll’s (1993) Stratum I; Woodley and Figueredo, 2013].

Evidence for this model has recently been found in data indicating that performance on tests of simple reaction time has been declining since the late 19th century (Silverman, 2010; Woodley et al., 2014a). Simple reaction time, as a measure related to information processing speed, is a culturally neutral biological marker of g (Jensen, 2006, 2011). Changing population averages may thus reflect the effects of genetic selection and mutation accumulation on g. The change in mean reaction time when scaled in terms of g-equivalents, suggests a decline of -1.21 points per decade in the US and UK when the study-means are controlled for various sources of between-study methods variance (Woodley et al., 2014a). This observed loss is similar to the predicted loss derived from combining the results of a meta-analysis of 10 estimates of g decline computed on the basis of negative IQ × fertility correlations (-0.39 points per decade) with the additional decline derived from a study examining the effects of paternal age on offspring g as a proxy for the generational effects of de novo SNPs on g (-0.84 points per decade; combined loss = -1.23 points per decade; Woodley of Menie, 2015).

The co-occurrence model has also been tested using cohorts from the Netherlands born between 1950 and 1990. Using MCV on a sample of 63 subtests, it was found that the decline magnitudes among subtests showing anti- or “reverse” Flynn effects (i.e., IQ losses presumably due to selection and mutation) were borderline significantly and positively related to their g loadings, whereas those showing Flynn effects were not. The average g loading of the subtests exhibiting anti-Flynn effects was furthermore higher than that of the subtests exhibiting Flynn effects, consistent with predictions from the co-occurrence model (Woodley and Meisenberg, 2013b).

A New Test of the Co-occurrence Model

Here a novel test of the co-occurrence model is presented involving an examination of the temporal prevalence of vocabulary words across English-language texts over the past century and a half. Scores on tests of vocabulary, although themselves assessing a specific ability, are typically among the most g-loaded measures of IQ and are also among the most heritable (Kan et al., 2013). Nevertheless there is considerable heterogeneity in the difficulties of the vocabulary items that comprise these scales (Beaujean and Sheng, 2010), thus there is scope for testing predictions derived from the co-occurrence model using vocabulary measures at the item level.

An excellent vocabulary measure is WORDSUM (developed by Thorndike, 1942) from the General Social Survey (GSS), which has been administered to ‘household’ samples of the American public on a regular basis since 1974. WORDSUM involves showing the respondent a card containing 10 target words. They must find the synonymous term or phrase among five alternatives.

Wolfle (1980) found that WORDUM performance correlated at 0.71 with full-scale IQ. When this correlation is corrected for the reliability of WORDSUM (0.73; Hout and Hastings, 2012), and psychometric validity (0.90; Jensen, 1998), it rises to 0.93, indicating a very high g loading, as is typical of vocabulary measures (Kan et al., 2013).

Attempts at determining whether there is any kind of a secular trend toward changing overall performance on this measure in the GSS have yielded inconsistent results (Haung and Hauser, 1998; Beaujean and Sheng, 2010; Flynn, 2012). Studies by Bowles et al. (2005) and Cor et al. (2012) found that WORDSUM words could be grouped into two classes based on difficulty. Both groups of researchers also found that earlier-born cohorts exhibited higher difficult-vocabulary knowledge relative to more recently-born ones, suggesting declining performance with respect to difficult words. This is consistent with the co-occurrence model, as it is performance on the most difficult (and therefore most g-loaded) words that is declining. These trends would furthermore be congruent with the presence of persistent negative associations between IQ and fertility on WORDSUM vocabulary knowledge, which have been found in GSS birth cohorts dating back to 1880-1899 (van Court and Bean, 1985; Lynn and van Court, 2004).

Studies involving large lexical databases, such as Haung and Hauser (1998), who attempted to determine changes in the frequencies of WORDSUM target words across a 1 million word database, and Roivainen (2014), who used Google Ngram Viewer (Michel et al., 2011) to track the frequencies of WORDSUM, WAIS, WAIS-R, WISC, and WISC-R vocabulary words, have also found evidence for declining usage frequencies, especially among more difficult words. Here, the degree to which WORDSUM item-level difficulties, the negative correlation between item pass rates and fertility (a measure of the strength of genetic selection associated with each item), and changing levels of population literacy predict changes in word prevalence across texts from the mid 19th century to the present, is investigated, while controlling for various factors. Based on the co-occurence model, it is predicted that more difficult words should be declining in usage over time and that this decline should in part be predicted by genetic selection. Conversely, there should be a Flynn effect on easier words owing in part to the effects of increasing population level literacy enriching people’s vocabularies coupled with increasing demand for literature containing less cognitively demanding words.

In the present study Google Ngram Viewer is employed in tracking WORDSUM word frequencies. The Ngram viewer provides a database of more than 5 million texts (newspapers, works of fiction, non-fiction, technical works, etc.), comprising more than 500 billion words that can be searched using the target WORDSUM words, thus revealing their year-on-year frequencies. The database has considerable reach in time also – spanning from 1500 to nearly the present.

One major advantage of examining the prevalence of WORDSUM words across texts is that it can be reasonably assumed that the authors of the texts were using these words correctly – hence appearance in print is tantamount to the authors effectively ‘passing’ that item in WORDSUM. This is potentially important as scores on psychometric tests with multiple-choice-type answer formats are known to be inflated by factors relating to test wiseness such as guessing (Brand, 1987; Must and Must, 2013). It has been found that people are more likely to utilize guessing on more g-loaded measures of ability (Woodley et al., 2014b). Secular gains due purely to increased guessing therefore potentially weaken the capacity for psychometric tests to directly detect declines on g due to genetic selection and mutation accumulation as the effects are occurring on the same variance component (Woodley et al., 2014b). Tracking word usage trends across a representative corpus of written texts therefore yields potentially more ecologically-valid data on secular trends, as guessing and other factors associated with the ‘artificiality’ of the testing environment cannot be influencing these trends.

