The most recent issue of PNAS includes a report by Galen, et al linking enhanced mutation at a CpG site to altitude adaptation in Andean house wrens (Troglodytes aedon), based on clear biogeographic and biochemical evidence of adaptation. I’ve been waiting for this, both in the narrow sense that I’ve been waiting for this particular study to appear in print, and also in the broader sense that I have been waiting for any paper on mutation-biased adaptation to appear in a prominent venue. Results like these, one hopes, will overturn the “raw materials” doctrine of neo-Darwinism and stimulate the development of a new understanding of the role of mutation in evolution.
In late May, Lior Pachter posted a blog entitled Pachter’s P-value prize, offering a cash prize for providing a probability calculation (P value) based on a “justifiable” null model for the claim of Kellis, Birren and Lander, 2004, hereafter KBL, that some results from an analysis of yeast duplicate genes “strikingly” favored the classic neo-functionalization model of Ohno over the contemporary DDC (duplication-degeneration-complementation) model.
This attracted, not just a huge number of views (for a geeky science blog), but an extensive online discussion among readers that was carried out at a very high intellectual level. Dozens of scientists commented on the blog, including Manolis Kellis, the first author of KBL, and scientists well known in the field of comparative genome analysis. Mostly they were discussing what was an appropriate statistical test, but they also discussed the nature of the Ohno and DDC models, the responsibilities of authors, the flaws of the peer-review process, the appropriateness of blogging and tweeting about science, and so on.
Here, I’m going to use graphics and simulated data to illustrate null models in relation to the duplicate gene data from KBL. I also want to make a comment about the expectations of the DDC model. All of the plots and calculations are available as embedded R code in the R-Markdown file kbl_stats.Rmd. (more…)
In a previous post called “The revolt of the clay“, I described four different ways to think about the role of variation as a process with a predictably non-random impact on the outcome of evolution. The main point was to draw attention to my favorite idea, about biases in the introduction of variation as a source of orientation or direction, and to provide a list of what (IMHO) represents the best evidence for this idea. I gave anecdotes from four categories of evidence
mutation-biased laboratory adaptation
mutation-biased parallel adaptation in nature
recurrent evolution of genomic features
miscellaneous evo-devo cases such as worm sperm
With the publication of a study by Couce, et al., 2015 (a team of researchers from Spain, France and Germany), the first category just got stronger. Couce, et al did an experiment that I’ve been trying to talk experimentalists into doing for a long time: they directly compared the spectrum of lab-evolved changes between two strains with a known difference in mutation spectrum.
In brief, they exposed 576 different lines of E. coli to a dose of cefotaxime that started well below the MIC and doubled every 2 days for 12 iterations. The evolved lines fell into 6 groups depending on 3 choices of genetic background— wild-type, mutH, or mutT— and 2 choices of target gene copy number— just 1 copy of the TEM-1 gene, or an extra ~18 plasmid copies.
The ultimate dose of antibiotic was so extreme that most of the lines went extinct. The figure shows how many of the 96 lines (for each condition) remain viable at a given concentration.
The mutators were chosen because they have different spectra. The “red” mutator (mutH) greatly elevates the rate of G –> A transitions, and also elevates A –> G transitions. The “blue” mutator (mutT) elevates A –> C transversions. The figure above indicates that the blue mutator fared slightly better. A much stronger effect is that the lines with multiple copies of the TEM-1 gene did better. Very few single-copy strains survived the highest doses of antibiotic.
The authors then did some phenotypic characterization that led them to suspect that it was not just the TEM-1 gene, but other genes that were important in adaptation. In particular, they suspected the gene for PBP3, which is a direct target of cefotaxime.
They sequenced the genes for TEM-1 and PBP3 from a large sample of the surviving strains above, with results shown in this figure:
The left panel, i.e., figure 3a, shows that evolved strains of mutH (red), mutT (blue), and wild-type (black) sometimes have one of the well-known mutants in the TEM-1 gene encoding beta-lactamase. The predominance of red means that more mutH strains evolved the familiar TEM-1 mutations, which makes sense because these mutations happen to be the G–>A and A–>G transitions favored by mutH.
Panel 3b, which refers to mutants in the gene encoding PBP3, is a bit complicated. As before, the histogram bars are colored based on the background in which the particular mutation listed on the x axis is fixed. However, rather than showing all the lines together in one histogram, they are arranged into 3 different histograms depending on whether the mutational pathway matches the pathways preferred by mutT (left sub-panel), by mutH (center sub-panel), or neither (right sub-panel). For instance, blue appears in 16 columns, so there are 16 different mutant PBP3 types that evolved in mutT backgrounds, and 14 of them are in the left sub-panel indicating mutT-favored mutational pathways. Red appears in 18 columns, 15 corresponding to mutH-favored pathways.
Importantly, the density in the columns doesn’t overlap much: nearly all of the blueness is in the left histogram (~50/52), and nearly all the redness is in the center histogram (~26/31). This means that evolved mutT strains overwhelmingly carried mutations favored by the mutT mutators, and likewise for mutH strains and mutH mutators. If adaptation were unaffected by tendencies of mutation, as the architects of the Modern Synthesis believed, then the colors would be randomly scrambled between the 3 histograms. Because it is not, we know that the regime of mutation imposes a predictable bias on the spectrum of adaptive changes.
The main weakness of the paper, perhaps, is that they did not do reconstructions to show causative mutations. Instead they make plausibility arguments based on having seen some of these in other studies of laboratory or clinical resistance. I think the paper is pretty strong in spite of this. Recurrence of a mutation in independent lines is a very strong sign that it is causative, whereas singletons might be hitch-hikers with no causative significance. Yet if we were to eliminate the 17 non-recurring mutants, we would still have a striking contrast. Of ~43 mutT mutants with recurrently fixed mutations, ~41 are for mutT-favored pathways; of the ~23 mutH mutants with recurrently fixed mutations, ~18 are for mutH-favored pathways.
Unfortunately, due to the language and framing of this paper, casual readers might not recognize that it is actually about mutation-biased adaptation. The term “mutation bias” only occurs once, in the claim that “Mutational biases forcibly impose a high level of divergence at the molecular level.” I’m not sure what that means. What I would describe as “evolution biased by mutation” is described by Couce, et al as “limits to natural selection imposed by genetic constraints”. Sometimes the language of my fellow evolutionists seems like a strange and foreign language to me. But I digress.
To summarize, this work clearly establishes that 2 different mutator strains adapting to increasing antibiotic concentrations exhibit strikingly different spectra of evolved changes that match their mutation spectra, consistent with the a priori expectation of mutation-biased adaptation.
Earlier this month I was contacted by a reporter writing a piece on the role of chance in evolution. I responded that I didn’t work on that topic, but if he was interested in predictable non-randomness due to biases in variation, then I would be happy to talk. We had a nice chat last Friday.
I’m only working on the role of “chance” in the sense that, in our field, referring to “chance” is a placemarker for the demise of an approach based implicitly on deterministic thinking— evolution proceeds to equilibrium, and everything turns out for the best, driven by selection. This justifies the classic view that “the ultimate source of explanation in biology is the principle of natural selection” (Ayala, 1970). Bruce Levin and colleagues mock this idea hilariously in the following passage from an actual research paper:
To be sure, the ascent and fixation of the earlier-occurring rather than the best-adapted genotypes due to this bottleneck-mutation rate mechanism is a non-equilibrium result. On Equilibrium Day deterministic processes will prevail and the best genotypes will inherit the earth (Levin, Perrot & Walker, 2000)
I just had to post the image below because it’s so cool.
Here’s the backstory. A decade ago, Lev Yampolsky and I did a meta-analysis of mutation-scanning experiments– studies that systematically change amino acids in proteins, and measure the effects. Based on about 10,000 experimental exchanges, we computed an “EX” measure of mean exchangeability for use in modeling evolution, and showed its superiority. In negative reviews of the paper, we were told that EX could not be relevant to biology or evolution because it was based on laboratory experiments that are artificial and detached from what happens in the real world.
Ultimately we won this argument because facts are more compelling than hand-waving arguments about what is realistic. We compared EX against other allegedly more reality-based measures (e.g., BLOSUM, Grantham, PAM, etc) for (1) predicting effects of mutation experiments (in a cross-validation), (2) predicting which types of replacements are most likely to be involved in Mendelian diseases, and (3) serving as a basis function (in PAML) for evolutionary acceptability. EX beat all the other measures in the first 2 tests, and tied with BLOSUM in the 3rd test.
Over the past few years, mutation-scanning experiments based on high-throughput methods have exploded. In particular, it is possible to engineer huge numbers of genetic variants– e.g., creating every possible single amino acid change in a protein– express the variants in a mixed culture where growth is dependent on the mutated gene, and sequence the whole mess, assigning fitness based on relative frequency.
Over the past few months I have been processing data from these experiments. The data have many uses, but one of them is to compute a new measure of exchangeability. The first version of EX presented point estimates of exchangeability for each (asymmetric) path from amino acid 1 to amino acid 2, e.g., Alanine to Valine. That is, each EX value was a single number.
Now we have so much data that it is possible to present, for each amino acid replacement, a distribution of effects. That is what is shown in this figure (click exd for a PDF with full resolution). The data are ~100K exchanges from ~30 experiments. For each histogram, the x axis is quantile from 0 to 1 and the y axis is density. Each bar is ~25 observations on average. The amino acids are sorted chemically. Red is hydrophilic-hydrophilic, blue is hydrophobic-hydrophobic, and purple is mixed.
This is asymmetric. The row is “from” and the column is “to”. For instance, exchangeability from Lysine tends to be high consistently because the majority of Lysine residues are found at surface positions, which are more forgiving. So the row for Lysine has a lot of increasing histograms, while the column has quite a few decreasing ones. For instance, K —> M is quite different from M —> K.
Obviously there is a lot of noise, but one encouraging observation from this is that there clearly are different shapes from decreasing to flat or convex to increasing. So, if we imagine moving a threshold T from low (forgiving) to high (stringent), this is obviously going to creature a series of quite distinct EX measures for each T. That means there is leverage to distinguish forgiving from stringent (by branch, by site, or by protein when considering different branches, sites or proteins).
For me, the story started a long time ago with our theoretical demonstration (graph at right) that bias in the introduction of variation (by mutation-and-altered-development) is a fundamental cause of non-randomness in evolution (Yampolsky & Stoltzfus, 2001).
The novelty of this claim bothered me deeply. Why? Here was a basic principle— a causal link between non-randomness in biological inputs (mutational and developmental biases) and non-randomness in evolutionary outputs— as fundamental as the concept of selection or drift. Yet, this principle was not mentioned in any textbook of evolution or population genetics (indeed, there is even a classical population-genetic argumentagainst a determinative role for mutational biases). I could not even find this principle in the research literature! When it comes to contemplating the impact of biases in variation, evolutionary biologists habitually assume that such an impact is impossible, except in the special case of (1) rigid constraints (i.e., the impossibility of generating form B means we’ll get A or C instead), or (2) neutral evolution. We knew that all of this was incorrect.
This prompted 2 questions. Why wasn’t a general connection between biases in variation and biases in evolution recognized long ago, e.g., by Wright, Haldane or Fisher? And, why— after it was discovered and published in 2001— didn’t this inspire a revolution?
I’m still puzzling over the second, admittedly naive, question. To address the first question, I’ve spent an inordinate amount of time studying the development of evolutionary thought (bookshelf at right).
The short answer is this: the notion that mutation has a dispositional role in evolution, influencing its rate and direction, represents a kind of “internal” causation, an internal source of direction in evolution, that Darwin’s followers rejected as illegitimate. Ever since, it has been a blind spot in evolutionary thinking.
The nature of this rejection is hard to comprehend today, due to a process of amnesia and theory-drift. Nearly all evolutionary biologists today believe that evolutionary biology has a prevailing theory, and that this theory— called the Modern Synthesis or modern neo-Darwinism— came together in the mid-20th century. What few realize is how far the common conception of this theory has drifted from its original intentions. The original Modern Synthesis was held together with Darwinian doctrines that most scientists today do not accept, such as the doctrine of gradualism, the idea that selection is creative, or the rejection of any internal causes of direction. We can think of these as the “soft parts” of the Modern Synthesis, the muscles and connective tissue that gave it shape and motion.
Over time, the Darwinian character of the Modern Synthesis has rotted away, leaving only the more resilient parts. This is why scientists today think of the Modern Synthesis as a kind of open-ended framework for understanding evolution. They are looking at an open-ended skeleton.
Our study of early geneticists revealed that this skeleton predates the Modern Synthesis. There was an earlier Mendelian-Mutationist Synthesis that combined mutation, heredity and selection, without Darwinian doctrinal commitments to gradualism, the creativity of selection, and the “randomness” (non-importance) of mutation. What most scientists today think of as the Modern Synthesis is actually the forgotten Mendelian-Mutationist synthesis. Like scientists today, the early geneticists or “mutationists” welcomed both selection and neutrality, allowed both gradual change and saltations, and welcomed the idea that biases in mutation could be the cause of parallelisms or trends.
The new paper by Stoltzfus and Cable describes what the early geneticists believed about how evolution works, and what they contributed to the foundations of evolutionary thought. It also explains why they rejected Darwin’s theory (another case in which the popular conception of a theory today does not match what its historical meaning).
But that’s only half of the story. The other big theme is historiography, the telling of history. The disconnect between what actually happened and what scientists believe is not just a matter of theory-drift.
“History is written by the victors,” Churchill said. In this case, the victorious architects of the Modern Synthesis promulgated a view of early geneticists as bumbling fools who saw mutation and selection as opposing principles, and who couldn’t think synthetically. The period of 1900 to 1920, actually a rich period in which early geneticists laid the foundations of modern evolutionary thought, is described perversely as part of an “eclipse of Darwinism”— a period of darkness when the world was deprived of His light— lasting until Darwinism is re-born in the Modern Synthesis. This story-telling has been so influential that, when contemporary scientists list historically important figures, all key figures of the Mendelian-Mutationist synthesisare removed, Soviet-style (see figure below).[2]
That is, the distorted view of history that evolutionary biologists hold today is not just a matter of passive amnesia, but of a highly successful public relations campaign, what evo-devoist Stuart Newman recently called “an unremitting 90-year campaign to identify ‘evolutionary theory’ with ‘Darwinism'”.
The recent paper on Mendelian-Mutationism is actually an off-shoot of a series of “Mutationism myth” blogs written for SandWalk in 2010. To turn the blogs into a scholarly work worthy of publication in a peer-reviewed historical journal was a major project accomplished over the course of 2 years, by teaming up with a history-of-science graduate student named Kele Cable. Kele recently blogged about our paper on his web site.
Notes
[1] Some of my favorites: Haldane, 1932 (the tattered volume, top, second from right); the 1911 (3rd) edition of Punnett’s Mendelism, the first textbook of genetics (the slimmer of two burgundy volumes, top center); George Williams (1966) Adaptation and Natural Selection (row 2, 9th from right, with the shiny jacket cover); Lewontin, 1974 (row 2, right end, red with gold lettering next to Crow & Kimura 1970).
[2] Other examples could be given. The Oxford Encyclopedia of Evolution (click for searchable online index) has an entry for Mendel, who made no direct contributions to evolutionary thinking, but lacks an entry for all of the mutationists except for Morgan. Importantly, the entry for Morgan says nothing of his evolutionary views, only of his contributions to genetics. Textbooks (e.g., Ridley, 1993, or Freeman & Herron, 1998) and online teaching materials (try a web search on “development” or “history” of evolutionary thought) frequently jump from Darwin to the Modern Synthesis, with the explanation that Darwin’s theory was right but needed a mechanism, and this was supplied when the architects of the Modern Synthesis combined genetics and selection. Early geneticists, if they are mentioned at all, are depicted only for their alleged failure to understand selection, accept small changes, or achieve synthesis.
In a recent QRB paper with David McCandlish, we review the form, origins, uses, and implications of models (e.g., the familiar K = 4Nus) that represent evolutionary change as a 2-step process of (1) the introduction of a new allele by mutation, followed by (2) its fixation or loss.
What could be surprising about these “origin-fixation” models, which are invoked in theoretical models of adaptation (e.g., the mutational landscape model) and in widely used methods applied to phylogenetic inference, comparative genomics, detecting selection, modeling codon usage, and so on?
Earlier this year, the Open Tree of Life project made the first public release of its synthetic tree of 2.5 million species (from ~4000 source trees), and announced a web services API (Application Programming Interface) providing programmatic access to a continually updated set of resources:
a synthetic tree covering millions of species
a database of thousands of source trees
a reference taxonomy used to align names from different sources
Tree-for-All participants convene to hear a report from a hackathon team
The API release was timed to coincide with our open call for participation in a “Tree-for-all” hackathon, which took place September 15 to 19 (2014) at University of Michigan, Ann Arbor. The hackathon— organized and funded by OT, the Arbor workflows project and NESCent’s HIP (Hackathons, Interoperability, Phylogenies) working group (Stoltzfus and Pontelli, PIs)— aimed to build capacity to leverage OT’s resources, making expert phylogenetic knowledge more accessible to scientists, educators, and the public.
To find out more about the hackathon, go to Open Tree’s blog (click on the “tree-for-all” tag), where I am guest-blogging about it. Using web services to make phylogenetic knowledge more accessible is the theme of the Phylotastic project described by Stoltzfus, et al. 2013.
Arlin Stoltzfus, 2012 to 2023 (@ArlinStoltzfus). Views expressed here are my own. [CC-BY ]