Skip to PREreview

PREreview of Higher-order epistasis creates idiosyncrasy, confounding predictions in protein evolution

Published
DOI
10.5281/zenodo.7196581
License
CC BY 4.0

Epistasis, a phenomenon where mutations interact rather than behaving independently, has long been of interest to biologists. In the molecular context, it describes a situation where the whole is different than the sum of its parts - that is, when the fitness effect of a set of mutations is not equal to the sum of each mutation measured alone. Epistasis has implications for nearly all aspects of protein and evolutionary biology. The presence of epistasis is often (perhaps naively) taken as evidence of a physical interaction between positions, linking structural constraints to functional measurements. Epistatic interactions may also be important for controlling the accessibility of adaptation on fitness landscapes.

The increasing availability of large mutational datasets has provided new material for modeling epistasis. This work has progressed along two major lines. One effort attempts to deconvolute epistasis into global and local components. In this framework, the functional form of the underlying genotype-phenotype map can confer a non-specific apparent epistasis which can be transformed away, with remaining epistasis interpreted as specific or local interactions. A second effort has been to examine a potential role for higher-order epistasis (here meaning interactions above pairwise). Some groups have found only marginal improvements to prediction accuracy with higher-order interactions, while others have found pervasive epistasis which makes prediction extremely challenging. Given experimental variety, it has been difficult to make firm conclusions.

This work by Buda, Miton, and Tokuriki takes some major steps towards clarifying what we do and don’t know about epistasis from experiments. In addition to contributing a new six-position landscape with phosphotriesterase (PTE), they synthesize available experimental datasets to measure the prevalence and nature of epistasis across a broad range of models, phenotypes, and selections. The major successes and drawbacks relate to the two lines above: 1) they convincingly demonstrate that high-order epistasis is pervasive and essential, but 2) the impact of this conclusion is dulled a bit because of some potential concerns (and clarity of presentation issues) regarding how they normalize their measurements across a variety of phenotypes and systems.

Major results

The authors examine 45 distinct combinatorial landscapes comprising between 4 and 7 mutations (thus, each landscape is 16-128 genotypes in size). For a given WT background, they define two measures for fitness effects: the heterogeneity (the spread of ΔF or ε across all mutants), and the WT idiosyncrasy (the deviation of a particular WT ΔF or ε from the mean across all mutants). The term idiosyncrasy is (all joking aside) a little bit idiosyncratic to this manuscript - it relies on explanations offered in references 20 and 21 and we struggled with figuring out what it meant. At the introduction of the term, it would be good to work out the full example (along with Figure 1). We understand it such that, for a given landscape with n positions, there will therefore be n heterogeneities, 2n WT idiosyncrasies, (n-2) first-order epistatic coefficients, 2n-2 first-order epistatic idiosyncrasies, and so on. The definitions are sound, just a bit hard to access for the reader. These issues distract from the main point: the paper convincingly demonstrates that both single fitness effects and epistatic interactions are sensitive to the chosen background, and that this holds for any given landscape!

The authors then quantify the impact of idiosyncrasy by comparing the predictive power of two models, one of which comprehensively samples a given background and one which averages across all possible backgrounds. The “WT-background” model includes the fitness and epistatic coefficients in a particular “WT” background, and so will suffer from idiosyncrasy, whereas the “global” model is a linear regression that takes the averaged epistatic and fitness effects across the entire landscape as parameters.

The authors successfully establish that higher-order epistasis generates idiosyncrasy; therefore as a protein accumulates new mutations, single mutational effects and epistasis are expected to be increasingly idiosyncratic. The WT-background model failed to accurately predict function for most genotypes at all orders of epistasis, demonstrating how idiosyncrasy confounds functional predictions in protein fitness landscapes. Along the most accessible trajectories of each adaptive landscape, new epistatic interactions that affect the accessibility of a genotype were usually idiosyncratic. Through two selected examples, it was observed that idiosyncrasy generated by synergistic or antagonistic higher-order interactions permits or restricts accessibility along adaptive landscapes on a local scale.

Major issues

  1. The clarity of the presentation could be improved a bit. We found it difficult to grasp clearly what each term meant at first read, and the text somewhat jargony. Given the abstract nature of this work, we realize this is challenging. Including clear definitions for each term in a single section of the methods would help with this. We found figure 1 very helpful for understanding, as well, but have some suggestions for improvement.

    1. Including the “WT idiosyncrasy” in the single mutational effects case would help.

    2. Showing which connections in the example landscape produce the values in the single mutation plot, the cycles producing the epistatic plot, etc.- each dot on the idiosyncrasy point traces back to a very specific set of calculations and it might be helpful in a supplemental figure to enumerate them

  2. Can the comparison of the WT and global models be made more quantitative? The authors mention that each includes different amounts of data, but perhaps some sort of information criterion could be used to quantify the relative importance of this data vs. the particular parameters.

  3. The manuscript would benefit from further discussion of the fold-change transformation employed. We are uncertain whether this sufficiently removes any impact from a non-linear genotype-phenotype map. Model misspecification will drastically impact epistasis inferences; the analysis of the spline-based nonlinear map in the SI is suggestive, but a better (albeit much more involved) approach would be to use such a transform with the experimental landscapes and then repeat the analysis. We understand this would be a much more challenging undertaking, but we think the impact of this unique dataset would be greatly improved by doing such a comprehensive comparison.

  4. We feel the analysis would be strengthened by including some more granular analysis.

    1. Currently, most analysis combines all positions across all landscapes, regardless of their size, the phenotype measured, or the system. We wonder what the distribution of the heterogeneity and idiosyncrasy across these would be. In other words, how idiosyncratic is idiosyncrasy? How heterogeneous is heterogeneity?

    2. The model comparison is extremely convincing, but would be made even more so by comparing results across landscape size as well (essentially summarizing figure 5a in the style of 4c).

Minor issues, questions, suggestions

  1. We found it striking that, for the WT-background model, adding second-order terms decreases accuracy. We would love to see additional comments on that!

  2. Could the authors comment on how the “actual” WT sequences compare with alternative backgrounds? Is there anything privileged about the actually occurring WT sequences, or are they typical of others in these landscapes?

  3. How limited are the results to adaptive landscapes? The distribution of fitness effects in figure 2a is symmetrical around neutrality, but we would expect that a random set of mutations would skew towards loss of function. Does that mean we shouldn’t generalize these results?

  4. In figure 1, in the example landscape there should be only 4 epistatic coefficients in the bottom plot, to accord with the landscape.

  5. “Stating function” should read “starting function” in extended data fig 2 caption.

Written by Christian Macdonald, Sonya Lee, and James Fraser, but arising from discussions with the whole Fraser lab.

Competing interests

We share a grant with the Tokuriki lab (Human Frontier Science Program Grant, RGP0054/2020).