🧪 Genomics

All Life Uses 20 Amino Acids. These Bacteria Run Their Most Ancient Machine on 19.

A Columbia-Harvard-MIT team used AlphaFold2, ESM2, and ProteinMPNN to redesign 21 of 52 E. coli ribosomal proteins so they function without isoleucine, achieving 90% fitness where brute-force substitution managed only 40%. But 382 residues out of 81,000 is 0.47% of the way to a true 19-amino-acid organism.

A translucent ribosome model glowing in a molecular biology laboratory, surrounded by floating amino acid structures

Dr. Sanjay Mehta

Twenty. For roughly four billion years, that number has been a biological constant more universal than DNA itself. Every bacterium, every redwood, every human cell builds its proteins from the same set of 20 amino acids, encoded by the same genetic translation machinery, and nobody chose that number. Natural selection arrived at it once, in the ancestor of all living things, and nothing has changed the count since. Until now. A team spanning Columbia, Harvard, and MIT published results in Science on April 30, 2026, demonstrating that the ribosome, the molecular machine responsible for translating genetic code into protein in every cell on Earth, can be redesigned to function without isoleucine, one of those 20 canonical building blocks.

They called the resulting strain Ec19.

What They Actually Did

Isoleucine was the target because it is, among the 20, the most frequently substituted amino acid across related bacterial species in nature. It is structurally similar to leucine and valine, both branched hydrophobic amino acids, which gave the team plausible replacement candidates. Starting with E. coli, the team first tried the most obvious approach, brute force: systematically swap every isoleucine in each ribosomal protein for valine, its closest structural cousin, and see what survives. Of the 36 essential ribosomal genes they tested, simple valine substitution killed the cells outright in 22 cases and produced severely crippled bacteria in most of the surviving 14. Fitness hovered around 40% of wild type, which is to say the approach failed.

So they brought in three AI systems, each attacking a different dimension of the protein redesign problem. ESM2, Meta's protein language model trained on 250 million sequences, scored which positions could tolerate substitution by comparing the wild-type sequence against hundreds of millions of naturally occurring protein variants to identify residues sitting in evolutionarily flexible regions versus those locked in place by billions of years of selective pressure. AlphaFold2 predicted how alternative amino acid sequences would fold in three dimensions, flagging designs likely to misfold or destabilize the ribosome's structure before anyone synthesized a single nucleotide of DNA. ProteinMPNN, developed at the University of Washington, designed entirely new protein sequences optimized to fold into the same structures without using isoleucine at all. Working position by position across 52 ribosomal proteins, the AI-guided approach produced 21 fully isoleucine-free ribosomal proteins, removing a total of 382 isoleucine residues. Fitness of the resulting Ec19 strain exceeded 90% of wild type, meaning the bacteria grew, divided, and synthesized proteins at nearly the same rate as unmodified E. coli despite carrying 21 redesigned ribosomal components.

From 40% to 90%. A 2.25-fold improvement in cellular fitness, attributable entirely to the difference between naive substitution and AI-guided protein redesign.

Why the Ribosome Matters More Than Any Other Target

Proteins in a cell vary enormously in age and conservation. Some enzymes are recent evolutionary inventions, present in only a handful of species and tolerant of extensive mutation, making them comparatively easy targets for amino acid substitution experiments. Ribosomal proteins are the opposite: they are among the most conserved molecular structures in all of biology, essentially unchanged across 3.5 to 4 billion years of evolution, shared by bacteria, archaea, and eukaryotes alike. If you can strip a canonical amino acid from the ribosome and keep the cell alive, you have passed the hardest possible test. Every other protein in the genome is presumably easier to modify, because none of them face the same intensity of evolutionary constraint that ribosomal proteins do.

Tom Ellis, a synthetic biologist at Imperial College London, called the paper "a tour de force of synthetic biology to address a really interesting question that's fundamental to the origin of life on Earth." Julius Fredens, who led a separate effort at the National University of Singapore to build a synthetic E. coli genome with a reduced genetic code, described the results as "very exciting."

0.47%: The Number Nobody Published

Here is the calculation that contextualizes the entire achievement. Ec19 removed 382 isoleucine residues from ribosomal proteins. According to the research team's own data, approximately 81,000 isoleucine residues remain scattered across the roughly 4,000 other protein-coding genes in E. coli. That means the project has addressed 382 out of approximately 81,382 total isoleucine residues in the genome, a number that becomes even more sobering when you express it as a percentage. That is 0.47%. A true 19-amino-acid organism, one that uses no isoleucine anywhere in its proteome, is 99.53% unfinished.

This is not a criticism but rather a map of the remaining distance, and the distance is enormous. At the throughput demonstrated in this study, where years of work by a multi-institution team produced 382 validated replacements, completing the remaining 81,000 residues at the same pace would require roughly 212 sequential iterations of equivalent effort. Even with dramatic improvements in AI model accuracy and DNA synthesis cost (currently about $0.07 per base pair for long synthetic constructs, per Synthetic and Systems Biotechnology benchmarks), the combinatorial challenge of epistatic interactions between simultaneous modifications remains unsolved. Individual AI-designed proteins worked alone in this study but killed cells when combined, because changing multiple proteins simultaneously introduces interactions that current models cannot predict.

What the AI Did That Surprised the Researchers

One redesign stood out. RpsJ, a small ribosomal protein, contained only two isoleucine residues, and replacing them should have been simple. Instead, ProteinMPNN generated a design that introduced eight compensatory mutations, completely remodeling an entire alpha helix to maintain the protein's structural integrity without isoleucine. Harris Wang, the senior author at Columbia, told Scientific American: "Some of these AI designs were really surprising. They didn't look like anything we would have anticipated." After 450 generations of laboratory evolution, none of the AI-designed sequences reverted to isoleucine-containing versions, which suggests the AI found genuine solutions rather than unstable approximations.

Why You Should Be Skeptical

Twenty-one of 52 ribosomal proteins is not a complete ribosome. It is 40% of the ribosomal proteome, and the remaining 31 proteins may include positions where isoleucine is structurally irreplaceable because it participates in hydrophobic core packing or protein-protein interfaces that leucine and valine, despite their structural similarity, cannot adequately fill without destabilizing the entire complex. Beyond the ribosome, E. coli has approximately 4,000 other genes whose proteins collectively contain those 81,000 remaining isoleucine residues, and many of those proteins participate in metabolic pathways, membrane transport, and regulatory networks where the hydrophobic character of isoleucine plays roles that leucine and valine cannot simply fill. A deeper problem lurks in the epistatic failures the team reported, cases where proteins that functioned perfectly when tested individually killed the cells when combined, because the simultaneous removal of isoleucine from multiple ribosomal components disrupted interaction networks that no single-protein test could reveal. Current AI protein design tools optimize one protein at a time, treating each sequence as an isolated engineering problem disconnected from the thousands of molecular interactions that keep a living cell running. They do not model how dozens of simultaneous modifications propagate through a cell's entire interaction network, and that gap between single-protein accuracy and whole-genome coherence is where the path from Ec19 to a true 19-amino-acid organism could permanently stall.

What This Analysis Does Not Prove

Our 0.47% completion calculation treats all isoleucine residues as equally difficult to replace. In reality, buried structural isoleucines in enzyme active sites may be far harder to substitute than surface-exposed positions in abundant housekeeping proteins, and the actual difficulty distribution is unknown. Fitness was measured under standard laboratory conditions (rich media, 37°C, no competition), and performance in nutrient-poor or fluctuating environments could differ substantially. Our 212-iteration projection assumes linear scaling of effort, but advances in AI models, cheaper DNA synthesis, and automated strain construction could compress that timeline by orders of magnitude, while conversely, the epistatic interaction problem that killed cells when individually successful modifications were combined could make later stages exponentially harder than earlier ones in ways that linear projections fundamentally cannot capture. Cost projections for whole-genome isoleucine removal are not attempted here because too many variables, including future DNA synthesis pricing, AI model accuracy gains, and the scaling behavior of epistatic constraints, remain unquantified and inherently speculative at this stage of the research.

What You Can Do

If you work in synthetic biology or protein engineering: this paper demonstrates that AI-guided multi-tool pipelines (ESM2 for fitness prediction, AlphaFold2 for structural modeling, ProteinMPNN for sequence design) outperform single-method approaches by a factor of 2.25 on measurable cellular fitness. Adopt the pipeline, because if you are still designing protein variants by intuition or single-substitution scanning, the data says you are leaving performance on the table. Read the methods section of Liu et al. for the specific workflow, paying particular attention to how they sequenced ESM2 fitness scoring, AlphaFold2 structural validation, and ProteinMPNN sequence generation into a pipeline where each tool filters and refines the output of the one before it.

If you are interested in origin-of-life research: the Ec19 strain is the first experimental evidence that a reduced amino acid alphabet can sustain core translational machinery. As Wang framed it in an analogy that captures both the simplicity and the audacity of the question: "Think about language. There are 26 letters in the English alphabet, but do you really need 26?" Early life may have used fewer than 20 amino acids, and this work provides the first modern experimental system to test that hypothesis. Watch for follow-up papers attempting to remove a second amino acid, which would test whether the epistatic interaction problem compounds or whether the ribosome has already absorbed the hardest blow.

If you invest in biotech or biocontainment: organisms that depend on only 19 amino acids cannot survive outside controlled environments where the missing amino acid is deliberately withheld, creating a natural biocontainment mechanism far stronger than existing kill-switch genetic circuits. However, this application is years from practical implementation because the epistatic interaction barrier described above is the rate-limiting step, and scaling from 382 validated residue replacements to tens of thousands while maintaining cellular viability demands modeling capabilities that do not yet exist in any AI system. Do not invest on a timeline shorter than a decade, and even then, the fundamental question of whether whole-genome amino acid removal is physically achievable remains unanswered by any published research, including this one.

Bottom Line

For four billion years, 20 amino acids was not a choice but the only option. A team at Columbia, Harvard, and MIT just demonstrated that the most ancient, most conserved molecular machine in biology can operate on 19, with AI doing the protein engineering that brute force could not. At 0.47% completion, a true 19-amino-acid organism remains a distant goal, and epistatic interactions between modified proteins represent an unsolved barrier that could prove permanent. But the ribosome was the hardest test, the one structure where evolutionary constraint is most severe. It passed. Everything else in the genome should be easier, if the interaction problem can be solved.

Sources

  1. Liu, L. et al. (April 30, 2026). Redesigning the ribosome with a reduced amino acid alphabet. Science 392, aeb5171. DOI: 10.1126/science.aeb5171
  2. Sanfiorenzo, C. & Wang, K. (2026). Can AI simplify the alphabet of life? Science 392, 467-468. DOI: 10.1126/science.aeh0122
  3. Dolgin, E. (2026). All life runs on 20 amino acids. These cells run key machinery on just 19. Nature. DOI: 10.1038/d41586-026-01396-w
  4. Krywko, J. (April 30, 2026). Scientists use AI to test whether life can run on only 19 amino acids. Scientific American
  5. Ars Technica (April 30, 2026). AI helps redesign bacterial ribosome to work without one amino acid. Ars Technica