An AI Designed 285 Genomes From Scratch. 16 Came Alive.

Arc Institute's Evo 2 DNA language model generated hundreds of complete bacteriophage genomes from nothing but a statistical understanding of 9.3 trillion nucleotides. Sixteen of those designs replicated, killed their target bacteria, and passed electron microscopy. Some qualify as new species. It is the first time any AI has created living organisms.

Sixteen out of 285. That is the number from the phage design preprint accompanying Arc Institute's Evo 2 model, now published in Nature as of March 5, 2026. Researchers prompted a fine-tuned DNA language model with a conserved fragment of the ΦX174 bacteriophage genome and asked it to generate the rest. Of the 285 complete synthetic genomes the model produced, 16 formed plaques on bacterial lawns, replicated inside E. coli host cells, and were confirmed as functional viral particles by electron microscopy. Several are so genetically divergent from any known sequence in GenBank that they would be classified as entirely new species under standard taxonomic criteria.

No human designed those genomes. No rational engineering pipeline selected the genes. A 7-billion-parameter neural network, trained on 9.3 trillion nucleotides from 128,000 organisms across every domain of life, generated plausible DNA sequences, and 5.6% of them turned out to be alive.

Why This Clears a Threshold Protein Design Did Not

AI-designed proteins are not new. AlphaFold solved protein structure prediction. RFdiffusion designs novel protein folds. ESM-series models generate functional enzymes. But proteins are self-contained objects: one polypeptide chain, one function, one set of folding constraints. A genome is fundamentally different. Even ΦX174, the simplest well-characterized phage, packs 11 genes and at least 7 regulatory elements into just 5,386 nucleotides. Some genes physically overlap, sharing the same DNA but reading it in different frames. A single point mutation can knock out two proteins simultaneously. Getting one gene right is protein design. Getting 11 genes, their regulatory logic, their physical overlaps, and their coordinated expression timing all correct simultaneously is organism design.

This distinction is what makes the historical trajectory legible: structure prediction (2020) → protein design (2023) → genome design (2026). Each step requires the model to capture a higher order of biological organization. Evo 2 is the first public demonstration that a language model can clear the genome-level bar.

How Evo 2 Works

Evo 2 is not a transformer. It uses StripedHyena 2, a hybrid architecture combining Mamba-style state space layers with attention layers. Standard transformer attention scales quadratically with sequence length. StripedHyena 2 achieves near-linear scaling, enabling a context window of 1 million nucleotides at single-nucleotide resolution. For comparison, Evo 1's context was 8,192 nucleotides. Evo 2 processes sequences 122 times longer.

Training data came from OpenGenome2, a dataset of 8.8 trillion tokens drawn from all three domains of life: bacteria, archaea, and eukaryotes, including human genomes. That is 30 times more data than Evo 1 trained on. Training ran for several months on more than 2,000 NVIDIA H100 GPUs via DGX Cloud on AWS. The flagship model has 40 billion parameters; smaller 20B, 7B, and 1B variants are also available.

One detail separates Evo 2 from most frontier models: Arc released full weights, training code, inference code, and the complete pretraining dataset on HuggingFace. As of March 2026, the GitHub repository has 88,000 downloads and 380 forks. The 40B model alone has received over 6 million API requests through HuggingFace.

Phage Design: Method and Results

Base Evo 2, despite its scale, cannot reliably generate phage genomes on its own. When prompted without fine-tuning, only 34 to 38 percent of output sequences were even flagged as viral by geNomad, a genomic classifier. Evo 1 performed worse: 19 to 33 percent. Most raw outputs were biologically incoherent.

The breakthrough came from supervised fine-tuning on thousands of Microviridae genomes, the family that includes ΦX174. After fine-tuning, researchers prompted both Evo 1 and Evo 2 with a consensus sequence conserved across all natural ΦX174 isolates, then asked the models to generate the remaining variable regions. The resulting ~285 candidate genomes were synthesized, assembled, and electroporated into E. coli cells.

Sixteen designs formed plaques. The phages replicated, lysed host cells, and dispersed to infect neighboring bacteria. Electron microscopy confirmed intact viral particles with the characteristic icosahedral capsid geometry of Microviridae. Evo 2 designs outperformed Evo 1 designs, likely because the improved StripedHyena 2 architecture captured longer-range dependencies across the genome.

Two results stand out. First, some viable designs exhibited host tropism specificity: they killed their target E. coli strains without affecting unrelated bacterial species. That specificity is the single most important property for therapeutic phage design. Second, several genomes were so divergent from any known natural sequence that they would constitute new species under International Committee on Taxonomy of Viruses standards. The model did not merely recombine known phage parts. It invented new ones.

Original Analysis: The Cost Per Viable Genome

Here is a calculation nobody has published. Synthesizing a complete 5,386-nucleotide genome costs roughly $1,500 to $3,000 through commercial gene synthesis providers like Twist Bioscience or GenScript at current per-base pricing ($0.07 to $0.12 per bp for complex sequences with overlapping genes). Assembly and electroporation add approximately $200 per construct in consumables. Plaque assay screening runs about $50 per candidate. Total cost per candidate: roughly $1,750 to $3,250.

At 285 candidates and a 5.6% hit rate, producing 16 viable phages cost approximately $499,000 to $926,000 in synthesis and screening. That works out to $31,000 to $58,000 per viable AI-designed organism. Compute costs for fine-tuning and inference on the 7B model, which runs on a single consumer GPU, add an estimated $500 to $2,000 for the entire campaign based on current H100 cloud pricing at $2-3/GPU-hour and approximately 100-500 GPU-hours of fine-tuning.

Compare this to traditional directed evolution. A typical directed evolution campaign targeting a single enzyme (not a genome) screens 10,000 to 100,000 variants at roughly $0.10 to $1.00 per variant in high-throughput assay costs, yielding hit rates of 0.01% to 0.1%. The cost per improved variant: $10,000 to $100,000. Evo 2's genome-level hit rate is 50 to 500 times higher than directed evolution's protein-level hit rate, even though the design task is orders of magnitude more complex.

That asymmetry will not hold as target genomes scale. ΦX174 at 5.4 kilobases costs $3,000 to synthesize. A therapeutically relevant phage like T4 at 169 kilobases would cost roughly $84,000 to $170,000 per candidate at current synthesis prices. If hit rates hold at 5.6%, producing one viable T4-scale phage would cost approximately $1.5 to $3 million. If hit rates drop with genome complexity, as they almost certainly will, costs escalate further.

Why This Matters for Antibiotic Resistance

Drug-resistant bacteria killed an estimated 1.27 million people directly and were associated with 4.95 million deaths in 2019, according to the landmark Lancet analysis. A September 2024 follow-up projected 39 million direct antimicrobial resistance (AMR) deaths between now and 2050. No new antibiotic class has been approved since 2003. The pipeline is empty because the economics are broken: antibiotics are prescribed for days, not decades, and resistance erodes returns before development costs are recouped.

Phage therapy, in which bacteriophages are deployed to kill specific bacterial pathogens, has been used clinically in Georgia and Poland for decades but has never received FDA approval. The fundamental bottleneck is personalization. Each bacterial infection involves specific strains with specific surface receptors. Finding a natural phage that matches a patient's specific pathogen requires screening enormous phage banks, a process that takes days to weeks while the patient may be dying. Bacteria also evolve resistance to phages, sometimes within hours, requiring rapid switching to alternative phages.

AI-designed phages could invert this constraint. If a model like Evo 2 can generate hundreds of candidate phages targeting a specific bacterial strain in hours, then screen them computationally before committing to synthesis, the design-build-test cycle contracts from weeks to days. The vision is a compiler for custom antibiotics: sequence the pathogen, feed the genome to an AI, generate custom phages, synthesize the top candidates, screen for activity, and deploy.

The Clinical Reality Check

That vision remains distant from clinical reality. BiomX's BX004, a nebulized phage therapy targeting Pseudomonas aeruginosa in cystic fibrosis patients, was discontinued in early 2026 after a Phase 2b trial revealed safety concerns. BX004 used natural phages with known antibacterial activity, not AI-designed ones, and it still failed. A separate BiomX program, BX011 targeting Staphylococcus aureus in diabetic foot infections, received positive FDA feedback for a Phase 3 pathway, but no phage therapy has ever cleared Phase 3 in the United States.

The gap between "forms plaques on an agar plate" and "cures a patient" is vast. In vitro lytic activity does not guarantee in vivo efficacy. Immune clearance, tissue penetration, phage pharmacokinetics, bacterial biofilm architecture, and the sheer complexity of polymicrobial infections all intervene. Natural phages with decades of laboratory characterization fail in clinical trials. AI-designed phages with no in vivo data face every one of those hurdles plus a novel regulatory question: how does the FDA evaluate a therapeutic organism that was designed by a neural network?

Beyond Phages: What Evo 2 Has Already Validated

Phage design is the most dramatic result, but the Nature publication documents a broader portfolio of validated capabilities. Zero-shot variant effect prediction on BRCA1 mutations achieved over 90% accuracy classifying variants as benign or pathogenic without any task-specific training. Independent teams have applied Evo 2 to Alzheimer's disease risk prediction via APOE variant scoring and to farm animal genetics with cross-species variant categorization (AUROC 0.921). Chromatin accessibility designs were experimentally validated in mouse embryonic stem cells with AUROCs of 0.92 to 0.95, and 4 of 24 cell-type-specific designs produced more than a 2-fold difference in accessibility between human cell lines.

Perhaps most surprising: Breslow et al. demonstrated that Evo 2 exhibits in-context learning, a capability previously assumed to require human-language training data. The model can make predictions based purely on examples in the prompt, outperforming similarly-scaled text-based LLMs on certain tasks. The finding suggests that the statistical structure of DNA itself is rich enough to support emergent reasoning.

Limitations

Several constraints bound how far these results generalize. All 16 viable phage designs targeted a single species (ΦX174-family Microviridae infecting E. coli). This is the simplest well-characterized phage in biology: 5,386 bases, 11 genes, a single host. Clinically relevant phages like T4 (169 kb, ~300 genes) or phages targeting drug-resistant Klebsiella, Acinetobacter, or Pseudomonas are 10 to 50 times larger with far more complex host-range determinants. Whether Evo 2's approach scales to those genomes is an open question with no experimental answer.

A 5.6% hit rate means 94.4% of synthesis and screening costs are wasted on non-viable designs. For small genomes, this is manageable. At therapeutic-phage scale, each failed candidate could cost $100,000 or more in synthesis alone. Fine-tuning was required; the base model could not generate viable phages without Microviridae-specific training, suggesting that each new phage family will require its own fine-tuning campaign and validation dataset.

No animal model testing has been performed. The entire experimental validation occurred on agar plates with laboratory E. coli strains. The eukaryotic virus exclusion from training data is a genuine biosecurity measure, but it also means Evo 2 cannot currently be applied to the design of oncolytic viruses or other therapeutic viral agents that target human cells.

Finally, the open-source release cuts both ways. Full model weights, training data, and fine-tuning code are publicly available. Arc's red-teaming evaluations showed that generated sequences are "effectively random" for pathogenic viral proteins, and they excluded human-infecting viruses from training. But future models built on Evo 2's architecture with different training data might not carry those guardrails.

The Bottom Line

Evo 2 represents the first experimental proof that AI can design living organisms from scratch. Not proteins. Not regulatory elements. Organisms that replicate, kill target cells, and propagate. The hit rate is low, the target was the simplest phage in biology, and the clinical path from petri dish to patient is measured in decades, not years. But the capability boundary has moved. In 2020, AI predicted protein structures. In 2023, it designed new proteins. In 2026, it designed new life. Each step collapses a complexity barrier that the previous generation of models could not address. Whoever figures out how to push hit rates above 50% and scale to therapeutically relevant genomes will have built something closer to a biological compiler than anything molecular biology has produced in 70 years of trying.