Blog

RSS

lf-lean: The frontier of verified software engineering

We present lf-lean, a verified translation of all 1,276 statements of the Logical Foundations textbook from Rocq to Lean, produced by frontier AI with ~2 person-days of human effort versus an estimated ~2.75 person-years manually (a 350x speed-up). We achieve this through task-level specification generators: because many software transformations are semantics-preserving, correctness can be defined once for an entire task class and checked automatically across all instances and codebases. This scales human oversight from π’ͺ(𝓃) to π’ͺ(1) regardless of program complexity. Placed on METR’s time horizon graph, our result suggests verified software engineering is advancing faster than expected.

Systematically generating tests that would have caught Anthropic’s top‑K bug

We introduce fractional proof decomposition, a technique for scaling testing compute logarithmically, instead of linearly, with bug rarity. We achieve this efficiency by fusing partial evaluation and property-based testing.