AlphaEvolve Automates FHE Optimization on TPUs, Achieving Up to 2.5x Speedup in Homomorphic Encryption

instant Polaroid photograph, vintage 1970s aesthetic, faded colors, white border frame, slightly overexposed, nostalgic lo-fi quality, amateur snapshot, a small wooden lockbox sitting slightly off-center on a sunlit wall corner, its surface worn but warm, a faint golden light pulsing from within as intricate gears silently shift and reconfigure behind the locked lid, morning light from the left casting a soft shadow, stillness and quiet transformation in the air [Z-Image Turbo]
It is curious how a machine, instructed not by dogma but by trial and feedback, has refined the inner workings of encrypted calculation—reducing the time once spent in patient waiting by more than half. The cautious among us will want to verify these figures.
AlphaEvolve Automates FHE Optimization on TPUs, Achieving Up to 2.5x Speedup in Homomorphic Encryption In Plain English: Sending data to the cloud is risky because it can be seen or stolen. Fully Homomorphic Encryption is a way to do math on encrypted data without ever unlocking it—like solving a math problem inside a locked box. But this method is usually very slow. To fix this, researchers used an AI system that automatically tests and improves how the math is done on special computer chips called TPUs. The AI found ways to make the encrypted math run up to 2.5 times faster than the best hand-designed methods. This could help make secure cloud computing much more practical in real life, like for private medical or financial data. Summary: Fully Homomorphic Encryption (FHE) allows computations on encrypted data, preserving privacy, but suffers from high computational overhead that limits its scalability. While hardware accelerators like Google’s Tensor Processing Units (TPUs) offer potential performance gains, effectively mapping complex FHE kernels onto these architectures—especially the systolic array-based Matrix Multiplication Unit (MXU) and Vector Processing Units (VPUs)—requires intricate, low-level optimization that existing compiler stacks often fail to deliver. Developers are typically forced into a tedious, manual trial-and-error process, leading to inefficient resource utilization and fragmented execution. To overcome this bottleneck, the authors adapted AlphaEvolve, an AI-driven evolutionary optimization framework, to automate the discovery of hardware-aware FHE kernel implementations. The system frames optimization as a closed-loop evolutionary search, where Large Language Models (LLMs) generate candidate code variants, which are then compiled, executed on real TPU hardware (TPUv5e), and evaluated for both performance and correctness. Feedback from actual hardware execution guides subsequent generations of code evolution, enabling the system to navigate the complex trade-offs between cryptographic operations, data movement, and hardware utilization. The approach was evaluated on core primitives of two major FHE schemes: TFHE (via Jaxite) and CKKS (via CROSS). Within 24 hours of automated exploration, AlphaEvolve discovered optimizations that reduced TFHE bootstrap latency by 2.5x and improved CKKS rotation and multiplication latencies by 1.31x and 1.18x, respectively, compared to state-of-the-art human-engineered implementations. These results demonstrate that AI-driven, hardware-in-the-loop optimization can significantly accelerate the development of efficient FHE software, enabling better co-design across cryptography, compilers, and hardware accelerators (Abraham et al., 2026). Key Points: - Fully Homomorphic Encryption (FHE) enables secure computation on encrypted data but is computationally intensive. - Google Tensor Processing Units (TPUs) can accelerate FHE but require expert-level, low-level optimization for efficiency. - Manual optimization of FHE kernels on TPUs is slow, error-prone, and often suboptimal due to hardware complexity. - AlphaEvolve uses LLM-driven code generation and evolutionary search guided by real hardware feedback to automate optimization. - The system co-optimizes interactions between MXU, VPU, and vector register files for efficient execution. - Evaluated on TPUv5e using TFHE (Jaxite) and CKKS (CROSS) schemes. - Achieved 2.5x reduction in TFHE bootstrap latency and 1.31x/1.18x improvements in CKKS rotation and multiplication. - Optimization process completed within 24 hours of automated exploration. - Results outperform human-engineered state-of-the-art implementations. - Demonstrates potential for AI to accelerate systems-level optimization in cryptography and hardware. Notable Quotes: - "To accelerate this development process, we use AlphaEvolve to automate the exploration of hardware-aware cryptographic-kernel optimizations." - "We frame optimization as an evolutionary search problem, utilizing the closed-loop system provided by AlphaEvolve, that leverages LLM-driven code generation." - "These results demonstrate that AlphaEvolve can be used to enable researchers to navigate the optimization trade-offs between cryptography, compilers, and hardware accelerators." Data Points: - TFHE bootstrap latency improved by 2.5x. - CKKS rotation latency improved by 1.31x. - CKKS multiplication latency improved by 1.18x. - Optimization achieved within 24 hours of automated exploration. - Experiments conducted on Google Cloud TPUv5e. - Optimization targeted both TFHE (Jaxite) and CKKS (CROSS) FHE schemes. Controversial Claims: - The claim that AlphaEvolve can outperform human experts in low-level cryptographic kernel optimization within just 24 hours may be seen as bold, especially given the depth of domain-specific knowledge typically required in FHE and hardware optimization. - The reliance on LLM-driven code generation for safety-critical cryptographic implementations raises questions about long-term reliability, auditability, and potential for subtle bugs that pass correctness tests but fail under edge cases. - The paper implies a paradigm shift toward AI-driven systems programming, which could be interpreted as downplaying the role of human expertise or overlooking maintainability and interpretability concerns in production environments. Technical Terms: - Fully Homomorphic Encryption (FHE), TFHE, CKKS, Tensor Processing Unit (TPU), Matrix Multiplication Unit (MXU), Vector Processing Unit (VPU), vector register files, cryptographic kernels, hardware acceleration, systolic array, compiler stack, AlphaEvolve, evolutionary search, LLM-driven code generation, closed-loop optimization, hardware-in-the-loop, Jaxite, CROSS, bootstrap, rotation, multiplication, latency, co-optimization, data movement, hardware-aware optimization —Ada H. Pemberley Dispatch from The Prepared E0
Published May 15, 2026
ai@theqi.news