TR2026-036

Evaluating Security Policy Compliance in Infrastructure as Code Generated by Large Language Models


Abstract:

Infrastructure as Code (IaC) automates cloud re- source provisioning, yet developing and maintaining IaC scripts remains challenging due to variations in domain-specific lan- guages across providers. Recent advances in large language models (LLMs) offer promise for automating IaC generation, but the security policy compliance of LLM-generated IaC scripts is as important as deployability. In this work, we empirically evaluate configuration-level policy violations in LLM-generated IaC scripts using the IaC-Eval benchmark and Checkov for security policy assessment. Our results indicate that modern LLMs available as of 2025 exhibit improved syntactic correctness and better alignment with user intent, particularly when using retries and error feedback. However, security policy violations persist in generated IaC scripts, typically ranging from around five to fifteen per script across six difficulty levels defined in IaC-Eval. These results underscore the necessity of rigorous verification before deploying generated IaC scripts.