Software & Data Downloads — LLMPhy

Parameter-Identifiable Physical Reasoning Combining Large Language Models and Physics Engines for parameter-identifiable physical reasoning combining multimodal large language models and physics engines for simulation-grounded world modeling.

LLMPhy-TraySim is a synthetic benchmark for parameter-identifiable physical reasoning, designed to rigorously evaluate whether modern large language models (LLMs) and vision-language models (VLMs) can move beyond pattern recognition to recover physical parameters and generalize them across dynamical settings. The central objective of LLMPhy-TraySim is to assess a model’s ability to (i) infer intrinsic physical properties of a dynamical system and (ii) predict event outcomes under novel configurations where the underlying physical parameters remain invariant but the scene layout and external perturbations change. Unlike traditional perception benchmarks, this formulation explicitly tests causal and transferable reasoning, rather than memorization of visual patterns. To foster research into this topic, we are publicly releasing the dataset.