When a humanoid video is captioned "the robot decides to pick up the cup," the verb is doing unearned work. Figure AI's grant US12638859B2, "Bipedal action model for humanoid robot" (issued May 26, 2026), lets us replace "decides" with something falsifiable: it predicts.
The classification is the tell. This robotics patent carries CPC G06F 40/40 — natural-language processing — alongside G06V vision classes and G05D 1/495 for the locomotion control. That pairing is not decorative. It means Figure's control stack is built like a sequence model: take in perception and a goal, predict the next action, repeat. "Action model" is the deliberate name; it rhymes with "language model" because it is the same machinery pointed at motors instead of words.
Count the actuators and the story changes; read the classification and it changes again. A learned action model is powerful precisely because it generalizes from demonstration data rather than from hand-written rules. But it inherits the same limits as any large predictive model: it is confident inside its training distribution and brittle outside it. A humanoid running an action model will look fluid doing tasks it has seen and fail in ways that are hard to predict on tasks it hasn't.
The demo-versus-filing gap lives right here. A staged video shows the model inside its comfort zone — the curated tasks, the controlled lighting, the rehearsed sequence. The patent describes the mechanism that makes that fluency possible and, by the same token, makes the failure modes statistical rather than logical. The robot is not reasoning about the cup; it is predicting the action sequence that, in training, accompanied scenes that looked like this one.
What the grant cannot tell you is the success rate on unseen tasks — the only number that matters for deployment. Method claims describe how the policy is structured, not how often it works in a kitchen it has never seen. That gap is the entire distance between a compelling reel and a dependable worker.
So the honest reframe for anyone tracking Figure: this is impressive learned control, in the lineage of the models that power chatbots, applied to legs and arms. Call it prediction, not cognition, and the filing and the demo finally agree.