arxiv:2606.19419

Playful Agentic Robot Learning

Published on Jun 17

· Submitted by

Junyi Zhang on Jun 19

#3 Paper of the day

UC Berkeley

Upvote

Authors:

Junyi Zhang ,

Zirui Wang ,

Abstract

Embodied robots learn reusable skills through self-directed play and exploration, then apply these skills to improve performance on downstream tasks without additional training.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

View arXiv page View PDF Project page GitHub 24 Add to collection

Community

Junyi42

Paper author Paper submitter about 16 hours ago

RATs is a multi-agent Code-as-Policy system for lifelong robot skill learning. During free-form play a team of LLM agents invents its own tasks, writes code-as-policy, and distills successful executions into a reusable skill library; at evaluation those skills are reused as planner context — no gradients, no RL, all learning through structured natural-language feedback and code reuse.

noahml

about 3 hours ago

Neat paper. The shift from task-driven agentic systems to using self-directed play for skill building feels like a really natural evolution for Code-as-Policy workflows. It’s impressive that the agents can distill those exploratory attempts into a reusable library that actually helps with downstream performance without needing extra finetuning.

I’m curious, how does the agent decide which exploratory tasks are "learnable" during the play stage to ensure the skill library remains high-quality?

I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/644799ed-3472-44fb-9dcf-31a87f7aac51