Moving by Looking: Towards Vision-Driven Avatar Motion Generation

Markos Diomataris, Berat Mert Albaba, Giorgio Becherini, Partha Ghosh, Omid Taheri, Michael J. Black

arXiv 2025

What is CLOPS?

Human-Like motion requires human-like perception. We create a human motion generation system, named CLOPS, purely driven by egocentric visual observations. CLOPS is able to realistically move in a scene and use egocentric vision in order to find a goal (red sphere). We achieve this by combining a data driven low level motion prior with a Q-Learning policy that effectively create a loop of visual perception and motion.

This video includes audio narration

Visual overview of CLOPS approach showing motion generation through visual observation.

Qualitative Results

The following examples demonstrate CLOPS in action. The Q-Network recieves egocentric observations at 1Hz and predicts goal poses for the avatar’s head (visualised as coordinate frames). The motion generation network then generates natural motion in order to reach these head goals and the loop continues:

Video