Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
The paper introduces Motion Mamba, a new framework for efficient and long sequence motion generation. It integrates the Mamba model, which is a state space model (SSM) architecture, into the domain of motion generation. The main idea is to leverage the capabilities of SSMs for long-range dependency modeling and efficient computation to improve motion generation tasks.
Method Overview
The method builds upon the Mamba model and adapts it for motion generation. An overview of the model can be shown below:
The Motion Mamba framework consists of two key components:
Hierarchical Temporal Mamba (HTM) block: This block is designed to process temporal motion data by arranging motion frames in sequential order using hierarchically adjusted scanning. It employs a set of SSM modules with varying numbers of scans to capture temporal dependencies at different levels of detail.
Bidirectional Spatial Mamba (BSM) block: This block is designed to unravel the structured latent skeleton by evaluating data from both forward and reverse directions. It aims to ensure the continuity of information flow and enhance the model's ability to generate precise motion by maintaining dense informational exchange.
The framework follows a denoising U-Net architecture, with the encoder and decoder blocks comprising the Hierarchical Temporal Mamba (HTM) and Bidirectional Spatial Mamba (BSM) components.
Results
The paper reports significant improvements achieved by Motion Mamba compared to previous state-of-the-art methods:
Up to 50% improvement in Fréchet Inception Distance (FID) on the HumanML3D dataset, indicating superior generation quality and diversity.
Up to 4 times faster inference speed compared to previous diffusion-based methods, enabling real-time motion generation.
Superior performance in long-sequence motion generation, demonstrated on the HumanML3D-LS dataset.
Conclusion
The Motion Mamba framework introduces a novel approach to motion generation by integrating selective state space models (SSMs) into a specialized architecture designed for efficient and accurate long-sequence motion modeling. For more details please consult the full paper or the project page.
Congrats to the authors for their work!
Zhang, Zeyu, et al. "Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM." arXiv preprint arXiv:2403.07487, 12 Mar. 2024