ST-GDance

Abstract

Creating group choreography from music is crucial for applications in entertainment and virtual reality. However, existing methods face challenges such as high computational costs and difficulties in generating long sequences and modeling interactions between multiple dancers. To address these issues, we propose ST-GDance, a novel framework that can efficiently separate spatial and temporal dependencies. By doing this, it allows us to use lightweight graph convolutions for spatial modeling and sparse attention mechanisms for temporal modeling. Thus, our approach reduces computational complexity and ensures smooth and collision-free interactions between dancers. Experiments on the GDance dataset show that ST-GDance outperforms state-of-the-art methods, especially in long-term group dance generation tasks.

Controllability

Our method allows control over the generated choreography by adjusting the edge weights of the underlying interaction graph. As shown in the figure, when the graph is fully connected with uniform weights, the generated dancers tend to exhibit highly synchronized movements and maintain equal pairwise distances in their positions. This demonstrates the flexibility of our spatial modeling in influencing group formation and motion coherence. In summary, our method demonstrates the potential for node-level graph control, enabling fine-grained influence over individual dancer behavior through structured spatial relationships.

Framework

Our framework comprises two core components: the Spatial Modeling Block (SMB) and the Temporal Dependency Module (TDM). To reduce computational burden while ensuring realistic coordination, we first employ SMB to capture inter-dancer spatial relations through a lightweight graph convolutional network (GCN), which models spatial distances and suppresses collisions. This enables the model to learn structured formations and avoid dancer ambiguity. Then, TDM leverages a hybrid attention design combining Local Dependency Transformer (LDT) and Differential Attention to efficiently model temporal dynamics across extended sequences. These modules operate in a decoupled manner, with spatial features guiding formation consistency and temporal features ensuring motion coherence. Together, they enable scalable and collision-free group choreography generation.

Citation

If you find this work useful, please consider citing our paper:

@article{xu2025stgdance,
    title={ST-GDance: Long-Term and Collision-Free Group Choreography from Music},
    author={Xu, Jing and Wang, Weiqiang and Chen, Cunjian and Liu, Jun and Ke, Qiuhong},
    journal={arXiv preprint arXiv:2507.21518},
    year={2025}
}

ST-GDance: Long-Term and Collision-Free Group Choreography from Music

Abstract

Generate Results

2-Dancers

3-Dancers

4-Dancers

5-Dancers

Controllability

Framework

Citation