publications
Listed below are my publications in reversed chronological order.
2025
- ArxivLess is More: Improving Motion Diffusion Models with Sparse KeyframesJinseok Bae, Inwoo Hwang, Young Yoon Lee, Ziyu Guo, Joseph Liu, Yizhak Ben-Shabat, Young Min Kim, and Mubbasir Kapadia2025
Recent advances in motion diffusion models have led to remarkable progress in diverse motion generation tasks, including text-to-motion synthesis. However, existing approaches represent motions as dense frame sequences, requiring the model to process redundant or less informative frames. The processing of dense animation frames imposes significant training complexity, especially when learning intricate distributions of large motion datasets even with modern neural architectures. This severely limits the performance of generative motion models for downstream tasks. Inspired by professional animators who mainly focus on sparse keyframes, we propose a novel diffusion framework explicitly designed around sparse and geometrically meaningful keyframes. Our method reduces computation by masking non-keyframes and efficiently interpolating missing frames. We dynamically refine the keyframe mask during inference to prioritize informative frames in later diffusion steps. Extensive experiments show that our approach consistently outperforms state-of-the-art methods in text alignment and motion realism, while also effectively maintaining high performance at significantly fewer diffusion steps. We further validate the robustness of our framework by using it as a generative prior and adapting it to different downstream tasks. Source code and pre-trained models will be released upon acceptance.
@article{bae2025less, title = {Less is More: Improving Motion Diffusion Models with Sparse Keyframes}, author = {Bae, Jinseok and Hwang, Inwoo and Lee, Young Yoon and Guo, Ziyu and Liu, Joseph and Ben-Shabat, Yizhak and Kim, Young Min and Kapadia, Mubbasir}, year = {2025}, }
- CVPR Workshop (HuMoGen)Goal-Driven Human Motion Generation in Diverse TasksInwoo Hwang, Jinseok Bae, Donggeun Lim, and Young Min Kim2025
We propose a framework for goal-driven human motion generation, which can synthesize interaction-rich scenarios. Given the goal positions for key joints, our pipeline automatically generates natural full-body motion that approaches the target in cluttered environments. Our pipeline solves the complex constraints in a tractable formulation by disentangling the process of motion generation into two stages. The first stage computes the trajectory of the key joints like hands and feet to encourage the character to naturally approach the target position while avoiding possible physical violation. We demonstrate that diffusion-based guidance sampling can flexibly adapt to the local scene context while satisfying goal conditions. Then the subsequent second stage can easily generate plausible full-body motion that traverses the key joint trajectories. The proposed pipeline applies to various scenarios that have to concurrently account for 3D scene geometry and body joint configurations.
@article{hwang2025goal, title = {Goal-Driven Human Motion Generation in Diverse Tasks}, author = {Hwang, Inwoo and Bae, Jinseok and Lim, Donggeun and Kim, Young Min}, year = {2025}, }
- Eurographics Short PaperAudio-aided Character Control for Inertial Measurement TrackingHojun Jang, Jinseok Bae, and Young Min Kim2025
Physics-based character control generates realistic motion dynamics by leveraging kinematic priors from large-scale data within a simulation engine. The simulated motion respects physical plausibility, while dynamic cues like contacts and forces guide compelling human-scene interaction. However, leveraging audio cues, which can capture physical contacts in a cost-effective way, has been less explored in animating human motions. In this work, we demonstrate that audio inputs can enhance accuracy in predicting footsteps and capturing human locomotion dynamics. Experiments validate that audio-aided control from sparse observations (e.g., an IMU sensor on a VR headset) enhances the prediction accuracy of contact dynamics and motion tracking, offering a practical auxiliary signal for robotics, gaming, and virtual environments.
@article{jang2025audio, title = {Audio-aided Character Control for Inertial Measurement Tracking}, author = {Jang, Hojun and Bae, Jinseok and Kim, Young Min}, year = {2025}, }
- EurographicsVersatile Physics-based Character Control with Hybrid Latent RepresentationJinseok Bae, Jungdam Won, Donggeun Lim, Inwoo Hwang, and Young Min Kim2025
We present a versatile latent representation that enables physically simulated character to efficiently utilize motion priors. To build a powerful motion embedding that is shared across multiple tasks, the physics controller should employ rich latent space that is easily explored and capable of generating high-quality motion. We propose integrating continuous and discrete latent representations to build a versatile motion prior that can be adapted to a wide range of challenging control tasks. Specifically, we build a discrete latent model to capture distinctive posterior distribution without collapse, and simultaneously augment the sampled vector with the continuous residuals to generate high-quality, smooth motion without jittering. We further incorporate Residual Vector Quantization, which not only maximizes the capacity of the discrete motion prior, but also efficiently abstracts the action space during the task learning phase. We demonstrate that our agent can produce diverse yet smooth motions simply by traversing the learned motion prior through unconditional motion generation. Furthermore, our model robustly satisfies sparse goal conditions with highly expressive natural motions, including head-mounted device tracking and motion in-betweening at irregular intervals, which could not be achieved with existing latent representations.
@article{bae2025hybrid, title = {Versatile Physics-based Character Control with Hybrid Latent Representation}, author = {Bae, Jinseok and Won, Jungdam and Lim, Donggeun and Hwang, Inwoo and Kim, Young Min}, year = {2025}, }
2023
- ICCVDynamic Mesh Recovery from Partial Point Cloud SequenceHojun Jang, Minkwan Kim, Jinseok Bae, and Young Min Kim2023
The exact 3D dynamics of the human body provides crucial evidence to analyze consequences of the physical interaction between the body and the environment, which can eventually assist everyday activities in a wide range of applications. However, optimizing for 3D configurations from image observation requires a significant amount of computation, whereas real-world 3D measurements often suffer from noisy observation or complex occlusion. We resolve the challenge by learning a latent distribution representing strong temporal priors. We use a conditional variational autoencoder (CVAE) architecture with transformer to train the motion priors with a large-scale motion datasets. Then our feature follower effectively aligns the feature spaces of noisy, partial observation with the necessary input for pre-trained motion priors, and quickly recovers a complete mesh sequence of motion. We demonstrate that the transformer-based autoencoder can collect necessary spatio-temporal correlations robust to various adversaries, such as missing temporal frames, or noisy observation under severe occlusion. Our framework is general and can be applied to recover the full 3D dynamics of other subjects with parametric representations.
@article{jang2023dynamic, title = {Dynamic Mesh Recovery from Partial Point Cloud Sequence}, author = {Jang, Hojun and Kim, Minkwan and Bae, Jinseok and Kim, Young Min}, year = {2023}, video = {https://www.youtube.com/watch?v=OgineYrkgRE}, page = {https://hojunjang17.github.io/DynamicMeshRecovery/}, }
- SIGGRAPHPMP: Learning to Physically Interact with Environments using Part-wise Motion PriorsJinseok Bae, Jungdam Won, Donggeun Lim, Cheol-Hui Min, and Young Min Kim2023
We present a method to animate a character incorporating multiple part-wise motion priors (PMP). While previous works allow creating realistic articulated motions from reference data, the range of motion is largely limited by the available samples. Especially for the interaction-rich scenarios, it is impractical to attempt acquiring every possible interacting motion, as the combination of physical parameters increases exponentially. The proposed PMP allows us to assemble multiple part skills to animate a character, creating a diverse set of motions with different combinations of existing data. In our pipeline, we can train an agent with a wide range of part-wise priors. Therefore, each body part can obtain a kinematic insight of the style from the motion captures, or at the same time extract dynamics-related information from the additional part-specific simulation. For example, we can first train a general interaction skill, e.g. grasping, only for the dexterous part, and then combine the expert trajectories from the pre-trained agent with the kinematic priors of other limbs. Eventually, our whole-body agent learns a novel physical interaction skill even with the absence of the object trajectories in the reference motion sequence.
@article{bae2023pmp, title = {PMP: Learning to Physically Interact with Environments using Part-wise Motion Priors}, author = {Bae, Jinseok and Won, Jungdam and Lim, Donggeun and Min, Cheol-Hui and Kim, Young Min}, year = {2023}, video = {https://www.youtube.com/watch?v=WdLGvKdNG-0}, page = {https://jinseokbae.github.io/pmp}, }
2022
- AAAINeural marionette: Unsupervised learning of motion skeleton and latent dynamics from volumetric videoJinseok Bae, Hojun Jang, Cheol-Hui Min, Hyungun Choi, and Young Min Kim2022
We present Neural Marionette, an unsupervised approach that discovers the skeletal structure from a dynamic sequence and learns to generate diverse motions that are consistent with the observed motion dynamics. Given a video stream of point cloud observation of an articulated body under arbitrary motion, our approach discovers the unknown low-dimensional skeletal relationship that can effectively represent the movement. Then the discovered structure is utilized to encode the motion priors of dynamic sequences in a latent structure, which can be decoded to the relative joint rotations to represent the full skeletal motion. Our approach works without any prior knowledge of the underlying motion or skeletal structure, and we demonstrate that the discovered structure is even comparable to the hand-labeled ground truth skeleton in representing a 4D sequence of motion. The skeletal structure embeds the general semantics of possible motion space that can generate motions for diverse scenarios. We verify that the learned motion prior is generalizable to the multi-modal sequence generation, interpolation of two poses, and motion retargeting to a different skeletal structure.
@article{bae2022neural, title = {Neural marionette: Unsupervised learning of motion skeleton and latent dynamics from volumetric video}, author = {Bae, Jinseok and Jang, Hojun and Min, Cheol-Hui and Choi, Hyungun and Kim, Young Min}, year = {2022}, }
2021
- Eurographics Short PaperAuto-rigging 3D Bipedal Characters in Arbitrary PosesJeonghwan Kim, Hyeontae Son, Jinseok Bae, and Young Min Kim2021
We present an end-to-end algorithm that can automatically rig a given 3D character such that it is ready for 3D animation. The animation of a virtual character requires the skeletal motion defined with bones and joints, and the corresponding deformation of the mesh represented with skin weights. While the conventional animation pipeline requires the initial 3D character to be in the predefined default pose, our pipeline can rig a 3D character in arbitrary pose. We handle the increased ambiguity by fixing the skeletal topology and solving for the full deformation space. After the skeletal positions and orientations are fully discovered, we can deform the provided 3D character into the default pose, from which we can animate the character with the help of recent motion-retargeting techniques. Our results show that we can successfully animate initially deformed characters, which was not possible with previous works.
@article{kim2021autorigging, title = {Auto-rigging 3D Bipedal Characters in Arbitrary Poses}, author = {Kim, Jeonghwan and Son, Hyeontae and Bae, Jinseok and Kim, Young Min}, year = {2021}, paper = {https://diglib.eg.org/handle/10.2312/egs20211023}, video = {https://www.youtube.com/watch?v=1UVNbxYLkE8}, }
- CVPRGatsbi: Generative agent-centric spatio-temporal object interactionCheol-Hui Min, Jinseok Bae, Junho Lee, and Young Min Kim2021
We present GATSBI, a generative model that can transform a sequence of raw observations into a structured latent representation that fully captures the spatio-temporal context of the agent’s actions. In vision-based decision-making scenarios, an agent faces complex high-dimensional observations where multiple entities interact with each other. The agent requires a good scene representation of the visual observation that discerns essential components and consistently propagates along the time horizon. Our method, GATSBI, utilizes unsupervised object-centric scene representation learning to separate an active agent, static background, and passive objects. GATSBI then models the interactions reflecting the causal relationships among decomposed entities and predicts physically plausible future states. Our model generalizes to a variety of environments where different types of robots and objects dynamically interact with each other. We show GATSBI achieves superior performance on scene decomposition and video prediction compared to its state-of-the-art counterparts.
@article{min2021gatsbi, title = {Gatsbi: Generative agent-centric spatio-temporal object interaction}, author = {Min, Cheol-Hui and Bae, Jinseok and Lee, Junho and Kim, Young Min}, year = {2021}, video = {https://www.youtube.com/watch?v=nAf87_0T5CE} }