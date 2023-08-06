Numerous human-centric tasks, such as 3D whole-body mesh recovery and human-object interaction, rely on whole-body pose estimation. The use of algorithms like OpenPose and MediaPipe has made recording human postures for virtual content development and VR/AR more popular. However, these tools still have performance limitations that need improvement to fully realize their potential.

Compared to human pose estimation with body-only key points detection, whole-body pose estimation is more challenging due to several factors. The hierarchical structures of the human body, small resolutions of the hand and face, matching complex body parts with multiple people in an image (especially for occlusion and difficult hand poses), and data limitations for diverse hand and head poses in whole-body images all contribute to the difficulties.

To address these challenges, researchers from Tsinghua Shenzhen International Graduate School and International Digital Economy Academy propose a two-stage pose distillation architecture called DWPose. They use the latest pose estimator, RTMPose, trained on COCO-WholeBody, as their base model. In the first stage, they use the teacher model’s intermediate layer and final logits to guide the student model. Instead of only using visible key points, they employ the teacher’s entire outputs as final logits to aid in the learning process. Weight decay is also used to increase effectiveness.

In the second stage, they apply head-aware self-KD to improve the localization accuracy of the head. By building two identical models and updating only the head of the student model using logit-based distillation, they achieve better outcomes with 20% less training time. Additionally, they incorporate an extra UBody dataset with comprehensive face and hand key points to address data limitations and improve performance.

The contributions of their work include exploring more comprehensive training data for diverse and expressive hand gestures and facial expressions, introducing a two-stage pose knowledge distillation method for efficient and precise whole-body pose estimation, and demonstrating the effectiveness and efficiency of DWPose in generating work.

These developments in human pose assessment technologies have the potential to enhance user-driven content production in various applications.