HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

1ByteDance, 2Peking University, 3Xiamen University, 4Tsinghua University
* denotes equal contribution, † denotes project leader, ‡ denotes corresponding author

HumanSplat predicts 3D Gaussian Splatting properties from a single input image in a generalizable manner.

Abstract

Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat that predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.

Novel View Synthesis

Comparsion

Related Links

There's a lot of excellent work that was introduced.

SIFU (CVPR 24) is capable of reconstructing a high-quality 3D clothed human model, making it well-suited for practical applications such as scene creation and 3D printing.

TeCH (3DV 24) reconstructs the 3D human by leveraging descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance.

GTA (NeurIPS 23) is a novel transformer-based architecture that reconstructs clothed human avatars from monocular images.

BibTeX

If you find our work helpful, please consider citing:

    @misc{pan2024humansplat,
          title={HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors}, 
          author={Panwang Pan and Zhuo Su and Chenguo Lin and Zhen Fan and Yongjie Zhang and Zeming Li and Tingting Shen and Yadong Mu and Yebin Liu},
          year={2024},
          eprint={2406.12459},
          archivePrefix={arXiv},
          primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
    }