Given a single image, the existing image-to-3D generative models always synthesize 3D meshes with flawed geometry and RGB textures only. Our method not only boosts existing approaches with PBR materials, empowering relighting under various lighting conditions, but also boosts the object's normal maps, capturing more intricate details and better aligning with the given image.
Automatic 3D content creation has gained increasing attention recently, due to its potential in various applications such as video games, film industry, and AR/VR. Recent advancements in diffusion models and multimodal models have notably improved the quality and efficiency of 3D object generation given a single RGB image. However, 3D objects generated even by state-of-the-art methods are still unsatisfactory compared to human-created assets. Considering only textures instead of materials makes these methods encounter challenges in photo-realistic rendering, relighting, and flexible appearance editing. And they also suffer from severe misalignment between geometry and high-frequency texture details. In this work, we propose a novel approach to boost the quality of generated 3D objects from the perspective of Physics-Based Rendering (PBR) materials. By analyzing the components of PBR materials, we choose to consider albedo, roughness, metalness, and bump. %in a single image for 3D object generation. For albedo and bump, we leverage Stable Diffusion fine-tuned on synthetic data to extract these values, with novel usages of these fine-tuned models to obtain 3D consistent albedo UV and bump UV for generated objects. In terms of roughness and metalness, we adopt a semi-automatic process to provide room for interactive adjustment, which we believe is more practical. Extensive experiments demonstrate that our model is generally beneficial for various state-of-the-art generation methods, significantly boosting the quality and realism of their generated 3D objects, with natural relighting effects and substantially improved geometry.
Method Overview. Given a single image, we first convert it to an albedo image using our fine-tuned diffusion model. Conditioned on this derived albedo, the base method to be boosted will generate multi-view albedo images and then fuse them into a 3D mesh and an albedo UV map. Afterwards, we leverage a 3D semantic mask to obtain complete metalness and roughness UV maps by acquiring the VLMs or 3D artists' manual adjustment. Moreover, an iterative normal refinement is employed to boost the original flawed normals, empowering realistic relighting results.
Image (Left 1), Albedo (Left 2), Original Normal (Left 3), Boosted Normal (Left 4)
CRM-based
InstantMesh-based
Wonder3D-based
CRM-based
TripoSR-based
InstantMesh-based
The artist-crafted objects presented below are sourced from the Objaverse-XL dataset, a freely accessible, open-source collection of 3D assets.
Albedo (Left 1), Original Normal (Left 2), Boosted Normal (Left 3)
Chunky Knight
Joker
Fallout Car
Darius
Genshin Impact Lumine
Batman
Carnivorous Plant
Hive
Albedo (Left 1), PBR (Left 2), Relighting (Left 3)
Iron Man
Shield
Treasure Chest
Knight
Pistol
Darius
This work is mainly supported by Shanghai Artificial Intelligence Laboratory.
@inproceedings{wang2024boosting3dobjectgeneration,
author = {Wang, Yitong and Xu, Xudong and Ma, Li and Wang, Haoran and Dai, Bo},
title = {Boosting 3D object generation through PBR materials},
year = {2024},
booktitle = {SIGGRAPH Asia 2024 Conference Papers},
articleno = {140},
numpages = {11},
series = {SA '24}
}
@misc{wang2024boosting3dobjectgeneration,
title={Boosting 3D Object Generation through PBR Materials},
author={Yitong Wang and Xudong Xu and Li Ma and Haoran Wang and Bo Dai},
year={2024},
eprint={2411.16080},
archivePrefix={arXiv},
primaryClass={cs.CV}
}