LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
Published:
Reading
Abstract
任务:
3D immersive scene generation
现有方法的问题:
A desired virtual 3D scene should 1) exhibit omnidirectional view consistency, and 2) allow for free exploration in complex scene hierarchies. Existing methods either rely on successive scene expansion via inpainting or employ panorama representation to represent large FOV scene environments. However, the generated scene suffers from semantic drift during expansion and is unable to handle occlusion among scene hierarchies.
提出的方法:
To tackle these challenges, we introduce LAYERPANO3D, a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt. Our key insight is to decompose a reference 2D panorama into multiple layers at different depth levels, where each layer reveals the unseen space from the reference views via diffusion prior.
展开方法:
LAYERPANO3D comprises multiple dedicated designs: 1) we introduce a novel text-guided anchor view synthesis pipeline for high-quality, consistent panorama generation. 2) We pioneer the Layered 3D Panorama as underlying representation to manage complex scene hierarchies and lift it into 3D Gaussians to splat detailed 360-degree omnidirectional scenes with unconstrained viewing paths.
实验验证:
Extensive experiments demonstrate that our framework generates state-of-the-art 3D panoramic scene in both full view consistency and immersive exploratory experience.
Intro
任务背景:
THe development of spatial computing, including virtual and mixed reality systems, greatly enhances user engagement across various applications, and drives demand for explorable, high-quality 3D environments. We contend that a desired virtual 3D scene should 1) exhibit high-quality and consistency in appearance and geometry across the full 360◦×180◦view; 2) allow for free exploration among complex scene hierarchies with clear parallax. In recent years, many approaches in 3D scene generation [1], [2], [3] were proposed to address these needs.
现有方法的问题:
One branch of works [4], [5], [6], [7], [8], [9] seeks to create extensive scenes by leveraging a navigate-and-imagine” strategy, which successively applies novel-view rendering and outpaints unseen areas to expand the scene. However, this type of approaches suffer from the semantic drift issue: long sequential scene expansion easily produces incoherent results as the out-paint artifacts accumulate through iterations, hampering the global consistency and harmony of the generated scene.
Another branch of methods [10], [11], [12], [13], [14], [15] employs Equirectangular Panorama to represent 360◦, large field of view (FOV) environments in 2D. However, the absence of large-scale panoramic datasets hinders the capability of panorama generation systems, resulting in low-resolution images with simple structures and sparse assets. Moreover, 2D panorama [10], [11], [13] does not allow for free scene exploration. Even when lifted to a panoramic scene [16], the simple spherical structure fails to provide complex scene hierarchies with clear parallax, leading to occluded spaces that cause blurry renderings, ambiguity, and gaps in the generated 3D panorama. Some methods [17] typically use inpainting-based disocclusion strategy to fill in the unseen spaces, but they require specific, predefined rendering paths tailored for each scene, limiting the potential for free exploration.
本文做法:
To this end, we present LAYERPANO3D, a novel framework that leverages Multi-Layered 3D Panorama for full-view consistent and free exploratory scene generation from text prompts. The main idea is to create a Layered 3D Panorama by first generating a reference panorama and treating it as a multi-layered composition, where each layer depicts scene content at a specific depth level. In this regard, it allows us to create complex scene hierarchies by placing occluded assets in different depth layers at full appearance. Our contributions are two-fold. First, to generate high-quality and coherent 360◦×180◦panoramas, we propose a novel text-guided anchor view synthesis pipeline. By finetuning a T2I model [18] to generate 4 orthogonal perspective views as anchors, we prevent semantic drifts during panorama generation, while ensuring a consistent horizon level across all views. Furthermore, the anchor views enrich the panorama by incorporating complex structures and detailed features derived from large-scale, pre-trained perspective image generators. Second, we introduce the Layered 3D Panorama representation as a general solution to handle occlusion for different types of scenes with complex scene hierarchies, and lift it to 3D Gaussians [19] to enable free 3D exploration. By leveraging pre-trained panoptic segmentation prior and K-Means clustering, we streamline an automatic layer construction pipeline to decompose the reference panorama into different depth layers. The unseen space at each layer is synthesized with a finetuned panorama inpainter [11].
Extensive experiments demonstrate the effectiveness of LAYERPANO3D in generating hyper-immersive layered panoramic scene from a single text prompt. LAYERPANO3D surpasses state-of-the-art methods in creating coherent, plausible, text-aligned 2D panorama and full-view consistent, explorable 3D panoramic environments. Furthermore, our framework streamlines an automatic pipeline without any scene-specific navigation paths, providing more user-friendly interface for non-experts. We believe that LAYERPANO3D effectively enhances the accessibility of full-view, explorable AIGC 3D environments for real-world applications.
Related Work about 3D Scene Generation
Due to the recent success of diffusion models, 3D scene generation has also achieved some development. Scenescape [7] and DiffDreamer [30], for example, explore perpetual view generation through the incremental construction of 3D scenes. One major branch of work employ step-by-step inpainting from pre-defined trajectories. Text2Room [6] creates room-scale 3D scenes based on text prompt, utilizing textured 3D meshes for scene representation. Similarly, LucidDreamer [4] and WonderJourney [5] can generate domain-free 3D Gaussian splatting scenes from iterative inpainting. However, this line of work often suffer from the semantic drift issue, resulting in unrealistic scene from artifact accumulation and inconsistent semantics. While some other approaches [3], [31], [32] endeavor to integrate objects with environments, they yield relatively low quality of comprehensive scene generation. Recently, our concurrent works, DreamScene360 [16] and HoloDreamer [17] also employ panorama as prior to construct panoramic scenes. However, they only achieve the 360◦×180◦ field of view at a fixed viewpoint based on a single panorama of low-quality and simple structure, and do not support free roaming within the scene. In contrast, our framework leverages Multi-Layered 3D Panorama representation to construct high-quality, fully enclosed scenes that enable unconstrained navigation paths in 3D scene.
Thoughts
- 解决了基础的分层问题,可以作为一个比较好的baseline
参考