Free-viewpoint Indoor Neural Relighting from Multi-view Stereo (TOG 2021) 논문 리뷰

~~notion에서 긁어 온~~ 논문 리뷰

오늘은 개인적으로 굉장히 재미있게 읽었던 ToG 논문을 소개해보려고 합니다.

Free-viewpoint Indoor Neural Relighting from Multi-view Stereo

We introduce a neural relighting algorithm for captured indoors scenes, that allows interactive free-viewpoint navigation. Our method allows illumination to be changed synthetically, while coherently rendering cast shadows and complex glossy materials. We

arxiv.org

오늘 리뷰 포스트도 혼자 보려고 만들었던 내용이라 다소 정신없는 점 양해 부탁드립니다.

Free-viewpoint Indoor Neural Relighting from MVS

multiview input. non-lambertian (specular) surface 고려 가능, single light source condition, implicit representation, pbr 기반 algorithm 사용.

Check Points?

Training은 synthetic으로만 한 것인지
mirror image로 reprojection view를 만들어 glossy surface를 고려할 수 있게 했다는 게 뭔 소린지
3d mesh는 어떤 알고리즘으로 / 몇 장 input으로 받아서 만든 건지

>> 1차적으로 논문의 가장 큰 novelty / idea 중 하나는 다른 일반적인 페이퍼들처럼 PBR , 혹은 IBR 중 어느 하나만 사용하는 것이 아니라 두 개의 좋은 점을 섞어서 썼다는 것

Introduction

다시 몇 가지 paper characteristic을 정의하자면,

mvs input 을 받는다
view-dependent glossy effects (non-lambertian surface) 를 고려할 수 있다
여러 lighting condition input이 요구되지 않는다 (single light source assumption)
우리가 하려는 relighting이랑 구분되게 virtual light source를 삽입해 relit을 하는 논문이며, 당연히 single light source assumption을 사용하기 때문에 훨씬 단순한 input condition이라고도 할 수 있다. (이에 비해 우리는 multiple light source, single / mvs input.
pbr에서 사용하는 patch-tracing 과 IBR에서 사용하는 reprogjection / data blending 기반 접근법을 모두 사용햇다고 언급.

논문에서 직접적으로 언급하고 있듯, 논문은 우리가 사용하려는 input image보다 낮은 resolution - 250x350의 wide baseline multiview capture를 input으로 사용하고 있다. 사용하는 material / reflectance model은 기본적으로 - specular / diffuse 를 구분하는 모델.

We make the simplifying assumption that the lighting can be decomposed into the sum of two components: view-dependent diffuse, and view-dependent glossy.

Input Feature Maps with 3 Main Elements

(Diffuse Albedo) Light weighted MC integration (아마도 앞서 언급했던 PBR 기반이 이 부분일 것) 을 이용, diffuse illumination 정보를 viewpoint-wise irradiance map에 저장한다. 결과적으로 mvs recon된 mesh vertices에 저장되는 approximate albedo 를 얻을 수 있다. 이 얻어낸 albedo mesh 만으로도 user-specified target illumination에 따른 변화를 충분히 표현할 수 있다는 게 논문의 주장. 논문의 첫번재 network (각 input feture 들이 다 따로 encoding 되는지는 아직 모르겟지만)는 target illumination을 받아 overall energy level, cast shadow, global illumination 을 적절히 바꾸도록 학습된다.
(Glossy Effects) Mirror Image 를 이용해 Glossy effects를 encoding. fast single-ray mirror reflection 을 on-the-fly 로 최초 한 번 만들고 (scene 당 최초 한 번 ) 이를 이용해서 view-dependent component랑 diffuse component 를 구분해낸다고 하는데 이게 mvs 로 만들어낸 mesh에 ray를 쏜다는 건지 뭔지 모르겠음.
(view synthesis) arbitrary viewpoint image를 encoding 하기 위한 reprojection. 각 input view 별로 얻어진 fetaure map 들을 reprojeciton 시켜 view synthesis를 해야 하는 network 에게 diverse view information을 제공.

** Geometric-aware network 에서와 마찬가지로, 논문은 synthetic data로만 학습을 진행하되, real-world에서도 잘 작동하도록 하기 위해 gt geometry 가 아닌 mvs 로 recon된 데이터를 사용했다. 논문에서 제시하는 application 이 기존 light 를 끄고, 새로운 light 를 추가할 수 있다느 점에서 반드시 refer 되어야 한다.

Contributions?

first relighting, interactive free-viewpoint synthesis with glossy material.

pbr을 활용한 indoor hybrid rendering pipeline

mirror image 를 활용한 IBR reproejction pipeline.

Multi-view Neural Relighting

다시 한 번 goal 을 정리해보자면,

(able to ) disable original light
(able to ) add virtual indoor light source
free-viewpoint navigation
plausible glossy lighting for realistic appearance.

Limitations?

사진에서 관찰 가능한 (visible) light source만 고려 가능하다.

복잡한 pbr을 피하기 위해 explicit material / light model을 사용하지 않고 implicit representaiton을 사용.

mirror image 사용 = isotropic surface assumption에서만 reasonable. 다시 말해 isotropic surface 가정을 이용해 mirrored direction 근처의 reflectance (implicit representation이니까 이 경우엔 정확히 말하자면 그냥 specular component . )

다시 정리하면

wide baseline multi-view image 와 mvs를 이용해 recon한 3d mesh 에서 computation을 시작한다.
relit에 필요한 source / taget illumination estimation의 경우 , 기존 방식들처럼 explicitly modeling 하는 게 inverse problem을 더 어렵게 만드므로, 사진 상에서 visible한 light source 에 한해 cnn으로 encoding한 implicit representation을 사용하기로 한다.
source / target illumination을 표현하는데 사용된 easy-to-compute feature map 들은 pbr + ibr 이 섞인 novel hybrid renderer 로 illumination amp 으로 다시 생성된다.
기존 inverse rendering paper 들처럼, BRDF를 diffuse + specular의 형태로 정리한 건 유사하다.

a view-dependent glossy term, compactly supported in a narrow angular neighborhood of the mirror direction at x.

즉, specular term 이 mirror direction 를 기준으로 주변으로 좁게 분포되는 형태의 distribution을 가가 가정되어야 한다. (100% 납득이 되는 표현은 아닌걸?) Mirror direction assumption이 항상 성립하려면, isotropic surface여야

(integrate over incident angles) >> 아마 이부분때문에 isotropic 가정이 필요한 거 아닐까

E(x)는 view-independent diffuse term 이니까 x외에 따른 variable 제거
S(x, wo)는 near-mirror glossy illumination term 이니까 outgoing radiance 까지는 포함
논문은 purely diffuse + purely mirror guide map 을 이용해 hybrid rendering을 하겠다는 심산

Scene Illuimination as 2D Feature Map

distant object와의 light transport 를 일반 cnn 만으로 풀어내는 것은 어려우므로, 논문은 필요한 E 와 S (Diffuse & Specular) 를 2d input map 으로 encoding 한 뒤, intrinsic material property와 illumination을 decouple하는 network의 input으로 사용되도록 하였음
각각의 source /' target illumination의 경우 illumination map이 미리 pre-comute 된 상태에서 나머지 processing이 진행되게 된다.
그리고 final step 에서 이 precomputed per-view 2d maps를 novel view로 projection 시켜 새로운 view를 synthesis 하는 구조인 듯 하다.

Diffuse Source Irradiance

E^{src}_{1...n} 은 multiview input 각각에 대한 per-view diffuse source irradiance. - 각 view / 각 pixel에 대해 ray casting 진행. ray가 surface (geometry)를 만나게 되면, intersection 주변으로 ray sampling을 진행, 나머지 color (diffuse irradiance) 들을 16 bit float 에 저장하는 과정을 반복한다.

per-view irradiance map 를 구하는데, visibility test를 거쳐서 만약 특정 input view 가 어떤 intersection point y를 보지 못한다면, 이 pixel에 대해서는 이 view에서의 irradiance map을 계산하지 않는 것.

** 각 camera view 랑 가장 가까운 sample ray 를 선택, 이것을 각 per-view 의 color ,값으로 가져가게 될 것.

논문에서는 각 irradiance map 이 full RGB information 을 저장한다고 하고 있다. 이 떄, 각 input iamge 들을 irradiance로 나누면! (divde by 를 분류로 잘못 ㅎ생각해서 헷갈렸던 것 ) approximate albedo 를 얻어낼 수 있다고.. 새로운 light 를 추가할 때는, albedo mesh 를 기반으로 새로 path-tracing을 진행해 target diffuse irradiance를 얻을 수 있다.

Diffuse Albedo Mesh

Albedo Map 는 I_i / E^{src}_i . 다시 말해, 각 per-view source image를 irradiance map으로 나눈 값. albedo map 도 per-view 로 나오게 되는 것,. 이렇게 구한 albedo map 를 MVS mesh 로 모두 reprojection 시키면, albedo mesh 를 얻을 수 있다 (weigthted average) - wegith 는 camera distance에 반비례. 얻어진 결과는 coarse 하지만, user-define light에 대한 diffuse irradiance를 구하기 위한 중간값이므로, 노상관.

Path-traced Added Irradiance

각 input view에 대해, add light 에대해 bi-directional path tracing + Mitsuba renderer 로 irradiance를 precomput 해놓는다. 여러 light를 킬 경우> linearly mix 한다고 함.

Removed Irradiance

single global intensity switch variable - alpha 로 source light intensity 에 대한 보존 정도를 조절

Mirror Images for Glossy Reflectance

일반적인 cnn은 'local operation'을 하므로, scene안의 distant part 들에 대해 correlation을 잘 encoding하길 기대하기는 힘들다. 문제를 단순하게 만들기 위해, first-bounce mirror image 를 미리 계산하는 방법을 논문에서는 사용한다.

near-mirror reflection 을 보인다는 가정 하에, 각 view scene을 두 가지 version - direct 버전 과 pure mirror hypothesis. mirror image 와 실제 이미지 비교를 통해 glossy illumination 과 diffuse term 을 분류 할 수 있다는 것이 novelty의 핵심이다.

source miurror image + target mirror image 를 recon → target mirror image는 new view + user-specified illumination이 고려된 mirror image. >> 결과적으로 new view에서의 glossy reflection 표현을 도울 수 있도록.

Source Mirror Image

각 view / 각 pixel 마다 single ray 를 cast → first surface intersection 정보를 mesh 에 저장. intersection 를 다시 다른 모든 view들에 대해 re-project 한 뒤, reflected ray direction 방향과 가장 유사한 방향에 위치한 view의 re-projection을 수집한다. - 해당 view에서의 re-projection color를 수집한다.

즉, 가지고 있는 여러 씬 들 중, mirrored direction 과 가장 가까운 방향에 위치한 view에서의 color 정보를 가져오는 것인듯. 샘플 미러이지는 아래와 같은 식이다.

다시! mirror-image 를 구하기 위해서는, mirror direction에 위치한 다른 surface intersection point의 color를 알아야 한다. 위 이미지는 바닥이 mirror surface가 가정되어 벽을 비추고, 벽이 mirror-surface 가정으로 바닥 surface를 비추는 모습을 보여준다.

Target Mirror Image

target Mirror Image의 경우, novel view 에 한해서만 계산됨 + user specified lighting modification 를 고려해야 한다느 차이.

Interactive Reprojection in the Novel View

앞서 설명했던 precomputed 들은 virtual reference frame에 저장되어 있고, novel view synthesis 를 하기 위해서는 이걸 다시 common coordinate system으로 reproejction 시켜야 한다. 모든 view / reference information 들을 다 reprojeciton 시키는 redundency 가 많으면서 동시에 network 를 heavy하게 만듦.

각각의 view들을 weighted average / assemble 해서 사용. (on-the-fly)/ 즉, input view dependent feautre / mirror image 를 composite reproejction 한 결과

projection에 사용된 두 가지 알고리즘?

알고리즘 자체가 포인트가 아니라, 8개의 composite 중 4개는 texture quality enhancement 에, 나머지 4개는 specular highlight 정보를 주는 데 사용했다는 것이 포인트. 즉, 이 composite 부분이 성공적인 IBR을 위한 부분이라면, PBR 기반 방법들은 초반의 illumination map 을 얻기 위한 ray casting 부분에서 사용되었다고 할 수 잇다.

irradiance map 과 같은 diffuse component에 대해서는, view-dependent 한 정보가 아니므로 단순한 blending을 사용햇다고 언급한다.

이 밖에도 normalized inverse depth (disparity 정보) / surface normal 정보 + Fresnel reflection 까지 고려가능하도록 하기 위한 surface normal / viewing direciton 간 angle 정보도 사용된다.

즉, 논문은 각각의 component 들을 recon하는데에는 pbr 과 같은 보다 정확하고 복잡한 method 들을 사용하되, 이것들을 composition 하는 과정은 implicit 하게 network architecture에게 맡기는 방식을 사용했다. PhySG 를 생각하면, 이게 최근 inverse rendering 이나, relighting 논문에서의 (즉,rendering이 요구되는 논문에서의) rough한 트렌드가 아닌가 싶다

Network Architecture

Input

Composite Image Feature, Source Mirror Image (Composite), Target Mirror Image, Source / Added / Removed Illumination (Diffuse), Extra Feature.

Output

Diffuse Image Component, View-dependent (Specular) Image Components.

**각 input / ouput 들은 tonemapping 해서 진행하는 것이 학습이 잘된다고 실험적으로 증명 - radiance value 들이 왜곡되거나, dark pixel 로 나타나는 현상들을 방지해주므로.

Ground Truths?

tonemapped ground truth radiance (diffuse ground truth). sepcular output의 경우 tonemapped space 상의 residual term 으로 계산한다 ( GT Total radiance - GT diffuse)

Dataset

synthetic data with realistic, varied geometry, materials and lighting. Publicly Available + Purchased. Total 16 synthetic scenes for training.

Augmentation

각 scene에 대해 2000 individual viewpoints / 6-7 distant lighting conditions (include original lighting). 각 training sample 에 대해 diffuse / glossy term 두 개를 target image로 rendering했다. mvs mesh ㅡ이 경우에는 additional image 들을 rendering 해 coarse geometry generation에 사용. MVS recon시 texture-less / specular surface가 error를 일으키는 경우가 이으므로, 이 step에서만 diffuse surface로 대체해 rendering.

Evaluation?

To the best of our knowledge, no previous method can jointly relight scenes and enable free-viewpoint rendering for indoor scenes with glossy materials. We thus compare to view-synthesis and relighting techniques separately.

Comparison with Previous Work

Complex Indoor Scene,

Interactive relighting in single, low-dynamic range images (ToG 2017)

너무 의식의 흐름으로 정리한 것 같아 죄송합니다.

부족한 글 읽어주셔서 감사합니다.

'Computer Graphics' 카테고리의 다른 글

IBRNet: Learning Multi-view Image-Based Rendering (CVPR 2021) (0)	2022.04.26
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs (CVPR 2022 oral) (0)	2022.04.24
NeRFactor: Neural Factorization of Shape and Reflectance under an Unknown Illumination (TOG 2021) (2)	2022.04.22
PixelNeRF: Neural Radiance Fields from One or Few Images (CVPR 2021) (0)	2022.04.22
RefNeRF: Structured View-Dependent Appearance for Neural Radiance Fields (CVPR 2022 oral) 논문 리뷰 (0)	2022.04.21

Vision4Graphics & Graphics4Vision

Free-viewpoint Indoor Neural Relighting from Multi-view Stereo (TOG 2021) 논문 리뷰