Unsupervised Learning for Intrinsic Image Decomposition from a Single Image (CVPR 2020)

Abstract

기존 Intrinsic decomposition - supervised learning으로 진행되었던 것들의 문제점 : gt를 구하기 어렵거나 아예 불가능함. 기존에도 여러 prior들을 이용해 이런 문제를 해결하려는 시도가 있지만, performance limitation이 있거나 hand-crafted 였음.

Instead, it directly learns the latent feature of reflectance and shading from unsupervised and uncorrelated data

un-labellded, un-correlated data의 reflectance and shading에서 directly latent feature를 배우려면, 일단 reflectance, shading이 뭔지를 알아야 하는데?

>> 이 문제를 해결하기 위해

explore independence between reflectance and shading

domain invariant content constraint

physical constraint

Introduction

>> Lambertian Assumption 사용

Intrinsic Image Decomposition → Illumination-variant / Illumination-invariant 로 나누는 일로 표현

>> Lambertian Assumption 으로 얻을 수 있는 것?

Lambertian Surface는 모든 방향으로 동일하게 반사되는 (diffuse) surface이다. 따라서, lambertian surface를 가정하게 되면 Image formation model을 결정함에 있어 illumination direction 이라는 문제를 제거할 수 있다. 다만, 이런 가정이 있는 만큼 이 model을 변형할 때는 잘 생각해볼 것.

위 Image Formation model에서 Reflectance가 illumination-invariance, shading이 illumination variance가 된다.

정보의 부족으로 위 formation model은 ill-posed problem이다. 기존에는 이걸 해결하려고 physical prior를 쓰거나, directly-learned prior를 사용했음

>> 결론부터 얘기하면 차피 supervised learning based approach를 사용하기엔 너무 physically high-level data가 요구됨 + 그런 데이터가 충분히 있지 않음.

** reference로 언급된 것 중 하나가 Learning non-lambertian object intrinsics across shapenet categories. (cvpr 2017)

>> shpenet을 사용했으니 분명 geometry 관련된 prior learning이 포함되어 있을 것.

Self-Supervised Intrinsic Image Decomposition - NIPS 2017 . object shape 가 요구된다고 논문에서 명시함.

https://papers.nips.cc/paper/2017/file/c8862fc1a32725712838863fb1a260b9-Paper.pdf

Main Idea

당연히 reflectance와 shading 이미지는 서로 scene content를 공유한다는 것. 그렇기 때무네 단순히 3 개의 unlabelled, uncorrelated(un-paired) data (natural image, reflectance image, shading image) 를 collecting 하는 것만으로 style transfer로 변형한 intrinsic decomposition을 풀 수 있다는 것이 아이디어.

>> 여기까지는 매우 하려는 일과 비슷하기는 하나, 내가 사용하려는 lighting separation task에 적용하려면 한 가지 문제가 존재한다 - single lighting image가 없다는 것. 그리고 gt 값이 필요한 형태의 task 이기 때문에 당연히 그대로는 사용 못 할 듯 .

논문에서는 일반 style transfer 와는 다르게 physical meaning을 가지는 transfer이므로 다음과 같은 physical constraints를 줬다고 말한다.

image formation model I = R*S

domain invariant content constraint - object, layout, geometry**

physical independent: shading-illumination variant, reflectance-illumination invariant

Contribution?

first physics based single image unsupevised learninf for intrinsic image decomposition. - with physical constraints

위 contribution을 위한 network architecture

unsupervised learning 기반 중 SOTA. supversied learning과도 comparable

** InverseRenderNet (MegaDepth 사용한) 도 geometry 사용한 intrinsic decomposition에 해당

Related work 에서 언급한, image-to-image translation과 intrinsic decomposition의 차이

image-to-image translation은 static-based, intrinsic image decomposition은 physic-based

translated image은 여러 modalities를 가지는 게 허용되지만, intrinsic image decomposition은 답이 정해져 있는 문제 (explicit)

>> 이걸 어떻게 해결하느냐가 주요하게 찾아봐야 할 부분 중 하나이다.

Unsupervised Single Input Intrinsic Image Decomposition

Problem formulation and assumptions

>> 이전 섹션에서 언급했던 것. Single input supvervised learning으로 풀려면 I, R(I), S(I)의 triplet이 필요하다. 그리고 이걸 얻는 건 매우 힘듦. 이게 없으면 이전의 image formation model은 unknown data >> known data가 되어 ill-posed가 된다.

Unsupervised Intrinsic Image Decomposition

unlabelled, unrelated samples를 triplet 각각에 대해 모아서 각 image로의 transfer를 학습한다고 해보자. 하는 일은 다음과 같다.

from unlabelled reflectance images: marginal distribution $p(R_j)$

from unlabelled shading images: marginal distribution $p(S_k)$

from natural iamges: marginal distribution $p(I_i)$

>> 결과적으로 학습된 model 들을 이용해 natural image $I_i$ 에 대한 intrinsic image $R(I_i), S(I_i)$ 를 얻을 수 있음. 단, physical constraint를 지키게 하기 위한 assumption 들이 필요하다.

Domain invariant content

각각의 domain - natural image, reflectance, shading은 content를 공유하고 있고, 이런 object property 들이 latent-coded 될 수 있다고 가정한다. 그리고 이 content 가 새 domain 간에 공유되면서 encoding된다고 가정한다. - common , domain invariant latent space $C$

2. Reflectance-Shading Independence

reflectance와 shading은 각각 서로 다른 lighting dependency 를 가지므로, 둘의 conditional prior를 독립적으로 학습할 수 있다고 가정한다.

위의 Figure 2에서 보이듯이, shading, reflectance 각각에 대해 latent prior $z_R, z_S$ 를 정의했다. 각각은 reflectance (혹은 shading) domain과 natural image domain 모두에 의해 encoding된다.

** 여기서 latent prior 라는게 일반적으로 생각하는 prior처럼 guidance로 사용되는 건지, 아니면 output인지

3. The latent code enccoders are reversible

image-to-image translation에서 흔히 사용되는 setting으로, image → latent code의 encoding과 latent code → image의 decoding이 모두 가능하다고 가정.

Implementation

The Content-Sharing Architecture

세 도메인에 대해 서로 다른 encoder - $EIc,ERc,EScE^c_I, E^c_R, E^c_S$ 를 사용한다. 그리고 각 encoder를 이용해 뽑아낸 content latent code c_~ 가 서로 일치하도록 content consistency loss를 부여한다.

Mapping Module (M module)

prior code inference를 위해 - natural image prior code 부터 추출한 뒤 decomposition mapping $fdcpf_$ 로 Reflectance, shading에 대한 latent prior를 얻어낸다. ( $zRIi,zSIiz_}, z_}$ ) 그리고 얻어낸 reflectance / shading prior 에 대해 각 domain에 대한 constraint를 주기 위해 reflectance prior domain에서 sampling 해 낸 real prior 와의 KLD loss를 계산한다.

그러니까, input natural image로부터 분리해낸 (potentially) reflectance / shading을 실제로 학습 과정에서 가지고 있을 reflectance, shading prior domain에서 얻어낸 sample과 닮는 방향으로 학습시켜, 실제로는 얻을 수 없는 reflectance / shading 값을 natural image로부터 얻어낼 수 있도록 하는 것.

위의 KLD LOSS를 reflectance, shading 각각에 대해 계산한 뒤 합한 것이 total KLD loss가 된다.

Autoencoder

reversible assumption을 위해 총 세 가지 autoencoder를 구현함.

즉, Generator에 content code c와 prior code p를 넣어 만든 reconstructed image 와 gt image 간의 loss

즉, content code와 prior code 를 받아 만들어낸 이미지로 다시 prior를 encoding 하면 input에서 사용된 prior와 같은 것이 나와야 한다. 이것을 I, R, S 각각에 대해 L1 loss로 계산한 것.

decomposed intrinsic image가 실제 이미지와 indistinguishable 하도록 만들기 위해

Uploaded by Notion2Tistory v1.1.0

'Computer Vision' 카테고리의 다른 글

Unsupervised Learning of Dense Visual Representations (NIPS 2020) (0)	2021.04.22
Multi-view Relighting using a Geometry-Aware Network (0)	2021.04.22
Install packages to Anaconda Environment directly from git source (Windows) (0)	2020.05.07
Learning Common and Specific Features for RGB-D Semantic Segmentation with Deconvolutional Networks (ECCV2016) (0)	2019.12.11
t-SNE implementation code (pytorch) (0)	2019.12.02

Vision4Graphics & Graphics4Vision

Unsupervised Learning for Intrinsic Image Decomposition from a Single Image (CVPR 2020)

Introduction