Neural Denoising with Layer Embeddings [EGSR20]

Theme
논문
Type
테마글
Status
Post
분야
Monte Carlo Noise Reduction
중요도
2 more properties

My Thoughts

Sample-wise kernel might be a huge overhead for calculation while producing blurry results. Using input features as samples but applying kernel in a pixel-level can be a good compromise.
I cannot fully understand how the layer embedding partitions the samples. I think there is no good analysis on how samples are separated.
To achieve real-time denoising, reducing precision and custom CUDA kernel is necessary. In a college-level research, doing these things are too much. We should aim for a more interesting approach for denoising rather than achieving real-time.

Motivation

Previous work [Gharbi et al. 2019] have shown to improve the reconstruction quality by deal with each samples instead of each pixels. Also it showed constructing a kernel for each sample with a splatting manner instead of gathering improves not just the reconsturction quality but also the training finding the outliers easier than before.
However, one of the concerns were that
1.
Runtime & computation cost linear increases with the number of samples. This makes the previous work to be heavy for moderate sample inputs.
2.
Fixed number of boucnes for each samples makes use of too much auxiliary features.
The main contribution of the paper is
1.
layered architecture for denosing with alpha compositing
2.
Hierarchical kernel design to reduce computation
3.
runtime performance improvements compared to other sample based denoisers

Method

Layered architecture and alpha compositing are inspired by previous works using layered light field to recontruct sample-wise effects like defocusing and motion blur [GO12], [MVH14].
Note that the network uses less auxiliry features (74 → 20) and use a single U-net for simplicity.
Auxiliary features : normal, depth, albedo, specular color, roughness, motion vector, circle of confusion, lens position, and time, for a total of 20 floats per sample
Following is the flow of the architecture
1.
Sample reducuer (per pixel processing)
From the per-pixel input radiance and the auxiliary features, it maps to the sample embedding space with 32 features using FC layer.
2.
U-net
Using U-net, the sample embedding per pixel is averaged to per-pixel. The output will be a kind of a context aggregation per pixel.
3.
Sample partitioner
Using the FC layer containing weights for each layer, position, and samples, it partitions the context embedding from U-net to multiple layers using the sample embeddings. The paper says that using only two layers gave the most improvement.
Here, the paper uses extended light field reconstruction technique using alpha blending. The network computes weight sum, layer occupancy, and weight sum of radiance for each layers.
4.
Kernel generator
Then, the FC layer generates and applies the kernel to each layer.
5.
Composition
Then the kernel results of each layers will be aggregated with alpha blending factor calculated from the weight sum. Results are normalized by layer occupancy.

Why alpha blending?

The paper shows the visualized comparison between simple-weighted composition of layers. The result of alpah blending shows to be trained more depth-related information. The paper argues that the more constraints led to more optimal minimum.

Result & Discussions

Paper gave a huge effort comparing the performance between pixel / sample / layer level denoisers.