Real-time Monte Carlo Denoising with the Neural Bilateral Grid

Status
Read
Field
Monte Carlo Rendering
Denoising
Deep Learning
Conference / Journal
EGSR
Year
2020
Link
Created
2021/01/13 13:56
The paper introduces a neural network that maps the noisy image with auxiliary features into the bilateral grid for denoising.
Before going over the paper...
Bilateral gird is used for faster bilateral filtering which is a nonlinear technique to preserve edges and smoothing other areas, mitigating outliers. It allows faster and better perfomance by mapping the image into a 3D bilateral grid with intensity calculated by a predefined metric (mostly gaussian). By this metric, grid allows to easily detect edges by large changes in intensity. Then by slicing the part of grid, one can get a smoothed & edge preserved image. There are other works that performs tranformation on the bilateral grid for efficient denoising and to other filtering problems.
The main contribution of the paper is
Using differentiable neural bilateral grid with end-to-end training
High performance on extremely low spp
Multi-scale grids for optimal weight
The flow of denoising the noisy image from the bilateral grid is similar to the traditional one (construct grid → denoise on grid → slice). The part where the neural network contribute is when constructing the bilateral grid. The neural network predicts the appropriate guide position for each pixel from the input of noisy radiance and auxiliary features. The network is trained by the loss comparing the ground truth and the denoised result.
GuideNet, which is the neural network mentioned above is constructed with simple conv layers. Numbers depend on the spp of the input. While passing each layer, the features are concatenated. The output of the Guidenet will produce a predicted guide for each pixel for three bilateral grids and weights.
Then given a grid with 3D sampling rate which will define the size of the grid, the pixel is splatted to the grid with a discrete manner. This means, that grid will have discrete cells, and the pixels will be splatted to each grid through the guide. When splatting, there will be weights for each pixels.
Then through slicing, the reconstructed image will be produced. Tri-linear interpolation will give a smooth image from a discrete grid. The sliced image from each grid will be weighed by the weights from the GuideNet and then combined.
The result of GuideNet on 1spp compared to other denoisers showed par denoising performance to SOTA denoisers. However the most interesting part is that it showed remarkably fast speed, up to 61fps on 720p.
NFOR shows best reconstruction since uses sophisticated auxiliary features, but very slow → need to look up.
Ablation studies show that predicting the guide using neural network is better than using hand-crafted guides for constructing bilateral grid. Also the study shows that it multi-scale bilateral grid with weight composition has advantage over using a single grid and simple mean.
However, difficult effects such as specular or some effects not seen on train data was found hard to reconstruct.