Theoretical background of noise2noise is quite straightforward. By assuming that we know the distribution of the noisy data with certain mean, the paper claims that we can replace the ground truth with noisy data.
Chapter 3.3 of the paper includes the denoising of MC noise with the proposed method. The paper assumed the noise to be zero-mean corrupted distribution and trained with only 64spp inputs. Compared with training with ground truth of 131k ssp, the former training showed par results. Even though it did not outperform the latter trained model, the interesting point is the speed. Training with only less sampled scenes require less time to generate data. Paper says that generating data was 2000times faster. Also the paper claims that with the advantage of speed, it can end up to online training with low latency (500ms/frame, still 2 fps).
I think such unsupervised learning for denoising is the key for fast MC denoising and accelerating path tracing. Also, for unknown effects can be trainned easily and fastly online since unsupervised denoising only require noisy input.