Self-supervised Learning of Graph Neural Networks: A Unified Review

Status
Read
Field
Deep Learning
Conference / Journal
TPAMI
Year
2023
Link
Created
2023/08/29 05:53

Keypoints for Me

Each graph generated by one sample can be viewed as sampling-based transformation

Critise

Write parts that I cannot understand or fully agree with the paper
Further used to look up other papers and might be erasable (erase with a strikethrough)

Minor Keypoints

Write minor parts that I would like to investigate further (e.g. analysis/implementation tips)

Reference

Leave related page links in this page or a title with a link to arXiv

Summary

Motivation

Contrastive Learning
Contrastive Objectives
Graph view generation
Feature Transformation
Node attribute masking : masking some node features or some nodes
Structure Transformation
Edge perturbation : masking edges! But it is not recommended for structured graphs
Graph diffusion : Creating new connections using random walks.
Centrality-based edge removal
Sampling-based transformation
Uniform sampling and node dropping : Uniformly sample nodes and create sub-graphs
Ego-nets sampling : used to unify contrasts on graph-level and node-level. Use typical graph encoders to get node representation and use contrastive learning.
Maybe similar to embedding images with different pretrained encoders?
Random walk sampling : Sampling sub-graph using random walk.
Network schema & Meta-path : Use individual ego-net for each node..?
Predictive Learning
Training graph encoder together with a prediction head under supervision and self-supervision.
Graph reconstruction
Non-probabilistic Graph Autoencoders
Reconstructing adjacency matrix
GAE. GraphSAGE, SuperGAT, SimP-GCN
MGAE, GALA,
Attribute masking (graph completion) : reconstructing masked attributes
Variational Graph Autoencoders
VGAE, ARGA/ARVGA, SIG-VAE
Autoregressive Reconstruction
GPT, GPT-GNN
Representation Invariance Regularization
Computing losses on representations similar to contrastive learning, but without pairs or negative samples.
Training model to minimize the difference between distorted graphs, leading them to seek distortion-invariant features
BYOL, BGRL, CCA-SSGm LaGraph
Information Bottleneck principle
Graph Property Prediction
Using predictive task based on graph properties not on graph data.
S2GRL: k-hop connectivity prediction
Meta-path prediction : Heterogeneous graph, finding 8 meta-paths
GROVER
Contextual property prediction : atom-bond-count on k-hop neighbor
graph-level motif prediction : predict functional group in molecules.
Self-training with Pseudo-labels
Multi-stage self-training
Train model with labeled dataset
Get pseudo labels with high-confidence prediction on unlabeled dataset
Trian new model on labeled dataset…
M3S - K-means clustering for pseudo labels
ICF-GCN : iterative pseudo label refinement based on EM manner
Expansion assumption : correct pseudo-labels denoising incorrect pseudo-labels
Unsupervised pretraining
Auxiliary learning

Main Contributions

Results

Further Thesis