Keypoints for Me
•
Each graph generated by one sample can be viewed as sampling-based transformation
Critise
•
Write parts that I cannot understand or fully agree with the paper
•
Further used to look up other papers and might be erasable (erase with a strikethrough)
Minor Keypoints
•
Write minor parts that I would like to investigate further (e.g. analysis/implementation tips)
Reference
Leave related page links in this page or a title with a link to arXiv
Summary
Motivation
•
Contrastive Learning
◦
Contrastive Objectives
◦
Graph view generation
▪
Feature Transformation
•
Node attribute masking : masking some node features or some nodes
▪
Structure Transformation
•
Edge perturbation : masking edges! But it is not recommended for structured graphs
•
Graph diffusion : Creating new connections using random walks.
•
Centrality-based edge removal
▪
Sampling-based transformation
•
Uniform sampling and node dropping : Uniformly sample nodes and create sub-graphs
•
Ego-nets sampling : used to unify contrasts on graph-level and node-level. Use typical graph encoders to get node representation and use contrastive learning.
◦
Maybe similar to embedding images with different pretrained encoders?
•
Random walk sampling : Sampling sub-graph using random walk.
•
Network schema & Meta-path : Use individual ego-net for each node..?
◦
Predictive Learning
Training graph encoder together with a prediction head under supervision and self-supervision.
▪
Graph reconstruction
•
Non-probabilistic Graph Autoencoders
◦
Reconstructing adjacency matrix
▪
GAE. GraphSAGE, SuperGAT, SimP-GCN
▪
MGAE, GALA,
◦
Attribute masking (graph completion) : reconstructing masked attributes
•
Variational Graph Autoencoders
◦
VGAE, ARGA/ARVGA, SIG-VAE
•
Autoregressive Reconstruction
◦
GPT, GPT-GNN
▪
Representation Invariance Regularization
Computing losses on representations similar to contrastive learning, but without pairs or negative samples.
Training model to minimize the difference between distorted graphs, leading them to seek distortion-invariant features
•
BYOL, BGRL, CCA-SSGm LaGraph
Information Bottleneck principle
▪
Graph Property Prediction
Using predictive task based on graph properties not on graph data.
•
S2GRL: k-hop connectivity prediction
•
Meta-path prediction : Heterogeneous graph, finding 8 meta-paths
•
GROVER
◦
Contextual property prediction : atom-bond-count on k-hop neighbor
◦
graph-level motif prediction : predict functional group in molecules.
▪
Self-training with Pseudo-labels
•
Multi-stage self-training
◦
Train model with labeled dataset
◦
Get pseudo labels with high-confidence prediction on unlabeled dataset
◦
Trian new model on labeled dataset…
•
M3S - K-means clustering for pseudo labels
•
ICF-GCN : iterative pseudo label refinement based on EM manner
Expansion assumption : correct pseudo-labels denoising incorrect pseudo-labels
•
Unsupervised pretraining
•
Auxiliary learning