Graph Embeddings

Graph Embeddings

Motivating use case

“A government client wanted to perform flexible similarity driven search and outlier detection in cybersecurity data, where network entities (e.g., routers, computers, processes, files, IP addresses) and their relationships (e.g., a process opened a file, or a connection to a local machine was opened from a remote IP address) are represented as very large graphs. The goal was to support queries like “show more more processes like this one” or “show me IP addresses with unusual behavior”. The difficulty is that computing similarity between sub-graphs is computationally very expensive, and may not capture structure that is important for this specific task.” Retrieved from synaptiq

The Node-embedding goal

Assume that we have an input graph $G=(V,E)$.

graph-embedding

The goal of graph embedding is to embed the nodes of the graph in a vector space (encoding process) such that two nodes that are dependent (e.g. connected, share neighbors, etc.) on each other in the graph space, appear in the embedding space as vectors that possess strong similarity according to a metric. In other words, we are after the coordinates of the vectors in the embedding space that preserve the similarity evident in the graph space.

More explicitly we need to specify three things:

encoder-goal

encoder-goal2

Encoder Design

The encoder function is a LUT with each row being the node graph index and the d-dim embedding of that node ($\mathbf z_v$)

encoder-lut

encoder-lut-2

Similarity

Random Walks on Graphs

random-walk-example Random walk example

DeepWalk Embeddings

random-walk-similarity

random-walk-similarity-2

Why random walks are useful abstraction mechanisms? First, they provide the flexibility of a stochastic definition of node similarity that incorporates both local and higher-order neighborhood information. Second, during training we not need to consider all node pairs; only need to consider pairs that co-occur on random walks.

Deepwalk Objective

deepwalk-objective

Deep walk method

deep-walk-method

Likelihood parameterization

likelihood-parameterisation

minimizing-loss-deepwalk