Timo Denk's Blog

Graph Theory Overview

· Timo Denk

In the field of computer science, a graph is an abstract data type that is widely used to represent connections and relationships. This post gives an overview about a selection of definitions, terms, and algorithms, which are related to graphs. The content was put together during preparation for a theoretical computer science test at Cooperative State University Baden-Württemberg and is mostly taken from either Wikipedia or lecture notes.

The featured image shows the S-Bahn map of Stuttgart, Germany. A graph could be used to represent its connections. Image source: Verkehrs- und Tarifverbund Stuttgart (VVS)

Formal Definition

A graph $G$ is an ordered pair consisting of a set of vertices (also called nodes or points) $V$ and a set of edges $E$. In undirected graphs, an edge is an unordered pair of two vertices, which are called adjacent.
$$\begin{align}G=&\left(V,E\right)\\
E\subseteq &\left\{\left\{v_1,v_2\right\}\mid v_1,v_2\in V\right\}\end{align}$$Alternatively, an edge can be considered an ordered pair. For undirected graphs this definition comes with the constraint that each ordered pair has to occur twice, with the elements interchanged, i.e. as $\left(v_1,v_2\right)$ and $\left(v_2,v_1\right)$, unless $v_1=v_2$.

The degree of a vertex is defined by the number of edges that connect to it, where an edge that connects a vertex to itself is counted twice.

A directed graph differs from the more simple, undirected graph, because the elements of $E$ are ordered. Therefore $E\subseteq V\times V$, with $e=\left(v_1,v_2\right)$ meaning, that $e$ is directed from $v_1$ to $v_2$. Directed graphs allow the additional definition of the input and output degree of each vertex. The input degree (indeg) of a vertex is the number of edges that are directed towards the vertex and vice-versa for the output degree (outdeg).

A weighted graph has a cost function $c:E\rightarrow\mathbb{R}$ that assigns a cost to every edge.

A graph $G’=\left(V’,E’\right)$ is a subgraph of $G=\left(V,E\right)$, if
$V’\subseteq V\land E’\subseteq E$.

A bipartite graph is a graph whose vertices $V$ can be split into two disjoint sets $U\subset V$ and $W\subset V$, such that $$\forall {v_1,v_2}\in E[v_1\in U\land v_2\in W],.$$All cycles (with length $n$) in a bipartite graph fulfill the condition $2\mid n$. A bipartite graph is balanced, if $\vert U\vert=\vert V\vert$.

Path

A path is a sequence of edges, which connects two vertices. In most cases an edge may not occur twice in the same path. In a directed graph it is an additional constraint for a directed path, that all edges are pointing into the same direction. A path in an undirected graph from $v_1$ to $v_n$ can also be defined as a sequence of vertices
$$P=(v_1,v_2,\dots,v_n)\in V\times V\times\dots\times V$$where $v_i$ is adjacent to $v_{i+1}$. The length of such a path would be $n-1$ (the number of edges).

Opposed to a path, a walk can repeat edges or vertices multiple times.

A cycle is a special kind of path where the first and the last vertex are equal $v_1=v_n$. A graph is called acyclic, if it does not contain any cycles.

Tree

A tree is an undirected graph $G$, in which any two vertices are connected by exactly one path. That implies that $G$ is connected and acyclic.

A tree called spanning tree $T$ of an undirected graph $G$, is a subgraph that is a tree which includes all of the vertices of $G$, with minimum possible number of edges. For an edge weighted, undirected graph, the minimum spanning tree is connecting all the vertices together, without any cycles and with the minimum possible total edge weight. The total edge weight is defined as $$\sum_{i=1}^{\vert E’\vert}c(E’_i),,$$ with $c$ being the cost function and $E’$ being a subset of the graph’s edges $E$.

binary tree is a directed tree where each vertex has at most two children.

A forest is an undirected graph, all of whose connected components are trees. Connected components are sub-graphs in which any two vertices are connected to each other by paths.

Representation

Adjacency Matrix

An adjacency matrix $A_G^{\left\lvert V\right\rvert\times\left\lvert V\right\rvert}$ is a square matrix, used to represent a finite graph $G$. For a directed, unweighted graph, its fields are defined by
$$a_{ij}=\begin{cases}
1&\text{if }\left(v_i, v_j\right)\in E\\
0&\text{else},.
\end{cases}$$For graphs with a cost function $c$ each item is defined as $a_{ij}=c\left(v_i, v_j\right)$. The adjacency matrix of undirected graphs is symmetric: $A_G=A_G^\text{T}$.

Advantages: Constant memory requirements, $\left\lvert V\right\rvert^2$, simple insert and delete operations. For unweighted graphs each matrix cell requires only one bit of storage capacity. Disadvantages: Time consuming initialization.

The adjacency matrix also allows determination of the number of walks from $v_i$ to $v_j$ with a given length $n$. The number of walks is equal to $\left(A^n\right)_{ij}$. Furthermore, calculating $\sum_{i=0}^{n}A^n$ and subsequent replacement of all fields unequal zero with 1 tells, that $v_i$ is connected to $v_j$ by a walk of length $\le n$, if $a_{ij}=1$.

Adjacency List

An adjacency list is a collection of unordered lists. Each list contains all neighbors (adjacent vertices) of a vertex in the graph.

Advantages: Low memory consumption for graphs with few edges. Disadvantages: Higher memory consumption per edge, compared to the adjacency matrix; insert and delete operation more complex.

Algorithms

Topological Sorting

Topological sorting of a directed, acyclic graph is a linear ordering of its vertices, such that for every directed edge $\left(u,v\right)$ from vertex $u$ to vertex $v$, $u$ comes before $v$ in the ordering.

Partially ordered sets have the following properties.

  • Reflexivity: $a\le a$
  • Antisymmetry: $a\le b\land b\le a\rightarrow a=b$
  • Transitivity: $a\le b\land b\le c\rightarrow a\le c$

The Java snippet topologically sorts the vertices of a graph (which can be seen as a partially ordered set):

public List topologicalSorting() throws InvalidObjectException {
    final List sortedList = new ArrayList<>(vertices.size());
    DirectedGraph editableGraph = new DirectedGraph(this);
    while (sortedList.size() < this.vertices.size()) {
        boolean inputZeroVertexFound = false;
        // determine input degree of vertices in the editable graph
        for (int i = 0; i < editableGraph.vertices.size(); i++) {
            Vertex vertex = editableGraph.vertices.get(i);
            if (vertex.inputDegree(editableGraph.edges) == 0) {
                sortedList.add(vertex);
                editableGraph.removeVertex(vertex);
                inputZeroVertexFound = true;
            }
        }
        if (!inputZeroVertexFound) {
            throw new InvalidObjectException("Graph contains cycles and can not be sorted");
        }
    }
    return sortedList;
}

St-Connectivity

Two vertices $s$ and $t$ are connected in a directed graph $G$, if $t$ is reachable from $s$. STCON is a decision problem, asking whether that is true.

Reflexive Transitive Closure

  • Reflexive relation. $\forall x\in X:xRx$
  • The reflexive closure $S$ of a binary relation $R$ on a set $X$ is the smallest reflexive relation on $X$ that contains $R$. $S=R\cup\underbrace{\left\{\left(x,x\right)\mid x\in X\right\}}_{\text{Identity relation}}$
  • The transitive closure $S$ of a binary relation $R$ on a set $X$ is the smallest transitive relation on $X$ that contains $R$.

$G^*=\left(V,E^*\right)$ is a transitive closure of $G=\left(V,E\right)$. Constructing $G^*$ involves adding edges between every pair of vertices that is connected in $G$ through two or more edges. The transitive closure allows determination of STCON in $O\left(1\right)$. The closure is usually stored as an adjacency matrix $C^{\left\lvert V\right\rvert\times\left\lvert V\right\rvert}$, with
$$c_{ij}=\begin{cases}1&\text{if }\left(v_i,v_j\right)\in E^*\\0&\text{else}\end{cases}$$ telling whether a vertex $v_j$ is reachable from another vertex $v_i$. For a reflexive closure $c_{ij}$ is also set to $1$ for all cases where $i=j$.

An approach of constructing $G^*$ from a acyclic graph: Given a graph $G=\left(V,E\right)$ conduct the following steps.

  1. Topologically sort all vertices $v\in V$.
  2. Initialize an adjacency matrix $C^{\left\lvert V\right\rvert\times\left\lvert V\right\rvert}$ with zeros.
  3. Set all diagonal values (where $i=j$) $c_{ij}=1$. This step is only required for the closure to be reflective.
  4. Iterate over all vertices from back to front (the current vertex being $v_i$). The back-most vertex has an outdeg of 0.
    1. Set $c_{ij}=1$ if $(v_i,v_j)\in V$. That restores all original connections from $V$.
    2. $\forall x\in\left[1,\left\lvert V\right\rvert\right]\left[\underbrace{a_{ix}=1}_{i\text{ con. to }x}\mid \underbrace{a_{jx}=1}_{j\text{ con. to } x} \land\underbrace{(v_i,v_j)\in V}_{i\text{ con. to }j}\right]$.

Tree Traversal

Tree traversal is the process of visiting (checking and/or updating) each vertex in a tree data structure, exactly once. Different algorithms are classified by the order in which they visit the vertices. A general distinction between depth- and breadth-first search is made.

Depth-first search (DFS). Starting at a given vertex, that is taken as the root of a sub-graph, the algorithm explores the sub-graph as far as possible and subsequently starts backtracking to search the vertices above. If the graph contains cycles the algorithm will terminate only, if it stops searching when reaching a vertex that was already visited before.

Breadth-first search (BFS). Starting at a given vertex, BFS searches connected vertices and, subsequently, searches vertices which are connected to these and so on. This can be done by creating a set $S$ and a queue $Q$. The set $S$ holds the visited vertices and is initialized with one element which is the root vertex $r$. All connected vertices are added to $Q$. For each of the added elements all connected elements are appended to $Q$ as well. Therefore, they will not be processed before all vertices that were added in the first place have been visited. The pseudo code illustrates the algorithms way of proceeding:

Breadth-First-Search(Graph, root):
    create empty set S
    create empty queue Q      

    add root to S
    Q.enqueue(root)                      

    while Q is not empty:
        current = Q.dequeue()
        if current is the goal:
            return current
        for each vertex n that is adjacent to current:
            if n is not in S:
                add n to S
                n.parent = current
                Q.enqueue(n)

Shortest Path

The shortest path problem searches for the shortest path $P=(v_1,v_2,\dots,v_n)\in V\times V\times\dots\times V$ in a graph $G=(V,E)$. It is defined as follows:

  • Undirected graph. Finding $P$ with the lowest possible value for $n$ under the condition, $v_i$ is adjacent to $v_{i+1}$.
  • Directed graph. Similar to the undirected graph, $v_i$ needs to be connected to $v_{i+1}$ by an edge pointing from the former to the latter.
  • Weighted graph. Finding a path $P$ so that $\displaystyle\sum_{i=1}^{n-1}c(v_i,v_{i+1})$, that is the sum of all edge’s costs on the path, is minimal.

The critical path is the longest possible path connecting two vertices without revisiting any vertex in between.

Given a connected, undirected graph $G$, a shortest-path tree rooted at vertex $v$ is a spanning tree $T$ of $G$, such that the path distance from root $v$ to any other vertex $u$ in $T$ is the shortest path distance from $v$ to $u$ in $G$.

Dijkstra’s algorithm is an algorithm for finding the shortest path between two vertices.

Minimum Spanning Tree

Kruskal’s algorithm for finding a minimum spanning tree (MST) is applied to weighted, undirected graphs. It works as follows: (1) Sort all edged by their weight in ascending order. (2) Create an empty graph with the vertices of the original graph. (3) Add the edges one after the other, unless adding it would create a cycle in the new graph. (4) Terminate after all edges have been considered. The resulting graph is an MST.

Boruvka’s algorithm. Iterate over all vertices and, for each vertex select the outgoing edge with the lowest weight. Thereby some groups (connected vertices) are formed. Subsequently, iterate over all such groups and look for the connection to another group with the lowest weight. Connect the group to exactly one other group, which is the one with the lowest weighted connection in between. Continue by connecting groups to each other until all vertices are members of the same group.

Prim’s algorithm. Copy the graph without edges. (1) Select a random vertex. (2) From all selected vertices, search for the closest (i.e. originally connected with the lowest weight), not yet connected vertex and add it to the set of selected vertices. Continue with (2) until all vertices are connected.