Graph Sparsifier - Uniform Sampling

티스토리 뷰

공부/CS theory

Graph Sparsifier - Uniform Sampling

구사과 2024. 5. 18. 07:09

Given an undirected unweighted graph $G = (V, E)$, A $(1 \pm \epsilon)$ cut sparsifier $H = (V, F, w)$ satisfies the following:

$F \subseteq E$
For all $\emptyset \subsetneq S \subsetneq V$ it holds that $\delta_w(H, S) \in [(1 - \epsilon)\delta_{\mathbf{1}}(G, S),(1 + \epsilon)\delta_{\mathbf{1}}(G, S)]$

This is a sparsification that preserves the cuts. Compare this to the distance spanner, which preserves the shortest path.

Some remarks:

$\delta_w(G, S)$ is a cut function which is a sum of $w(e)$ for edges $e$ that goes between $S$ and $V - S$
The original definition of cut sparsifier does not have $F \subseteq E$, but all algorithms in our scope satisfy that condition, so we will just state it in the definition.
We assume that the input graph is unweighted, but all the algorithms in this article apply to weighted graphs as well. We will clarify if this is not trivial.

1. Uniform sampling of the complete graph

All sparsifier algorithms we discuss are randomized algorithms. Hence, we only guarantee that the output is a valid sparsifier with high probability (Not for each cut! With high probability FOR ALL cuts)

In this article, we will only discuss algorithms that look like this (or don't look like this but are equivalent to it anyway):

Assign a probability $p_e$ for all edges $e \in E$.
Sample each edge independently with probability $p_e$.
For each sampled edge, assign weights of $1/p_e$.

It's easy to see that the cuts are preserved per expectation, although that doesn't mean too much.

When we say uniform sampling, it means we sample every edge with the same probability. This is certainly a naive algorithm, and we need nonuniform sampling in the end, but it is a good starting point. Note that if we set $p$ too high, the result will be more correct but will not be sparse. On the other hand, if we set $p$ too low, the graph will be sparse, but the result will be more incorrect.

If the graph is complete, setting $p = \frac{c \log n}{\epsilon^2 n}$ (for some $c>0$) suffices. For each cut $S$ with $t = |S|$, the expectation will be $\mu = p t (n - t)$. We want it to deviate not more than the $(1\pm \epsilon)$ factor. This probability can be computed conveniently by Chernoff bounds:

$2 \exp( -\epsilon^2 (pt (n - t)) / 3)$
$\le 2 \exp (-\epsilon^2 p n / 6)$(assume $t \leq n/2$ wlog)
$\le 2 \exp (ct \log n / 6)$

Let's say $c = 60$. Then, the total can be computed by union bounds:
$\sum_{t = 1}^{n/2} 2 \exp(-10 t \log n) \times n^t \le 2\sum_{t = 1}^{n/2} n^{-9t} \le 2n^{-8}$

Hence, we have a cut sparsifier of $O(n \log n / \epsilon^2)$ edges.

(Remark: A graph obtained by sampling each edge of the complete graph u.i.r is called an Erdos-Renyi Graph ($G_{n, p}$). This kind of graph is very extensively studied, and we know all sorts of things, such as for which $p$ the graph gets connected)

2. Uniform sampling of the general graph

The above approach doesn't work for general graphs—if you try it, you can see that having the largest possible cut definitely helps.

Uniform sampling still works if the graph has a large minimum cut, but we need some better insights to prove it. Recall a magical proof of correctness for Karger's algorithm—we need a lemma that generalizes it.

Lemma 1. In a connected undirected graph with minimum cut size $k$, the number of cuts of size at most $\alpha k$ is at most $O(n^{2\alpha})$, for $\alpha \geq 1$.
Proof. We will fix a cut with size at most $\alpha k$ and prove that Karger's algorithm has at least $O(n^{-2\alpha})$ probability to discover that cut. Then we can't have more than $O(n^{2\alpha})$ cuts, otherwise probability exceeds $1$.
Karger's algorithm finds a random un-contracted edge and contracts it for $n - 2$ times, until the remaining graph has $2$ vertices.
Karger's algorithm finds our fixed cut if it doesn't contract any of our selected $\alpha k$ edges. Let $m_t$ be the number of un-contracted edges when we have $t$ vertices left. We have $m_t \geq \frac{tk}{2}$, as all vertices have degree at least $k$ (otherwise, we find a smaller min-cut). Let's try to write down the success probability:
$\Pi_{t = 3}^{n} (1 - \frac{\alpha k}{m_t})$
$\geq \Pi_{t = 3}^{n} (1 - \frac{\alpha k}{tk/2})$
$= \Pi_{t = 3}^{n} (1 - \frac{2\alpha}{t})$
$= (\Pi_{t = 3}^{n} (t - 2\alpha)) / (\Pi_{t = 3}^n t)$
$\geq n^{-2\alpha}$
Unfortunately, this analysis is incorrect for $t \le 2\alpha$, and in fact, the premise is just wrong. Consider a path of length 3 and a cut {1, 3}/{2} with $\alpha = 2$; Karger's algorithm can't find it. To remedy it, we need to change Karger's algorithm slightly: As soon as $t \leq 2 \alpha$, we return a random cut over all $2^{t-1}-1$ cuts with equal probability. Since $2^n \le (n)!$ for sufficiently large $n$, we still have $O(n^{-2\alpha})$ probability.
Remark. The case with $\alpha = 1$ is exactly the proof of Karger's algorithm - nicer because it doesn't have $t \leq 2\alpha$. It is proof that there are at most $n(n-1)/2$ minimum cuts in the graph, where the upper bound is reached by a simple cycle. Cactus representation implicitly represents all min-cuts, and its construction can be considered an enumeration of all min-cuts.

Given that lemma, we can do uniform sampling for graphs with large minimum cuts. Let $p = \frac{c \log n}{\epsilon^2 k}$. For each cut $S$ with $t = |S| = \alpha k$, let's compute the probability where it can get wrong:

$\le 2 \exp( -\epsilon^2 (p\alpha k) / 3)$
$\le 2 \exp (-c \alpha \log n / 6)$
$\le 2 n^{-c\alpha/6}$
Let's say $c = 60$. Then, the total can be computed by union bounds:
$\int_{\alpha =1}^{\infty} O(n^{-10\alpha} n^{2\alpha}) d \alpha \le \int_{\alpha = 1}^{\infty} O(\exp(-8 \alpha \log n)) d\alpha \le \frac{1}{-8 \alpha \log n} n^{-8}$.

Hence, we have a cut sparsifier of $O(n^2 \log n / (k \epsilon^2))$, which is near-linear if $n / k = \text{poly log}(n)$.Actually, a little more general way to make the same argument is the following:

Theorem 2. Given an undirected unweighted graph $G$ and a probability function $p : E(G) \rightarrow [0, 1]$. A skeleton is an undirected unweighted graph on the same vertex set, where each edge is sampled independently with probability $p_e$. Suppose that all cuts of $G^\prime$ have at least $O(\frac{\log n}{\epsilon^2})$ expected number of edges. Then, with high probability, for all cuts of $G^\prime$, its size is concentrated around the mean.

Proof. We use the same argument. Consider a weighted graph $G$ with $p$ as its weight. There exists at most $O(n^{2\alpha})$ $\alpha$-approx cuts even in the weighted graph, and the min-cut of $G$ is $O(\frac{c \log n}{\epsilon^2})$. So, each cut is violated with probability at most $O(n^{-c \alpha / 6})$.

It is easy to see that the above uniform sampling is the corollary of this theorem.

3. Computing the exact minimum cut

At this point, uniform sampling looks like a naive algorithm—it requires knowledge of $k$ and doesn't work for smaller $k$. However, this naive algorithm is very important for computing the exact minimum cut in near-linear time.

We pointed out two problems with our cut sparsifier. First, we have to know $k$; second, it isn't sparse in some cases. We note that in our application, the second case is not an issue - we want a near-linear time algorithm on edges, and what is more important is to preserve all cuts. The first case can be solved by the following result:

Theorem A (Mat91, GMW19). Given a weighted graph, there exists a near-linear deterministic algorithm for finding a $3$-approximation of the minimum cut. The running time is $O(m)$ in unweighted graphs and $O(m \log^2 n)$ in weighted graphs.

Theorem A allows us to compute $p$. The good thing about this sparsifier is that all edges are weighted equally - so it's just an unweighted graph, with its minimum cut being $kp = \frac{c \log n}{\epsilon^2} = O(\log n)$. The same can be said for the weighted graph, although the number of edges will blow up to $O(m \log n)$ (we will discuss this later).

In the end, we can obtain the following graph in $O(m \log^2 n)$ time:

Unweighted
At most $O(m)$ ($O(m \log n)$ in weighted) edges
Minimum cut is $O(\epsilon^{-2} \log n)$
All cuts are preserved within $(1\pm\epsilon)$ factor

Let's invoke this algorithm with $\epsilon = 1/6$, then in $O(m \log^2 n)$ time, we obtain a graph where every cut is preserved within $[5/6, 7/6]$ range. If we can somehow enumerate all $7/5$-approximate minimum cuts in this graph, then one of them will be the minimum cut, and we are done. (This is how the approximate min-cut algorithm helps us to compute the exact min-cut. What follows is the constructive proof that there are at most $O(n^2 \log n)$ $7/5$-approximate min-cuts in the graph.)

Here, we invoke the following algorithm.

Theorem B (Gab95) Given a directed graph of $n$ vertices, $m$ edges, minimum cut $c$, Edmonds (1972) states that there are $c$ disjoint directed spanning trees in the graph. There exists a deterministic algorithm that finds such directed trees in $O(m c \log n)$ time.

(We note that the algorithm described in Gab95 is rather complicated and there exists a nicer and faster deterministic algorithm of Plotkin, Shmoys, and Tardos - but that algorithm computes something slightly weaker, so we will use something that makes our argument simpler)

Given an $1/6$ sparsifier, we replace all undirected edge ${u, v}$ with two directed edges $(u \rightarrow v), (v \rightarrow u)$. Applying Theorem B, we obtain $c = O(\log n)$ disjoint spanning trees where each edges are covered by at most two trees. Let's fix the original minimum cut - at most $7/5c$ minimum cut in this sparsifier.

We can see that there is a tree that intersects at most two edges in our cut. Suppose not, and say every tree intersects our cut at least three times. Then we have $c$ trees, and the total time each edge is covered is at least $3c$. But there are at most $7/5c$ edges that can be covered twice. Hence, we reach a contradiction.

Finally, it suffices to compute all cuts intersecting some tree at most twice. Each intersection uniquely defines the cut, so we can simply try all $n^2$ combinations of edges and take the minimum. This can be optimized by clever data structure techniques:

Theorem C (Kar00, improved by GMW19). Given a weighted graph $G$ and a tree $T$ which spans $G$, there exists a deterministic algorithm to compute the minimum cut of $G$, which intersects at most two edges of $T$, in $O(m \log n)$ time.

The weighted graph we have here is the original input graph, so here, we don't have an extra $O(\log n)$. Summarizing everything, we have:

Theorem A, that takes $O(m \log^2 n)$ time
Theorem B, that takes $O(m^\prime c \log n) = O(m \log^3 n)$ time
Theorem C, that takes $O(m \log^2 n)$ time (we have $O(\log n)$ trees)

Near-Linear Min Cut (Kar00, improved by GMW19) A minimum cut of a weighted graph can be computed in $O(m \log^3 n)$. (Improving theorem 2 yields $O(m \log^2 n)$ time.)

The following story hints at the importance of a sparsifier in this algorithm: The only randomized part of this near-linear min-cut algorithm is the sparsifier, but it took about 20 years to derandomize it:

In 2019, KT19 discovered the deterministic algorithm that only works in unweighted graphs in $O(m \log^{12} n)$. This result was later improved to $O(m \log^2 n \log \log^2 n)$. This algorithm is very different from Karger's approach, and specifically, it does not compute any sparsifiers.
In 2024, HLRW24 discovered the deterministic algorithm and presented it in SODA'24; It does compute the sparsifier. Their algorithm takes $O(m \log^c n)$ time for a constant $c > 0$. I wonder what the upper bound for $c$ is.

3.1 What changes in the weighted graphs?

In a weighted graph, you must contact all edges with weight greater than $O(k)$ after invoking the 3-approx algorithm. This procedure is valid (they can't belong to cuts) and will let us assume $w_i \le O(k)$. Assume that all $w_i$ are integers and each edge is represented as $w_i$ copies of weight $1$ edges. Then, the sparsifier algorithm samples an integer from $Binomial(w_i, p)$ and creates a copy of edges by that number of sampled integers. If you look at the expectation, we have a $O(m \log n)$ edge blowup - but this is fine.

In the general case of real weights, keeping the expectation of edge "weight" (copy of edge numbers) to $w_i \times p$ is sufficient. For example, we can compute that real value and round it to its ceiling or floor randomly. For why this is true, imagine multiplying all edge weights by $10^{100}$ and $p$ by $10^{-100}$ - then the binomial distribution essentially converges to what we describe. For the actual argument, you can plug in the Theorem 2.

References

'공부 > CS theory' 카테고리의 다른 글

Skew heap (0)	2024.07.25
Graph Sparsifier - Nonuniform sampling (0)	2024.05.21
Graph Spanners (0)	2024.04.14
Randomized algorithms - 2023.12.08 (0)	2023.12.08
A deterministic near-linear time algorithm for ﬁnding minimum cuts in planar graphs (0)	2023.11.23

공유하기 링크

페이스북
카카오스토리
트위터

공지사항

최근에 올라온 글

Total

Today

Yesterday

티스토리 뷰