Authors Howard Heaton*, Samy Wu Fung*, Alex Tong Lin*, Stanley Osher, and Wotao Yin

Problem Background

This blog identifies a new tool, called adversarial projections, for solving inverse problems. An inverse problem may be stated as the task of finding a true signal1 $u^\star$ given a (possibly nonlinear) mapping $A$ and measurement data

$d = A(u^\star) + \varepsilon,$

where $\varepsilon$ is noise. Traditional methods for this task often minimize the addition of a regularization term that incorporates prior knowledge of the structure of $u^\star$ and a fidelity term that measures compliance with measurements (e.g., least squares). However, it has been shown that convex regularization can introduce bias, preventing recovery of the true signal. Our approach avoids this issue by instead projecting signal estimates2 toward a (possibly nonlinear) low dimensional manifold containing true signals.

A cornerstone of analysis in the era of big data is dimension reduction. Here it is assumed that high dimensional data have redundancies in their representation. So, dimension reduction is the transformation of high dimensional data (e.g., images) into a lower dimensional form that still retains the core information. Many works have found efficiacious ways of transforming data in this way (e.g., for interpretation and classification). Our current work, however, takes a fundamentally different approach. Instead of finding a mapping to create lower dimensional data or attempting to minimize the dimension of ours signal’s representation, we leverage assumptions about dimension reduction to improve our ability to solve inverse problems. To emphasize our core insight, we restrict the focus of this blog to what adversarial projections are and how to compute them. (Later blogs will discuss applying this tool to more elaborate optimization schemes).

Core Ideas

1. Signals from a common true data set often share features that can be represented by a low dimensional manifold $\mathcal{M}$.
2. If $\tilde{u}\notin\mathcal{M}$ approximates a true signal $u^\star \in \mathcal{M}$, then the closest point in the manifold $\mathcal{M}$ to $\tilde{u}$ better approximates $u^\star$.
3. The closest point in the manifold $\mathcal{M}$ to $\tilde{u}$ can be computed via adversarial projections.

Goal Project noisy estimate $\tilde{u}$ of a signal $u^\star$ onto the manifold $\mathcal{M}$ of true signals.

Manifolds and Projections

Applying the first core idea, we assume3 the true data is contained in a convex, closed, and bounded subset of $\mathbb{R}^n$. Given an estimate $\tilde{u}$ of a signal $u^\star$, the second core idea is that we can improve our estimate $\tilde{u}$ by finding the closest point in the manifold $\mathcal{M}$, which is precisely the projection defined by

$P_{\mathcal{M}}(u) := \underset{v}{\mbox{argmin}} \| v - u\|.$

That is, in a typical situation,

$\| P_{\mathcal{M}}(\tilde{u}) - u^\star \| \ll \| \tilde{u} - u^\star \|.$

In most practical settings, we do not have direct access to the manifold $\mathcal{M}$ to determine this projection. However, we can indirectly form a projection variation using the pointwise distance function

$d_{\mathcal{M}}(u) := \min_{v} \| v - u\| = \| P_{\mathcal{M}}(u) - u \|.$

If $0 \in \partial d_{\mathcal{M}}(u)$, then $P_{\mathcal{M}}(u) = u$. Otherwise, for $\lambda\in\mathbb{R}$,

$u + \dfrac{\lambda}{d_{\mathcal{M}}(u)} \left( P_{\mathcal{M}}(u) - u\right) = u - \lambda \nabla d_{\mathcal{M}}(u).$

In our experiments, we use numerical differentiation and so, for practical purposes, we can just assume the above equation always holds.

The left hand side of the above equation is known as a relaxed-projection onto the manifold $\mathcal{M}$, and the right hand side is the result of a single gradient descent step.

Loosely speaking, if we assume that our available measurement data is not “too noisy” and sufficiently representative (i.e., enough samples), then the pointwise distance function $d_{\mathcal{M}}$ is a solution to the problem4

$\max_{|f|_L \leq 1}\ \mathbb{E}_{\omega\sim \Omega } \left[ f(u(\omega))\right] - \mathbb{E}_{u \sim \mathcal{D}} \left[ f(u)\right],$

where $\mathcal{D}$ is the distribution of true signals, $\Omega$ is the distribution of estimates, and $| f |_{L} \leq 1$ indicates that $f$ belongs to the set of 1-Lipschitz functions. Thus, our main tasks are to i) generate a projection operator estimate by solving the above problem and ii) apply the projection operation in a way that ensures convergence to what we want.

High Level Formula

The algorithm works by generating a sequence ${u^k}$ such that

$\displaystyle \lim_{k\rightarrow\infty} u^k = P_{\mathcal{M}}(u^1).$

For step size5 $\alpha_k$, the formula for each update is

$u^{k+1} = \dfrac{1}{k} u^1 + \dfrac{k-1}{k}\left( (1-\alpha_k) u^k + \alpha_k P_{\mathcal{M}}(u^k) \right).$

The terms inside the rightmost paretheses yield a point that is along the line segment between $u^k$ and its projection $P_{\mathcal{M}}(u^k)$. Because we can only obtain an approximation of $P_{\mathcal{M}}$, we let $\lambda_k$ be small so that we take small steps. The idea here is that several small steps in the overall right direction is better than a single large step with substantial error. And, to ensure that the noise in our projections doesn’t make ${u^k}$ converge to a random point in the manifold $\mathcal{M}$, we average the terms on the righthand side with the initial iterate $u^1$. This acts like an anchor, pulling us to the desired point in $\mathcal{M}$ that is closest to $u^1$.

We choose a fixed $\lambda_k$ for all the samples, which results in some updates overshooting the manifold (thus the need for anchoring). On average, $\lambda_k$ is chosen to be small so that the entire distribution of samples (think point cloud) flows uniformly to the manifold.

Training

The training process is used to identify multiple estimates of the distance function $d_{\mathcal{M}}$, one estimate for each step of our algorithm. The first training step is to find an estimate of the distance function $d_{\mathcal{M}}$ by solving an unsupervised learning problem. Then we choose a step size $\lambda_k$ that is a small fraction of the average distance $d_{\mathcal{M}}(u^k(\omega))$ to the manifold from the $k$-th iterate of the sample $\omega\in\Omega$. (We use funny notation here: think of $\Omega$ as an indexed set of samples $\lbrace\omega_\ell\rbrace$.) So, we repeat the following steps for $k=1,2,\ldots,K$.

Step 1: Find optimal regularizer $J_{\theta^k} \approx d_{\mathcal{M}}$.

$\theta^k \leftarrow \displaystyle \underset{\theta\in\mathcal{I}}{\mbox{argmin}} \; \mathbb{E}_{u\sim \mathcal{D}_{\mathrm{true}}} \left[ J_\theta(u) \right] - \mathbb{E}_{\omega \sim\Omega} \left[ J_\theta(u^k(\omega)) \right]$

Step 2: For fixed $\alpha \in (0,1)$, assign step size.

$\lambda_k \leftarrow \alpha\cdot \left(\displaystyle \mathbb{E}_{u\sim\mathcal{D}_{\mathrm{true}}} \left[ J_{\theta^k}(u) \right] -\mathbb{E}_{\omega\sim\Omega} \left[ J_{\theta^k}(u^k(\omega)) \right]\right)$

Step 3: For each sample $\omega \in \Omega$, apply the Halpern update.

$u^{k+1}(\omega) \leftarrow \dfrac{1}{k} u^1(\omega) + \dfrac{k-1}{k} \left( u^k(\omega) - \lambda_k \nabla J_{\theta^k}(u^k(\omega))\right)$

Implementation

Given an initial estimate $u^1$ (e.g., from a pseudo-inverse $A^\dagger b$ or variational method), we simply use the terms learned during the training process to compute $u^K$ using the update formula

$u^{k+1} \leftarrow \dfrac{1}{k} u^1 + \dfrac{k-1}{k} \left( u^k - \lambda_k \nabla J_{\theta^k}(u^k)\right).$

Alternate Interpretations

At the level of individual signals, this work may also be interpreted as learned gradient descent with a sequence of expert-like regularizers. And at the aggregate level of distributions, it may be viewed as a subgradient method for minimizing the Wasserstein-1 distance between the distribution of initial estimates and the true distribution. This also introduces connections to optimal transport. See our preprint for further details.

Numerical Illustrations

Below we provide a few numerical examples of our method. (More pictures and details are provided in preprint).

Evolution for Toy Example

The first example is used to illustrate the idea of flowing one distribution (blue) to a manifold (red). This may also be thought of as simultaneously plotting the sequence $\lbrace u^k \rbrace$ for several different starting points. Rather than simply show the initial distribution and final distribution, here we provide an animation that shows our algorithm as it evolves the initial distribution (i.e., $k$ increases). In some cases we can see the trajectory start toward what is not the projection and the ensuing effect of the anchoring term in the algorithm pulling the iterate to a point on the manifold closer to $u^1$. Note that the animation is on test data that is different than what was seen during training.

CT Reconstructions

Below we perform CT reconstructions in two settings. Each problem follows the form presented at the beginning of this blog. The training phase is used to determine the step size $\lambda_k$ and an estimate of $d_{\mathcal{M}}$ at each step $k$ of the algorithm. In the first example, training data consists of ellipse images (more complicated versions of the classic Shepp Logan phantom). The second example uses the LoDoPab data set, consisting of real-world CTs. Two traditional methods are given for reference (FBP and TV). To allow for an apples to apples comparison, TV regularized solutions were used as the initial iterate $u^1$ for both the Adversarial Regularizer and Adversarial Projection methods. We also emphasize here that, unlike Adversarial Regularizers, our projection did not use the measurement data at all! By simply having knowledge of the manifold $\mathcal{M}$, the projection operator inferred finer details of the blurry TV image.

Looking Forward

This blog overviews a new way to perform projections. Namely, given a collection of samples, we can now form projections onto the low dimensional manifold containing those samples (even when the samples don’t entirely cover the manifold). We have demonstrated these projections with CT images, taking a “good” reconstruction and making it “better” by simply projecting the image onto the manifold. This work also carries natural extensions to several other data-driven algorithms (e.g., for variational schemes) to be shared in detail in an upcoming post.

For further details about this work, check out the preprint, code, and slides (linked above). You may also email the author or comment below (via a Github account).

Footnotes

* Equal contribution

1. In this post, we use “signal” and “point” interchangeably to describe the mathematical representation of the objects we are working with. Signal is used here to convey shared information and point is used to indicate that the object can be represented in a Euclidean space.

2. Throughout this blog, we assume a sufficiently good method is available for finding an estimate $\tilde{u}$ of $u^\star$ so that we may leverage our new dimension reduction tools.

3. We keep to these assumptions in this post for simplicity even though the results hold under weaker assumptions.

4. The solution is not unique, but will work for our purposes as long as we evaluate $d_{\mathcal{M}}$ at points $u(\omega)$ with $\omega\in\Omega$.

5. Below we use step size $\lambda_k$ for our implementation, which is connected to this via the relation $\alpha_k d_{\mathcal{M}} = \lambda_k$.

Published:

Updated: