Authors Howard Heaton*, Samy Wu Fung*, Alex Tong Lin*, Stanley Osher, and Wotao Yin

Background

This blog identifies a new tool, called adversarial projections, for solving inverse problems. An inverse problem may be stated as the task of finding a true signal1 $u^\star$ given a (possibly nonlinear) mapping $A$ and measurement data

\[d = A(u^\star) + \varepsilon,\]

where $\varepsilon$ is noise. Traditional methods for this task often estimate $u^\star$ by a signal $\tilde{u}$ that solves something called a variational problem.1 That is, $\tilde{u}$ minimizes the addition of a regularization term $J(u)$ that incorporates prior knowledge of the structure of $u^\star$ and a fidelity term that measures compliance with measurements, i.e.,

\[\tilde{u} = \min_{u\in\mathbb{R}^n} \dfrac{1}{2} \|Au - d\|_2^2 + J(u).\]

Choosing the regularizer $J(u)$ is the key task of current interest. Classic optimization methods use hand chosen formulas (e.g., total variation) that theoretically guarantee certain properties of signals. On the other hand, several state-of-the-art methods attempt to learn an ideal regularizer from available data, often with the shortcoming of limited guarantees.

An underlying theme of essentially all regularization approaches is that signals represented in high dimensional spaces often exhibit redundancies. For example, in a piecewise constant image, any given pixel value is often highly correlated with adjacent pixel values. Examples like this lead to the insight that such signals possess intrinsically low dimensional manifold representations. However, explicitly approximating the manifold is highly nontrivial. Thus, a key question remains:

How can we guarantee reconstructed signal estimates are on the manifold of true signals?

This guarantee could be ensured, in theory, using the indicator function $J(u) = \delta_{\mathcal{M}}(u)$, which acts as a hard constraint by taking value zero on the manifold $\mathcal{M}$ and infinity elsewhere. However, the true practical concern here is that utilizing optimization algorithms to solve the variational problem with an indicator function typically requires the ability to perform a projection $P_{\mathcal{M}}$ onto the manifold. (See below figure.)

manifold-projection
Signal estimates (blue) and projections onto 2D manifold $\mathcal{M}$ (red) in 3D space

Core Contribution

Our core result is to demonstrate how unsupervised learning2 can be used to construct an operator (adversarial projection) that projects signal estimates onto the underlying low dimensional manifold of true data. Given samples of true and noisy signals, we prove our method converges in probability to the desired projection of the noisy signals onto the manifold. This can then be applied during projection steps of optimization algorithms. As is common when solving subproblems inside algorithms, finite subsequences (typically 10-20 steps) of this method can be used to approximate the projection $P_{\mathcal{M}}$.

  1. True signals share features that can be represented by a low dimensional manifold $\mathcal{M}$, making the ideal regularizer the indicator function $\delta_{\mathcal{M}}(u)$.
  2. Optimization algorithms with $\delta_{\mathcal{M}}(u)$ require computing projection $P_{\mathcal{M}}(u)$, which can be approximated by our adversarial projection algorithm.

Manifolds and Projections

Applying the first core idea, we assume the true data can be represented by a sufficiently “nice” subset $\mathcal{M}$ of $\mathbb{R}^n$. The second core idea addresses the task of finding the closest point in the manifold $\mathcal{M}$ to a provided signal $\tilde{u}$, which is precisely the projection defined by

\[P_{\mathcal{M}}(u) := \underset{v}{\mbox{argmin}} \| v - u\|.\]

In most practical settings, we do not have direct access to the manifold $\mathcal{M}$ to determine this projection. However, we can indirectly form a projection variation using the pointwise distance function

\[d_{\mathcal{M}}(u) := \min_{v} \| v - u\| = \| P_{\mathcal{M}}(u) - u \|.\]

If $0 \in \partial d_{\mathcal{M}}(u)$, then $P_{\mathcal{M}}(u) = u$. Otherwise, for $\lambda\in\mathbb{R}$,

\[u + \dfrac{\lambda}{d_{\mathcal{M}}(u)} \left( P_{\mathcal{M}}(u) - u\right) = u - \lambda \nabla d_{\mathcal{M}}(u).\]

In our experiments, we use numerical differentiation and so, for practical purposes, we can just assume the above equation always holds.

The left hand side of the above equation is known as a relaxed-projection onto the manifold $\mathcal{M}$, and the right hand side is the result of a single gradient descent step.

Loosely speaking, if we assume that our available measurement data is not “too noisy” and sufficiently representative (i.e., enough samples), then the pointwise distance function $d_{\mathcal{M}}$ is a solution to the problem

\[\min_{f}\ \mathbb{E}_{u^k\sim \mathbb{P}^k } \left[ f(u^k) + \tau\cdot f(u^k)^p \right] - \mathbb{E}_{u \sim \mathbb{P}^{\mathrm{true}}} \left[ f(u)\right],\]

where $\mathbb{P}^{\mathrm{true}}$ is the distribution of true signals, $\mathbb{P}^k$ is the distribution of estimates $u^k$ in the $k$-th step. Note we minimize over all functions $f$ that are nonnegative and 1-Lipschitz functions. Thus, we may estimate the projection $P_{\mathcal{M}}$ by obtaining an estimate of $d_{\mathcal{M}}$ from the above problem and then successively forming relaxed projections (i.e., taking gradients).

Adversarial Projection Algorithm

High Level Formula

Given a point $z$, this algorithm works by generating a sequence $\lbrace u^k\rbrace$ with $u^1 = z$ such that

\[\displaystyle \lim_{k\rightarrow\infty} u^k = P_{\mathcal{M}}(z).\]

For step size5 $\alpha_k$, the formula for each update is

\[u^{k+1} = \dfrac{1}{k} z + \dfrac{k-1}{k}\left( (1-\alpha_k) u^k + \alpha_k P_{\mathcal{M}}(u^k) \right).\]

The terms inside the rightmost paretheses yield a point that is along the line segment between $u^k$ and its projection $P_{\mathcal{M}}(u^k)$. Because we can only obtain an approximation of $P_{\mathcal{M}}$, we let $\alpha_k$ be small so that we take small steps. The idea here is that several small steps in the overall right direction that are insentive to errors are better than a single large step with substantial error. And, to ensure that the noise in our projections doesn’t make ${u^k}$ converge to a random point in the manifold $\mathcal{M}$, we average the terms on the righthand side with the initial iterate $u^1=z$. This acts like an anchor, pulling us to the desired point in $\mathcal{M}$ that is closest to $z$.

Implementation

Given an initial estimate $u^1$ (e.g., from a pseudo-inverse $A^\dagger b$ or variational method), we simply use the terms learned during the training process to compute $u^K$ using the update formula

\[u^{k+1} \leftarrow \dfrac{1}{k} u^1 + \dfrac{k-1}{k} \left( u^k - \lambda_k \nabla J_{\theta^k}(u^k)\right).\]

Numerical Illustrations

Below we provide a few numerical examples of our method. (More pictures and details are provided in preprint).

Toy Example

The first example is used to illustrate different pieces of this work in 2D.

toy-illustration
2D Toy Problem training setup and inference example. a) Training uses a uniform distribution $\mathbb{P}^1$ and manifold sampling distribution $\mathbb{P}^{\mathrm{true}}$. b) Trajectories $\{z^t\}$ for solving the variational problem are shown with analytic projections (green) and adversarial projections (blue). The analytic method uses full knowledge of the manifold while the adversarial method uses only a finite set of samples. The feasible set is all $z$ such that $Az = d$. The analytic method yields discrete points, but these are joined via a curve for illustrative purposes. c) Estimate of the distance function $d_{\sM}$ from training. All plots are over the region ${[ 0.0,3.0] \times [-0.5,2.5]}$ We welcome readers check out and run the code used to generate this figure, which can be run online via this notebook on Google Colab.

CT Reconstructions

Below we perform CT reconstructions in two settings. Each problem follows the form presented at the beginning of this blog. The training phase is used to determine the step size $\lambda_k$ and an estimate of $d_{\mathcal{M}}$ at each step $k$ of the algorithm. In the first example, training data consists of ellipse images (more complicated versions of the classic Shepp Logan phantom). The second example uses the LoDoPab data set, consisting of real-world CTs. Two traditional methods are given for reference (FBP and TV). To allow for an apples to apples comparison, TV regularized solutions were used as the initial iterate $u^1$ for both the Adversarial Regularizer and Adversarial Projection methods. We also emphasize here that, unlike Adversarial Regularizers, our projection did not use the measurement data at all! By simply having knowledge of the manifold $\mathcal{M}$, the projection operator inferred finer details of the blurry TV image.

ellipse-CT
Reconstruction on a validation sample obtained with Filtered Back Projection (FBP) method, TV regularization, Adversarial Regularizer, and Adversarial Projections (left to right). Bottom row shows expanded version of corresponding cropped region indicated by red box.
placeholder image 1 Chest CT with zoom in on lung. placeholder image 3
Reconstruction on a validation sample from LoDoPab data set obtained with Filtered Back Projection (FBP) method, TV regularization, Adversarial Regularizer, and Adversarial Projections (left to right). Bottom row shows expanded version of corresponding cropped region indicated by red box.

Looking Forward

This blog overviews a new way to perform projections. Namely, given a collection of samples, we can now form projections onto the low dimensional manifold containing those samples (even when the samples don’t entirely cover the manifold). We have demonstrated these projections with CT images, taking a “good” reconstruction and making it “better” by simply projecting the image onto the manifold. This work also carries natural extensions to several other data-driven algorithms (e.g., for variational schemes) to be shared in detail in an upcoming post.

For further details about this work, check out the preprint, code, and slides (linked above). You may also email the author or comment below (via a Github account).

Footnotes

* Equal contribution

  1. In this post, we use “signal” and “point” interchangeably to describe the mathematical representation of the objects we are working with. Signal is used here to convey shared information and point is used to indicate that the object can be represented in a Euclidean space.

  2. Here unsupervised means that a direct correspondence between noisy signal estimate data and true signal data is not needed (e.g., we may have a different amount of samples from each data set).

  3. We keep to these assumptions in this post for simplicity even though the results hold under weaker assumptions.

  4. The solution is not unique, but will work for our purposes as long as we evaluate $d_{\mathcal{M}}$ at points $u(\omega)$ with $\omega\in\Omega$.

  5. Below we use step size $\lambda_k$ for our implementation, which is connected to this via the relation $\alpha_k d_{\mathcal{M}} = \lambda_k$.

Published:

Updated:

Leave a comment