The primary area of research is optimization with a focus on developing fast and memory efficient iterative fixed point methods for large scale computing problems. This includes specialty interests in parallel asynchronous methods and “learning to optimize”. In the latter subject, machine learning is incoporated into algorithmic development for situations where an individual problem must be repeatedly solved, each time with new (but similar) data. Below is a brief overview of the setting of fixed point problems. In this context, summaries are then provided of asynchronous and learned methods. Following this, a brief note is made about variational methods in image processing (an application problem).

*(Published research papers and slides are available at the bottom of this page.)*

Let \(T:\mathcal{H}\rightarrow\mathcal{H}\) be an operator on a Hilbert space \(\mathcal{H}\) with a nonempty fixed point set \( \textrm{fix}(T) \). Many optimization problems may be abstractly written in the form of a fixed point problem

\[ \textrm{Find} \ x^\star \in \textrm{fix}(T). \]

For example, see the operators used for Forward Backward Splitting (FBS) and Douglas Rachford Splitting (DRS). Iterative methods for solving this type of problem often use a sequence of operators \( \lbrace T_k \rbrace_{k=1}^\infty \) with fixed point sets whose common intersection equals \( \textrm{fix}(T) \). Such methods define a sequence of points \(\lbrace x^k \rbrace_{k=1}^\infty \) with an arbitrary initial point \(x^1\) and updates computed via \[ x^{k+1} := T_k \left( x^k \right), \ \ \forall \ k \geq 1. \] The main sort of result is usually to assert the chosen sequence of operators and update iteration yields \[ \lim_{k\rightarrow\infty} x^k = x^\star \in \textrm{fix}(T), \] noting whether the convergence is weak or strong. Second to this, results are often sought regarding the rate at which \(x^k\longrightarrow x^\star\). Asynchronous methods and “learned” methods are being developed to further improve convergence of such fixed point methods both for massive scale problems and tasks that must be completed in real time.

For other examples of operators and fixed point problems, we refer the reader to ADMM and to the method of alternating projections.

Asynchronous algorithms are of practical interest primarily since they open the door to greater utilization of processing power than anologous synchronous algorithms, particularly as the number of computing agents (e.g., GPU cores) increases. Let \(\tau\) be a nonnegative integer, identifying how out-of-date of information (a.k.a. “stale”) we wish for processors to be able to utilize. For synchronous methods \(\tau =0\), but for asynchronous methods we assume \(\tau > 0\). Asynchronous methods replace each component \(x_i^k\) of the vector \(x^k\) in the update above with \( {\hat{x}^k_i} = x_i^{k-j} \), where \(j \in \lbrace 0, 1, 2, \ldots, \tau \rbrace \). This means the vector \( \hat{x}^k\) may possibly be out-of-date. Particular values of each \(j\) at individual iteration steps do not need to be known; instead it suffices to know \(j\leq \tau\) for all \(k\). In mathematical terms, this may be written as
\[ x^{k+1} := x^k - \lambda_k S_k \left( \hat{x}^k \right), \ \ \forall \ k \geq 1, \]
where each \(\lambda_k\) gives a step-size and each operator \(S_k\) is related to \(T_k\) above.
This enables, in practice, each processor to perform its computations at its own speed (rather than remain idle waiting for the slowest processor to catch up) and speeds up computations. This also introduces robustness to dropped network transmissions. With appropriate assumptions on the delays and step-sizes used, performing asynchronous updates generates a sequence that converges to a solution \( x^\star \). Moreover, in some cases, asynchronous algorithms can require *fewer* iterations to converge than their synchronous counterpart.

The term “machine learning” is a common buzz phrase nowadays. Although there are many wonderful applications and interesting underlying algorithms for training, this section is aimed at the portion of the field that field falls under the category of “learning to optimize”. This is a small and growing field of research that focuses on the automatic development of task-specific algorithms that must be repeatedly executed with new (but similar) data. An example of this in the context of reinforcement learning is described by Li and Malek. A distinct LISTA method by Gregor and LeCun created a network by unfolding an iterative procedure, truncated to a fixed number of iterations, and allowing its parameters to be learned. Their work was applied to a specific problem, and was followed up by the recent ALISTA method of Liu et al. The ALISTA paper refined the LISTA method by using fewer/simpler parameters to learn, thereby simplifying and speeding up the training for solving the associated machine learning problem. The underlying idea in these last two works is that some applications exist where a collection of data is available along with standard general purpose algorithms for solving a certain type of problem. A machine learning problem can then be formulated by relaxing parameters in the standard algorithm (truncated to a fixed number of iterations) to be learned and then minimizing a loss function associated with each output from this parameterized algorithm when applied to available data. The ensuing result in these works is for the learned algorithm to execute much more rapidly than the original standard algorithm.

Let \(\mathcal{H}\) be a Hilbert space of functions \(u:\Omega\rightarrow\mathbb{R}\), where \(\Omega\subset\mathbb{R}^n\). In the context of images, \(u\) is often used rather than \(x\). A variational method defines an energy functional \(J(u)\) for which a minimizer \(u\) possesses desirable properties, making the underlying problem \[ \min_{u\in\mathcal{H}} J(u). \] For example, if \(f\) is a blurry and grainy image, the classic Rudin-Osher-Fatemi (ROF) model proposes using the energy \(J(u)\) for denoising, which is defined by \[ J(u) := \int_\Omega |\nabla u | \ dx + \lambda \int_\Omega |u-f|^2 \ dx. \] Here \(\nabla\) denotes the gradient and \(\lambda > 0\) is a regularization parameter. Many image processing models exist for problems in differing contexts and/or applications. Nonlocal variational methods, which generalize local operators (e.g., gradients), have also been demonstrated to be quite effective in this field. Several algorithms exist for solving these variational problems, often using fixed point methods.

Below is a paper that presents the asynchronous sequential inertial (ASI) algorithm that weaves together convex feasibility problems and asynchronous methods in a simple manner while also introducing inertial terms. Additionally, there is a paper on superiorization, which is a heuristic methodology that lies somewhere between the feasibility-seaking and constrained minimization branches of optimization. This methodology takes the hypothesis above and provides a general framework that has been shown in many papers to be quite efficacious. Although the superiorization method generates a sequence \(\lbrace x^k \rbrace_{k=1}^\infty \) that provably converges to a feasibile point, limited results are known regarding the cost function values \(\phi(x^k)\) as \(k\longrightarrow \infty\). See Professor Yair Censor’s webpage for a comprehensive bibliography of papers on this topic. More results on the “learning to optimize” subject will be shared later this year.

H. Heaton and Y. Censor, Asynchronous Sequential Inertial Iterations for Common Fixed Points Problems with an Application to Linear Systems.

Journal of Global Optimization, Published February 14, 2019. DOI:10.1007/s10898-019-00747-4.

Y. Censor, H. Heaton, and R.W. Schulte, Derivative-free superiorization with component-wise perturbations.

Numerical Algorithms, Published April 11, 2018. DOI:10.1007/s11075-018-0524-0.