Research direction

Optimization for Machine Learning

Second-order and higher-order optimization methods for scalable learning systems, with emphasis on curvature, quasi-Newton structure, variance reduction, distributed training, and theory.

Second-order methods Quasi-Newton methods SARAH Variance reduction Cubic regularization Distributed training Large-scale ML

Focus

This direction studies curvature-aware algorithms that make modern machine learning more efficient, reliable, and theoretically grounded. The work spans Newton and quasi-Newton methods, cubic regularization, variance-reduced stochastic methods, higher-order methods, Hessian sketches, and communication-efficient second-order algorithms.

Typical Questions

When does curvature information give real gains over first-order training?
How can Newton, cubic Newton, and quasi-Newton methods scale to large models?
How can stochastic recursive gradients reduce variance without expensive full gradients?
How should second-order information be compressed, sketched, or distributed?
Can we obtain clean global rates while preserving practical implementability?

Selected papers

Second-Order Optimization Highlights

3 papers

Decentralized agents exchanging local gradients and Hessians before a consensus cubic Newton step

2026 · Distributed Second-Order Optimization

Decentralized Inexact Cubic Newton Method with Consensus Procedure

Studies decentralized second-order optimization with only neighbor-to-neighbor communication.
Tracks errors from consensus and disagreement among local iterates, gradients, and Hessians.
Matches exact Cubic Newton iteration complexity up to polylogarithmic communication overhead.

Paper PDF

Newton convergence curves comparing stepsize schedules and linesearch rates

2026 · Newton Method / Global Rates

Newton Method Revisited: Global Convergence Rates up to O(1/k³) for Stepsize Schedules and Linesearch Procedures

Revisits Newton steps with stepsize schedules and linesearch procedures.
Establishes global convergence rates for classical second-order methods.
Clarifies when simple Newton-style algorithms can be both stable and fast.

Paper PDF

Curvature pairs forming a quasi-Newton Hessian approximation for an adaptive cubic step

2026 · Cubic Regularized Quasi-Newton

Accelerated Adaptive Cubic Regularized Quasi-Newton Methods

Combines adaptive cubic regularization with quasi-Newton curvature approximation.
Targets second-order behavior while avoiding full Hessian computation.
Connects implementable curvature approximations with convergence theory.

Paper PDF

Variance reduction

SARAH and AI-SARAH Line

4 papers

AI-SARAH recursive gradient estimate using local geometry to choose an implicit adaptive stepsize

2023 · TMLR / Variance Reduction

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

Extends SARAH with an adaptive, implicit approach to stepsize selection.
Uses local geometry and smoothness estimates to reduce hyperparameter tuning.
Shows faster convergence when the local geometry permits larger progress.

Paper PDF

SARAH stochastic recursive gradient update using fresh samples and the previous gradient estimator

2017 · ICML / SARAH

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

Introduces the stochastic recursive gradient estimator for finite-sum optimization.
Avoids storing past gradients while retaining variance-reduction benefits.
Establishes linear convergence in the strongly convex setting.

Paper PDF

Inexact SARAH replacing exact full gradients with mini-batch stochastic estimates

2020 · Optimization Methods and Software

Inexact SARAH Algorithm for Stochastic Optimization

Removes the need for an exact outer-loop gradient in SARAH.
Uses mini-batch estimates with variance-reduced recursive updates.
Extends the method beyond finite sums to general stochastic expectation problems.

Paper PDF

Random reshuffling across an epoch aggregating stochastic gradients instead of computing a full gradient

2023 · Optimization Letters

Random-reshuffled SARAH does not need full gradient computations

Uses random reshuffling and epoch aggregation to estimate the full gradient.
Reduces the expensive full-gradient step in SARAH-style variance reduction.
Provides theory and experiments for the reshuffled stochastic estimator.

Paper PDF

Quasi-Newton methods

Scalable Curvature Approximations

4 papers

Federated clients sending compressed quasi-Newton updates with error feedback corrections

2025 · OPT @ NeurIPS / Federated Learning

Quasi-Newton Methods for Federated Learning with Error Feedback

Combines L-BFGS style curvature with the EF21 error-feedback mechanism.
Targets communication-efficient federated learning under biased compression.
Obtains O(1/T) nonconvex convergence and linear rates under PL structure.

Paper PDF

A quasi-Newton approximation paired with a simple stepsize schedule and global convergence curves

2025 · Quasi-Newton Theory

Simple Stepsize for Quasi-Newton Methods with Global Convergence Guarantees

Introduces a simple stepsize schedule for globally convergent quasi-Newton methods.
Gives O(1/k) rates for convex objectives and accelerated O(1/k^2) rates under controlled approximation error.
Adds an adaptive variant that adjusts to curvature while retaining global guarantees.

Paper PDF

Fresh local samples around an iterate forming sampled L-BFGS and sampled LSR1 curvature approximations

2021 · Optimization Methods and Software

Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

Develops sampled L-BFGS and sampled LSR1 methods for deep learning.
Builds curvature approximations from fresh local samples instead of stale iterate history.
Designed for parallel and distributed training environments.

Paper PDF

Distributed workers sending compact sampled SR1 curvature sketches to a matrix-free optimizer

2020 · Distributed Quasi-Newton

Scaling Up Quasi-Newton Algorithms: Communication Efficient Distributed SR1

Scales sampled LSR1 through a communication-efficient distributed implementation.
Reduces communication rounds and keeps the method matrix-free and inverse-free.
Targets neural-network training tasks with better workload balance across nodes.

Paper PDF

Martin Takáč

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)

Optimization for Machine Learning

Focus

Typical Questions

Second-Order Optimization Highlights

Decentralized Inexact Cubic Newton Method with Consensus Procedure

Newton Method Revisited: Global Convergence Rates up to O(1/k³) for Stepsize Schedules and Linesearch Procedures

Accelerated Adaptive Cubic Regularized Quasi-Newton Methods

SARAH and AI-SARAH Line

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient

Inexact SARAH Algorithm for Stochastic Optimization

Random-reshuffled SARAH does not need full gradient computations

Scalable Curvature Approximations

Quasi-Newton Methods for Federated Learning with Error Feedback

Simple Stepsize for Quasi-Newton Methods with Global Convergence Guarantees

Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample

Scaling Up Quasi-Newton Algorithms: Communication Efficient Distributed SR1