Foundations of Computational Mathematics, volume 18, issue 4, pages 971-1013

Optimal Rates for Regularization of Statistical Inverse Learning Problems

Gilles Blanchard ¹

Nicole Mücke ¹

Hide authors affiliations

Institute of Mathematics, University of Potsdam, Potsdam, Germany |

Publication type: Journal Article

Publication date: 2017-06-20

Springer Nature

Journal: Foundations of Computational Mathematics

scimago Q1

SJR: 2.546

CiteScore: 6.9

Impact factor: 2.5

ISSN: 16153375, 16153383

DOI: 10.1007/s10208-017-9359-7

Copy DOI

Computational Mathematics

Computational Theory and Mathematics

Applied Mathematics

Analysis

Abstract

We consider a statistical inverse learning (also called inverse regression) problem, where we observe the image of a function f through a linear operator A at i.i.d. random design points $$X_i$$ , superposed with an additive noise. The distribution of the design points is unknown and can be very general. We analyze simultaneously the direct (estimation of Af) and the inverse (estimation of f) learning problems. In this general framework, we obtain strong and weak minimax optimal rates of convergence (as the number of observations n grows large) for a large class of spectral regularization methods over regularity classes defined through appropriate source conditions. This improves on or completes previous results obtained in related settings. The optimality of the obtained rates is shown not only in the exponent in n but also in the explicit dependency of the constant factor in the variance of the noise and the radius of the source condition set.

Found 30

By date By citations

Convergence rates of Kernel Conjugate Gradient for random design regression

Blanchard G., Krämer N.

Analysis and Applications scimago Q1 wos Q1 ,

2016-09-09, citations by CoLab: 17 , Abstract

Minimax fast rates for discriminant analysis with errors in variables

Loustau S., Marteau C.

Bernoulli scimago Q1 wos Q2 ,

2015-02-01, citations by CoLab: 8

Inverse statistical learning

Loustau S.

Electronic Journal of Statistics scimago Q1 wos Q3 Open Access

2013-01-01, citations by CoLab: 5 , Abstract

Spline Models for Observational Data

Wahba G.

2011-09-30, citations by CoLab: 3449

Optimal learning rates for least squares regularized regression with unbounded sampling

Wang C., Zhou D.

Journal of Complexity scimago Q1 wos Q1 ,

2011-02-01, citations by CoLab: 45 , Abstract

CROSS-VALIDATION BASED ADAPTATION FOR REGULARIZATION OPERATORS IN LEARNING THEORY

CAPONNETTO A., YAO Y.

Analysis and Applications scimago Q1 wos Q1 ,

2010-04-20, citations by CoLab: 47 , Abstract

Regularization in kernel learning

Mendelson S., Neeman J.

Annals of Statistics scimago Q1 wos Q1 ,

2009-12-31, citations by CoLab: 47 , Abstract

Introduction to Nonparametric Estimation

Tsybakov A.B.

2009-01-01, citations by CoLab: 890

Regularization Theory and Neural Networks Architectures

Girosi F., Jones M., Poggio T.

Neural Computation scimago Q1 wos Q3 ,

2008-04-04, citations by CoLab: 968 , Abstract

We had previously shown that regularization principles lead to approximation schemes that are equivalent to networks with one layer of hidden units, called regularization networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known radial basis functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends radial basis functions (RBF) to hyper basis functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, some forms of projection pursuit regression, and several types of neural networks. We propose to use the term generalized regularization networks for this broad class of approximation schemes that follow from an extension of regularization. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to different types of smoothness assumptions. In summary, different multilayer networks with one hidden layer, which we collectively call generalized regularization networks, correspond to different classes of priors and associated smoothness functionals in a classical regularization principle. Three broad classes are (1) radial basis functions that can be generalized to hyper basis functions, (2) some tensor product splines, and (3) additive splines that can be generalized to schemes of the type of ridge approximation, hinge functions, and several perceptron-like neural networks with one hidden layer.

Spectral Algorithms for Supervised Learning

Gerfo L.L., Rosasco L., Odone F., Vito E.D., Verri A.

Neural Computation scimago Q1 wos Q3 ,

2008-02-06, citations by CoLab: 58 , Abstract

Society for Industrial and Applied Mathematics (SIAM)

Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications

Bissantz N., Hohage T., Munk A., Ruymgaart F.

SIAM Journal on Numerical Analysis scimago Q1 wos Q1 ,

2007-12-07, citations by CoLab: 129 , Abstract

On Early Stopping in Gradient Descent Learning

Yao Y., Rosasco L., Caponnetto A.

Constructive Approximation scimago Q1 wos Q1 ,

2007-04-04, citations by CoLab: 574 , Abstract

Learning Theory Estimates via Integral Operators and Their Approximations

Smale S., Zhou D.

Constructive Approximation scimago Q1 wos Q1 ,

2007-03-15, citations by CoLab: 335 , Abstract

Approximation in Learning Theory

Temlyakov V.N.

Constructive Approximation scimago Q1 wos Q1 ,

2007-02-09, citations by CoLab: 17 , Abstract

This paper addresses some problems of supervised learning in the setting formulated by Cucker and Smale. Supervised learning, or learning-from-examples, refers to a process that builds on the base of available data of inputs xi and outputs yi, i = 1,...,m, a function that best represents the relation between the inputs x ∈ X and the corresponding outputs y ∈ Y. The goal is to find an estimator fz on the base of given data z := ((x1,y1),...,(xm,ym)) that approximates well the regression function fρ (or its projection) of an unknown Borel probability measure ρ defined on Z = X × Y. We assume that (xi,yi), i = 1,...,m, are independent and distributed according to ρ. We discuss the following two problems: I. the projection learning problem (improper function learning problem); II. universal (adaptive) estimators in the proper function learning problem. In the first problem we do not impose any restrictions on a Borel measure ρ except our standard assumption that |y|≤ M a.e. with respect to ρ. In this case we use the data z to estimate (approximate) the L2(ρX) projection (fρ)W of fρ onto a function class W of our choice. Here, ρX is the marginal probability measure. In [KT1,2] this problem has been studied for W satisfying the decay condition εn(W,B) ≤ Dn-r of the entropy numbers εn(W,B) of W in a Banach space B in the case B = C(X) or B = L2(\rhoX). In this paper we obtain the upper estimates in the case εn(W,L1(ρX)) ≤ Dn-r with an extra assumption that W is convex. In the second problem we assume that an unknown measure ρ satisfies some conditions. Following the standard way from nonparametric statistics we formulate these conditions of the form fρ ∈ Θ. Next, we assume that the only a priori information available is that fρ belongs to a class Θ (unknown) from a known collection {Θ} of classes. We want to build an estimator that provides approximation of fρ close to the optimal for the class Θ. Along with standard penalized least squares estimators we consider a new method of construction of universal estimators. This method is based on a combination of two powerful ideas in building universal estimators. The first one is the use of penalized least squares estimators. This idea works well in the case of general setting with rather abstract methods of approximation. The second one is the idea of thresholding that works very well when we use wavelets expansions as an approximation tool. A new estimator that we call the big jump estimator uses the least squares estimators and chooses a right model by a thresholding criteria instead of the penalization. In this paper we illustrate how ideas and methods of approximation theory can be used in learning theory both in formulating a problem and in solving it.

On regularization algorithms in learning theory

Bauer F., Pereverzev S., Rosasco L.

Journal of Complexity scimago Q1 wos Q1 ,

2007-02-01, citations by CoLab: 143 , Abstract

Found 41

By date By citations

Distributed learning with discretely observed functional data

Liu J., Shi L.

Inverse Problems scimago Q1 wos Q2 ,

2025-03-20, citations by CoLab: 0 , Abstract

Criticality Measure-Based Error Estimates for Infinite Dimensional Optimization

Li D., Milz J.

SIAM Journal on Numerical Analysis scimago Q1 wos Q1 ,

2025-01-23, citations by CoLab: 0

How many neurons do we need? A refined analysis for shallow networks trained with gradient descent

Nguyen M., Mücke N.

Journal of Statistical Planning and Inference scimago Q2 wos Q3 ,

2024-12-01, citations by CoLab: 1 , Abstract

Institute for Operations Research and the Management Sciences (INFORMS)

Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions

Wang W., Wang Y., Zhang X.

Management Science scimago Q1 wos Q1 Open Access

2024-12-01, citations by CoLab: 2 , Abstract

Adaptive Parameter Selection for Kernel Ridge Regression

Lin S.

Applied and Computational Harmonic Analysis scimago Q1 wos Q1 ,

2024-11-01, citations by CoLab: 2 , Abstract

Institute of Electrical and Electronics Engineers (IEEE)

Learning for Control: $\mathcal {L}_{1}$-error Bounds for Kernel-based Regression

Bisiacco M., Pillonetto G.

IEEE Transactions on Automatic Control scimago Q1 wos Q1 ,

2024-10-01, citations by CoLab: 0

Least Squares Approximations in Linear Statistical Inverse Learning Problems

Helin T.

SIAM Journal on Numerical Analysis scimago Q1 wos Q1 ,

2024-08-22, citations by CoLab: 0

Nonlinear Tikhonov regularization in Hilbert Scales for Inverse Learning

Rastogi A.

Journal of Complexity scimago Q1 wos Q1 ,

2024-06-01, citations by CoLab: 2 , Abstract

Iterative Kernel Regression with Preconditioning

Shi L., Zhang Z.

Analysis and Applications scimago Q1 wos Q1 ,

2024-04-22, citations by CoLab: 0 , Abstract

Sketching with Spherical Designs for Noisy Data Fitting on Spheres

Lin S., Wang D., Zhou D.

SIAM Journal of Scientific Computing scimago Q1 wos Q1 ,

2024-01-25, citations by CoLab: 3

Learning particle swarming models from data with Gaussian processes

Feng J., Kulick C., Ren Y., Tang S.

Mathematics of Computation scimago Q1 wos Q1 ,

2023-11-15, citations by CoLab: 2 , Abstract

Interacting particle or agent systems that exhibit diverse swarming behaviors are prevalent in science and engineering. Developing effective differential equation models to understand the connection between individual interaction rules and swarming is a fundamental and challenging goal. In this paper, we study the data-driven discovery of a second-order particle swarming model that describes the evolution of N N particles in R d \mathbb {R}^d under radial interactions. We propose a learning approach that models the latent radial interaction function as Gaussian processes, which can simultaneously fulfill two inference goals: one is the nonparametric inference of the interaction function with pointwise uncertainty quantification, and the other is the inference of unknown scalar parameters in the noncollective friction forces of the system. We formulate the learning problem as a statistical inverse learning problem and introduce an operator-theoretic framework that provides a detailed analysis of recoverability conditions, establishing that a coercivity condition is sufficient for recoverability. Given data collected from M M i.i.d trajectories with independent Gaussian observational noise, we provide a finite-sample analysis, showing that our posterior mean estimator converges in a Reproducing Kernel Hilbert Space norm, at an optimal rate in M M equal to the one in the classical 1-dimensional Kernel Ridge regression. As a byproduct, we show we can obtain a parametric learning rate in M M for the posterior marginal variance using L ∞ L^{\infty } norm and that the rate could also involve N N and L L (the number of observation time instances for each trajectory) depending on the condition number of the inverse problem. We provide numerical results on systems exhibiting different swarming behaviors, highlighting the effectiveness of our approach in the scarce, noisy trajectory data regime.

A note on the prediction error of principal component regression in high dimensions

Hucker L., Wahl M.

Theory of Probability and Mathematical Statistics scimago Q3 wos Q4 ,

2023-10-03, citations by CoLab: 0 , Abstract

Kernel interpolation generalizes poorly

Li Y., Zhang H., Lin Q.

Biometrika scimago Q1 wos Q1 ,

2023-08-07, citations by CoLab: 0 , Abstract

Optimality of Robust Online Learning

Guo Z., Christmann A., Shi L.

Foundations of Computational Mathematics scimago Q1 wos Q1 ,

2023-07-26, citations by CoLab: 7 , Abstract

American Institute of Mathematical Sciences (AIMS)

Convex regularization in statistical inverse learning problems

Bubba T.A., Burger M., Helin T., Ratti L.

Inverse Problems and Imaging scimago Q2 wos Q3 ,

2023-04-12, citations by CoLab: 2 , Abstract

	1 2 3 4
Analysis and Applications	Analysis and Applications, 4, 9.76% Analysis and Applications 4 publications, 9.76%
Inverse Problems	Inverse Problems, 4, 9.76% Inverse Problems 4 publications, 9.76%
SIAM Journal on Numerical Analysis	SIAM Journal on Numerical Analysis, 3, 7.32% SIAM Journal on Numerical Analysis 3 publications, 7.32%
Applied and Computational Harmonic Analysis	Applied and Computational Harmonic Analysis, 3, 7.32% Applied and Computational Harmonic Analysis 3 publications, 7.32%
Machine Learning	Machine Learning, 1, 2.44% Machine Learning 1 publication, 2.44%
SIAM Journal on Mathematical Analysis	SIAM Journal on Mathematical Analysis, 1, 2.44% SIAM Journal on Mathematical Analysis 1 publication, 2.44%
SIAM-ASA Journal on Uncertainty Quantification	SIAM-ASA Journal on Uncertainty Quantification, 1, 2.44% SIAM-ASA Journal on Uncertainty Quantification 1 publication, 2.44%
International Journal of Wavelets, Multiresolution and Information Processing	International Journal of Wavelets, Multiresolution and Information Processing, 1, 2.44% International Journal of Wavelets, Multiresolution and Information Processing 1 publication, 2.44%
Annales de l'institut Henri Poincare (B) Probability and Statistics	Annales de l'institut Henri Poincare (B) Probability and Statistics, 1, 2.44% Annales de l'institut Henri Poincare (B) Probability and Statistics 1 publication, 2.44%
Electronic Journal of Statistics	Electronic Journal of Statistics, 1, 2.44% Electronic Journal of Statistics 1 publication, 2.44%
Mathematics	Mathematics, 1, 2.44% Mathematics 1 publication, 2.44%
Journal of Fourier Analysis and Applications	Journal of Fourier Analysis and Applications, 1, 2.44% Journal of Fourier Analysis and Applications 1 publication, 2.44%
Computational Optimization and Applications	Computational Optimization and Applications, 1, 2.44% Computational Optimization and Applications 1 publication, 2.44%
Information Sciences	Information Sciences, 1, 2.44% Information Sciences 1 publication, 2.44%
Automatica	Automatica, 1, 2.44% Automatica 1 publication, 2.44%
Neural Networks	Neural Networks, 1, 2.44% Neural Networks 1 publication, 2.44%
Journal of Computational and Applied Mathematics	Journal of Computational and Applied Mathematics, 1, 2.44% Journal of Computational and Applied Mathematics 1 publication, 2.44%
Inverse Problems and Imaging	Inverse Problems and Imaging, 1, 2.44% Inverse Problems and Imaging 1 publication, 2.44%
Mathematical Methods of Statistics	Mathematical Methods of Statistics, 1, 2.44% Mathematical Methods of Statistics 1 publication, 2.44%
Applied and Numerical Harmonic Analysis	Applied and Numerical Harmonic Analysis, 1, 2.44% Applied and Numerical Harmonic Analysis 1 publication, 2.44%
Bernoulli	Bernoulli, 1, 2.44% Bernoulli 1 publication, 2.44%
AIP Conference Proceedings	AIP Conference Proceedings, 1, 2.44% AIP Conference Proceedings 1 publication, 2.44%
Foundations of Computational Mathematics	Foundations of Computational Mathematics, 1, 2.44% Foundations of Computational Mathematics 1 publication, 2.44%
Theory of Probability and Mathematical Statistics	Theory of Probability and Mathematical Statistics, 1, 2.44% Theory of Probability and Mathematical Statistics 1 publication, 2.44%
Mathematics of Computation	Mathematics of Computation, 1, 2.44% Mathematics of Computation 1 publication, 2.44%
Biometrika	Biometrika, 1, 2.44% Biometrika 1 publication, 2.44%
Journal of Complexity	Journal of Complexity, 1, 2.44% Journal of Complexity 1 publication, 2.44%
SIAM Journal of Scientific Computing	SIAM Journal of Scientific Computing, 1, 2.44% SIAM Journal of Scientific Computing 1 publication, 2.44%
Management Science	Management Science, 1, 2.44% Management Science 1 publication, 2.44%
Journal of Statistical Planning and Inference	Journal of Statistical Planning and Inference, 1, 2.44% Journal of Statistical Planning and Inference 1 publication, 2.44%
	1 2 3 4

	1 2 3 4 5 6 7 8 9
Elsevier	Elsevier, 9, 21.95% Elsevier 9 publications, 21.95%
Society for Industrial and Applied Mathematics (SIAM)	Society for Industrial and Applied Mathematics (SIAM), 6, 14.63% Society for Industrial and Applied Mathematics (SIAM) 6 publications, 14.63%
Springer Nature	Springer Nature, 5, 12.2% Springer Nature 5 publications, 12.2%
World Scientific	World Scientific, 5, 12.2% World Scientific 5 publications, 12.2%
IOP Publishing	IOP Publishing, 4, 9.76% IOP Publishing 4 publications, 9.76%
Institute of Mathematical Statistics	Institute of Mathematical Statistics, 2, 4.88% Institute of Mathematical Statistics 2 publications, 4.88%
American Mathematical Society	American Mathematical Society, 2, 4.88% American Mathematical Society 2 publications, 4.88%
MDPI	MDPI, 1, 2.44% MDPI 1 publication, 2.44%
American Institute of Mathematical Sciences (AIMS)	American Institute of Mathematical Sciences (AIMS), 1, 2.44% American Institute of Mathematical Sciences (AIMS) 1 publication, 2.44%
Pleiades Publishing	Pleiades Publishing, 1, 2.44% Pleiades Publishing 1 publication, 2.44%
Taylor & Francis	Taylor & Francis, 1, 2.44% Taylor & Francis 1 publication, 2.44%
AIP Publishing	AIP Publishing, 1, 2.44% AIP Publishing 1 publication, 2.44%
Oxford University Press	Oxford University Press, 1, 2.44% Oxford University Press 1 publication, 2.44%
Institute for Operations Research and the Management Sciences (INFORMS)	Institute for Operations Research and the Management Sciences (INFORMS), 1, 2.44% Institute for Operations Research and the Management Sciences (INFORMS) 1 publication, 2.44%
Institute of Electrical and Electronics Engineers (IEEE)	Institute of Electrical and Electronics Engineers (IEEE), 1, 2.44% Institute of Electrical and Electronics Engineers (IEEE) 1 publication, 2.44%
	1 2 3 4 5 6 7 8 9

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.

Metrics

Cite this

GOST | RIS | BibTex | MLA

Found error?

Publisher

Springer Nature

Journal

Foundations of Computational Mathematics

scimago Q1

SJR

2.546

CiteScore

6.9

Impact factor

2.5

ISSN

16153375 (Print)

16153383 (Electronic)

Optimal Rates for Regularization of Statistical Inverse Learning Problems

Top-30

Journals

Publishers

Are you a researcher?