Statistical analysis of the optimal transport problem

GONZÁLEZ SANZ, ALBERTO

Statistical analysis of the optimal transport problem

GONZÁLEZ SANZ, ALBERTO

unter der Leitung von:

Jean-Michel Loubes Co-Doktorvater/Doktormutter
Eustasio del Barrio Tellado Doktorvater

Universität der Verteidigung: Universidad de Valladolid

Fecha de defensa: 18 von April von 2023

Gericht:

Axel Munk Präsident/in
Rosa María Crujeiras Casais Sekretär/in
Jean-Michel Loubes Vocal
Eustasio del Barrio Tellado Vocal
Dolores Romero Morales Vocal
Gabriel Peyrè Vocal
Johan Segers Vocal

Art: Dissertation

Teseo: 830023 DIALNET UVADOC editor

Zusammenfassung

Optimal transportation is a resource allocation problem present in fields such as economics, finance, physics or artificial intelligence. From a probabilistic point of view, the optimal transport cost endows the space of probability measures with a metric topology. In particular, this topology is equivalent to the weak topology of probability measures together with the convergence of moments. This makes the transport cost an appropriate tool for measuring discrepancies between distributions. On the other hand, the solution of the transport problem is known as optimal plan. That is, an unambiguous way to relate two distributions following an optimality criterion. This optimal plan, when deterministic, is called a transport map. However, in many cases the probability distribution is a theoretical, unattainable entity. It is only visible to the practitioner through its empirical version, i.e. a finite data set of size n. This work examines the asymptotic behaviour of the transport cost in its empirical version. In other words, we study the limits of the empirical cost and plans when the data grows to infinity. It is well-known that the empirical transport cost converges to the population one. Moreover, for continuous measures it does so at a rate that decreases with dimension. In this thesis we prove the consistency of the transport map using topology of set-valued maps. This leads, indirectly, to being able to state that the rate at which the fluctuations - difference between the expected empirical cost and the empirical cost itself - approximate zero is the parametric one, irrespective of the dimension. Moreover, these fluctuations multiplied by the parametric rate tend toward a Gaussian random variable. In economics the transportation problem appears in numerous occasions in its semi-discrete version, i.e. one of the probability distributions is discrete. In this case, we show that the rate at which the empirical transport cost converges to the population one does not depend on the dimension. We also show that the well-known entropy regularization (or Sinkhorn regularization), apart from simplifying the computation of the transport problem by giving it a differentiable structure, has highly satisfactory statistical properties. In particular, its bias and the divergence - that the regularization defines - converge with speed greater than the parametric one; the empirical regularized plans converge to the population ones with paramtetric rate moreover, tending to a Gaussian process. The transport map endows a probability measure P with an order with respect to a given reference. This property leads to the successful definition of M.Hallin s multivariate distribution function by choosing as a reference measure the spherical uniform. This thesis provides sufficient conditions under which this function defines a homeomorphism between the support of the probability measure P and the unitary ball i.e. to support of the spherical uniform. Finally, we provide a conditional version of the multivariate distribution function, with applications to quantile regression.