STEM Diary: Lebesgue's Theory of Real Analysis II: A General Framework for Approximation Theory and the Universal Approximation Theorem

"I have to pay a certain sum, which I have collected in my pocket. I take the bills and coins out of my pocket and give them to the creditor in the order I find them until I have reached the total sum. This is the Riemann integral. But I can proceed differently. After I have taken all the money out of my pocket I order the bills and coins according to identical values and then I pay the several heaps one after the other to the creditor. This is my integral." - Henri L. Lebesgue

Author's Commentary (Hong Kong China, 05/03/2026): This week and the next are and will be quite hectic, so I'm afraid my extracurricular pace will differ slightly in the way that I record my self-studying regimes. After next week commences, however, there will be one less course to be concerned with.

2 Applications in Approximation Theory: Simple Functions, CNC Machines, and the Universal Approximation Theorem

Suppose we were provided a problem of trying to determine a function in the description of particular phenomena in the sciences or engineering - unless further information was provided, say, in the form of a generalized theoretical framework as in Newtonian mechanics, trying to derive such a function may seem ill-posed and potentially impossible. Yet, approximations are possible if only there were appropriate sampling techniques, that is, we would sample functional values on objects of the domain of the hypothetical function so as to recover more and more "nodes", and, many approximation methods become open and appropriate. As many will know, there is an entire branch of approximation theory that comes in interpolation polynomials, however, for the purpose of this series, we would like to describe the simple functions instead. So, consider the following,

Theorem 2.1 Given a measurable function $f$ as defined on $\mathbb{R}$, one can always find a sequence of simple functions $\{ \phi_n \}_{n = 1}^{\infty}$ with $\left| \phi_{n_1} \right| \leq \left| \phi_{n_2} \right|$ whenever $n_1 < n_2$ such that $\lim_{n \rightarrow \infty} \phi_{n} = f$ pointwise almost everywhere.$^{[16]}$

PROOF. Without losing generality, we can consider the case where $f$ is non-negative since, if $f$ is not non-negative, then it can be decomposed into non-negative functions $f_1$ and $f_2$ in the sense of $f = f_1 - f_2$ for $f_1 = \max(f, 0)$ and $f_2 = \max(-f, 0)$ with both $f_1$ and $f_2$ measurable.$^{[17]}$ Thus, for $f$ non-negative, first consider the set of truncated functions $\{ F_n \}$ for $n \in \mathbb{N}$ as defined on closed intervals such that $F_n$ is defined to be $f(x)$ for $\left( x \in [-n, n] \right) \land \left( f(x) \leq n \right)$, defined to be $n$ for $\left( x \in [-n, n] \right) \land \left( f(x) > n \right)$, and $0$ otherwise. Then, consider the following sets for $k \in \mathbb{N}$ and $0 \leq k < n \cdot 2^n$,

$$E_{n, k} = \left\{ x \in [-n, n] \middle| \frac{k}{2^n} \leq F_n < \frac{k + 1}{2^n} \right\}$$

So we provide the functions $\phi_n$ in the form of,

$$\phi_n = \sum_{k} \frac{k}{2^n} \mathbb{1}_{E_{n, k}}$$

In the above, $\mathbb{1}_{E_{n, k}}$ denotes the indicator functions defined on sets $E_{n, k}$. As further remarks to introduce clarity in the above argument, we can note that $\lim_{n \rightarrow \infty} F_n = f$, and, that for arbitrarily small real $\epsilon > 0$, one can always find some sufficiently large $n$ such that $\left| F_n - \phi_n \right| < \epsilon$ almost everywhere, and so a simple application of the triangle inequality from elementary analysis completes the argument. $\square$

Notice that the functions $\phi_n$ provided in the above proof are quite reminiscent of the step functions, and since, after all, such functions are generalizations of the step functions. Additionally, we feel it appropriate to comment on a particular subtlety in that the function $f$ was required to be measurable - the measurability property has been utilized in the recovery of measurable pre-images, thus, that it makes sense to define indicator functions on the sets $E_{n, k}$, after all, we were trying to recover simple functions as approximations, and we can be reminded that simple functions are provided by finite linear combinations of indicator functions as defined on measurable sets. In recalling Lebesgue's intuitive description, what the proof had essentially achieved is to collect "coins" $x$ of the same type into their corresponding sets $E_{n, k}$. Incidentally, if we had defined $\phi_n = \sum_{k} \frac{k + 1}{2^n} \mathbb{1}_{E_{n, k}}$, then our sequence of simple functions could be decreasing rather than increasing.

Before moving on, I would like to point the attention of the interested reader to a particularly interesting demonstration of such a kind of mathematical phenomenon in the applied setting, and in the industrial setting no less. Such comes in a kind of machinery sometimes described as "3D CNC metal sheet bending machines", where, a collection of rods would exert varying pressures so as to bend steel sheets into particular surfaces - such is exactly a demonstration, although a special case, of the possibility that measurable functions can always be approximated by simple functions, where we repeat, for simple functions being functions that are provided by finite linear combinations of indicator functions as defined on appropriate collections of subsets of the domain.

In fact, for such kinds of 3D CNC metal sheet bending machines, we have that we desire certain surfaces for sheets of metals to conform to, and so, instead, a discrete approximation to it, in the form of a simple function, is provided and mimicked via a collection of rods as extended at different relative heights, somewhat akin to what one can observe regarding the natural phenomenon of the "Giant's Causeway", just with greater fineness. We can note finally however that the demonstration by such kinds of 3D CNC metal sheet bending machines provides special cases of simple functions, that of Riemann sums. But then, if Riemann sums suffice, why bother to continue to pursue more general notions of mathematical objects? Are we merely pursuing unnecessary generalizations?$^{[18]}$ Does it make sense to suppose that Riemann sums, and even Riemann integrals, suffice completely in "real life"? Well, we're going to provide examples in transitioning to the third part of this series, soon, where Riemann sums do not suffice.

But, there's an even more impressive result, intuitively and intimately related to Theorem 2.1, sometimes referred to as the "Universal Approximation Theorem", fundamental in the theory of neural networks, and a variation is,

Theorem 2.2 (Universal Approximation Theorem) Given a real continuous function $f$ as defined on a compact set of $\mathbb{R}$, one can always find a functions $\phi = \sum_{i = 1}^{n} \alpha_i \psi (\omega_i x + \beta_i)$ for $\psi$ denoting sigmoidal activation functions and $\alpha_i, \beta_i, \omega_i \in \mathbb{R}$ such that $\sup \| \phi - f \| < \epsilon$.

Instead of providing proof here, I just remind the reader that a certain result comes from a seminal 1989 paper by George Cybenko. We see that Theorem 2.2 is actually quite similar to Theorem 2.1, and, that Theorem 2.2 can be given as a special case of Theorem 2.1 if only 2.1 was stated for superpositions of simple functions, and 2.2, stated for simple function approximations to the sigmoid functions. In fact, the following GIF should enlighten the intuition as to why the universal approximation theorem, stated for sigmoid functions, make sense,

An intuitive point is simply this - since sigmoid functions can be identically zero on entire subsets of the real line, we can use parts of them, which are not identically 0, with non-zero derivatives, the localized "bumps", to "lift" or "lower" approximation functions in different parts to mimic a function which we intend to approximate.

Footnotes

[16] We note that the sequence of simple functions is indexed by a countable set. This is important as, if we had not provided a countable index set, then $\bigcup_{j = 1}^{\infty} E_j$ would not necessarily be a set of measure 0. Indeed, we use a sequence as opposed to a "net", that is, a topological net in the sense of topological filters where index sets are not assumed to be countable.

[17] Actually, the $\max(\cdot, \cdot)$ and $\min(\cdot, \cdot)$ "operators" are sometimes identified in contexts of "lattice operations". And, it is known that, as is sometimes described, measurability of measurable functions is closed with respect to lattice operations. For instance, see page 15 of Rudin's Real & Complex Analysis, or, page 40 of Klenke's Probability Theory: A Comprehensive Course. And, as to an example of a conventional exposition that uses the term "lattice" in such a context, see page 9 of Yosida's Functional Analysis. Also, we comment that there is such a notion as a "sigma-frame", stated here possibly for future reference.

[18] Consider another example in the notion of a "cohomology" in general. Some time ago, I thought there existed quite a few examples that illustrate the non-trivial utilitarian uses of such notions as cohomology in mathematical physics, until I realized that all of the examples that I had accessed are merely special cases, that are, for instance, de Rham cohomologies and what not, of differential geometry. As a result, one sometimes cannot help but feel, if all the special cases suffice for applications, how could one justify the more general notion of cohomology, of algebraic topology, from a utilitarian perspective? Of course, many perspectives and arguments exist, and one perspective comes in that abstraction provides a means of introducing further information-theoretic efficiency, and in reducing working memory load. More to come concerning such discussions.

References

Bogachev, V. I., & Smolyanov, O. G. (2020). Real and Functional Analysis. Springer Cham. https://doi.org/10.1007/978-3-030-38219-3

Courant, R. (1977). Dirichlet's Principle, Conformal Mapping, and Minimal Surfaces. Springer New York. (Original work published 1950). https://doi.org/10.1007/978-1-4612-9917-2

Courant, R., & Hilbert, D. (1989). Methods of Mathematical Physics: Volume I. Wiley-VCH Verlag GmbH & Co. KGaA. (Original work published 1953)

Courant, R., & John, F. (1989). Introduction to Calculus and Analysis: Volume I. Springer New York. (Original work published 1965). https://doi.org/10.1007/978-1-4613-8955-2

Cybenko, C. (1989). Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals, Systems, 2(4), 303-314.

Jahnke, H. N. (2003). A History of Analysis. American Mathematical Society; London Mathematical Society. (Original work published 1999). https://doi.org/10.1090/hmath/024

Klenke, A. (2020). Probability Theory: A Comprehensive Review (3rd ed.). Springer Cham. https://doi.org/10.1007/978-3-030-56402-5

Rudin, W. (1987). Real & Complex Analysis (3rd ed.). McGraw-Hill.

Stein, E. M., & Shakarchi, R. (2005). Real Analysis: Measure Theory, Hilbert Spaces, & Integration. Princeton University Press.

Yosida, K. (1995). Functional Analysis. Springer Berlin. (Original work published 1980). https://doi.org/10.1007/978-3-642-61859-8

Zorich, V. A. (2016). Mathematical Analysis II (2nd ed., R. Cooke, & O. Paniagua, Trans). Springer Berlin. (Original work published 2012). https://doi.org/10.1007/978-3-662-48993-2

STEM Diary

Lebesgue's Theory of Real Analysis II: A General Framework for Approximation Theory and the Universal Approximation Theorem

2 Applications in Approximation Theory: Simple Functions, CNC Machines, and the Universal Approximation Theorem

Footnotes

References

No comments:

Post a Comment

Variational Analysis and the Calculus of Variations I: An Application in Neurobiology, The Finite Element Method, and an Evolutionary Optimization Algorithm

Search This Blog