| Publications about 'differential geometry' |
| Articles in journal or book chapters |
| Given a "data manifold" $M\subset \mathbb{R}^n$ and "latent space" $\mathbb{R}^\ell$, an autoencoder is a pair of continuous maps consisting of an "encoder" $E\colon \mathbb{R}^n o \mathbb{R}^\ell$ and "decoder" $D\colon \mathbb{R}^\ell o \mathbb{R}^n$ such that the "round trip" map $D\circ E$ is as close as possible to the identity map $\mbox{id}_M$ on $M$. We present various topological limitations and capabilites inherent to the search for an autoencoder, and describe capabilities for autoencoding dynamical systems having $M$ as an invariant manifold. |
| Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a k-dimensional subset K of an input Euclidean space $\R^n$. The underlying idea is to obtain both an encoding layer that maps $\R^n$ into $\R^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\R^k$ back into $\R^n$, in such a way that the input data from the set K is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that K is homeomorphic to a k-dimensional subset of $\R^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to "work" well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential geometry. A computational example is also included to illustrate the ideas. |
| Single-cell -omics datasets are high-dimensional and difficult to visualize. A common strategy for exploring such data is to create and analyze 2D projections. Such projections may be highly nonlinear, and implementation algorithms are designed with the goal of preserving aspects of the original high-dimensional shape of data such as neighborhood relationships or metrics. However, important aspects of high-dimensional geometry are known from mathematical theory to have no equivalent representation in 2D, or are subject to large distortions, and will therefore be misrepresented or even invisible in any possible 2D representation. We show that features such as quantitative distances, relative positioning, and qualitative neighborhoods of high-dimensional data points will always be misrepresented in 2D projections. Our results rely upon concepts from differential geometry, combinatorial geometry, and algebraic topology. As an illustrative example, we show that even a simple single-cell RNA sequencing dataset will always be distorted, no matter what 2D projection is employed. We also discuss how certain recently developed computational tools can help describe the high-dimensional geometric features that will be necessarily missing from any possible 2D projections. |
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.
This document was translated from BibTEX by bibtex2html