Publications about 'neural networks'

Publications about 'neural networks'

Articles in journal or book chapters

A. Darabi, Z. An, M.A. Al-Radhawi, W. Cho, M. Siami, and E.D. Sontag. Combining model-based and data-driven models: an application to synthetic biology resource competition. 2025. Note: Submitted. Preprint in bioRxiv 2025/642275. [WWW] Keyword(s): mechanistic models, machine learning, neural networks, resource competition, synthetic biology. Abstract:

This work explores the integration of machine learning (ML) and mechanistic models (MM). While ML has demonstrated remarkable success in data-driven modeling across engineering, biology, and other scientific fields, MM remain essential for their interpretability and capacity to extrapolate beyond observed conditions based on established principles such as chemical kinetics and physiological processes. However, MM can be labor-intensive to construct and often rely on simplifying assumptions that may not fully capture real-world complexity. It is thus desirable to combine MM and ML approaches so as to enable more robust predictions, enhanced system insights, and improved handling of sparse or noisy data. A key challenge when doing so is ensuring that ML components do not disregard mechanistic information, potentially leading to overfitting or reduced interpretability. To address that challenge, this paper introduces the idea of Partially Uncertain Model Structures (PUMS) and investigates conditions that discourage the ML components from ignoring mechanistic constraints. This work also introduces the concept of embedded Physics-Informed Neural Networks (ePINNs), which consist of two loss-sharing neural networks that seamlessly blend ML and MM components. This work arose in the study of the context problem in synthetic biology. Engineered genetic circuits may exhibit unexpected behavior in living cells due to resource sharing. To illustrate the advantages of the ePINNs approach, this paper applies the framework to a gene network model subject to resource competition, demonstrating the effectiveness of this hybrid modeling approach in capturing complex system interactions while maintaining physical consistency.

P. Mestres, J. Cortés, and E.D. Sontag. Neural network-based universal formulas for control. 2025. Note: Submitted. Also arXiv https://arxiv.org/abs/2505.24744. Keyword(s): control-Lyapunov functions, control barrier functions, universal formulas, neural networks. Abstract:

We study the problem of designing a controller that satisfies an arbitrary number of affine inequalities at every point in the state space. This is motivated by the use of guardrails in autonomous systems. Indeed, a variety of key control objectives, such as stability, safety, and input saturation, are guaranteed by closed-loop systems whose controllers satisfy such inequalities. Many works in the literature design such controllers as the solution to a state-dependent quadratic program (QP) whose constraints are precisely the inequalities. When the input dimension and number of constraints are high, computing a solution of this QP in real time can become computationally burdensome. Additionally, the solution of such optimization problems is not smooth in general, which can degrade the performance of the system. This paper provides a novel method to design a smooth controller that satisfies an arbitrary number of affine constraints. This why we refer to it as a universal formula for control. The controller is given at every state as the minimizer of a strictly convex function. To avoid computing the minimizer of such function in real time, we introduce a method based on neural networks (NN) to approximate the controller. Remarkably, this NN can be used to solve the controller design problem for any task with less than a fixed input dimension and number of affine constraints, and is completely independent of the state dimension. Additionally, we show that the NN-based controller only needs to be trained with datapoints from a compact set in the state space, which significantly simplifies the training process. Various simulations showcase the performance of the proposed solution, and also show that the NN-based controller can be used to warmstart an optimization scheme that refines the approximation of the true controller in real time, significantly reducing the computational cost compared to a generic initialization.

E.D. Sontag. Some remarks on gradient dominance and LQR policy optimization. arXiv, 2025. [doi:https://doi.org/10.48550/arXiv.2507.10452] Keyword(s): gradient dominance, gradient flows, LQR, reinforcement learning, machine learning, artificial intelligence, optimal control. Abstract:

Solutions of optimization problems, including policy optimization in reinforcement learning, typically rely upon some variant of gradient descent. There has been much recent work in the machine learning, control, and optimization communities applying the Polyak-Åojasiewicz Inequality (PLI) to such problems in order to establish an exponential rate of convergence (a.k.a. ``linear convergence'' in the local-iteration language of numerical analysis) of loss functions to their minima under the gradient flow. Often, as is the case of policy iteration for the continuous-time LQR problem, this rate vanishes for large initial conditions, resulting in a mixed globally linear / locally exponential behavior. This is in sharp contrast with the discrete-time LQR problem, where there is global exponential convergence. That gap between CT and DT behaviors motivates the search for various generalized PLI-like conditions, and this paper addresses that topic. Moreover, these generalizations are key to understanding the transient and asymptotic effects of errors in the estimation of the gradient, errors which might arise from adversarial attacks, wrong evaluation by an oracle, early stopping of a simulation, inaccurate and very approximate digital twins, stochastic computations (algorithm ``reproducibility''), or learning by sampling from limited data. We describe an ``input to state stability'' (ISS) analysis of this issue. We also discuss convergence and PLI-like properties of ``linear feedforward neural networks'' in feedback control. Much of the work described here was done in collaboration with Arthur Castello B. de Oliveira, Leilei Cui, Zhong-Ping Jiang, and Milad Siami. This is a short paper summarizing the slides presented at my keynote at the 2025 L4DC (Learning for Dynamics \& Control Conference) in Ann Arbor, Michigan, 05 June 2025. A partial bibliography has been added.

A.C.B de Olivera, M. Siami, and E.D. Sontag. Convergence analysis of overparametrized LQR formulations. Automatica, 2025. Note: To appear. Also more details in arXiv 2408.15456, pdf linked here is to the long version. [PDF] Keyword(s): learning theory, singularities in optimization, gradient systems, overparametrization, neural networks, overparametrization, gradient descent, input to state stability, feedback control, LQR. Abstract:

Motivated by the growing use of Artificial Intelligence (AI) tools in control design, this paper takes the first steps towards bridging the gap between results from Direct Gradient methods for the Linear Quadratic Regulator (LQR), and neural networks. More specifically, it looks into the case where one wants to find a Linear Feed-Forward Neural Network (LFFNN) feedback that minimizes a LQR cost. This paper starts by computing the gradient formulas for the parameters of each layer, which are used to derive a key conservation law of the system. This conservation law is then leveraged to prove boundedness and global convergence of solutions to critical points, and invariance of the set of stabilizing networks under the training dynamics. This is followed by an analysis of the case where the LFFNN has a single hidden layer. For this case, the paper proves that the training converges not only to critical points but to the optimal feedback control law for all but a set of measure-zero of the initializations. These theoretical results are followed by an extensive analysis of a simple version of the problem (the ``vector case''), proving the theoretical properties of accelerated convergence and robustness for this simpler example. Finally, the paper presents numerical evidence of faster convergence of the training of general LFFNNs when compared to traditional direct gradient methods, showing that the acceleration of the solution is observable even when the gradient is not explicitly computed but estimated from evaluations of the cost function.

M. D. Kvalheim and E. D. Sontag. Why should autoencoders work?. Transactions on Machine Learning Research, 2024. Note: See also 2023 preprint in https://arxiv.org/abs/2310.02250.[WWW] [PDF] Keyword(s): autoencoders, neural networks, differential topology, model reduction. Abstract:

Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a k-dimensional subset K of an input Euclidean space $\R^n$. The underlying idea is to obtain both an encoding layer that maps $\R^n$ into $\R^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\R^k$ back into $\R^n$, in such a way that the input data from the set K is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that K is homeomorphic to a k-dimensional subset of $\R^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to "work" well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential geometry. A computational example is also included to illustrate the ideas.

J. Hanson, M. Raginsky, and E.D. Sontag. Learning recurrent neural net models of nonlinear systems. Proc. of Machine Learning Research, 144:1-11, 2021. [PDF] Keyword(s): machine learning, empirical risk minimization, recurrent neural networks, dynamical systems, continuous time, system identification, statistical learning theory, generalization bounds. Abstract:

This paper considers the following learning problem: given sample pairs of input and output signals generated by an unknown nonlinear system (which is not assumed to be causal or time-invariant), one wishes to find a continuous-time recurrent neural net, with activation function tanh, that approximately reproduces the underlying i/o behavior with high confidence. Leveraging earlier work concerned with matching derivatives up to a finite order of the input and output signals the problem is reformulated in familiar system-theoretic language and quantitative guarantees on the sup-norm risk of the learned model are derived, in terms of the number of neurons, the sample size, the number of derivatives being matched, and the regularity properties of the inputs, the outputs, and the unknown i/o map.

W. Maass, P. Joshi, and E.D. Sontag. Computational aspects of feedback in neural circuits. PLoS Computational Biology, 3:e165 1-20, 2007. [PDF] Keyword(s): machine learning, neural networks, feedback linearization, computation by cortical microcircuits, fading memory. Abstract:

It had previously been shown that generic cortical microcircuit models can perform complex real-time computations on continuous input streams, provided that these computations can be carried out with a rapidly fading memory. We investigate in this article the computational capability of such circuits in the more realistic case where not only readout neurons, but in addition a few neurons within the circuit have been trained for specific tasks. This is essentially equivalent to the case where the output of trained readout neurons is fed back into the circuit. We show that this new model overcomes the limitation of a rapidly fading memory. In fact, we prove that in the idealized case without noise it can carry out any conceivable digital or analog computation on time-varying inputs. But even with noise the resulting computational model can perform a large class of biologically relevant real-time computations that require a non-fading memory.

W. Maass, P. Joshi, and E.D. Sontag. Principles of real-time computing with feedback applied to cortical microcircuit models. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, 2006. [PDF] Keyword(s): neural networks. Abstract:

The network topology of neurons in the brain exhibits an abundance of feedback connections, but the computational function of these feedback connections is largely unknown. We present a computational theory that characterizes the gain in computational power achieved through feedback in dynamical systems with fading memory. It implies that many such systems acquire through feedback universal computational capabilities for analog computing with a non-fading memory. In particular, we show that feedback enables such systems to process time-varying input streams in diverse ways according to rules that are implemented through internal states of the dynamical system. In contrast to previous attractor-based computational models for neural networks, these flexible internal states are high-dimensional attractors of the circuit dynamics, that still allow the circuit state to absorb new information from online input streams. In this way one arrives at novel models for working memory, integration of evidence, and reward expectation in cortical circuits. We show that they are applicable to circuits of conductance-based Hodgkin-Huxley (HH) neurons with high levels of noise that reflect experimental data on invivo conditions.

P. Kuusela, D. Ocone, and E.D. Sontag. Learning Complexity Dimensions for a Continuous-Time Control System. SIAM J. Control Optim., 43(3):872-898, 2004. [PDF] [doi:http://dx.doi.org/10.1137/S0363012901384302] Keyword(s): machine learning, theory of computing and complexity, VC dimension, neural networks. Abstract:

This paper takes a computational learning theory approach to a problem of linear systems identification. It is assumed that input signals have only a finite number k of frequency components, and systems to be identified have dimension no greater than n. The main result establishes that the sample complexity needed for identification scales polynomially with n and logarithmically with k.

W. Maass and E.D. Sontag. Neural Systems as Nonlinear Filters. Neural Comput., 12(8):1743-1772, 2000. [PDF] [doi:http://dx.doi.org/10.1162/089976600300015123] Keyword(s): neural networks, Volterra series. Abstract:

We analyze computations on temporal patterns and spatio-temporal patterns in formal network models whose temporal dynamics arises from empirically established quantitative models for short term dynamics at biological synapses. We give a complete characterization of all linear and nonlinear filters that can be approximated by such dynamic network models: it is the class of all filters that can be approximated by Volterra series. This characterization is shown to be rather stable with regard to changes in the model. For example it is shown that synaptic facilitation and one layer of neurons suffices for approximating arbitrary filters from this class. Our results provide a new complexity hierarchy for all filters that are approximable by Volterra series, which appears to be closer related to the actual cost of implementing such filters in neural hardware than preceding complexity measures. Our results also provide a new parameterization for approximations to such filters in terms of parameters that are arguable related to those that are tunable in biological neural systems.

W. Maass and E.D. Sontag. Analog neural nets with Gaussian or other common noise distributions cannot recognize arbitrary regular languages. Neural Comput., 11(3):771-782, 1999. [PDF] [doi:http://dx.doi.org/10.1162/089976699300016656] Keyword(s): machine learning, neural networks. Abstract:

We consider recurrent analog neural nets where the output of each gate is subject to Gaussian noise, or any other common noise distribution that is nonzero on a large set. We show that many regular languages cannot be recognized by networks of this type, and we give a precise characterization of those languages which can be recognized. This result implies severe constraints on possibilities for constructing recurrent analog neural nets that are robust against realistic types of analog noise. On the other hand we present a method for constructing feedforward analog neural nets that are robust with regard to analog noise of this type.

E.D. Sontag and Y. Qiao. Further results on controllability of recurrent neural networks. Systems Control Lett., 36(2):121-129, 1999. [PDF] Keyword(s): machine learning, controllability, recurrent neural networks, neural networks. Abstract:

This paper studies controllability properties of recurrent neural networks. The new contributions are: (1) an extension of the result in "Complete controllability of continuous-time recurrent neural networks" to a slightly different model, where inputs appear in an affine form, (2) a formulation and proof of a necessary and sufficient condition, in terms of local-local controllability, and (3) a complete analysis of the 2-dimensional case for which the hypotheses made in previous work do not apply.

E.D. Sontag. Automata and neural networks. In The handbook of brain theory and neural networks, pages 119-122. MIT Press, Cambridge, MA, USA, 1998. [PDF] Keyword(s): neural networks.

E.D. Sontag. VC dimension of neural networks. In C.M. Bishop, editor, Neural Networks and Machine Learning, pages 69-95. Springer, Berlin, 1998. [PDF] Keyword(s): machine learning, VC dimension, learning, neural networks, shattering. Abstract:

The Vapnik-Chervonenkis (VC) dimension is an integer which helps to characterize distribution-independent learning of binary concepts from positive and negative samples. This paper, based on lectures delivered at the Isaac Newton Institute in August of 1997, presents a brief introduction, establishes various elementary results, and discusses how to estimate the VC dimension in several examples of interest in neural network theory. (It does not address the learning and estimation-theoretic applications of VC dimension, and the applications to uniform convergence theorems for empirical probabilities, for which many suitable references are available.)

P. Koiran and E.D. Sontag. Vapnik-Chervonenkis dimension of recurrent neural networks. Discrete Appl. Math., 86(1):63-79, 1998. [PDF] [doi:http://dx.doi.org/10.1016/S0166-218X(98)00014-6] Keyword(s): machine learning, neural networks, recurrent neural networks. Abstract:

This paper provides lower and upper bounds for the VC dimension of recurrent networks. Several types of activation functions are discussed, including threshold, polynomial, piecewise-polynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. Ignoring multiplicative constants, the main results say roughly the following: 1. For architectures whose activation is any fixed nonlinear polynomial, the VC dimension is proportional to wk. 2. For architectures whose activation is any fixed piecewise polynomial, the VC dimension is between wk and w**2k. 3. For architectures with threshold activations, the VC dimension is between wlog(k/w) and the smallest of wklog(wk) and w**2+wlog(wk). 4. For the standard sigmoid tanh(x), the VC dimension is between wk and w**4 k**2.

E.D. Sontag. A learning result for continuous-time recurrent neural networks. Systems Control Lett., 34(3):151-158, 1998. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(98)00006-1] Keyword(s): machine learning, neural networks, VC dimension, recurrent neural networks. Abstract:

The following learning problem is considered, for continuous-time recurrent neural networks having sigmoidal activation functions. Given a ``black box'' representing an unknown system, measurements of output derivatives are collected, for a set of randomly generated inputs, and a network is used to approximate the observed behavior. It is shown that the number of inputs needed for reliable generalization (the sample complexity of the learning problem) is upper bounded by an expression that grows polynomially with the dimension of the network and logarithmically with the number of output derivatives being matched.

P. Koiran and E.D. Sontag. Vapnik-Chervonenkis dimension of recurrent neural networks. In Computational learning theory (Jerusalem, 1997), volume 1208 of Lecture Notes in Comput. Sci., pages 223-237. Springer-Verlag, London, UK, 1997. Keyword(s): machine learning, neural networks, VC dimension, recurrent neural networks.

E.D. Sontag. Recurrent neural networks: Some systems-theoretic aspects. In M. Karny, K. Warwick, and V. Kurkova, editors, Dealing with Complexity: a Neural Network Approach, pages 1-12. Springer-Verlag, London, 1997. [PDF] Keyword(s): machine learning, neural networks, recurrent neural networks, learning, VC dimension. Abstract:

This paper provides an exposition of some recent results regarding system-theoretic aspects of continuous-time recurrent (dynamic) neural networks with sigmoidal activation functions. The class of systems is introduced and discussed, and a result is cited regarding their universal approximation properties. Known characterizations of controllability, observability, and parameter identifiability are reviewed, as well as a result on minimality. Facts regarding the computational power of recurrent nets are also mentioned.

M. J. Donahue, L. Gurvits, C. Darken, and E.D. Sontag. Rates of convex approximation in non-Hilbert spaces. Constr. Approx., 13(2):187-220, 1997. [PDF] Keyword(s): machine learning, neural networks, optimization, approximation theory. Abstract:

This paper deals with sparse approximations by means of convex combinations of elements from a predetermined "basis" subset S of a function space. Specifically, the focus is on the rate at which the lowest achievable error can be reduced as larger subsets of S are allowed when constructing an approximant. The new results extend those given for Hilbert spaces by Jones and Barron, including in particular a computationally attractive incremental approximation scheme. Bounds are derived for broad classes of Banach spaces. The techniques used borrow from results regarding moduli of smoothness in functional analysis as well as from the theory of stochastic processes on function spaces.

P. Koiran and E.D. Sontag. Neural networks with quadratic VC dimension. J. Comput. System Sci., 54(1, part 2):190-198, 1997. Note: (1st Annual Dagstuhl Seminar on Neural Computing, 1994). [PDF] [doi:http://dx.doi.org/10.1006/jcss.1997.1479] Keyword(s): machine learning, neural networks, VC dimension. Abstract:

This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles the open question of whether whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.

R. Koplon and E.D. Sontag. Using Fourier-neural recurrent networks to fit sequential input/output data. Neurocomputing, 15:225-248, 1997. [PDF] Keyword(s): machine learning, neural networks, recurrent neural networks. Abstract:

This paper suggests the use of Fourier-type activation functions in fully recurrent neural networks. The main theoretical advantage is that, in principle, the problem of recovering internal coefficients from input/output data is solvable in closed form.

E.D. Sontag. Shattering all sets of k points in `general position' requires (k-1)/2 parameters. Neural Comput., 9(2):337-348, 1997. [PDF] Keyword(s): machine learning, neural networks, VC dimension, real-analytic functions. Abstract:

For classes of concepts defined by certain classes of analytic functions depending on k parameters, there are nonempty open sets of samples of length 2k+2 which cannot be shattered. A slighly weaker result is also proved for piecewise-analytic functions. The special case of neural networks is discussed.

E.D. Sontag and H.J. Sussmann. Complete controllability of continuous-time recurrent neural networks. Systems Control Lett., 30(4):177-183, 1997. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(97)00002-9] Keyword(s): machine learning, neural networks, recurrent neural networks. Abstract:

This paper presents a characterization of controllability for the class of control systems commonly called (continuous-time) recurrent neural networks. The characterization involves a simple condition on the input matrix, and is proved when the activation function is the hyperbolic tangent.

Y. Yang, E.D. Sontag, and H.J. Sussmann. Global stabilization of linear discrete-time systems with bounded feedback. Systems Control Lett., 30(5):273-281, 1997. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(97)00021-2] Keyword(s): discrete-time, saturation, bounded inputs. Abstract:

This paper deals with the problem of global stabilization of linear discrete time systems by means of bounded feedback laws. The main result proved is an analog of one proved for the continuous time case by the authors, and shows that such stabilization is possible if and only if the system is stabilizable with arbitrary controls and the transition matrix has spectral radius less or equal to one. The proof provides in principle an algorithm for the construction of such feedback laws, which can be implemented either as cascades or as parallel connections (``single hidden layer neural networks'') of simple saturation functions.

B. DasGupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. IEEE Trans. Inform. Theory, 42(5):1479-1487, 1996. [PDF] Keyword(s): machine learning, neural networks, VC dimension, recurrent neural networks. Abstract:

Recurrent perceptron classifiers generalize the usual perceptron model. They correspond to linear transformations of input vectors obtained by means of "autoregressive moving-average schemes", or infinite impulse response filters, and allow taking into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. The results are expressed in the context of the theory of probably approximately correct (PAC) learning.

E.D. Sontag. Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets. Adv. Comput. Math., 5(2-3):245-268, 1996. [PDF] Keyword(s): machine learning, subanalytic sets, semianalytic sets, critical points, approximation theory, neural networks, real-analytic functions. Abstract:

This paper deals with nonlinear least-squares problems involving the fitting to data of parameterized analytic functions. For generic regression data, a general result establishes the countability, and under stronger assumptions finiteness, of the set of functions giving rise to critical points of the quadratic loss function. In the special case of what are usually called "single-hidden layer neural networks", which are built upon the standard sigmoidal activation tanh(x) or equivalently 1/(1+exp(-x)), a rough upper bound for this cardinality is provided as well.

B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the complexity of training neural networks with continuous activation functions. IEEE Trans. Neural Networks, 6:1490-1504, 1995. [PDF] Keyword(s): machine learning, neural networks, analog computing, theory of computing, neural networks, computational complexity, machine learning. Abstract:

Blum and Rivest showed that any possible neural net learning algorithm based on fixed architectures faces severe computational barriers. This paper extends their NP-completeness result, which applied only to nets based on hard threshold activations, to nets that employ a particular continuous activation. In view of neural network practice, this is a more relevant result to understanding the limitations of backpropagation and related techniques.

H. T. Siegelmann and E.D. Sontag. On the computational power of neural nets. J. Comput. System Sci., 50(1):132-150, 1995. [PDF] [doi:http://dx.doi.org/10.1006/jcss.1995.1013] Keyword(s): machine learning, neural networks, recurrent neural networks, machine learning, analog computing, theory of computing, neural networks, computational complexity, super-Turing computation. Abstract:

This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a rational-coefficient linear combination of the previous states of all units. We prove that one may simulate all Turing Machines by such nets. In particular, one can simulate any multi-stack Turing Machine in real time, and there is a net made up of 886 processors which computes a universal partial-recursive function. Products (high order nets) are not required, contrary to what had been stated in the literature. Non-deterministic Turing Machines can be simulated by non-deterministic rational nets, also in real time. The simulation result has many consequences regarding the decidability, or more generally the complexity, of questions about recursive nets.

B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the Intractability of Loading Neural Networks. In V. P. Roychowdhury, Siu K. Y., and Orlitsky A., editors, Theoretical Advances in Neural Computation and Learning, pages 357-389. Kluwer Academic Publishers, 1994. [PDF] Keyword(s): analog computing, neural networks, computational complexity, machine learning.

W. Maass, G. Schnitger, and E.D. Sontag. A comparison of the computational power of sigmoid and Boolean threshold circuits. In V. P. Roychowdhury, Siu K. Y., and Orlitsky A., editors, Theoretical Advances in Neural Computation and Learning, pages 127-151. Kluwer Academic Publishers, 1994. [PDF] Keyword(s): machine learning, neural networks, boolean systems. Abstract:

We examine the power of constant depth circuits with sigmoid threshold gates for computing boolean functions. It is shown that, for depth 2, constant size circuits of this type are strictly more powerful than constant size boolean threshold circuits (i.e. circuits with linear threshold gates). On the other hand it turns out that, for any constant depth d, polynomial size sigmoid threshold circuits with polynomially bounded weights compute exactly the same boolean functions as the corresponding circuits with linear threshold gates.

F. Albertini and E.D. Sontag. State observability in recurrent neural networks. Systems Control Lett., 22(4):235-244, 1994. [PDF] [doi:http://dx.doi.org/10.1016/0167-6911(94)90054-X] Keyword(s): machine learning, neural networks, recurrent neural networks, observability, identifiability. Abstract:

This paper concerns recurrent networks x'=s(Ax+Bu), y=Cx, where s is a sigmoid, in both discrete time and continuous time. Our main result is that observability can be characterized, if one assumes certain conditions on the nonlinearity and on the system, in a manner very analogous to that of the linear case. Recall that for the latter, observability is equivalent to the requirement that there not be any nontrivial A-invariant subspace included in the kernel of C. We show that the result generalizes in a natural manner, except that one now needs to restrict attention to certain special "coordinate" subspaces.

H. T. Siegelmann and E.D. Sontag. Analog computation via neural networks. Theoret. Comput. Sci., 131(2):331-360, 1994. [PDF] [doi:http://dx.doi.org/10.1016/0304-3975(94)90178-3] Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks, neural networks, computational complexity. Abstract:

We consider recurrent networks with real-valued weights. If allowed exponential time for computation, they turn out to have unbounded power. However, under polynomial-time constraints there are limits on their capabilities, though being more powerful than Turing Machines. Moreover, there is a precise correspondence between nets and standard non-uniform circuits with equivalent resources, and as a consequence one has lower bound constraints on what they can compute. We note that these networks are not likely to solve polynomially NP-hard problems, as the equality "P=NP" in our model implies the almost complete collapse of the standard polynomial hierarchy. We show that a large class of different networks and dynamical system models have no more computational power than this neural (first-order) model with real weights. The results suggest the following Church-like Thesis of Time-bounded Analog Computing: "Any reasonable analog computer will have no more power (up to polynomial time) than first-order recurrent networks."

H.J. Sussmann, E.D. Sontag, and Y. Yang. A general result on the stabilization of linear systems using bounded controls. IEEE Trans. Automat. Control, 39(12):2411-2425, 1994. [PDF] Keyword(s): saturation, neural networks, global stability, nonlinear stability, bounded inputs. Abstract:

We present two constructions of controllers that globally stabilize linear systems subject to control saturation. We allow essentially arbitrary saturation functions. The only conditions imposed on the system are the obvious necessary ones, namely that no eigenvalues of the uncontrolled system have positive real part and that the standard stabilizability rank condition hold. One of the constructions is in terms of a "neural-network type" one-hidden layer architecture, while the other one is in terms of cascades of linear maps and saturations.

F. Albertini, E.D. Sontag, and V. Maillot. Uniqueness of weights for neural networks. In R. Mammone, editor, Artificial Neural Networks for Speech and Vision, pages 115-125. Chapman and Hall, London, 1993. [PDF] Keyword(s): machine learning, neural networks, recurrent neural networks. Abstract:

In this short expository survey, we sketch various known facts about uniqueness of weights in neural networks, including results about recurrent nets, and we provide a new and elementary complex-variable proof of a uniqueness result that applies in the single hidden layer case.

E.D. Sontag. Neural networks for control. In H. L. Trentelman and J. C. Willems, editors, Essays on control: perspectives in the theory and its applications (Groningen, 1993), volume 14 of Progr. Systems Control Theory, pages 339-380. Birkhäuser Boston, Boston, MA, 1993. Note: A longer version (tech report with more details) is here: http://sontaglab.org/FTPDIR/neural-nets-siemens.pdf. [PDF] Keyword(s): neural networks, recurrent neural networks, machine learning, neural networks. Abstract:

This paper has an expository introduction to two related topics: (a) Some mathematical results regarding "neural networks", and (b) so-called "neurocontrol" and "learning control" (each part can be read independently of the other). It was prepared for a short course given at the 1993 European Control Conference.

F. Albertini and E.D. Sontag. For neural networks, function determines form. Neural Networks, 6(7):975-990, 1993. [PDF] Keyword(s): machine learning, neural networks, identifiability, recurrent neural networks, realization theory, observability, neural networks. Abstract:

This paper shows that the weights of continuous-time feedback neural networks x'=s(Ax+Bu), y=Cx (where s is a sigmoid) are uniquely identifiable from input/output measurements. Under very weak genericity assumptions, the following is true: Assume given two nets, whose neurons all have the same nonlinear activation function s; if the two nets have equal behaviors as "black boxes" then necessarily they must have the same number of neurons and -except at most for sign reversals at each node- the same weights. Moreover, even if the activations are not a priori known to coincide, they are shown to be also essentially determined from the external measurements.

E.D. Sontag. Feedback stabilization using two-hidden-layer nets. IEEE Trans. Neural Networks, 3:981-990, 1992. [PDF] Keyword(s): machine learning, neural networks, feedback stabilization. Abstract:

This paper compares the representational capabilities of one hidden layer and two hidden layer nets consisting of feedforward interconnections of linear threshold units. It is remarked that for certain problems two hidden layers are required, contrary to what might be in principle expected from the known approximation theorems. The differences are not based on numerical accuracy or number of units needed, nor on capabilities for feature extraction, but rather on a much more basic classification into "direct" and "inverse" problems. The former correspond to the approximation of continuous functions, while the latter are concerned with approximating one-sided inverses of continuous functions - and are often encountered in the context of inverse kinematics determination or in control questions. A general result is given showing that nonlinear control systems can be stabilized using two hidden layers, but not in general using just one.

E.D. Sontag. Feedforward nets for interpolation and classification. J. Comput. System Sci., 45(1):20-48, 1992. [PDF] [doi:http://dx.doi.org/10.1016/0022-0000(92)90039-L] Keyword(s): machine learning, neural networks, VC dimension, boolean systems. Abstract:

This paper deals with single-hidden-layer feedforward nets, studying various aspects of classification power and interpolation capability. In particular, a worst-case analysis shows that direct input to output connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows doubling both. For other measures of classification, including the Vapnik-Chervonenkis dimension, the effect of direct connections or sigmoidal activations is studied in the special case of two-dimensional inputs.

E.D. Sontag. Capabilities and training of feedforward nets. In Neural networks (New Brunswick, NJ, 1990), pages 303-321. Academic Press, Boston, MA, 1991. [PDF] Keyword(s): machine learning, machine learning, neural networks. Abstract:

This paper surveys recent work by the author on learning and representational capabilities of feedforward nets. The learning results show that, among two possible variants of the so-called backpropagation training method for sigmoidal nets, both of which variants are used in practice, one is a better generalization of the older perceptron training algorithm than the other. The representation results show that nets consisting of sigmoidal neurons have at least twice the representational capabilities of nets that use classical threshold neurons, at least when this increase is quantified in terms of classification power. On the other hand, threshold nets are shown to be more useful when approximating implicit functions, as illustrated with an application to a typical control problem.

H. T. Siegelmann and E.D. Sontag. Turing computability with neural nets. Appl. Math. Lett., 4(6):77-80, 1991. [PDF] Keyword(s): machine learning, neural networks, computational complexity, recurrent neural networks. Abstract:

This paper shows the existence of a finite neural network, made up of sigmoidal neurons, which simulates a universal Turing machine. It is composed of less than 100,000 synchronously evolving processors, interconnected linearly. High-order connections are not required. (Note: this paper was placed here by special request. The results in this paper have been by now improved considerably: see the JCSS pape which among other aspects provides a polynomial time simulation. This paper, based on a unary encoding, results in an exponential slowdown).

E.D. Sontag and H.J. Sussmann. Back propagation separates where perceptrons do. Neural Networks, 4(2):243-249, 1991. [PDF] [doi:http://dx.doi.org/10.1016/0893-6080(91)90008-S] Keyword(s): machine learning, neural networks. Abstract:

Feedforward nets with sigmoidal activation functions are often designed by minimizing a cost criterion. It has been pointed out before that this technique may be outperformed by the classical perceptron learning rule, at least on some problems. In this paper, we show that no such pathologies can arise if the error criterion is of a threshold LMS type, i.e., is zero for values ``beyond'' the desired target values. More precisely, we show that if the data are linearly separable, and one considers nets with no hidden neurons, then an error function as above cannot have any local minima that are not global. In addition, the proof gives the following stronger result, under the stated hypotheses: the continuous gradient adjustment procedure is such that from any initial weight configuration a separating set of weights is obtained in finite time. This is a precise analogue of the Perceptron Learning Theorem. The results are then compared with the more classical pattern recognition problem of threshold LMS with linear activations, where no spurious local minima exist even for nonseparable data: here it is shown that even if using the threshold criterion, such bad local minima may occur, if the data are not separable and sigmoids are used. keywords = { neural networks , feedforward neural nets },

E.D. Sontag. Sigmoids distinguish more efficiently than Heavisides. Neural Computation, 1:470-472, 1989. [PDF] Keyword(s): machine learning, neural networks, boolean systems. Abstract:

Every dichotomy on a 2k-point set in Rn can be implemented by a neural net with a single hidden layer containing k sigmoidal neurons. If the neurons were of a hardlimiter (Heaviside) type, 2k-1 would be in general needed.

E.D. Sontag and H.J. Sussmann. Backpropagation can give rise to spurious local minima even for networks without hidden layers. Complex Systems, 3(1):91-106, 1989. [PDF] Keyword(s): machine learning, neural networks. Abstract:

We give an example of a neural net without hidden layers and with a sigmoid transfer function, together with a training set of binary vectors, for which the sum of the squared errors, regarded as a function of the weights, has a local minimum which is not a global minimum. The example consists of a set of 125 training instances, with four weights and a threshold to be learnt. We do not know if substantially smaller binary examples exist.

Conference articles

A.C.B de Olivera, M. Siami, and E.D. Sontag. Remarks on the gradient training of linear neural network based feedback for the LQR Problem. In Proc. 2024 63rd IEEE Conference on Decision and Control (CDC), pages 7846-7852, 2024. [PDF] Keyword(s): neural networks, overparametrization, gradient descent, input to state stability, gradient systems, feedback control, LQR. Abstract:

Motivated by the current interest in using Artificial intelligence (AI) tools in control design, this paper takes the first steps towards bridging results from gradient methods for solving the LQR control problem, and neural networks. More specifically, it looks into the case where one wants to find a Linear Feed-Forward Neural Network (LFFNN) that minimizes the Linear Quadratic Regulator (LQR) cost. This work develops gradient formulas that can be used to implement the training of LFFNNs to solve the LQR problem, and derives an important conservation law of the system. This conservation law is then leveraged to prove global convergence of solutions and invariance of the set of stabilizing networks under the training dynamics. These theoretical results are then followed by and extensive analysis of the simplest version of the problem (the ``scalar case'') and by numerical evidence of faster convergence of the training of general LFFNNs when compared to traditional direct gradient methods. These results not only serve as indication of the theoretical value of studying such a problem, but also of the practical value of LFFNNs as design tools for data-driven control applications.

A.C.B de Olivera, M. Siami, and E.D. Sontag. Dynamics and perturbations of overparameterized linear neural networks. In Proc. 2023 62st IEEE Conference on Decision and Control (CDC), pages 7356-7361, 2023. Note: Extended version is On the ISS property of the gradient flow for single hidden-layer neural networks with linear activations, arXiv https://arxiv.org/abs/2305.09904. [PDF] [doi:10.1109/CDC49753.2023.10383478] Keyword(s): neural networks, overparametrization, gradient descent, input to state stability, gradient systems. Abstract:

Recent research in neural networks and machine learning suggests that using many more parameters than strictly required by the initial complexity of a regression problem can result in more accurate or faster-converging models -- contrary to classical statistical belief. This phenomenon, sometimes known as ``benign overfitting'', raises questions regarding in what other ways might overparameterization affect the properties of a learning problem. In this work, we investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation. This uncertainty arises naturally if the gradient is estimated from noisy data or directly measured. Our object of study is a linear neural network with a single, arbitrarily wide, hidden layer and an arbitrary number of inputs and outputs. In this paper we solve the problem for the case where the input and output of our neural-network are one-dimensional, deriving sufficient conditions for robustness of our system based on necessary and sufficient conditions for convergence in the undisturbed case. We then show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized, and discuss directions of future work that might extend our current results for more general formulations.

M. Sznaier, A. Olshevsky, and E.D. Sontag. The role of systems theory in control oriented learning. In Proc. 25th Int. Symp. Mathematical Theory of Networks and Systems (MTNS 2022), 2022. Note: Looks like only the abstract was published!. [PDF] Keyword(s): control oriented learning, neural networks, reinforcement learning, feedback control, machine learning. Abstract:

Systems theory can play an important in unveiling fundamental limitations of learning algorithms and architectures when used to control a dynamical system, and in suggesting strategies for overcoming these limitations. As an example, a feedforward neural network cannot stabilize a double integrator using output feedback. Similarly, a recurrent NN with differentiable activation functions that stabilizes a non-strongly stabilizable system must be itself open loop unstable, a fact that has profound implications for training with noisy, finite data. A potential solution to this problem, motivated by results on stabilization with periodic control, is the use of neural nets with periodic resets, showing that indeed systems theoretic analysis is instrumental in developing architectures capable of controlling certain classes of unstable systems. This short conference paper also argues that when the goal is to learn control oriented models, the loss function should reflect closed loop, rather than open loop model performance, a fact that can be accomplished by using gap-metric motivated loss functions.

T. Natschläger, W. Maass, E.D. Sontag, and A. Zador. Processing of time series by neural circuits with biologically realistic synaptic dynamics. In Todd K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13 (NIPS2000), pages 145-151, 2000. MIT Press, Cambridge. [PDF] Keyword(s): neural networks, Volterra series. Abstract:

Experimental data show that biological synapses are dynamic, i.e., their weight changes on a short time scale by several hundred percent in dependence of the past input to the synapse. In this article we explore the consequences that this synaptic dynamics entails for the computational power of feedforward neural networks. It turns out that even with just a single hidden layer such networks can approximate a surprisingly large large class of nonlinear filters: all filters that can be characterized by Volterra series. This result is robust with regard to various changes in the model for synaptic dynamics. Furthermore we show that simple gradient descent suffices to approximate a given quadratic filter by a rather small neural system with dynamic synapses.

W. Maass and E.D. Sontag. A precise characterization of the class of languages recognized by neural nets under Gaussian and other common noise distributions. In Proceedings of the 1998 conference on Advances in neural information processing systems II, Cambridge, MA, USA, pages 281-287, 1999. MIT Press. [PDF] Keyword(s): machine learning, neural networks.

E.D. Sontag and Y. Qiao. Remarks on controllability of recurrent neural networks. In Proc. IEEE Conf. Decision and Control, Tampa, Dec. 1998, IEEE Publications, 1998, pages 501-506, 1998. Keyword(s): machine learning, neural networks, recurrent neural networks.

E.D. Sontag. Some learning and systems-theoretic questions regarding recurrent neural networks. In Proc. Conf. on Information Sciences and Systems (CISS 97), Johns Hopkins, Baltimore, MD, March 1997, pages 630-635, 1997. Keyword(s): machine learning, neural networks, VC dimension, recurrent neural networks.

B. Dasgupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. In D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 204-210, 1996. MIT Press, Cambridge, MA. Keyword(s): machine learning, neural networks, VC dimension, recurrent neural networks.

P. Koiran and E.D. Sontag. Neural networks with quadratic VC dimension. In D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 197-203, 1996. MIT Press, Cambridge, MA. Keyword(s): machine learning, neural networks, VC dimension.

E.D. Sontag. Critical points for neural net least-squares problems. In Proc. 1995 IEEE Internat. Conf. Neural Networks, IEEE Publications, 1995, pages 2949-2954, 1995. Keyword(s): neural networks.

B. DasGupta, H. T. Siegelmann, and E.D. Sontag. On a learnability question associated to neural networks with continuous activations (extended abstract). In COLT '94: Proceedings of the seventh annual conference on Computational learning theory, New York, NY, USA, pages 47-56, 1994. ACM Press. [doi:http://doi.acm.org/10.1145/180139.181009] Keyword(s): machine learning, analog computing, neural networks, computational complexity.

R. Koplon and E.D. Sontag. Techniques for parameter reconstruction in Fourier-Neural recurrent networks. In Proc. IEEE Conf. Decision and Control, Orlando, Dec. 1994, IEEE Publications, 1994, pages 213-218, 1994. Keyword(s): machine learning, neural networks, recurrent neural networks.

F. Albertini and E.D. Sontag. Identifiability of discrete-time neural networks. In Proc. European Control Conf., Groningen, June 1993, pages 460-465, 1993. Keyword(s): machine learning, neural networks, recurrent neural networks.

F. Albertini and E.D. Sontag. State observability in recurrent neural networks. In Proc. IEEE Conf. Decision and Control, San Antonio, Dec. 1993, IEEE Publications, 1993, pages 3706-3707, 1993. Keyword(s): machine learning, neural networks, observability, recurrent neural networks.

F. Albertini and E.D. Sontag. Uniqueness of weights for recurrent nets. In Systems and Networks: Mathematical Theory and Applications, Proc. MTNS '93, Vol. 2, Akad. Verlag, Regensburg, pages 599-602, 1993. Note: Full version, never submitted for publication, is here: http://sontaglab.org/FTPDIR/93mtns-nn-extended.pdf. [PDF] Keyword(s): machine learning, neural networks, identifiability, recurrent neural networks. Abstract:

This paper concerns recurrent networks x'=s(Ax+Bu), y=Cx, where s is a sigmoid, in both discrete time and continuous time. The paper establishes parameter identifiability under stronger assumptions on the activation than in "For neural networks, function determines form", but on the other hand deals with arbitrary (nonzero) initial states.

J. L. Balcázar, R. Gavaldà, H. T. Siegelmann, and E.D. Sontag. Some structural complexity aspects of neural computation. In Proceedings of the Eighth Annual Structure in Complexity Theory Conference (San Diego, CA, 1993), Los Alamitos, CA, pages 253-265, 1993. IEEE Comput. Soc. Press. [PDF] Keyword(s): machine learning, analog computing, neural networks, computational complexity, super-Turing computation, theory of computing and complexity. Abstract:

Recent work by H.T. Siegelmann and E.D. Sontag (1992) has demonstrated that polynomial time on linear saturated recurrent neural networks equals polynomial time on standard computational models: Turing machines if the weights of the net are rationals, and nonuniform circuits if the weights are real. Here, further connections between the languages recognized by such neural nets and other complexity classes are developed. Connections to space-bounded classes, simulation of parallel computational models such as Vector Machines, and a discussion of the characterizations of various nonuniform classes in terms of Kolmogorov complexity are presented.

C. Darken, M.J. Donahue, L. Gurvits, and E.D. Sontag. Rate of approximation results motivated by robust neural network learning. In COLT '93: Proceedings of the sixth annual conference on Computational learning theory, New York, NY, USA, pages 303-309, 1993. ACM Press. [doi:http://doi.acm.org/10.1145/168304.168357] Keyword(s): machine learning, neural networks, optimization problems, approximation theory.

A. Macintyre and E.D. Sontag. Finiteness results for sigmoidal neural networks. In STOC '93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, New York, NY, USA, pages 325-334, 1993. ACM Press. [PDF] [doi:http://doi.acm.org/10.1145/167088.167192] Keyword(s): machine learning, neural networks, theory of computing and complexity, real-analytic functions. Abstract:

This paper deals with analog circuits. It establishes the finiteness of VC dimension, teaching dimension, and several other measures of sample complexity which arise in learning theory. It also shows that the equivalence of behaviors, and the loading problem, are effectively decidable, modulo a widely believed conjecture in number theory. The results, the first ones that are independent of weight size, apply when the gate function is the "standard sigmoid" commonly used in neural networks research. The proofs rely on very recent developments in the elementary theory of real numbers with exponentiation. (Some weaker conclusions are also given for more general analytic gate functions.) Applications to learnability of sparse polynomials are also mentioned.

H.T. Siegelmann and E.D. Sontag. Analog computation via neural networks. In Proc. 2nd Israel Symposium on Theory of Computing and Systems (ISTCS93), IEEE Computer Society Press, 1993, 1993. Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.

F. Albertini and E.D. Sontag. For neural networks, function determines form. In Proc. IEEE Conf. Decision and Control, Tucson, Dec. 1992, IEEE Publications, 1992, pages 26-31, 1992. Keyword(s): machine learning, neural networks, recurrent neural networks.

H.T. Siegelmann and E.D. Sontag. On the computational power of neural nets. In COLT '92: Proceedings of the fifth annual workshop on Computational learning theory, New York, NY, USA, pages 440-449, 1992. ACM Press. [doi:http://doi.acm.org/10.1145/130385.130432] Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.

H.T. Siegelmann and E.D. Sontag. Some results on computing with neural nets. In Proc. IEEE Conf. Decision and Control, Tucson, Dec. 1992, IEEE Publications, 1992, pages 1476-1481, 1992. Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.

H.T. Siegelmann, E.D. Sontag, and C.L. Giles. The Complexity of Language Recognition by Neural Networks. In Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1, pages 329-335, 1992. North-Holland. Keyword(s): machine learning, neural networks, computational complexity, machine learning, recurrent neural networks, theory of computing and complexity.

E.D. Sontag. Neural nets as systems models and controllers. In Proc. Seventh Yale Workshop on Adaptive and Learning Systems, Yale University, 1992, pages 73-79, 1992. [PDF] Keyword(s): machine learning, neural networks, recurrent neural networks, neural networks. Abstract:

A conference paper. Placed here because it was requested, but contains little that is not also contained in the survey on neural nets mentioned above.

E.D. Sontag. Systems combining linearity and saturations, and relations to neural nets. In Nonlinear Control Systems Design 1992, IFAC Symposia Series, 1993, M. Fliess Ed., Pergamon Press, Oxford, 1993, pages 15-21, 1992. Note: (Also in Proc. Nonlinear Control Systems Design Symp., Bordeaux, June 1992, M. Fliess, Ed., IFAC Publications, pp. 242-247). Keyword(s): machine learning, neural networks, recurrent neural networks.

W. Maass, G. Schnitger, and E.D. Sontag. On the computational power of sigmoid versus Boolean threshold circuits (extended abstract). In Proceedings of the 32nd annual symposium on Foundations of computer science, Los Alamitos, CA, USA, pages 767-776, 1991. IEEE Computer Society Press. Keyword(s): machine learning, neural networks, theory of computing and complexity.

T. Asano, J. Hershberger, J. Pach, E.D. Sontag, D. Souivaine, and S. Suri. Separating bi-chromatic points by parallel lines. In Proceedings of the Second Canadian Conf. on Computational Geometry, Ottawa, Canada, 1990, pages 46-49, 1990. [PDF] Keyword(s): computational geometry. Abstract:

Given a 2-coloring of the vertices of a regular n-gon P, how many parallel lines are needed to separate the vertices into monochromatic subsets? We prove that floor(n/2) is a tight upper bound, and also provide an O(n log n) time algorithm to determine the direction that gives the minimum number of lines. If the polygon is a non-regular convex polygon, then n-3 lines may be necessary, while n-2 lines always suffice. This problem arises in machine learning and has implications about the representational capabilities of some neural networks.

H. Dewan and E.D. Sontag. Extrapolatory methods for speeding up the BP algorithm. In Proc. Int. Joint Conf. on Neural Networks, Washington, DC, Jan. 1990, Lawrence Erlbaum Associates, Inc., Publishers, ISBN 0-8058-0775-6, pages I.613-616, 1990. [PDF] Keyword(s): machine learning, neural networks. Abstract:

We describe a speedup technique that uses extrapolatory methods to predict the weights in a Neural Network using Back Propagation (BP) learning. The method is based on empirical observations of the way the weights change as a function of time. We use numerical function fitting techniques to determine the parameters of an extrapolation function and then use this function to project weights into the future. Significant computational savings result by using the extrapolated weights to jump over many iterations of the standard algorithm, achieving comparable performance with fewer iterations.

E.D. Sontag. Comparing sigmoids and heavisides. In Proc. Conf. Info. Sci. and Systems, Princeton, 1990, pages 654-659, 1990. Keyword(s): machine learning, neural networks, boolean systems.

E.D. Sontag. Remarks on interpolation and recognition using neural nets. In NIPS-3: Proceedings of the 1990 conference on Advances in neural information processing systems 3, San Francisco, CA, USA, pages 939-945, 1990. Morgan Kaufmann Publishers Inc.. Keyword(s): machine learning, neural networks.

E.D. Sontag and H.J. Sussmann. Remarks on local minima in backpropagation. In Proc. Conf. Info. Sciences and Systems, Johns Hopkins University Press, 1989, pages 432-435, 1989. Keyword(s): machine learning, neural networks.

Internal reports

E.D. Sontag. Sigmoids distinguish more efficiently than Heavisides. Technical report SYCON-89-12, Rutgers Center for Systems and Control, 1989. Keyword(s): machine learning, neural networks.

E.D. Sontag. Some remarks on the backpropagation algorithm for neural net learning. Technical report SYCON-88-02, Rutgers Center for Systems and Control, 1988. [PDF] Keyword(s): machine learning, neural networks. Abstract:

This is a very old informal report that discusses the study of local minima of quadratic loss functions for fitting errors in sigmoidal neural net learning. It also includes several remarks concerning the growth of weights during gradient descent. There is nothing very interesting here - far better knowledge is now available - but the report was placed here by request.

BACK TO INDEX

Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.

Last modified: Tue Jul 15 23:04:59 2025
Author: sontag.

This document was translated from BibT_EX by bibtex2html