Publications about 'artificial intelligence'

Publications about 'artificial intelligence'

Books and proceedings

E.D. Sontag. Temas de Inteligencia Artificial. PROLAM, Buenos Aires, 1972. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks. Abstract:

Textbook on Artificial Intelligence. (Libro de texto con introduccion a la inteligencia artificial)

Articles in journal or book chapters

P. Mestres, J. Cortés, and E.D. Sontag. Neural network-based universal formulas for control. Systems and Control Letters, 2025. Note: Submitted. Also arXiv https://arxiv.org/abs/2505.24744. Keyword(s): machine learning, artificial intelligence, control-Lyapunov functions, control barrier functions, universal formulas, neural networks. Abstract:

We study the problem of designing a controller that satisfies an arbitrary number of affine inequalities at every point in the state space. This is motivated by the use of guardrails in autonomous systems. Indeed, a variety of key control objectives, such as stability, safety, and input saturation, are guaranteed by closed-loop systems whose controllers satisfy such inequalities. Many works in the literature design such controllers as the solution to a state-dependent quadratic program (QP) whose constraints are precisely the inequalities. When the input dimension and number of constraints are high, computing a solution of this QP in real time can become computationally burdensome. Additionally, the solution of such optimization problems is not smooth in general, which can degrade the performance of the system. This paper provides a novel method to design a smooth controller that satisfies an arbitrary number of affine constraints. This why we refer to it as a universal formula for control. The controller is given at every state as the minimizer of a strictly convex function. To avoid computing the minimizer of such function in real time, we introduce a method based on neural networks (NN) to approximate the controller. Remarkably, this NN can be used to solve the controller design problem for any task with less than a fixed input dimension and number of affine constraints, and is completely independent of the state dimension. Additionally, we show that the NN-based controller only needs to be trained with datapoints from a compact set in the state space, which significantly simplifies the training process. Various simulations showcase the performance of the proposed solution, and also show that the NN-based controller can be used to warmstart an optimization scheme that refines the approximation of the true controller in real time, significantly reducing the computational cost compared to a generic initialization.

E.D. Sontag. Some remarks on gradient dominance and LQR policy optimization. arXiv 2507.10452, 2025. [PDF] [doi:https://doi.org/10.48550/arXiv.2507.10452] Keyword(s): gradient dominance, gradient flows, LQR, reinforcement learning, machine learning, artificial intelligence, optimal control. Abstract:

Solutions of optimization problems, including policy optimization in reinforcement learning, typically rely upon some variant of gradient descent. There has been much recent work in the machine learning, control, and optimization communities applying the Polyak-Åojasiewicz Inequality (PLI) to such problems in order to establish an exponential rate of convergence (a.k.a. ``linear convergence'' in the local-iteration language of numerical analysis) of loss functions to their minima under the gradient flow. Often, as is the case of policy iteration for the continuous-time LQR problem, this rate vanishes for large initial conditions, resulting in a mixed globally linear / locally exponential behavior. This is in sharp contrast with the discrete-time LQR problem, where there is global exponential convergence. That gap between CT and DT behaviors motivates the search for various generalized PLI-like conditions, and this paper addresses that topic. Moreover, these generalizations are key to understanding the transient and asymptotic effects of errors in the estimation of the gradient, errors which might arise from adversarial attacks, wrong evaluation by an oracle, early stopping of a simulation, inaccurate and very approximate digital twins, stochastic computations (algorithm ``reproducibility''), or learning by sampling from limited data. We describe an ``input to state stability'' (ISS) analysis of this issue. We also discuss convergence and PLI-like properties of ``linear feedforward neural networks'' in feedback control. Much of the work described here was done in collaboration with Arthur Castello B. de Oliveira, Leilei Cui, Zhong-Ping Jiang, and Milad Siami. This is a short paper summarizing the slides presented at my keynote at the 2025 L4DC (Learning for Dynamics \& Control Conference) in Ann Arbor, Michigan, 05 June 2025. A partial bibliography has been added.

A.C.B de Oliveira, M. Siami, and E.D. Sontag. Convergence analysis of overparametrized LQR formulations. Automatica, 182:112504, 2025. Note: Version with more details in arXiv 2408.15456. [PDF] Keyword(s): machine learning, artificial intelligence, learning theory, singularities in optimization, gradient systems, overparametrization, neural networks, overparametrization, gradient descent, input to state stability, feedback control, LQR. Abstract:

Motivated by the growing use of Artificial Intelligence (AI) tools in control design, this paper takes the first steps towards bridging the gap between results from Direct Gradient methods for the Linear Quadratic Regulator (LQR), and neural networks. More specifically, it looks into the case where one wants to find a Linear Feed-Forward Neural Network (LFFNN) feedback that minimizes a LQR cost. This paper starts by computing the gradient formulas for the parameters of each layer, which are used to derive a key conservation law of the system. This conservation law is then leveraged to prove boundedness and global convergence of solutions to critical points, and invariance of the set of stabilizing networks under the training dynamics. This is followed by an analysis of the case where the LFFNN has a single hidden layer. For this case, the paper proves that the training converges not only to critical points but to the optimal feedback control law for all but a set of measure-zero of the initializations. These theoretical results are followed by an extensive analysis of a simple version of the problem (the ``vector case''), proving the theoretical properties of accelerated convergence and robustness for this simpler example. Finally, the paper presents numerical evidence of faster convergence of the training of general LFFNNs when compared to traditional direct gradient methods, showing that the acceleration of the solution is observable even when the gradient is not explicitly computed but estimated from evaluations of the cost function.

L. Cui, Z.P. Jiang, and E. D. Sontag. Small-disturbance input-to-state stability of perturbed gradient flows: Applications to LQR problem. Systems and Control Letters, 188:105804, 2024. [PDF] [doi:https://doi.org/10.1016/j.sysconle.2024.105804] Keyword(s): machine learning, artificial intelligence, gradient systems, direct optimization, input-to-state stability, ISS. Abstract:

This paper studies the effect of perturbations on the gradient flow of a general constrained nonlinear programming problem, where the perturbation may arise from inaccurate gradient estimation in the setting of data-driven optimization. Under suitable conditions on the objective function, the perturbed gradient flow is shown to be small-disturbance input-to-state stable (ISS), which implies that, in the presence of a small-enough perturbation, the trajectory of the perturbed gradient flow must eventually enter a small neighborhood of the optimum. This work was motivated by the question of robustness of direct methods for the linear quadratic regulator problem, and specifically the analysis of the effect of perturbations caused by gradient estimation or round-off errors in policy optimization. Interestingly, we show small-disturbance ISS for three of the most common optimization algorithms: standard gradient flow, natural gradient flow, and Newton gradient flow.

M. D. Kvalheim and E. D. Sontag. Why should autoencoders work?. Transactions on Machine Learning Research, 2024. Note: See also 2023 preprint in https://arxiv.org/abs/2310.02250.[WWW] [PDF] Keyword(s): machine learning, artificial intelligence, autoencoders, neural networks, differential topology, model reduction. Abstract:

Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a k-dimensional subset K of an input Euclidean space $\R^n$. The underlying idea is to obtain both an encoding layer that maps $\R^n$ into $\R^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\R^k$ back into $\R^n$, in such a way that the input data from the set K is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that K is homeomorphic to a k-dimensional subset of $\R^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to "work" well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential geometry. A computational example is also included to illustrate the ideas.

J. Hanson, M. Raginsky, and E.D. Sontag. Learning recurrent neural net models of nonlinear systems. Proc. of Machine Learning Research, 144:1-11, 2021. [PDF] Keyword(s): machine learning, artificial intelligence, empirical risk minimization, recurrent neural networks, dynamical systems, continuous time, system identification, statistical learning theory, generalization bounds. Abstract:

This paper considers the following learning problem: given sample pairs of input and output signals generated by an unknown nonlinear system (which is not assumed to be causal or time-invariant), one wishes to find a continuous-time recurrent neural net, with activation function tanh, that approximately reproduces the underlying i/o behavior with high confidence. Leveraging earlier work concerned with matching derivatives up to a finite order of the input and output signals the problem is reformulated in familiar system-theoretic language and quantitative guarantees on the sup-norm risk of the learned model are derived, in terms of the number of neurons, the sample size, the number of derivatives being matched, and the regularity properties of the inputs, the outputs, and the unknown i/o map.

W. Maass, P. Joshi, and E.D. Sontag. Computational aspects of feedback in neural circuits. PLoS Computational Biology, 3:e165 1-20, 2007. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, feedback linearization, computation by cortical microcircuits, fading memory. Abstract:

It had previously been shown that generic cortical microcircuit models can perform complex real-time computations on continuous input streams, provided that these computations can be carried out with a rapidly fading memory. We investigate in this article the computational capability of such circuits in the more realistic case where not only readout neurons, but in addition a few neurons within the circuit have been trained for specific tasks. This is essentially equivalent to the case where the output of trained readout neurons is fed back into the circuit. We show that this new model overcomes the limitation of a rapidly fading memory. In fact, we prove that in the idealized case without noise it can carry out any conceivable digital or analog computation on time-varying inputs. But even with noise the resulting computational model can perform a large class of biologically relevant real-time computations that require a non-fading memory.

W. Maass, P. Joshi, and E.D. Sontag. Principles of real-time computing with feedback applied to cortical microcircuit models. In Advances in Neural Information Processing Systems 18. MIT Press, Cambridge, 2006. Note: Proc. NIPS(NeurIPS)-18, Vancouver 2005, https://proceedings.neurips.cc/paper/2005. [PDF] Keyword(s): NeurIPS, machine learning, artificial intelligence, neural networks. Abstract:

The network topology of neurons in the brain exhibits an abundance of feedback connections, but the computational function of these feedback connections is largely unknown. We present a computational theory that characterizes the gain in computational power achieved through feedback in dynamical systems with fading memory. It implies that many such systems acquire through feedback universal computational capabilities for analog computing with a non-fading memory. In particular, we show that feedback enables such systems to process time-varying input streams in diverse ways according to rules that are implemented through internal states of the dynamical system. In contrast to previous attractor-based computational models for neural networks, these flexible internal states are high-dimensional attractors of the circuit dynamics, that still allow the circuit state to absorb new information from online input streams. In this way one arrives at novel models for working memory, integration of evidence, and reward expectation in cortical circuits. We show that they are applicable to circuits of conductance-based Hodgkin-Huxley (HH) neurons with high levels of noise that reflect experimental data on invivo conditions.

P. Kuusela, D. Ocone, and E.D. Sontag. Learning Complexity Dimensions for a Continuous-Time Control System. SIAM J. Control Optim., 43(3):872-898, 2004. [PDF] [doi:http://dx.doi.org/10.1137/S0363012901384302] Keyword(s): machine learning, artificial intelligence, theory of computing and complexity, VC dimension, neural networks. Abstract:

This paper takes a computational learning theory approach to a problem of linear systems identification. It is assumed that input signals have only a finite number k of frequency components, and systems to be identified have dimension no greater than n. The main result establishes that the sample complexity needed for identification scales polynomially with n and logarithmically with k.

W. Maass and E.D. Sontag. Neural Systems as Nonlinear Filters. Neural Computation, 12(8):1743-1772, 2000. [PDF] [doi:http://dx.doi.org/10.1162/089976600300015123] Keyword(s): machine learning, artificial intelligence, neural networks, Volterra series. Abstract:

We analyze computations on temporal patterns and spatio-temporal patterns in formal network models whose temporal dynamics arises from empirically established quantitative models for short term dynamics at biological synapses. We give a complete characterization of all linear and nonlinear filters that can be approximated by such dynamic network models: it is the class of all filters that can be approximated by Volterra series. This characterization is shown to be rather stable with regard to changes in the model. For example it is shown that synaptic facilitation and one layer of neurons suffices for approximating arbitrary filters from this class. Our results provide a new complexity hierarchy for all filters that are approximable by Volterra series, which appears to be closer related to the actual cost of implementing such filters in neural hardware than preceding complexity measures. Our results also provide a new parameterization for approximations to such filters in terms of parameters that are arguable related to those that are tunable in biological neural systems.

W. Maass and E.D. Sontag. Analog neural nets with Gaussian or other common noise distributions cannot recognize arbitrary regular languages. Neural Computation, 11(3):771-782, 1999. [PDF] [doi:http://dx.doi.org/10.1162/089976699300016656] Keyword(s): machine learning, artificial intelligence, neural networks. Abstract:

We consider recurrent analog neural nets where the output of each gate is subject to Gaussian noise, or any other common noise distribution that is nonzero on a large set. We show that many regular languages cannot be recognized by networks of this type, and we give a precise characterization of those languages which can be recognized. This result implies severe constraints on possibilities for constructing recurrent analog neural nets that are robust against realistic types of analog noise. On the other hand we present a method for constructing feedforward analog neural nets that are robust with regard to analog noise of this type.

E.D. Sontag and Y. Qiao. Further results on controllability of recurrent neural networks. Systems Control Lett., 36(2):121-129, 1999. [PDF] Keyword(s): machine learning, artificial intelligence, controllability, recurrent neural networks, neural networks. Abstract:

This paper studies controllability properties of recurrent neural networks. The new contributions are: (1) an extension of the result in "Complete controllability of continuous-time recurrent neural networks" to a slightly different model, where inputs appear in an affine form, (2) a formulation and proof of a necessary and sufficient condition, in terms of local-local controllability, and (3) a complete analysis of the 2-dimensional case for which the hypotheses made in previous work do not apply.

E.D. Sontag. Automata and neural networks. In The handbook of brain theory and neural networks, pages 119-122. MIT Press, Cambridge, MA, USA, 1998. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks.

E.D. Sontag. VC dimension of neural networks. In C.M. Bishop, editor, Neural Networks and Machine Learning, pages 69-95. Springer, Berlin, 1998. [PDF] Keyword(s): machine learning, artificial intelligence, VC dimension, learning, neural networks, shattering. Abstract:

The Vapnik-Chervonenkis (VC) dimension is an integer which helps to characterize distribution-independent learning of binary concepts from positive and negative samples. This paper, based on lectures delivered at the Isaac Newton Institute in August of 1997, presents a brief introduction, establishes various elementary results, and discusses how to estimate the VC dimension in several examples of interest in neural network theory. (It does not address the learning and estimation-theoretic applications of VC dimension, and the applications to uniform convergence theorems for empirical probabilities, for which many suitable references are available.)

P. Koiran and E.D. Sontag. Vapnik-Chervonenkis dimension of recurrent neural networks. Discrete Applied Mathematics, 86(1):63-79, 1998. [PDF] [doi:http://dx.doi.org/10.1016/S0166-218X(98)00014-6] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks. Abstract:

This paper provides lower and upper bounds for the VC dimension of recurrent networks. Several types of activation functions are discussed, including threshold, polynomial, piecewise-polynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. Ignoring multiplicative constants, the main results say roughly the following: 1. For architectures whose activation is any fixed nonlinear polynomial, the VC dimension is proportional to wk. 2. For architectures whose activation is any fixed piecewise polynomial, the VC dimension is between wk and w**2k. 3. For architectures with threshold activations, the VC dimension is between wlog(k/w) and the smallest of wklog(wk) and w**2+wlog(wk). 4. For the standard sigmoid tanh(x), the VC dimension is between wk and w**4 k**2.

E.D. Sontag. A learning result for continuous-time recurrent neural networks. Systems Control Lett., 34(3):151-158, 1998. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(98)00006-1] Keyword(s): machine learning, artificial intelligence, neural networks, VC dimension, recurrent neural networks. Abstract:

The following learning problem is considered, for continuous-time recurrent neural networks having sigmoidal activation functions. Given a ``black box'' representing an unknown system, measurements of output derivatives are collected, for a set of randomly generated inputs, and a network is used to approximate the observed behavior. It is shown that the number of inputs needed for reliable generalization (the sample complexity of the learning problem) is upper bounded by an expression that grows polynomially with the dimension of the network and logarithmically with the number of output derivatives being matched.

P. Koiran and E.D. Sontag. Vapnik-Chervonenkis dimension of recurrent neural networks. In Computational learning theory (Jerusalem, 1997), volume 1208 of Lecture Notes in Comput. Sci., pages 223-237. Springer-Verlag, London, UK, 1997. Keyword(s): machine learning, artificial intelligence, neural networks, VC dimension, recurrent neural networks.

E.D. Sontag. Recurrent neural networks: Some systems-theoretic aspects. In M. Karny, K. Warwick, and V. Kurkova, editors, Dealing with Complexity: a Neural Network Approach, pages 1-12. Springer-Verlag, London, 1997. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks, learning, VC dimension. Abstract:

This paper provides an exposition of some recent results regarding system-theoretic aspects of continuous-time recurrent (dynamic) neural networks with sigmoidal activation functions. The class of systems is introduced and discussed, and a result is cited regarding their universal approximation properties. Known characterizations of controllability, observability, and parameter identifiability are reviewed, as well as a result on minimality. Facts regarding the computational power of recurrent nets are also mentioned.

M. J. Donahue, L. Gurvits, C. Darken, and E.D. Sontag. Rates of convex approximation in non-Hilbert spaces. Constr. Approx., 13(2):187-220, 1997. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, optimization, approximation theory. Abstract:

This paper deals with sparse approximations by means of convex combinations of elements from a predetermined "basis" subset S of a function space. Specifically, the focus is on the rate at which the lowest achievable error can be reduced as larger subsets of S are allowed when constructing an approximant. The new results extend those given for Hilbert spaces by Jones and Barron, including in particular a computationally attractive incremental approximation scheme. Bounds are derived for broad classes of Banach spaces. The techniques used borrow from results regarding moduli of smoothness in functional analysis as well as from the theory of stochastic processes on function spaces.

P. Koiran and E.D. Sontag. Neural networks with quadratic VC dimension. J. Comput. System Sci., 54(1, part 2):190-198, 1997. Note: (1st Annual Dagstuhl Seminar on Neural Computing, 1994). [PDF] [doi:http://dx.doi.org/10.1006/jcss.1997.1479] Keyword(s): machine learning, artificial intelligence, neural networks, VC dimension. Abstract:

This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles the open question of whether whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed.

R. Koplon and E.D. Sontag. Using Fourier-neural recurrent networks to fit sequential input/output data. Neurocomputing, 15:225-248, 1997. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks. Abstract:

This paper suggests the use of Fourier-type activation functions in fully recurrent neural networks. The main theoretical advantage is that, in principle, the problem of recovering internal coefficients from input/output data is solvable in closed form.

E.D. Sontag. Shattering all sets of k points in `general position' requires (k-1)/2 parameters. Neural Computation, 9(2):337-348, 1997. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, VC dimension, real-analytic functions. Abstract:

For classes of concepts defined by certain classes of analytic functions depending on k parameters, there are nonempty open sets of samples of length 2k+2 which cannot be shattered. A slighly weaker result is also proved for piecewise-analytic functions. The special case of neural networks is discussed.

E.D. Sontag and H.J. Sussmann. Complete controllability of continuous-time recurrent neural networks. Systems Control Lett., 30(4):177-183, 1997. [PDF] [doi:http://dx.doi.org/10.1016/S0167-6911(97)00002-9] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks. Abstract:

This paper presents a characterization of controllability for the class of control systems commonly called (continuous-time) recurrent neural networks. The characterization involves a simple condition on the input matrix, and is proved when the activation function is the hyperbolic tangent.

B. DasGupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. IEEE Trans. Inform. Theory, 42(5):1479-1487, 1996. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, VC dimension, recurrent neural networks. Abstract:

Recurrent perceptron classifiers generalize the usual perceptron model. They correspond to linear transformations of input vectors obtained by means of "autoregressive moving-average schemes", or infinite impulse response filters, and allow taking into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. The results are expressed in the context of the theory of probably approximately correct (PAC) learning.

E.D. Sontag. Critical points for least-squares problems involving certain analytic functions, with applications to sigmoidal nets. Adv. Comput. Math., 5(2-3):245-268, 1996. [PDF] Keyword(s): machine learning, artificial intelligence, subanalytic sets, semianalytic sets, critical points, approximation theory, neural networks, real-analytic functions. Abstract:

This paper deals with nonlinear least-squares problems involving the fitting to data of parameterized analytic functions. For generic regression data, a general result establishes the countability, and under stronger assumptions finiteness, of the set of functions giving rise to critical points of the quadratic loss function. In the special case of what are usually called "single-hidden layer neural networks", which are built upon the standard sigmoidal activation tanh(x) or equivalently 1/(1+exp(-x)), a rough upper bound for this cardinality is provided as well.

B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the complexity of training neural networks with continuous activation functions. IEEE Trans. Neural Networks, 6:1490-1504, 1995. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, analog computing, theory of computing, neural networks, computational complexity, machine learning. Abstract:

Blum and Rivest showed that any possible neural net learning algorithm based on fixed architectures faces severe computational barriers. This paper extends their NP-completeness result, which applied only to nets based on hard threshold activations, to nets that employ a particular continuous activation. In view of neural network practice, this is a more relevant result to understanding the limitations of backpropagation and related techniques.

H. T. Siegelmann and E.D. Sontag. On the computational power of neural nets. J. Computer System Sciences, 50(1):132-150, 1995. [PDF] [doi:http://dx.doi.org/10.1006/jcss.1995.1013] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks, machine learning, analog computing, theory of computing, neural networks, computational complexity, super-Turing computation. Abstract:

This paper deals with finite size networks which consist of interconnections of synchronously evolving processors. Each processor updates its state by applying a "sigmoidal" function to a rational-coefficient linear combination of the previous states of all units. We prove that one may simulate all Turing Machines by such nets. In particular, one can simulate any multi-stack Turing Machine in real time, and there is a net made up of 886 processors which computes a universal partial-recursive function. Products (high order nets) are not required, contrary to what had been stated in the literature. Non-deterministic Turing Machines can be simulated by non-deterministic rational nets, also in real time. The simulation result has many consequences regarding the decidability, or more generally the complexity, of questions about recursive nets.

B. DasGupta, H.T. Siegelmann, and E.D. Sontag. On the Intractability of Loading Neural Networks. In V. P. Roychowdhury, Siu K. Y., and Orlitsky A., editors, Theoretical Advances in Neural Computation and Learning, pages 357-389. Kluwer Academic Publishers, 1994. [PDF] Keyword(s): machine learning, artificial intelligence, analog computing, neural networks, computational complexity, machine learning.

W. Maass, G. Schnitger, and E.D. Sontag. A comparison of the computational power of sigmoid and Boolean threshold circuits. In V. P. Roychowdhury, Siu K. Y., and Orlitsky A., editors, Theoretical Advances in Neural Computation and Learning, pages 127-151. Kluwer Academic Publishers, 1994. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, boolean systems. Abstract:

We examine the power of constant depth circuits with sigmoid threshold gates for computing boolean functions. It is shown that, for depth 2, constant size circuits of this type are strictly more powerful than constant size boolean threshold circuits (i.e. circuits with linear threshold gates). On the other hand it turns out that, for any constant depth d, polynomial size sigmoid threshold circuits with polynomially bounded weights compute exactly the same boolean functions as the corresponding circuits with linear threshold gates.

F. Albertini and E.D. Sontag. State observability in recurrent neural networks. Systems Control Lett., 22(4):235-244, 1994. [PDF] [doi:http://dx.doi.org/10.1016/0167-6911(94)90054-X] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks, observability, identifiability. Abstract:

This paper concerns recurrent networks x'=s(Ax+Bu), y=Cx, where s is a sigmoid, in both discrete time and continuous time. Our main result is that observability can be characterized, if one assumes certain conditions on the nonlinearity and on the system, in a manner very analogous to that of the linear case. Recall that for the latter, observability is equivalent to the requirement that there not be any nontrivial A-invariant subspace included in the kernel of C. We show that the result generalizes in a natural manner, except that one now needs to restrict attention to certain special "coordinate" subspaces.

H. T. Siegelmann and E.D. Sontag. Analog computation via neural networks. Theoretical Computer Science, 131(2):331-360, 1994. [PDF] [doi:http://dx.doi.org/10.1016/0304-3975(94)90178-3] Keyword(s): analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks, machine learning, artificial intelligence, neural networks, computational complexity. Abstract:

We consider recurrent networks with real-valued weights. If allowed exponential time for computation, they turn out to have unbounded power. However, under polynomial-time constraints there are limits on their capabilities, though being more powerful than Turing Machines. Moreover, there is a precise correspondence between nets and standard non-uniform circuits with equivalent resources, and as a consequence one has lower bound constraints on what they can compute. We note that these networks are not likely to solve polynomially NP-hard problems, as the equality "P=NP" in our model implies the almost complete collapse of the standard polynomial hierarchy. We show that a large class of different networks and dynamical system models have no more computational power than this neural (first-order) model with real weights. The results suggest the following Church-like Thesis of Time-bounded Analog Computing: "Any reasonable analog computer will have no more power (up to polynomial time) than first-order recurrent networks."

F. Albertini, E.D. Sontag, and V. Maillot. Uniqueness of weights for neural networks. In R. Mammone, editor, Artificial Neural Networks for Speech and Vision, pages 115-125. Chapman and Hall, London, 1993. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks. Abstract:

In this short expository survey, we sketch various known facts about uniqueness of weights in neural networks, including results about recurrent nets, and we provide a new and elementary complex-variable proof of a uniqueness result that applies in the single hidden layer case.

E.D. Sontag. Neural networks for control. In H. L. Trentelman and J. C. Willems, editors, Essays on control: perspectives in the theory and its applications (Groningen, 1993), volume 14 of Progr. Systems Control Theory, pages 339-380. Birkhäuser Boston, Boston, MA, 1993. Note: A longer version (tech report with more details) is here: http://sontaglab.org/FTPDIR/neural-nets-siemens.pdf. [PDF] [doi:https://doi.org/10.1007/978-1-4612-0313-1_10] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks, machine learning, neural networks. Abstract:

This paper has an expository introduction to two related topics: (a) Some mathematical results regarding "neural networks", and (b) so-called "neurocontrol" and "learning control" (each part can be read independently of the other). It was prepared for a short course given at the 1993 European Control Conference.

F. Albertini and E.D. Sontag. For neural networks, function determines form. Neural Networks, 6(7):975-990, 1993. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, identifiability, recurrent neural networks, realization theory, observability, neural networks. Abstract:

This paper shows that the weights of continuous-time feedback neural networks x'=s(Ax+Bu), y=Cx (where s is a sigmoid) are uniquely identifiable from input/output measurements. Under very weak genericity assumptions, the following is true: Assume given two nets, whose neurons all have the same nonlinear activation function s; if the two nets have equal behaviors as "black boxes" then necessarily they must have the same number of neurons and -except at most for sign reversals at each node- the same weights. Moreover, even if the activations are not a priori known to coincide, they are shown to be also essentially determined from the external measurements.

E.D. Sontag. Feedback stabilization using two-hidden-layer nets. IEEE Trans. Neural Networks, 3:981-990, 1992. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, feedback stabilization. Abstract:

This paper compares the representational capabilities of one hidden layer and two hidden layer nets consisting of feedforward interconnections of linear threshold units. It is remarked that for certain problems two hidden layers are required, contrary to what might be in principle expected from the known approximation theorems. The differences are not based on numerical accuracy or number of units needed, nor on capabilities for feature extraction, but rather on a much more basic classification into "direct" and "inverse" problems. The former correspond to the approximation of continuous functions, while the latter are concerned with approximating one-sided inverses of continuous functions - and are often encountered in the context of inverse kinematics determination or in control questions. A general result is given showing that nonlinear control systems can be stabilized using two hidden layers, but not in general using just one.

E.D. Sontag. Feedforward nets for interpolation and classification. J. Comput. System Sci., 45(1):20-48, 1992. [PDF] [doi:http://dx.doi.org/10.1016/0022-0000(92)90039-L] Keyword(s): machine learning, artificial intelligence, neural networks, VC dimension, boolean systems. Abstract:

This paper deals with single-hidden-layer feedforward nets, studying various aspects of classification power and interpolation capability. In particular, a worst-case analysis shows that direct input to output connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows doubling both. For other measures of classification, including the Vapnik-Chervonenkis dimension, the effect of direct connections or sigmoidal activations is studied in the special case of two-dimensional inputs.

E.D. Sontag. Capabilities and training of feedforward nets. In Neural networks (New Brunswick, NJ, 1990), pages 303-321. Academic Press, Boston, MA, 1991. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks. Abstract:

This paper surveys recent work by the author on learning and representational capabilities of feedforward nets. The learning results show that, among two possible variants of the so-called backpropagation training method for sigmoidal nets, both of which variants are used in practice, one is a better generalization of the older perceptron training algorithm than the other. The representation results show that nets consisting of sigmoidal neurons have at least twice the representational capabilities of nets that use classical threshold neurons, at least when this increase is quantified in terms of classification power. On the other hand, threshold nets are shown to be more useful when approximating implicit functions, as illustrated with an application to a typical control problem.

H. T. Siegelmann and E.D. Sontag. Turing computability with neural nets. Applied Mathematics Letters, 4(6):77-80, 1991. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, computational complexity, recurrent neural networks. Abstract:

This paper shows the existence of a finite neural network, made up of sigmoidal neurons, which simulates a universal Turing machine. It is composed of less than 100,000 synchronously evolving processors, interconnected linearly. High-order connections are not required. (Note: this paper was placed here by special request. The results in this paper have been by now improved considerably: see the JCSS pape which among other aspects provides a polynomial time simulation. This paper, based on a unary encoding, results in an exponential slowdown).

E.D. Sontag and H.J. Sussmann. Back propagation separates where perceptrons do. Neural Networks, 4(2):243-249, 1991. [PDF] [doi:http://dx.doi.org/10.1016/0893-6080(91)90008-S] Keyword(s): machine learning, artificial intelligence, neural networks. Abstract:

Feedforward nets with sigmoidal activation functions are often designed by minimizing a cost criterion. It has been pointed out before that this technique may be outperformed by the classical perceptron learning rule, at least on some problems. In this paper, we show that no such pathologies can arise if the error criterion is of a threshold LMS type, i.e., is zero for values ``beyond'' the desired target values. More precisely, we show that if the data are linearly separable, and one considers nets with no hidden neurons, then an error function as above cannot have any local minima that are not global. In addition, the proof gives the following stronger result, under the stated hypotheses: the continuous gradient adjustment procedure is such that from any initial weight configuration a separating set of weights is obtained in finite time. This is a precise analogue of the Perceptron Learning Theorem. The results are then compared with the more classical pattern recognition problem of threshold LMS with linear activations, where no spurious local minima exist even for nonseparable data: here it is shown that even if using the threshold criterion, such bad local minima may occur, if the data are not separable and sigmoids are used. keywords = { neural networks , feedforward neural nets },

E.D. Sontag. Sigmoids distinguish more efficiently than Heavisides. Neural Computation, 1:470-472, 1989. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, boolean systems. Abstract:

Every dichotomy on a 2k-point set in Rn can be implemented by a neural net with a single hidden layer containing k sigmoidal neurons. If the neurons were of a hardlimiter (Heaviside) type, 2k-1 would be in general needed.

E.D. Sontag and H.J. Sussmann. Backpropagation can give rise to spurious local minima even for networks without hidden layers. Complex Systems, 3(1):91-106, 1989. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks. Abstract:

We give an example of a neural net without hidden layers and with a sigmoid transfer function, together with a training set of binary vectors, for which the sum of the squared errors, regarded as a function of the weights, has a local minimum which is not a global minimum. The example consists of a set of 125 training instances, with four weights and a threshold to be learnt. We do not know if substantially smaller binary examples exist.

Conference articles

M.K. Wafi, A.C.B de Oliveira, and E.D. Sontag. On the (almost) global exponential convergence of overparameterized policy optimization for the LQR problem. In 2026 American Control Conference (ACC), 2025. Note: Submitted. Also arXiv:2510.02140. [PDF] Keyword(s): machine learning, artificial intelligence, gradient dominance, gradient flows, LQR, reinforcement learning. Abstract:

In this work we study the convergence of gradient methods for nonconvex optimization problems -- specifically the effect of the problem formulation to the convergence behavior of the solution of a gradient flow. We show through a simple example that, surprisingly, the gradient flow solution can be exponentially or asymptotically convergent, depending on how the problem is formulated. We then deepen the analysis and show that a policy optimization strategy for the continuous-time linear quadratic regulator (LQR) (which is known to present only asymptotic convergence globally) presents almost global exponential convergence if the problem is overparameterized through a linear feed-forward neural network (LFFNN). We prove this qualitative improvement always happens for a simplified version of the LQR problem and derive explicit convergence rates for the gradient flow. Finally, we show that both the qualitative improvement and the quantitative rate gains persist in the general LQR through numerical simulations.

A.C.B de Oliveira, L. Cui, and E. D. Sontag. Remarks on the Polyak-Lojasiewicz inequality and the convergence of gradient systems. In Proc. 64th IEEE Conference on Decision and Control (CDC), 2025. Note: To appear. Extended version in arXiv:2503.23641. [PDF] [doi:https://doi.org/10.48550/arXiv.2503.23641] Keyword(s): machine learning, artificial intelligence, gradient dominance, gradient flows, LQR, reinforcement learning. Abstract:

This work explores generalizations of the Polyak-Lojasiewicz inequality (PLI) and their implications for the convergence behavior of gradient flows in optimization problems. Motivated by the continuous-time linear quadratic regulator (CT-LQR) policy optimization problem -- where only a weaker version of the PLI is characterized in the literature -- this work shows that while weaker conditions are sufficient for global convergence to, and optimality of the set of critical points of the cost function, the "profile" of the gradient flow solution can change significantly depending on which "flavor" of inequality the cost satisfies. After a general theoretical analysis, we focus on fitting the CT-LQR policy optimization problem to the proposed framework, showing that, in fact, it can never satisfy a PLI in its strongest form. We follow up our analysis with a brief discussion on the difference between continuous- and discrete-time LQR policy optimization, and end the paper with some intuition on the extension of this framework to optimization problems with L1 regularization and solved through proximal gradient flows.

A.C.B de Oliveira, M. Siami, and E.D. Sontag. Remarks on the gradient training of linear neural network based feedback for the LQR Problem. In Proc. 2024 63rd IEEE Conference on Decision and Control (CDC), pages 7846-7852, 2024. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, overparametrization, gradient descent, input to state stability, gradient systems, feedback control, LQR. Abstract:

Motivated by the current interest in using Artificial intelligence (AI) tools in control design, this paper takes the first steps towards bridging results from gradient methods for solving the LQR control problem, and neural networks. More specifically, it looks into the case where one wants to find a Linear Feed-Forward Neural Network (LFFNN) that minimizes the Linear Quadratic Regulator (LQR) cost. This work develops gradient formulas that can be used to implement the training of LFFNNs to solve the LQR problem, and derives an important conservation law of the system. This conservation law is then leveraged to prove global convergence of solutions and invariance of the set of stabilizing networks under the training dynamics. These theoretical results are then followed by and extensive analysis of the simplest version of the problem (the ``scalar case'') and by numerical evidence of faster convergence of the training of general LFFNNs when compared to traditional direct gradient methods. These results not only serve as indication of the theoretical value of studying such a problem, but also of the practical value of LFFNNs as design tools for data-driven control applications.

T. Natschläger, W. Maass, E.D. Sontag, and A. Zador. Processing of time series by neural circuits with biologically realistic synaptic dynamics. In Todd K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13 (NIPS2000), pages 145-151, 2000. MIT Press, Cambridge. Note: Proc. NIPS(NeurIPS)-13, Denver, 2000, https://papers.nips.cc/paper_files/paper/2000. [PDF] Keyword(s): NeurIPS, machine learning, artificial intelligence, neural networks, Volterra series. Abstract:

Experimental data show that biological synapses are dynamic, i.e., their weight changes on a short time scale by several hundred percent in dependence of the past input to the synapse. In this article we explore the consequences that this synaptic dynamics entails for the computational power of feedforward neural networks. It turns out that even with just a single hidden layer such networks can approximate a surprisingly large large class of nonlinear filters: all filters that can be characterized by Volterra series. This result is robust with regard to various changes in the model for synaptic dynamics. Furthermore we show that simple gradient descent suffices to approximate a given quadratic filter by a rather small neural system with dynamic synapses.

W. Maass and E.D. Sontag. A precise characterization of the class of languages recognized by neural nets under Gaussian and other common noise distributions. In Proceedings of the 1998 conference on Advances in Neural Information Processing Systems II, Cambridge, MA, USA, pages 281-287, 1999. MIT Press. Note: Proc. NIPS(NeurIPS)-11, Denver, 1998, https://papers.nips.cc/paper_files/paper/1998. [PDF] Keyword(s): NeurIPS, machine learning, artificial intelligence, neural networks.

E.D. Sontag and Y. Qiao. Remarks on controllability of recurrent neural networks. In Proc. IEEE Conf. Decision and Control, Tampa, Dec. 1998, IEEE Publications, 1998, pages 501-506, 1998. Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks.

E.D. Sontag. Some learning and systems-theoretic questions regarding recurrent neural networks. In Proc. Conf. on Information Sciences and Systems (CISS 97), Johns Hopkins, Baltimore, MD, March 1997, pages 630-635, 1997. Keyword(s): machine learning, artificial intelligence, neural networks, VC dimension, recurrent neural networks.

B. Dasgupta and E.D. Sontag. Sample complexity for learning recurrent perceptron mappings. In D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 204-210, 1996. MIT Press, Cambridge, MA. Note: Proc. NIPS(NeurIPS)-8, Denver, 1995, https://papers.nips.cc/paper_files/paper/1995. Keyword(s): NeurIPS, machine learning, artificial intelligence, neural networks, VC dimension, recurrent neural networks.

P. Koiran and E.D. Sontag. Neural networks with quadratic VC dimension. In D.S. Touretzky, M.C. Moser, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 197-203, 1996. MIT Press, Cambridge, MA. Note: Proc. NIPS(NeurIPS)-8, Denver, 1995, https://papers.nips.cc/paper_files/paper/1995. Keyword(s): NeurIPS, machine learning, artificial intelligence, neural networks, VC dimension.

E.D. Sontag. Critical points for neural net least-squares problems. In Proc. 1995 IEEE Internat. Conf. Neural Networks, IEEE Publications, 1995, pages 2949-2954, 1995. Keyword(s): machine learning, artificial intelligence, neural networks.

B. DasGupta, H. T. Siegelmann, and E.D. Sontag. On a learnability question associated to neural networks with continuous activations (extended abstract). In COLT '94: Proceedings of the seventh annual conference on Computational learning theory, New York, NY, USA, pages 47-56, 1994. ACM Press. [doi:http://doi.acm.org/10.1145/180139.181009] Keyword(s): machine learning, artificial intelligence, analog computing, neural networks, computational complexity.

R. Koplon and E.D. Sontag. Techniques for parameter reconstruction in Fourier-Neural recurrent networks. In Proc. IEEE Conf. Decision and Control, Orlando, Dec. 1994, IEEE Publications, 1994, pages 213-218, 1994. Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks.

F. Albertini and E.D. Sontag. Identifiability of discrete-time neural networks. In Proc. European Control Conf., Groningen, June 1993, pages 460-465, 1993. Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks.

F. Albertini and E.D. Sontag. State observability in recurrent neural networks. In Proc. IEEE Conf. Decision and Control, San Antonio, Dec. 1993, IEEE Publications, 1993, pages 3706-3707, 1993. Keyword(s): machine learning, artificial intelligence, neural networks, observability, recurrent neural networks.

F. Albertini and E.D. Sontag. Uniqueness of weights for recurrent nets. In Systems and Networks: Mathematical Theory and Applications, Proc. MTNS '93, Vol. 2, Akad. Verlag, Regensburg, pages 599-602, 1993. Note: Full version, never submitted for publication, is here: http://sontaglab.org/FTPDIR/93mtns-nn-extended.pdf. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, identifiability, recurrent neural networks. Abstract:

This paper concerns recurrent networks x'=s(Ax+Bu), y=Cx, where s is a sigmoid, in both discrete time and continuous time. The paper establishes parameter identifiability under stronger assumptions on the activation than in "For neural networks, function determines form", but on the other hand deals with arbitrary (nonzero) initial states.

J. L. Balcázar, R. Gavaldà, H. T. Siegelmann, and E.D. Sontag. Some structural complexity aspects of neural computation. In Proceedings of the Eighth Annual Structure in Complexity Theory Conference (San Diego, CA, 1993), Los Alamitos, CA, pages 253-265, 1993. IEEE Comput. Soc. Press. [PDF] Keyword(s): machine learning, artificial intelligence, analog computing, neural networks, computational complexity, super-Turing computation, theory of computing and complexity. Abstract:

Recent work by H.T. Siegelmann and E.D. Sontag (1992) has demonstrated that polynomial time on linear saturated recurrent neural networks equals polynomial time on standard computational models: Turing machines if the weights of the net are rationals, and nonuniform circuits if the weights are real. Here, further connections between the languages recognized by such neural nets and other complexity classes are developed. Connections to space-bounded classes, simulation of parallel computational models such as Vector Machines, and a discussion of the characterizations of various nonuniform classes in terms of Kolmogorov complexity are presented.

C. Darken, M.J. Donahue, L. Gurvits, and E.D. Sontag. Rate of approximation results motivated by robust neural network learning. In COLT '93: Proceedings of the sixth annual conference on Computational learning theory, New York, NY, USA, pages 303-309, 1993. ACM Press. [doi:http://doi.acm.org/10.1145/168304.168357] Keyword(s): machine learning, artificial intelligence, machine learning, neural networks, optimization problems, approximation theory.

A. Macintyre and E.D. Sontag. Finiteness results for sigmoidal neural networks. In STOC '93: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, New York, NY, USA, pages 325-334, 1993. ACM Press. [PDF] [doi:http://doi.acm.org/10.1145/167088.167192] Keyword(s): machine learning, artificial intelligence, neural networks, theory of computing and complexity, real-analytic functions. Abstract:

This paper deals with analog circuits. It establishes the finiteness of VC dimension, teaching dimension, and several other measures of sample complexity which arise in learning theory. It also shows that the equivalence of behaviors, and the loading problem, are effectively decidable, modulo a widely believed conjecture in number theory. The results, the first ones that are independent of weight size, apply when the gate function is the "standard sigmoid" commonly used in neural networks research. The proofs rely on very recent developments in the elementary theory of real numbers with exponentiation. (Some weaker conclusions are also given for more general analytic gate functions.) Applications to learnability of sparse polynomials are also mentioned.

H.T. Siegelmann and E.D. Sontag. Analog computation via neural networks. In Proc. 2nd Israel Symposium on Theory of Computing and Systems (ISTCS93), IEEE Computer Society Press, 1993, 1993. Keyword(s): machine learning, artificial intelligence, analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.

F. Albertini and E.D. Sontag. For neural networks, function determines form. In Proc. IEEE Conf. Decision and Control, Tucson, Dec. 1992, IEEE Publications, 1992, pages 26-31, 1992. Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks.

H.T. Siegelmann and E.D. Sontag. On the computational power of neural nets. In COLT '92: Proceedings of the fifth annual workshop on Computational learning theory, New York, NY, USA, pages 440-449, 1992. ACM Press. [doi:http://doi.acm.org/10.1145/130385.130432] Keyword(s): machine learning, artificial intelligence, analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.

H.T. Siegelmann and E.D. Sontag. Some results on computing with neural nets. In Proc. IEEE Conf. Decision and Control, Tucson, Dec. 1992, IEEE Publications, 1992, pages 1476-1481, 1992. Keyword(s): machine learning, artificial intelligence, analog computing, neural networks, computational complexity, super-Turing computation, recurrent neural networks.

H.T. Siegelmann, E.D. Sontag, and C.L. Giles. The Complexity of Language Recognition by Neural Networks. In Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1, pages 329-335, 1992. North-Holland. Keyword(s): machine learning, artificial intelligence, neural networks, computational complexity, machine learning, recurrent neural networks, theory of computing and complexity.

E.D. Sontag. Neural nets as systems models and controllers. In Proc. Seventh Yale Workshop on Adaptive and Learning Systems, Yale University, 1992, pages 73-79, 1992. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks, neural networks. Abstract:

A conference paper. Placed here because it was requested, but contains little that is not also contained in the survey on neural nets mentioned above.

E.D. Sontag. Systems combining linearity and saturations, and relations to neural nets. In Nonlinear Control Systems Design 1992, IFAC Symposia Series, 1993, M. Fliess Ed., Pergamon Press, Oxford, 1993, pages 15-21, 1992. Note: (Also in Proc. Nonlinear Control Systems Design Symp., Bordeaux, June 1992, M. Fliess, Ed., IFAC Publications, pp. 242-247). Keyword(s): machine learning, artificial intelligence, neural networks, recurrent neural networks.

W. Maass, G. Schnitger, and E.D. Sontag. On the computational power of sigmoid versus Boolean threshold circuits (extended abstract). In Proceedings of the 32nd annual symposium on Foundations of computer science, Los Alamitos, CA, USA, pages 767-776, 1991. IEEE Computer Society Press. Keyword(s): machine learning, artificial intelligence, neural networks, theory of computing and complexity.

H. Dewan and E.D. Sontag. Extrapolatory methods for speeding up the BP algorithm. In Proc. Int. Joint Conf. on Neural Networks, Washington, DC, Jan. 1990, Lawrence Erlbaum Associates, Inc., Publishers, ISBN 0-8058-0775-6, pages I.613-616, 1990. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks. Abstract:

We describe a speedup technique that uses extrapolatory methods to predict the weights in a Neural Network using Back Propagation (BP) learning. The method is based on empirical observations of the way the weights change as a function of time. We use numerical function fitting techniques to determine the parameters of an extrapolation function and then use this function to project weights into the future. Significant computational savings result by using the extrapolated weights to jump over many iterations of the standard algorithm, achieving comparable performance with fewer iterations.

E.D. Sontag. Comparing sigmoids and heavisides. In Proc. Conf. Info. Sci. and Systems, Princeton, 1990, pages 654-659, 1990. Keyword(s): machine learning, artificial intelligence, neural networks, boolean systems.

E.D. Sontag. Remarks on interpolation and recognition using neural nets. In NIPS-3: Proceedings of the 1990 conference on Advances in neural information processing systems 3, San Francisco, CA, USA, pages 939-945, 1990. Morgan Kaufmann Publishers Inc.. Note: Proc. NIPS(NeurIPS)-3, Denver, 1990, https://papers.nips.cc/paper_files/paper/1990. Keyword(s): NeurIPS, machine learning, artificial intelligence, neural networks.

E.D. Sontag and H.J. Sussmann. Backpropagation Separates when Perceptrons Do. In Proc. IEEE Int. Conf. Neural Networks, Washington, DC, June 1989, pages 639-642, 1989. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks.

E.D. Sontag and H.J. Sussmann. Remarks on local minima in backpropagation. In Proc. Conf. Info. Sciences and Systems, Johns Hopkins University Press, 1989, pages 432-435, 1989. Keyword(s): machine learning, artificial intelligence, neural networks.

Internal reports

E.D. Sontag. Some remarks on the backpropagation algorithm for neural net learning. Technical report SYCON-88-02, Rutgers Center for Systems and Control, 1988. [PDF] Keyword(s): machine learning, artificial intelligence, neural networks. Abstract:

This is a very old informal report that discusses the study of local minima of quadratic loss functions for fitting errors in sigmoidal neural net learning. It also includes several remarks concerning the growth of weights during gradient descent. There is nothing very interesting here - far better knowledge is now available - but the report was placed here by request.

BACK TO INDEX

Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.

Last modified: Thu Nov 27 22:38:39 2025
Author: sontag.

This document was translated from BibT_EX by bibtex2html