Publications about 'VC dimension' |
Articles in journal or book chapters |
This paper takes a computational learning theory approach to a problem of linear systems identification. It is assumed that input signals have only a finite number k of frequency components, and systems to be identified have dimension no greater than n. The main result establishes that the sample complexity needed for identification scales polynomially with n and logarithmically with k. |
The Vapnik-Chervonenkis (VC) dimension is an integer which helps to characterize distribution-independent learning of binary concepts from positive and negative samples. This paper, based on lectures delivered at the Isaac Newton Institute in August of 1997, presents a brief introduction, establishes various elementary results, and discusses how to estimate the VC dimension in several examples of interest in neural network theory. (It does not address the learning and estimation-theoretic applications of VC dimension, and the applications to uniform convergence theorems for empirical probabilities, for which many suitable references are available.) |
This paper provides lower and upper bounds for the VC dimension of recurrent networks. Several types of activation functions are discussed, including threshold, polynomial, piecewise-polynomial and sigmoidal functions. The bounds depend on two independent parameters: the number w of weights in the network, and the length k of the input sequence. Ignoring multiplicative constants, the main results say roughly the following: 1. For architectures whose activation is any fixed nonlinear polynomial, the VC dimension is proportional to wk. 2. For architectures whose activation is any fixed piecewise polynomial, the VC dimension is between wk and w**2k. 3. For architectures with threshold activations, the VC dimension is between wlog(k/w) and the smallest of wklog(wk) and w**2+wlog(wk). 4. For the standard sigmoid tanh(x), the VC dimension is between wk and w**4 k**2. |
The following learning problem is considered, for continuous-time recurrent neural networks having sigmoidal activation functions. Given a ``black box'' representing an unknown system, measurements of output derivatives are collected, for a set of randomly generated inputs, and a network is used to approximate the observed behavior. It is shown that the number of inputs needed for reliable generalization (the sample complexity of the learning problem) is upper bounded by an expression that grows polynomially with the dimension of the network and logarithmically with the number of output derivatives being matched. |
This paper provides an exposition of some recent results regarding system-theoretic aspects of continuous-time recurrent (dynamic) neural networks with sigmoidal activation functions. The class of systems is introduced and discussed, and a result is cited regarding their universal approximation properties. Known characterizations of controllability, observability, and parameter identifiability are reviewed, as well as a result on minimality. Facts regarding the computational power of recurrent nets are also mentioned. |
This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles the open question of whether whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generalization are discussed. |
For classes of concepts defined by certain classes of analytic functions depending on k parameters, there are nonempty open sets of samples of length 2k+2 which cannot be shattered. A slighly weaker result is also proved for piecewise-analytic functions. The special case of neural networks is discussed. |
Recurrent perceptron classifiers generalize the usual perceptron model. They correspond to linear transformations of input vectors obtained by means of "autoregressive moving-average schemes", or infinite impulse response filters, and allow taking into account those correlations and dependences among input coordinates which arise from linear digital filtering. This paper provides tight bounds on sample complexity associated to the fitting of such models to experimental data. The results are expressed in the context of the theory of probably approximately correct (PAC) learning. |
This paper deals with single-hidden-layer feedforward nets, studying various aspects of classification power and interpolation capability. In particular, a worst-case analysis shows that direct input to output connections in threshold nets double the recognition but not the interpolation power, while using sigmoids rather than thresholds allows doubling both. For other measures of classification, including the Vapnik-Chervonenkis dimension, the effect of direct connections or sigmoidal activations is studied in the special case of two-dimensional inputs. |
Conference articles |
This paper deals with analog circuits. It establishes the finiteness of VC dimension, teaching dimension, and several other measures of sample complexity which arise in learning theory. It also shows that the equivalence of behaviors, and the loading problem, are effectively decidable, modulo a widely believed conjecture in number theory. The results, the first ones that are independent of weight size, apply when the gate function is the "standard sigmoid" commonly used in neural networks research. The proofs rely on very recent developments in the elementary theory of real numbers with exponentiation. (Some weaker conclusions are also given for more general analytic gate functions.) Applications to learnability of sparse polynomials are also mentioned. |
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders.
This document was translated from BibT_{E}X by bibtex2html