1. The Biological Paradigm.- 1.1 Neural computation.- 1.1.1 Natural and artificial neural networks.- 1.1.2 Models of computation.- 1.1.3 Elements of a computing model.- 1.2 Networks of neurons.- 1.2.1 Structure of the neurons.- 1.2.2 Transmission of information.- 1.2.3 Information processing at the neurons and synapses.- 1.2.4 Storage of information - learning.- 1.2.5 The neuron - a self-organizing system.- 1.3 Artificial neural networks.- 1.3.1 Networks of primitive functions.- 1.3.2 Approximation of functions.- 1.3.3 Caveat.- 1.4 Historical and bibliographical remarks.- 2. Threshold Logic.- 2.1 Networks of functions.- 2.1.1 Feed-forward and recurrent networks.- 2.1.2 The computing units.- 2.2 Synthesis of Boolean functions.- 2.2.1 Conjunction, disjunction, negation.- 2.2.2 Geometric interpretation.- 2.2.3 Constructive synthesis.- 2.3 Equivalent networks.- 2.3.1 Weighted and unweighted networks.- 2.3.2 Absolute and relative inhibition.- 2.3.3 Binary signals and pulse coding.- 2.4 Recurrent networks.- 2.4.1 Stored state networks.- 2.4.2 Finite automata.- 2.4.3 Finite automata and recurrent networks.- 2.4.4 A first classification of neural networks.- 2.5 Harmonic analysis of logical functions.- 2.5.1 General expression.- 2.5.2 The Hadamard-Walsh transform.- 2.5.3 Applications of threshold logic.- 2.6 Historical and bibliographical remarks.- 3.Weighted Networks - The Perceptron.- 3.1 Perceptrons and parallel processing.- 3.1.1 Perceptrons as weighted threshold elements.- 3.1.2 Computational limits of the perceptron model.- 3.2 Implementation of logical functions.- 3.2.1 Geometric interpretation.- 3.2.2 The XOR problem.- 3.3 Linearly separable functions.- 3.3.1 Linear separability.- 3.3.2 Duality of input space and weight space.- 3.3.3 The error function in weight space.- 3.3.4 General decision curves.- 3.4 Applications and biological analogy.- 3.4.1 Edge detection with perceptrons.- 3.4.2 The structure of the retina.- 3.4.3 Pyramidal networks and the neocognitron.- 3.4.4 The silicon retina.- 3.5 Historical and bibliographical remarks.- 4. Perceptron Learning.- 4.1 Learning algorithms for neural networks.- 4.1.1 Classes of learning algorithms.- 4.1.2 Vector notation.- 4.1.3 Absolute linear separability.- 4.1.4 The error surface and the search method.- 4.2 Algorithmic learning.- 4.2.1 Geometric visualization.- 4.2.2 Convergence of the algorithm.- 4.2.3 Accelerating convergence.- 4.2.4 The pocket algorithm.- 4.2.5 Complexity of perceptron learning.- 4.3 Linear programming.- 4.3.1 Inner points of polytopes.- 4.3.2 Linear separability as linear optimization.- 4.3.3 Karmarkar's algorithm.- 4.4 Historical and bibliographical remarks.- 5. Unsupervised Learning and Clustering Algorithms.- 5.1 Competitive learning.- 5.1.1 Generalization of the perceptron problem.- 5.1.2 Unsupervised learning through competition.- 5.2 Convergence analysis.- 5.2.1 The one-dimensional case - energy function.- 5.2.2 Multidimensional case - the classical methods.- 5.2.3 Unsupervised learning as minimization problem.- 5.2.4 Stability of the solutions.- 5.3 Principal component analysis.- 5.3.1 Unsupervised reinforcement learning.- 5.3.2 Convergence of the learning algorithm.- 5.3.3 Multiple principal components.- 5.4 Some applications.- 5.4.1 Pattern recognition.- 5.4.2 Image compression.- 5.5 Historical and bibliographical remarks.- 6. One and Two Layered Networks.- 6.1 Structure and geometric visualization.- 6.1.1 Network architecture.- 6.1.2 The XOR problem revisited.- 6.1.3 Geometric visualization.- 6.2 Counting regions in input and weight space.- 6.2.1 Weight space regions for the XOR problem.- 6.2.2 Bipolar vectors.- 6.2.3 Projection of the solution regions.- 6.2.4 Geometric interpretation.- 6.3 Regions for two layered networks.- 6.3.1 Regions in weight space for the XOR problem.- 6.3.2 Number of regions in general.- 6.3.3 Consequences.- 6.3.4 The Vapnik-Chervonenkis dimension.- 6.3.5 The problem of local minima.- 6.4 Historical and bibliographical remarks.- 7. The Backpropagation Algorithm.- 7.1 Learning as gradient descent.- 7.1.1 Differentiable activation functions.- 7.1.2 Regions in input space.- 7.1.3 Local minima of the error function.- 7.2 General feed-forward networks.- 7.2.1 The learning problem.- 7.2.2 Derivatives of network functions.- 7.2.3 Steps of the backpropagation algorithm.- 7.2.4 Learning with backpropagation.- 7.3 The case of layered networks.- 7.3.1 Extended network.- 7.3.2 Steps of the algorithm.- 7.3.3 Backpropagation in matrix form.- 7.3.4 The locality of backpropagation.- 7.3.5 Error during training.- 7.4 Recurrent networks.- 7.4.1 Backpropagation through time.- 7.4.2 Hidden Markov Models.- 7.4.3 Variational problems.- 7.5 Historical and bibliographical remarks.- 8. Fast Learning Algorithms.- 8.1 Introduction - classical backpropagation.- 8.1.1 Backpropagation with momentum.- 8.1.2 The fractal geometry of backpropagation.- 8.2 Some simple improvements to backpropagation.- 8.2.1 Initial weight selection.- 8.2.2 Clipped derivatives and offset term.- 8.2.3 Reducing the number of floating-point operations.- 8.2.4 Data decorrelation.- 8.3 Adaptive step algorithms.- 8.3.1 Silva and Almeida's algorithm.- 8.3.2 Delta-bar-delta.- 8.3.3 Rprop.- 8.3.4 The Dynamic Adaption algorithm.- 8.4 Second-order algorithms.- 8.4.1 Quickprop.- 8.4.2 QRprop.- 8.4.3 Second-order backpropagation.- 8.5 Relaxation methods.- 8.5.1 Weight and node perturbation.- 8.5.2 Symmetric and asymmetric relaxation.- 8.5.3 A final thought on taxonomy.- 8.6 Historical and bibliographical remarks.- 9. Statistics and Neural Networks.- 9.1 Linear and nonlinear regression.- 9.1.1 The problem of good generalization.- 9.1.2 Linear regression.- 9.1.3 Nonlinear units.- 9.1.4 Computing the prediction error.- 9.1.5 The jackknife and cross-validation.- 9.1.6 Committees of networks.- 9.2 Multiple regression.- 9.2.1 Visualization of the solution regions.- 9.2.2 Linear equations and the pseudoinverse.- 9.2.3 The hidden layer.- 9.2.4 Computation of the pseudoinverse.- 9.3 Classification networks.- 9.3.1 An application: NETtalk.- 9.3.2 The Bayes property of classifier networks.- 9.3.3 Connectionist speech recognition.- 9.3.4 Autoregressive models for time series analysis.- 9.4 Historical and bibliographical remarks.- 10. The Complexity of Learning.- 10.1 Network functions.- 10.1.1 Learning algorithms for multilayer networks.- 10.1.2 Hilbert's problem and computability.- 10.1.3 Kolmogorov's theorem.- 10.2 Function approximation.- 10.2.1 The one-dimensional case.- 10.2.2 The multidimensional case.- 10.3 Complexity of learning problems.- 10.3.1 Complexity classes.- 10.3.2 NP-complete learning problems.- 10.3.3 Complexity of learning with AND-OR networks.- 10.3.4 Simplifications of the network architecture.- 10.3.5 Learning with hints.- 10.4 Historical and bibliographical remarks.- 11. Fuzzy Logic.- 11.1 Fuzzy sets and fuzzy logic.- 11.1.1 Imprecise data and imprecise rules.- 11.1.2 The fuzzy set concept.- 11.1.3 Geometric representation of fuzzy sets.- 11.1.4 Fuzzy set theory, logic operators, and geometry.- 11.1.5 Families of fuzzy operators.- 11.2 Fuzzy inferences.- 11.2.1 Inferences from imprecise data.- 11.2.2 Fuzzy numbers and inverse operation.- 11.3 Control with fuzzy logic.- 11.3.1 Fuzzy controllers.- 11.3.2 Fuzzy networks.- 11.3.3 Function approximation with fuzzy methods.- 11.3.4 The eye as a fuzzy system - color vision.- 11.4 Historical and bibliographical remarks.- 12. Associative Networks.- 12.1 Associative pattern recognition.- 12.1.1 Recurrent networks and types of associative memories.- 12.1.2 Structure of an associative memory.- 12.1.3 The eigenvector automaton.- 12.2 Associative learning.- 12.2.1 Hebbian learning - the correlation matrix.- 12.2.2 Geometric interpretation of Hebbian learning.- 12.2.3 Networks as dynamical systems - some experiments.- 12.2.4 Another visualization.- 12.3 The capacity problem.- 12.4 The pseudoinverse.- 12.4.1 Definition and properties of the pseudoinverse.- 12.4.2 Orthogonal projections.- 12.4.3 Holographic memories.- 12.4.4 Translation invariant pattern recognition.- 12.5 Historical and bibliographical remarks.- 13. The Hopfield Model.- 13.1 Synchronous and asynchronous networks.- 13.1.1 Recursive networks with stochastic dynamics.- 13.1.2 The bidirectional associative memory.- 13.1.3 The energy function.- 13.2 Definition of Hopfield networks.- 13.2.1 Asynchronous networks.- 13.2.2 Examples of the model.- 13.2.3 Isomorphism between the Hopfield and Ising models.- 13.3 Converge to stable states.- 13.3.1 Dynamics of Hopfield networks.- 13.3.2 Covergence proof.- 13.3.3 Hebbian learning.- 13.4 Equivalence of Hopfield and perceptron learning.- 13.4.1 Perceptron learning in Hopfield networks.- 13.4.2 Complexity of learning in Hopfield models.- 13.5 Parallel combinatorics.- 13.5.1 NP-complete problems and massive parallelism.- 13.5.2 The multiflop problem.- 13.5.3 The eight rooks problem.- 13.5.4 The eight queens problem.- 13.5.5 The traveling salesman.- 13.5.6 The limits of Hopfield networks.- 13.6 Implementation of Hopfield networks.- 13.6.1 Electrical implementation.- 13.6.2 Optical implementation.- 13.7 Historical and bibliographical remarks.- 14. Stochastic Networks.- 14.1 Variations of the Hopfield model.- 14.1.1 The continuous model.- 14.2 Stochastic systems.- 14.2.1 Simulated annealing.- 14.2.2 Stochastic neural networks.- 14.2.3 Markov chains.- 14.2.4 The Boltzmann distribution.- 14.2.5 Physical meaning of the Boltzmann distribution.- 14.3 Learning algorithms and applications.- 14.3.1 Boltzmann learning.- 14.3.2 Combinatorial optimization.- 14.4 Historical and bibliographical remarks.- 15. Kohonen Networks.- 15.1 Self-organization.- 15.1.1 Charting input space.- 15.1.2 Topology preserving maps in the brain.- 15.2 Kohonen's model.- 15.2.1 Learning algorithm.- 15.2.2 Mapping high-dimensional spaces.- 15.3 Analysis of convergence.- 15.3.1 Potential function - the one-dimensional case.- 15.3.2 The two-dimensional case.- 15.3.3 Effect of a unit's neighborhood.- 15.3.4 Metastable states.- 15.3.5 What dimension for Kohonen networks?.- 15.4 Applications.- 15.4.1 Approximation of functions.- 15.4.2 Inverse kinematics.- 15.5 Historical and bibliographical remarks.- 16. Modular Neural Networks.- 16.1 Constructive algorithms for modular networks.- 16.1.1 Cascade correlation.- 16.1.2 Optimal modules and mixtures of experts.- 16.2 Hybrid networks.- 16.2.1 The ART architectures.- 16.2.2 Maximum entropy.- 16.2.3 Counterpropagation networks.- 16.2.4 Spline networks.- 16.2.5 Radial basis functions.- 16.3 Historical and bibliographical remarks.- 17. Genetic Algorithms.- 17.1 Coding and operators.- 17.1.1 Optimization problems.- 17.1.2 Methods of stochastic optimization.- 17.1.3 Genetic coding.- 17.1.4 Information exchange with genetic operators.- 17.2 Properties of genetic algorithms.- 17.2.1 Convergence analysis.- 17.2.2 Deceptive problems.- 17.2.3 Genetic drift.- 17.2.4 Gradient methods versus genetic algorithms.- 17.3 Neural networks and genetic algorithms.- 17.3.1 The problem of symmetries.- 17.3.2 A numerical experiment.- 17.3.3 Other applications of GAs.- 17.4 Historical and bibliographical remarks.- 18. Hardware for Neural Networks.- 18.1 Taxonomy of neural hardware.- 18.1.1 Performance requirements.- 18.1.2 Types of neurocomputers.- 18.2 Analog neural networks.- 18.2.1 Coding.- 18.2.2 VLSI transistor circuits.- 18.2.3 Transistors with stored charge.- 18.2.4 CCD components.- 18.3 Digital networks.- 18.3.1 Numerical representation of weights and signals.- 18.3.2 Vector and signal processors.- 18.3.3 Systolic arrays.- 18.3.4 One-dimensional structures.- 18.4 Innovative computer architectures.- 18.4.1 VLSI microprocessors for neural networks.- 18.4.2 Optical computers.- 18.4.3 Pulse coded networks.- 18.5 Historical and bibliographical remarks.