Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." Neural Computation 7 (1995): 219-269.
A thorough introduction to the connection between learning and Regularization Theory. We will often refer to this paper in this and in the next few classes.
Vapnik, V. The Nature of Statistical Learning Theory. Springer, 1995.
Chapter 1 is a readable first-hand introduction to the subject.
Further Readings:
Bertero, M. "Regularization Methods for Linear Inverse Problems." In Inverse Problems. Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.
Still a very good survey of the subject.
Tikhonov, A. N. and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, 1977.
Everybody's first book on Regularization Theory.
Vapnik, V. Statistical Learning Theory. Wiley, 1998.
Browse the first chapters of this book if you want to go deeper into the foundations of SLT.
Kolmogorov, N., and S.V. Fomine. Elements of the Theory of Functions and Functional Analysis. Dover, 1975.
A classic. Though you should be able to follow the class anyway, go through Sec. 5.1, 6.4, and 6.5 of Ch. 2 and Sec. 13.1, 13.2, 13.3, 13.5, 13.6, and 15.1 of Ch. 4 paying particular attention to all what concerns function spaces.
Strang, G. Calculus. Wellesley-Cambridge Press, 1991.
Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.
Further Readings:
Bertero, M. "Regularization Methods for Linear Inverse Problems" In Inverse Problems. Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.
Still a very good survey of the subject.
Tikhonov, A. N., and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, 1977.
Everybody's first book on Regularization Theory.
Strang, G. Introduction to Linear Algebra. Wellesley-Cambridge Press, 1993.
Chapter 6 contains the matrix algebra used in this class (and more!).
Further Readings:
Aronszajn, N. "Theory of Reproducing Kernels." Trans. Amer. Math. Soc. 686 (1950): 337-404.
RKHS the hard way.
Girosi, F. "An Equivalence Between Sparse Approximation and Support Vector Machines." Neural Computation 10 (1998): 1455-1480.
In Appendix A of this paper you find a smooth introduction to RKHS.
Wahba, G. Spline Models for Observational Data. SIAM, 1990.
Chapter 1 introduces you to the world of RKHS.
Strang, G. Calculus. Wellesley-Cambridge Press, 1991.
Chapter 13 contains an excellent exposition of the Lagrange multipliers technique.
Further Readings:
Vapnik, V. N. Estimation of Dependences Based on Empirical Data. Springer, 1982.
Chapter 9 contains a discussion of the Parzen windows method within the framework of Regularization Theory.
Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." Neural Computation 7 (1995): 219-269.
Once more a very good source of information about connections between different approximation techniques.
Further Readings:
Hertz, J., A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.
A good book on Neural Networks viewed from the physicist perspective.
Further Readings:
Hertz, J., A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.
A good book on Neural Networks viewed from the physicist perspective.
Further Readings:
Vapnik, V. Statistical Learning Theory. Wiley, 1998.
Browse the first chapters of this book if you want to go deeper into the foundations of SLT.
Evgeniou, T., M. Pontil, and T. Poggio. "Regularization Networks and Support Vector Machines." Advances in Computational Mathematics 13 (2000): 1-50.
Most of this class can be found in this paper.
Vapnik, V. Statistical Learning Theory. Wiley, 1998.
Chapters 5 and 6 tell you most but not the whole story about the results discussed in this class.
Vapnik, V. Statistical Learning Theory. Wiley, 1998.
This class will cover part of chapter 10. You may want to go through chapter 8 for putting SVMs in perspective with respect to other techniques.
Girosi, F. "An Equivalence between Sparse Approximation and Support Vector Machines." Neural Computation 10 (1998): 1455-1480.
This is the paper in which the relation between SVM and BPD is studied.
Vapnik, V. Statistical Learning Theory. Wiley, 1998.
This class will cover part of chapters 11 and 13.
Further Readings:
Chen, S., D. Donoho, and M. Saunders. "Atomic Decomposition by Basis Pursuit." Tech Rep 479. Dept. of Statistics. Stanford University. 1995.
Daubechies, I. "Time Frequency Localization Operators: a Geometric Phase Space Approach." IEEE Trans. on Information Theory 34 (1988): 605-612.
Mallat, S., and S. Zhang. "Matching Pursuits with Time-Frequency Dictionaries." IEEE. Trans. on Signal Proc. 41 (1993): 3397-3415.
Pontil, M., S. Mukherjee, and F. Girosi. "On the Noise Model of Support Vector Machine Regression." CBCL Paper #168, AI Memo #1651, Massachusetts Institute of Technology, Cambridge, MA (1998).
Vapnik, V. Statistical Learning Theory. Wiley, 1998.
You'll find kernels and ideas on kernels throughout chapters 10, 11 and 12.
Further Readings:
Berg, C., J. P. R. Christensen, and P. Ressel. "Harmonic Analysis on Semigroups." Springer Verlag.
The title is intimidating, but chapter 3 is easy to read and contains a lucid introduction to positive definite functions.
Jaakkola, T., and D. Haussler. "Exploiting Generative Models in Discriminative Classifiers." NIPS (1998).
Niyogi, P., T. Poggio, and F. Girosi. "Incorporating Prior Information in Machine Learning by Creating Virtual Examples." IEEE Proceedings on Intelligent Signal Processing 86 (1998): 2196-2209.
Logothetis, N. K., T. Vetter, A. Hulbert, and T. Poggio. "View-Based Models of 3D Object Recognition and Class-Specific Invariances." AI Memo 1473, CBCL Paper 94 (1994).
Riesenhuber, M., and T. Poggio. "Hierarchical Models of Object Recognition in Cortex." Nature Neuroscience 2 (1999): 1019-1025.
Niyogi, P., and F. Girosi. "On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions." Neural Computation 8 (1996): 819-842.
Here you find the material for the discussion on the various types of error.
Is the SVM solution unique?
Burges, and Crisp. Uniqueness of the SVM Solution NIPS 12 (1999).
The Decomposition Method for SVMs:
Osuna, Edgar. Support Vector Machines: Training And Applications. Ph.D. Thesis (1998).
Optimizing over 2 variables at a time:
Platt, John C. "Sequential Minimal Optimization: A Fast Algorithm For Training Support Vector Machines." Microsoft Research MST-TR-98-14 (1998).
Analysis of the Decomposition Method:
Chang, Chih-Chung, Chih-Wei Hsu, and Chih-Jen Lin. "The Analysis of Decomposition Methods For Support Vector Machines." Proceedings of IJCAI99, SVM workshop (1999).
Keerthi, S. S., and E. G. Gilbert. Convergence of a Generalized SMO Algorithm For SVM Classifier Design Control Division. Dept. of Mechanical and Production Engineering, National University of Singapore CD-00-01 (2000).
Keerthi, S. S., S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt's SMO Algorithm for SVM Classifier Design Control Division. Dept. of Mechanical and Production Engineering, National University of Singapore CD-99-14 (1999).
Sparsity Control:
Osuna, Freund, and Girosi. "Reducing Run-time Complexity in SVMs." Proceedings of the 14th Int'l Conference on Pattern Recognition.
Schapire, R. E., Y. Freund, P. Bartlett, and W. S. Lee. "Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods." The Annals of Statistics 26 (1998): 1651-1686.
Daubechies, I. "Ten Lectures on Wavelets." CBMS-NSF Regional Conferences Series in Applied Mathematics, SIAM, Philadelphia PA (1992).
More advanced but it also contains the basic theoretical results on frames.