A. Ambroladze, E. Parrado-hernández, and J. Shawe-taylor, Tighter PAC-Bayes bounds, NIPS, 2006.

Y. Bengio, Learning deep architectures for AI. Foundations and Trends in Machine Learning, vol.2, pp.1-127, 2009.

O. Catoni, Olivier Catoni. Statistical learning theory and stochastic optimization: Ecole d'Eté de Probabilités de Saint-Flour XXXI-2001, vol.840, 2003.

O. Catoni, PAC-Bayesian supervised classification: the thermodynamics of statistical learning, Inst. of Mathematical Statistic, vol.56, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00206119

C. Cortes and V. Vapnik, Support-vector networks, Machine Learning, vol.20, 1995.

K. Gintare, D. M. Dziugaite, and . Roy, Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data, UAI, 2017.

P. Germain, A. Lacasse, F. Laviolette, and M. Marchand, PAC-Bayesian learning of linear classifiers, ICML, pp.353-360, 2009.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

B. Guedj, A primer on PAC-Bayesian learning, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01983732

I. Hubara, M. Courbariaux, D. Soudry, R. El-yaniv, and Y. Bengio, Binarized neural networks. In NIPS, pp.4107-4115, 2016.

I. Hubara, M. Courbariaux, and D. Soudry, Quantized neural networks: Training neural networks with low precision weights and activations, JMLR, vol.18, issue.1, pp.6869-6898, 2017.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, ICLR, 2015.

J. Langford, Tutorial on practical prediction theory for classification, JMLR, vol.6, 2005.

J. Langford and R. Caruana, Not) Bounding the True Error, NIPS, pp.809-816, 2001.

J. Langford and J. Shawe-taylor, PAC-Bayes & margins, NIPS, 2002.

D. Mcallester, Some PAC-Bayesian theorems, Machine Learning, vol.37, 1999.

D. Mcallester, Behnam Neyshabur, Srinadh Bhojanapalli, and Nathan Srebro. A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks, Machine Learning, vol.51, 2003.

E. Parrado-hernández, A. Ambroladze, J. Shawe-taylor, and S. Sun, PAC-Bayes bounds with data dependent priors, JMLR, p.13, 2012.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in PyTorch, NIPS Autodiff Workshop, 2017.

M. Seeger, PAC-Bayesian generalization bounds for gaussian processes, JMLR, vol.3, 2002.

J. Shawe, -. Taylor, and R. C. Williamson, A PAC analysis of a Bayesian estimator, COLT, 1997.

D. Soudry, I. Hubara, and R. Meir, Expectation backpropagation: Parameter-free training of multilayer neural networks with continuous or discrete weights, NIPS, pp.963-971, 2014.

G. Leslie and . Valiant, A theory of the learnable, Proceedings of the sixteenth annual ACM symposium on Theory of computing, pp.436-445, 1984.

J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, How transferable are features in deep neural networks? In NIPS, pp.3320-3328, 2014.

W. Zhou, V. Veitch, M. Austern, R. P. Adams, and P. Orbanz, Non-vacuous generalization bounds at the imagenet scale: a PAC-bayesian compression approach, ICLR, 2019.

B. , Additional results, vol.3

, Figure 3 reproduces the experiment presented by Figure 1 with another toy dataset. Figure 5 studies the effect of the sampling size T on the stochastic gradient descent procedure. See both figures captions for details