Draft:Bayesian Neural Network

A Bayesian Neural Network (BNN) is a neural network, that trains a distribution over its network parameters using Bayesian inference. Once trained it uses this distribution over parameters to predict a probability distribution in the output space for a single input. BNNs are used in applications which require the quantification of uncertainties or in which multimodal distributions are expected that point-estimate predictions cannot express.[1] The design of a BNN entails the choice of a functional model (the neural network, with model parameters ), and a stochastic model which contains both priors and .[2]

Not to be confused with Bayesian Network.

The first use of BNNs was in 1993 by Hinton and Van Camp[3].

Setup

From a Bayesian point of view, the model parameters of a BNN are treated as latent random variables and the training process is their inference, conditional to the (observed) training data . The distribution of the completely trained model parameters is directly given by Bayes' theorem

In practice, the marginal is often intractable, which makes it necessary to adapt strategies to approximate the true posterior. Once the parameter posterior is computed, other quantities of interest can be computed via marginalisation. For example, the predictive posterior is given by

where is provided by the functional model.

Variants

The numerical difficulties that come with the computation of Bayes theorem have given rise to a family of BNN approaches, which can roughly be separated into parametric methods (Variational Inference (VI), Bayes by Backprop) and nonparametric methods (free-form VI, Monte Carlo Dropout (MCD), Direct sampling).

Variational inference

A simplification often performed is to introduce a surrogate posterior with closed form and minimize its Kullback-Leibler (KL) divergence to the true parameter posterior with respect to the variational parameters . This is possible because the objective function can be rewritten such that the intractable log marginal likelihood is separated from the variational term:

The quantity to maximize therefore becomes the evidence lower bound (ELBO).

Using the reparameterization trick, ...

This allows the reformulation of each Bayesian update to an optimization problem, which can be solved using established gradient descent methods. This method is known as variational inference.

Common choices for the closed form include the mean-field approach, which assumes a complete factorisation of q, and the more general multivariate normal distribution with a low-rank covariance matrix. [Barber and Bishop (Ensemble Learning in Bayesian Neural Networks)] This choice is the equivalent of a loss function in point estimate ML.[2]

In cases where the computation of the complete log-likelihood becomes infeasible due to the large volume of training data, VI also works with stochastic gradient descent (stochastic variational inference), where in each Bayesian update only a subset of data is used to approximate the likelihood term.

Monte Carlo dropout

Another approximation technique is Monte Carlo Dropout (MCD)[4] where conventional dropout layers are kept enabled during inference time. This allows the sampling of predictions.

Direct sampling

In contrast to VI, samplers like MCMC converge to the true posterior without assumptions about its form.[Goan]


Often used Hamiltonian Monte Carlo (HMC) or Langevin Monte Carlo

d[5]

Connection to Gaussian Processes

Neural network Gaussian process

...

Limitations

BNN are usually computation-heavy in training and inference, due to their need to generate many samples from the posterior distribution. For this reason, the VI method is often only suitable in its mean field form, which trades expressiveness for lower computational complexity.[5] Further, it was found that VI approaches often underestimate the variance of their predictions. Direct sampling using Markov Chain Monte Carlo (MCMC) scales poorly with the amount of data as by default it requires the processing of the entire training dataset to perform an update. Stochastic MCMC methods, which consume only a subset of the data, have been shown to introduce a bias to the posterior.[5]

VI: Placed assumptions and restrictions about the form may introduce a bias and induce inaccuracies in predictions.

References

  1. ^ Gawlikowski, Jakob; Tassi, Cedrique Rovile Njieutcheu; Ali, Mohsin; Lee, Jongseok; Humt, Matthias; Feng, Jianxiang; Kruspe, Anna; Triebel, Rudolph; Jung, Peter; Roscher, Ribana; Shahzad, Muhammad; Yang, Wen; Bamler, Richard; Zhu, Xiao Xiang (2023-10-01). "A survey of uncertainty in deep neural networks". Artificial Intelligence Review. 56 (1): 1513–1589. doi:10.1007/s10462-023-10562-9. ISSN 1573-7462.
  2. ^ a b Jospin, Laurent Valentin; Laga, Hamid; Boussaid, Farid; Buntine, Wray; Bennamoun, Mohammed (May 2022). "Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users". IEEE Computational Intelligence Magazine. 17 (2): 29–48. doi:10.1109/MCI.2022.3155327. ISSN 1556-603X.
  3. ^ Hinton, Geoffrey E.; van Camp, Drew (1993). "Keeping the neural networks simple by minimizing the description length of the weights". ACM Press: 5–13. doi:10.1145/168304.168306. ISBN 978-0-89791-611-0. {{cite journal}}: Cite journal requires |journal= (help)
  4. ^ Gal, Yarin; Ghahramani, Zoubin (2015), Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, arXiv, doi:10.48550/ARXIV.1506.02142, retrieved 2026-03-17
  5. ^ a b c Goan, Ethan; Fookes, Clinton (2020), Mengersen, Kerrie L.; Pudlo, Pierre; Robert, Christian P. (eds.), "Bayesian Neural Networks: An Introduction and Survey", Case Studies in Applied Bayesian Data Science: CIRM Jean-Morlet Chair, Fall 2018, Cham: Springer International Publishing, pp. 45–87, doi:10.1007/978-3-030-42553-1_3, ISBN 978-3-030-42553-1, retrieved 2026-03-17{{citation}}: CS1 maint: work parameter with ISBN (link)

Category:Bayesian statistics

Content Disclaimer

Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.

  1. The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
  2. There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
  3. It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
  4. Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
  5. Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.