Draft:Bayesian Neural Network
Draft article not currently submitted for review.
This is a draft Articles for creation (AfC) submission. It is not currently pending review. While there are no deadlines, abandoned drafts may be deleted after six months. To edit or make changes to this draft, simply click on the "Edit" tab at the top of the window. To be accepted, a draft should:
It is strongly discouraged to write about either yourself or your business or employer. If you do so, you must declare it. Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
Last edited by Uppsimba (talk | contribs) 59 days ago. (Update) |
| Part of a series on |
| Machine learning and data mining |
|---|
A Bayesian Neural Network (BNN) is a neural network, that trains a distribution over its network parameters using Bayesian inference. Once trained it uses this distribution over parameters to predict a probability distribution in the output space for a single input. BNNs are used in applications which require the quantification of uncertainties or in which multimodal distributions are expected that point-estimate predictions cannot express.[1] The design of a BNN entails the choice of a functional model (the neural network, with model parameters ), and a stochastic model which contains both priors and .[2]
Not to be confused with Bayesian Network.
The first use of BNNs was in 1993 by Hinton and Van Camp[3].
Setup
From a Bayesian point of view, the model parameters of a BNN are treated as latent random variables and the training process is their inference, conditional to the (observed) training data . The distribution of the completely trained model parameters is directly given by Bayes' theorem
In practice, the marginal is often intractable, which makes it necessary to adapt strategies to approximate the true posterior. Once the parameter posterior is computed, other quantities of interest can be computed via marginalisation. For example, the predictive posterior is given by
where is provided by the functional model.
Variants
The numerical difficulties that come with the computation of Bayes theorem have given rise to a family of BNN approaches, which can roughly be separated into parametric methods (Variational Inference (VI), Bayes by Backprop) and nonparametric methods (free-form VI, Monte Carlo Dropout (MCD), Direct sampling).
Variational inference
A simplification often performed is to introduce a surrogate posterior with closed form and minimize its Kullback-Leibler (KL) divergence to the true parameter posterior with respect to the variational parameters . This is possible because the objective function can be rewritten such that the intractable log marginal likelihood is separated from the variational term:
The quantity to maximize therefore becomes the evidence lower bound (ELBO).
Using the reparameterization trick, ...
This allows the reformulation of each Bayesian update to an optimization problem, which can be solved using established gradient descent methods. This method is known as variational inference.
Common choices for the closed form include the mean-field approach, which assumes a complete factorisation of q, and the more general multivariate normal distribution with a low-rank covariance matrix. [Barber and Bishop (Ensemble Learning in Bayesian Neural Networks)] This choice is the equivalent of a loss function in point estimate ML.[2]
In cases where the computation of the complete log-likelihood becomes infeasible due to the large volume of training data, VI also works with stochastic gradient descent (stochastic variational inference), where in each Bayesian update only a subset of data is used to approximate the likelihood term.
Monte Carlo dropout
Another approximation technique is Monte Carlo Dropout (MCD)[4] where conventional dropout layers are kept enabled during inference time. This allows the sampling of predictions.
Direct sampling
In contrast to VI, samplers like MCMC converge to the true posterior without assumptions about its form.[Goan]
Often used Hamiltonian Monte Carlo (HMC) or Langevin Monte Carlo
d[5]
Connection to Gaussian Processes
Neural network Gaussian process
...
Limitations
BNN are usually computation-heavy in training and inference, due to their need to generate many samples from the posterior distribution. For this reason, the VI method is often only suitable in its mean field form, which trades expressiveness for lower computational complexity.[5] Further, it was found that VI approaches often underestimate the variance of their predictions. Direct sampling using Markov Chain Monte Carlo (MCMC) scales poorly with the amount of data as by default it requires the processing of the entire training dataset to perform an update. Stochastic MCMC methods, which consume only a subset of the data, have been shown to introduce a bias to the posterior.[5]
VI: Placed assumptions and restrictions about the form may introduce a bias and induce inaccuracies in predictions.
References
- ^ Gawlikowski, Jakob; Tassi, Cedrique Rovile Njieutcheu; Ali, Mohsin; Lee, Jongseok; Humt, Matthias; Feng, Jianxiang; Kruspe, Anna; Triebel, Rudolph; Jung, Peter; Roscher, Ribana; Shahzad, Muhammad; Yang, Wen; Bamler, Richard; Zhu, Xiao Xiang (2023-10-01). "A survey of uncertainty in deep neural networks". Artificial Intelligence Review. 56 (1): 1513–1589. doi:10.1007/s10462-023-10562-9. ISSN 1573-7462.
- ^ a b Jospin, Laurent Valentin; Laga, Hamid; Boussaid, Farid; Buntine, Wray; Bennamoun, Mohammed (May 2022). "Hands-On Bayesian Neural Networks—A Tutorial for Deep Learning Users". IEEE Computational Intelligence Magazine. 17 (2): 29–48. doi:10.1109/MCI.2022.3155327. ISSN 1556-603X.
- ^ Hinton, Geoffrey E.; van Camp, Drew (1993). "Keeping the neural networks simple by minimizing the description length of the weights". ACM Press: 5–13. doi:10.1145/168304.168306. ISBN 978-0-89791-611-0.
{{cite journal}}: Cite journal requires|journal=(help) - ^ Gal, Yarin; Ghahramani, Zoubin (2015), Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, arXiv, doi:10.48550/ARXIV.1506.02142, retrieved 2026-03-17
- ^ a b c Goan, Ethan; Fookes, Clinton (2020), Mengersen, Kerrie L.; Pudlo, Pierre; Robert, Christian P. (eds.), "Bayesian Neural Networks: An Introduction and Survey", Case Studies in Applied Bayesian Data Science: CIRM Jean-Morlet Chair, Fall 2018, Cham: Springer International Publishing, pp. 45–87, doi:10.1007/978-3-030-42553-1_3, ISBN 978-3-030-42553-1, retrieved 2026-03-17
{{citation}}: CS1 maint: work parameter with ISBN (link)
Content Disclaimer
Informasi ini disarikan dari Wikipedia dan disajikan kembali untuk tujuan edukasi. Konten tersedia di bawah lisensi CC BY-SA 3.0. Kami tidak bertanggung jawab atas ketidakakuratan data yang bersumber dari kontribusi publik tersebut.
- The information displayed on this website is sourced in part or in whole from Wikipedia and has been adapted for the purpose of restating it. We strive to provide accurate and relevant information, however:
- There is no guarantee of absolute accuracy. Wikipedia is an open, collaborative project that can be edited by anyone, so information is subject to change.
- It is not intended to constitute professional advice. The content displayed is for informational and educational purposes only. For important decisions (e.g., medical, legal, or financial), please consult a professional.
- Content copyright. Wikipedia is licensed under the Creative Commons Attribution-ShareAlike License (CC BY-SA). This means that content may be reused with appropriate attribution and shared under a similar license.
- Responsible use. Any risk arising from the use of information from this website is entirely the responsibility of the user.
