# d682: Bayesian Machine Learning Full Graduate Course [videos + slides + homework]

Bayesian Machine Learning Full Graduate Course [videos + slides + homework] – more | reddit discussion

## Bayesian Machine Learning Full Graduate Course Lecture notes and videos

1. Introduction to Statistical Computing and Probability and Statistics
• Introduction to the course, books and references, objectives, organization; Fundamentals of probability and statistics, laws of probability, independency, covariance, correlation; The sum and product rules, marginal and conditional distributions; Random variables, moments, discrete and continuous distributions; The univariate Gaussian distribution.
[Video-Lecture] [Lecture Notes]

2. Introduction to Probability and Statistics (Continued)
• Binomial, Bernoulli, Multinomial, Multinoulli, Poisson, Student’s-T, Laplace, Gamma, Beta, Pareto, multivariate-Gaussian and Dirichlet distributions; Joint probability distributions; Transformation of random variables; Central limit theorem and basic Monte Carlo approximations; Probability inequalities; Information theory review, KL Divergence, Entropy, Mutual Information, Jensen’s inequality.
[Video-Lecture] [Lecture Notes]

3. Information Theory, Multivariate Gaussian, MLE Estimation, Robbins-Monro algorithm
• Information theory, KL divergence, entropy, mutual information, Jensen’s inequality (continued); Central limit theorem examples, Checking the Gaussian nature of a data set; Multivariate Gaussian, Mahalanobis distance, geometric interpretation; Maximum Likelihood Estimation (MLE) for the univariate and multivariate Gaussian, sequential MLE estimation; Robbins-Monro algorithm for sequential MLE estimation.
[Video-Lecture] [Lecture Notes]

4. Robbins-Monro for Sequential MLE, Curse of Dimensionality, Conditional and Marginal Gaussian Distributions
• Sequential MLE for the Gaussian, Robbins-Monro algorithm (continued); Back to the multivariate Gaussian, Mahalanobis distance, geometric interpretation, mean and moments, restricted forms; Curse of dimensionality, challenges in polynomial regression in high-dimensions, volume/area of a sphere and hypercube in high-dimensions, Gaussian distribution in high-dimensions; Conditional and marginal Gaussian distributions, completing the square, Woodbury matrix inversion Lemma, examples in interpolating noise-free data and data imputation, information form of the Gaussian.
[Video-Lecture] [Lecture Notes]

5. Likelihood calculations, MAP estimate and Regularized Least Squares, Linear Gaussian Models
• Information Form of the Gaussian (continued); Bayesian inference and likelihood function calculation, additive and multiplicative errors; MAP estimate and regularized least squares; Estimating the mean of a Gaussian with a Gaussian Prior; Applications to sensor fusion; Smoothness prior and interpolating noisy data.
[Video-Lecture] [Lecture Notes]

6. Introduction to Bayesian Statistics, Exponential Family of Distributions
• Parametric modeling, Sufficiency principle, Likelihood principle, Stopping rules, Conditionality principle, p-values and issues with frequentist statistics, MLE and the likelihood and conditionality principles; Inference in a Bayesian setting, posterior and predictive distributions, MAP estimate, Evidence, Sequential nature of Bayesian inference, Examples; Exponential family of distributions, Examples, Computing moments, Sufficiency and Neymann factorization, Sufficient statistics and MLE estimate.
[Video-Lecture] [Lecture Notes]

7. Exponential Family of Distributions and Generalized Linear Models, Bayesian Inference for the Multivariate Gaussian
• Exponential family of distributions, Computing moments, Neymann factorization, Sufficient statistics and MLE estimate (continued); Generalized Linear Models, Canonical Response, Batch and sequential IRLS algorithms; Bayesian inference for mean and variance/precision for the multivariate Gaussian, Wishart and inverse-Wishart distributions, MAP estimates and posterior marginals.
[Video-Lecture] [Lecture Notes]

8. Prior and Hierarchical Models
• Conjugate priors (continued) and limitations, mixture of conjugate priors; Non-informative priors, maximum entropy priors; Translation and scale invariant priors; Improper priors; Jeffrey’s priors; Hierarchical Bayesian Models, and empirical Bayes/type II maximum likelihood, Stein estimator.
[Video-Lecture] [Lecture Notes]

9. Introduction to Bayesian Linear Regression, Model Comparison and Selection
• Overfitting and MLE, Point estimates and least squares, posterior and predictive distributions, model evidence; Bayesian information criterion, Bayes factors, Occam’s Razor, Bayesian model comparison and selection.
[Video-Lecture] [Lecture Notes]

10. Bayesian Linear Regression
• Linear basis function models, sequential learning, multiple outputs, data centering, Bayesian inference when σ2 is unknown, Zellner’s g-prior, uninformative semi-conjugate priors, introduction to relevance determination for Bayesian regression.
[Video-Lecture] [Lecture Notes]

11. Bayesian Linear Regression (continued)
• The evidence approximation, Limitations of fixed basis functions, equivalent kernel approach to regression, Gibb’s sampling for variable selection, variable and model selection.
[Video-Lecture] [Lecture Notes]

12. Implementation of Bayesian Regression and Variable Selection
• The caterpillar regression problem; Conjugate priors, conditional and marginal posteriors, predictive distribution, influence of the conjugate prior; Zellner’s G prior, marginal posterior mean and variance, credible intervals; Jeffrey’s non-informative prior, Zellner’s non-informative G prior, point null hypothesis and calculation of Bayes factors for the selection of explanatory input variables; Variable selection, model comparison, variable selection prior, sampling search for the most probable model, Gibb’s sampling for variable selection; Implementation details.
[Video-Lecture] [Lecture Notes]

13. Introduction to Monte Carlo Methods, Sampling from Discrete and Continuum Distributions
• Review of the Central Limit Theorem, Law of Large Numbers; Calculation of π, Indicator functions and Monte Carlo error estimates; Monte Carlo estimators, properties, coefficient of variation, convergence, MC and the curse of dimensionality; MC Integration in high dimensions, optimal number of MC samples; Sample representation of the MC estimator; Bayes factors estimation with Monte Carlo; Sampling from discrete distributions; Reverse sampling from continuous distributions; Transformation methods, the Box-Muller algorithm, sampling from the multivariate Gaussian.
[Video-Lecture] [Lecture Notes]

14. Reverse Sampling, Transformation Methods, Composition Methods, Accept-Reject Methods, Stratified/Systematic Sampling
• Sampling from a discrete distribution; Reverse sampling for continuous distributions; Transformation methods, Box-Muller algorithm, sampling from the multivariate Gaussian; Simulation by composition, accept-reject sampling; Conditional Monte Carlo; Stratified sampling and systematic sampling.
[Video-Lecture] [Lecture Notes]

15. Importance Sampling
• Importance sampling methods, sampling from a Gaussian mixture; Optimal importance sampling distribution, normalized importance sampling; Asymptotic variance/Delta method, asymptotic bias; Applications to Bayesian inference; Importance sampling in high dimensions, importance sampling vs rejection sampling; Solving Ax=b with importance sampling, computing integrals with singularities, other examples.
[Video-Lecture] [Lecture Notes]

16. Gibbs Sampling
• Review of Importance sampling, Solving Ax=b with importance sampling, Sampling Importance Resampling (Continued); Gibbs Sampling, Systematic and Random scans, Block and Metropolized Gibbs, Application to Variable Selection in Bayesian Regression; MCMC, Metropolis-Hastings, Examples.
[Video-Lecture] [Lecture Notes]

17. Markov Chain Monte Carlo and Metropolis-Hasting Algorithm
• MCMC, Averaging along the chain, Ergodic Markov chains; Metropolis algorithm, Metropolis-Hastings, Examples; Random Walk Metropolis-Hastings, Independent Metropolis-Hastings; Metropolis-adjusted Langevin algorithm; Combinations of Transition Kernels, Simulated Annealing.
[Video-Lecture] [Lecture Notes]

18. Introduction to State Space Models and Sequential Importance Sampling
• The state space model; Examples, Tracking problem, Speech enhancement, volatility model; The state space model with observations, examples; Bayesian inference in state space models, forward filtering, forward-backward filtering; Online parameter estimation; Monte Carlo for the state space model, optimal importance distribution, sequential importance sampling.
[Video-Lecture] [Lecture Notes]

19. Sequential Importance Sampling with Resampling
• Sequential importance sampling (Continued); Optimal Importance distribution, locally optimal importance distribution, suboptimal importance distribution; Examples, Robot localization, Tracking, Stochastic volatility; Resampling, Effective sample size, multinomial resampling, sequential importance sampling with resampling, Various examples; Rao-Blackwellised particle filter, mixture of Kalman filters, switching LG-SSMs, Fast Slam; Error estimates, degeneracy, convergence.
[Video-Lecture] [Lecture Notes]

20. Sequential Importance Sampling with Resampling (Continued)
• General framework for Sequential Importance Sampling Resampling; Growing a polymer in two dimensions; Sequential Monte Carlo for Static Problems; Online parameter estimation; SMC for Smoothing.
[Video-Lecture] [Lecture Notes]

21. Sequential Monte Carlo (Continued) and Conditional Linear Gaussian Models
• Online parameter estimation; SMC for Smoothing; Kalman filter review for linear Gaussian models; Sequential Monte Carlo for conditional linear Gaussian models, Rao-Blackwellized particle filter, applications; Time Series models; Partially observed linear Gaussian models; Dynamic Tobit and Dynamic Probit models.
[Video-Lecture] [Lecture Notes]

22. Reversible Jump MCMC
• Trans-dimensional MCMC, Motivation with Autoregression and finite mixture of Gaussians models; Designing trans-dimensional moves, Birth/Death moves, Split/Merge moves, mixture of moves; Bayesian RJ-MCMC models for autoregressions, and Gaussian mixtures.
[Video-Lecture] [Lecture Notes]

23. Introduction to Expectation-Maximization (EM)
• Latent variable models; K-Means, image compression; Mixture of Gaussians, posterior responsibilities and latent variable view; Mixture of Bernoulli distributions; Generalization of EM, Variational inference perspective.
[Video-Lecture] [Lecture Notes]

24. Expectation-Maximization (continued)
• Mixture of Gaussians; Mixture of Bernoulli distributions; EM for Bayesian Linear Regression; MAP estimation and EM; Incremental EM; Fitting with missing data using EM; Variational inference perspective.
[Video-Lecture] [Lecture Notes]

25. Principal Component Analysis
• Continuous latent variable models, low-dimensional manifold of a data set, generative point of view, unidentifiability; Principal component analysis (PCA), Maximum variance formulation, minimum error formulation, PCA versus SVD; Canonical correlation analysis; Applications, Off-line Digit images, Whitening of the data with PCA, PCA for visualization; PCA for high-dimensional data; Probabilistic PCA, Maximum likelihood solution, EM algorithm, model selection.
[Video-Lecture] [Lecture Notes]

26. Continuous Latent Variable Models

27. Kernel Methods and Introduction to Gaussian Processes
• Dual representation for regression, kernel functions; Kernel design, combining kernels, Gaussian kernels, probabilistic kernels, Fisher kernel; Radial basis functions, Nadaraya-Watson model; Gaussian processes, GPs for regression vs. a basis function approach, learning parameters, automatic relevance determination; Gaussian Process classification, Laplace approximation, connection to Bayesian neural nets.
[Video-Lecture] [Lecture Notes]

28. Gaussian Processes for Classification Problems, Course Summary
• Gaussian Process classification, connection of GPs to Bayesian neural nets; Summary of the Course – Probability inequalities, Law of large numbers, MLE estimates and Bias, Bayes’ theorem and posterior exploration, predictive distribution, marginal likelihood, exponential family and conjugate priors, Empirical Bayes and evidence approximation, Sampling methods, Rejection methods, importance sampling, MCMC, Gibbs sampling, Sequential importance sampling and particle methods, reversible jump MCMC, Latent variables and expectation maximization, Model reduction, probabilistic PCA and generative models.
[Video-Lecture] [Lecture Notes]

## Homework

• Sept. 13, Homework 1
• Working with multivariate Gaussians, exponential family distributions, posterior for (μ,σ2) for a Gaussian likelihood with conjugate prior, Bayesian information criterion (BIC), whitening vs standardizing the data.
[Homework] [Solution] [Software]

• Sept. 20, Homework 2
• Computing HPD intervals for posterior distributions, Monte Carlo approximations for Bayes’ factors, regularized MAP estimation in multivariate Gaussians, Bayesian inference in sensor fusion, Jeffrey’s priors, hierarchical prior models.
[Homework] [Solution] [Software]

• Oct. 4, Homework 3
• Bayesian linear regression, Variable and model selection, Gibbs sampling for variable selection, Informative Zellner’s G Prior, Jeffreys’ non-informative Prior, Zellner’s non-informative G Prior.
[Homework] [Solution] [Software]

• Oct. 23, Homework 4
• Monte Carlo Methods, Accept-Reject Sampling, Sampling from the Gamma distribution with Cauchy distribution as proposal, Metropolis-Hastings and Gibbs sampling, Hamiltonian MC methods, applications to Bayesian Regression.
[Homework] [Solution] [Software]

• Nov. 6, Homework 5
• Sequential Monte Carlo Methods, SMC for the Volatility Model, SMC for modeling a polymer chain (self-avoiding paths), SMC for solving integral equations, particle filters for linear Gaussian state-space models and comparison with Kalman filtering.
[Homework] [Solution] [Software]

• Nov. 27, Homework 6
• Principal Component Analysis, Bayesian PCA, EM Algorithm for PCA, Expectation Maximization, Gaussian Process Modeling.
[Homework] [Solution] [Software]