Color test I just want to be sure that all my favourite colors are being displayed correctly on this new device. If not I’ll modify them. R´ emi Bardenet Bayesian numerical methods 1 / 51 How to unefficiently solve any problem A tutorial on Bayesian numerical methods R´emi Bardenet LAL, LRI, Univ. Paris-Sud XI May 30th, 2012 R´ emi Bardenet Bayesian numerical methods 2 / 51 Summary 1 A very generic problem 2 First sampling methods 3 MCMC algorithms 4 A taste of a monster MCMC sampler for Auger R´ emi Bardenet Bayesian numerical methods 3 / 51 Summary 1 A very generic problem 2 First sampling methods 3 MCMC algorithms 4 A taste of a monster MCMC sampler for Auger R´ emi Bardenet Bayesian numerical methods A very generic problem 4 / 51 Today’s talk: sampling for Bayesian Inference When I a model p(data|θ) of an experiment has been written, I a prior p(θ) has been set on the parameters, then by Bayes’ theorem: π(θ) := p(θ|data) = R R´ emi Bardenet Bayesian numerical methods p(data|θ)p(θ) . Θ p(data|θ)p(θ)dθ A very generic problem 5 / 51 Today’s talk: sampling for Bayesian Inference When I a model p(data|θ) of an experiment has been written, I a prior p(θ) has been set on the parameters, then by Bayes’ theorem: π(θ) := p(θ|data) = R R´ emi Bardenet Bayesian numerical methods p(data|θ)p(θ) . Θ p(data|θ)p(θ)dθ A very generic problem 5 / 51 Today’s talk: sampling for Bayesian Inference When I a model p(data|θ) of an experiment has been written, I a prior p(θ) has been set on the parameters, then by Bayes’ theorem: π(θ) := p(θ|data) = R R´ emi Bardenet Bayesian numerical methods p(data|θ)p(θ) . Θ p(data|θ)p(θ)dθ A very generic problem 5 / 51 The generative process of an air shower R´ emi Bardenet Bayesian numerical methods A very generic problem 6 / 51 0.2 Looking at Auger tank signals 0.2 0.0 0 50 100 150 tbin [25 ns] 200 250 300 0.1 0.1 0.2 0.3 !V 0.2 0.3 !V 0.2 0.3 !V Event 1524469 - E = 10.6 [EeV] / ! = 39.6° - Station 612 at 1684 m 2.2 2.0 2 1.8 1.4 1 dN d! V VEM-peak 1.6 1.2 1.0 0.8 0.6 0.2 0.4 0.2 0.0 0 50 100 150 tbin [25 ns] 200 250 300 0.1 0.1 Event 1524469 - E = 10.6 [EeV] / ! = 39.6° - Station 622 at 2031 m 1.8 1.6 2 1.2 1 dN d! V VEM-peak 1.4 1.0 0.8 0.6 0.4 0.2 0.2 0.0 R´ emi Bardenet 0 50 100 150 tbin [25 ns] 200 250 300 0.1 0.1 F IG . 5.17 - Traces FADC sautsproblem pour les cinq stations Bayesian numerical methodset distributionsA des very generic 7 / 51 Modelling tank signals: the single muon case Figure: Muonic time response model pτ,td 0.015 p 0.010 0.005 0.000 -20 0 20 40 t @nsD 60 80 100 Mean number of Photo-electrons per bin Z ti n¯i (Aµ , tµ ) = Aµ pτ,td (t − tµ )dt. ti−1 R´ emi Bardenet Bayesian numerical methods A very generic problem 8 / 51 50 45 40 35 ðPE 30 25 20 15 10 5 0 0 100 200 300 400 500 600 t @nsD ni Poisson with mean n¯i (Aµ , tµ ) = Nµ X n¯i (Aµ j , tµ j ), j=1 R´ emi Bardenet Bayesian numerical methods A very generic problem 9 / 51 So what is what in the cas Nµ = 1 ? I Parameters to infer are θ = (tµ , Aµ ). I The likelihood is fixed by the model: p(data|θ) = N bins Y i=1 I Choose e.g. a uniform prior: p(θ) = I n¯i (θ)ni −¯ni (θ) e . ni ! 1 1 χ (tµ ) χ[0,D] (Aµ ), C D [0,C ] Then the posterior reads p(θ|data) ∝ χ[0,C ] (tµ ) χ[0,D] (Aµ ) N bins Y n¯i (θ)ni e −¯ni (θ) . i=1 R´ emi Bardenet Bayesian numerical methods A very generic problem 10 / 51 Bayesian inference requires computing integrals I MEP estimate (aka Bayes’): compute Z ˆ θMEP := θp(θ|data)dθ, Θ I credible intervals: find I such that Z p(θ|data)dθ ≥ 1 − α. I I Bayes’ factors: require evidence computations Z p(data|M) = p(data|θ, M)p(θ|M)dθ. The MAP estimate θMAP := arg max p(θ|data) with “Hessian” credible interval does not require integrals. R´ emi Bardenet Bayesian numerical methods A very generic problem 11 / 51 Averaging whenever necessary Most Bayesian inference tasks require integrals wrt. the posterior Z h(θ)p(θ|data)dθ. The Monte Carlo principle Z N 1 X I := h(x)π(x)dx ≈ h(xi ) =: ˆIN , N i=1 where X1:N ∼ π I I i.i.d. ˆIN is unbiased: EˆIN = I . √ Error bars shrink at speed 1/ N: Var(X ) . Var(ˆIN ) = N R´ emi Bardenet Bayesian numerical methods A very generic problem 12 / 51 Averaging whenever necessary Most Bayesian inference tasks require integrals wrt. the posterior Z h(θ)p(θ|data)dθ. The Monte Carlo principle Z N 1 X I := h(x)π(x)dx ≈ h(xi ) =: ˆIN , N i=1 where X1:N ∼ π I I i.i.d. ˆIN is unbiased: EˆIN = I . √ Error bars shrink at speed 1/ N: Var(X ) . Var(ˆIN ) = N R´ emi Bardenet Bayesian numerical methods A very generic problem 12 / 51 Averaging whenever necessary Most Bayesian inference tasks require integrals wrt. the posterior Z h(θ)p(θ|data)dθ. The Monte Carlo principle Z N 1 X I := h(x)π(x)dx ≈ h(xi ) =: ˆIN , N i=1 where X1:N ∼ π I I i.i.d. ˆIN is unbiased: EˆIN = I . √ Error bars shrink at speed 1/ N: Var(X ) . Var(ˆIN ) = N R´ emi Bardenet Bayesian numerical methods A very generic problem 12 / 51 Pros & Cons Pros I Numerical integration (grid methods) is too costly and imprecise when d ≥ 6. I MC should concentrate the effort on places where π is big. I Many, many applications ! Cons One must be able to sample from π ! → Need for generic clever sampling methods ! R´ emi Bardenet Bayesian numerical methods A very generic problem 13 / 51 Pros & Cons Pros I Numerical integration (grid methods) is too costly and imprecise when d ≥ 6. I MC should concentrate the effort on places where π is big. I Many, many applications ! Cons One must be able to sample from π ! → Need for generic clever sampling methods ! R´ emi Bardenet Bayesian numerical methods A very generic problem 13 / 51 Pros & Cons Pros I Numerical integration (grid methods) is too costly and imprecise when d ≥ 6. I MC should concentrate the effort on places where π is big. I Many, many applications ! Cons One must be able to sample from π ! → Need for generic clever sampling methods ! R´ emi Bardenet Bayesian numerical methods A very generic problem 13 / 51 Pros & Cons Pros I Numerical integration (grid methods) is too costly and imprecise when d ≥ 6. I MC should concentrate the effort on places where π is big. I Many, many applications ! Cons One must be able to sample from π ! → Need for generic clever sampling methods ! R´ emi Bardenet Bayesian numerical methods A very generic problem 13 / 51 Non-Bayesian applications of sampling algorithms: Auger “A complete Monte Carlo hybrid simulation has been performed to study the trigger efficiency and the detector performance. The simulation sample consists of about 6000 proton and 3000 iron CORSIKA [19] showers” from “The exposure of the hybrid detector of the P. Auger Observatory”, the Auger Collab., Astroparticle Phys., 2010. R´ emi Bardenet Bayesian numerical methods A very generic problem 14 / 51 Non-Bayesian applications of sampling algorithms: EPOS “The difficulty with Monte Carlo generation of interaction configurations arises from the fact that the configuration space is huge and rather nontrivial [...] the only way to proceed amounts to employing dynamical MC methods” from “Parton-based Gribov-Regge theory”, Drescher et al., Phys.Rept., 2008. R´ emi Bardenet Bayesian numerical methods A very generic problem 15 / 51 Summary 1 A very generic problem 2 First sampling methods 3 MCMC algorithms 4 A taste of a monster MCMC sampler for Auger R´ emi Bardenet Bayesian numerical methods First sampling methods 16 / 51 The inverse function method Principle If U ∼ U(0,1) and F is a cdf, then F −1 (U) ∼ F . I It assumes we know how to sample from U(0, 1) ! I It needs a known and convenient cdf. R´ emi Bardenet Bayesian numerical methods First sampling methods 17 / 51 The inverse function method Principle If U ∼ U(0,1) and F is a cdf, then F −1 (U) ∼ F . I It assumes we know how to sample from U(0, 1) ! I It needs a known and convenient cdf. R´ emi Bardenet Bayesian numerical methods First sampling methods 17 / 51 Rejection Sampling 1/2 Assumptions I Target π is known up to a multiplicative constant, I an easy-to-sample distribution q s.t. π ≤ Mq is known. RejectionSampling π, q, M, N 1 for i ← 1 to N, 2 Sample θ(i) ∼ q and u ∼ U(0,1) . 3 Form the acceptance ratio ρ = 4 if u < ρ, then accept θ(i) . 5 else reject. R´ emi Bardenet Bayesian numerical methods π(θ(i) ) . Mq(θ(i) ) First sampling methods 18 / 51 Rejection Sampling 2/2 Remarks & Drawbacks I One needs to know good q and M, I Lots of wasted samples. R´ emi Bardenet Bayesian numerical methods First sampling methods 19 / 51 Importance Sampling 1/2 Assumptions & principle I Target π is fully known, I an easy-to-sample distribution q is known s.t. Supp(π) ⊂ Supp(q). Then Z N 1 X π(θ(i) ) h(θ(i) ) IˆN := → h(θ)π(θ)dθ, N q(θ(i) ) Θ θ(i) ∼ q i.i.d. i=1 ImportanceSampling π, q, N 1 Sample independent θ(i) ∼ q, i = 1..N, 2 Form the weights wi = 3 π is approximated by R´ emi Bardenet Bayesian numerical methods 1 N π(θ(i) ) . q(θ(i) ) PN i=1 wi δθ(i) . First sampling methods 20 / 51 Importance Sampling 2/2 I If |h| ≤ M, then M2 Var(IˆN ) ≤ N I Z (π − q)2 dθ + 1 , q thus q has to be close to π and have heavier tails than π. Further: the q achieving the smallest variance is q=R |h|π . |h|πdθ I Better and achievable: For smaller variance, or if π is known up to a constant, use PN h(θ(i) )π(θ(i) )/q(θ(i) ) ˜ IN := i=1 (biased). PN (i) (i) i=1 π(θ )/q(θ ) I One needs to know a good, heavy-tailed q. I Adaptive strategies for tuning q are possible [WKB+ 09]. R´ emi Bardenet Bayesian numerical methods First sampling methods 21 / 51 Importance Sampling 2/2 I If |h| ≤ M, then M2 Var(IˆN ) ≤ N I Z (π − q)2 dθ + 1 , q thus q has to be close to π and have heavier tails than π. Further: the q achieving the smallest variance is q=R |h|π . |h|πdθ I Better and achievable: For smaller variance, or if π is known up to a constant, use PN h(θ(i) )π(θ(i) )/q(θ(i) ) ˜ IN := i=1 (biased). PN (i) (i) i=1 π(θ )/q(θ ) I One needs to know a good, heavy-tailed q. I Adaptive strategies for tuning q are possible [WKB+ 09]. R´ emi Bardenet Bayesian numerical methods First sampling methods 21 / 51 Importance Sampling 2/2 I If |h| ≤ M, then M2 Var(IˆN ) ≤ N I Z (π − q)2 dθ + 1 , q thus q has to be close to π and have heavier tails than π. Further: the q achieving the smallest variance is q=R |h|π . |h|πdθ I Better and achievable: For smaller variance, or if π is known up to a constant, use PN h(θ(i) )π(θ(i) )/q(θ(i) ) ˜ IN := i=1 (biased). PN (i) (i) i=1 π(θ )/q(θ ) I One needs to know a good, heavy-tailed q. I Adaptive strategies for tuning q are possible [WKB+ 09]. R´ emi Bardenet Bayesian numerical methods First sampling methods 21 / 51 Importance Sampling 2/2 I If |h| ≤ M, then M2 Var(IˆN ) ≤ N I Z (π − q)2 dθ + 1 , q thus q has to be close to π and have heavier tails than π. Further: the q achieving the smallest variance is q=R |h|π . |h|πdθ I Better and achievable: For smaller variance, or if π is known up to a constant, use PN h(θ(i) )π(θ(i) )/q(θ(i) ) ˜ IN := i=1 (biased). PN (i) (i) i=1 π(θ )/q(θ ) I One needs to know a good, heavy-tailed q. I Adaptive strategies for tuning q are possible [WKB+ 09]. R´ emi Bardenet Bayesian numerical methods First sampling methods 21 / 51 Going to high dimensions Benchmark Let π(θ) = N (0, Id ) and q(θ) = N (0, σId ). I Rejection sampling I I I needs σ ≥ 1. Fraction of accepted proposals goes as σ −d . Importance sampling I I √ yields infinite variance when σ ≤ 1/ 2, variance of the weights goes as d/2 σ2 . 2 2 − 1/σ R´ emi Bardenet Bayesian numerical methods First sampling methods 22 / 51 Going to high dimensions Benchmark Let π(θ) = N (0, Id ) and q(θ) = N (0, σId ). I Rejection sampling I I I needs σ ≥ 1. Fraction of accepted proposals goes as σ −d . Importance sampling I I √ yields infinite variance when σ ≤ 1/ 2, variance of the weights goes as d/2 σ2 . 2 2 − 1/σ R´ emi Bardenet Bayesian numerical methods First sampling methods 22 / 51 Going to high dimensions Benchmark Let π(θ) = N (0, Id ) and q(θ) = N (0, σId ). I Rejection sampling I I I needs σ ≥ 1. Fraction of accepted proposals goes as σ −d . Importance sampling I I √ yields infinite variance when σ ≤ 1/ 2, variance of the weights goes as d/2 σ2 . 2 2 − 1/σ R´ emi Bardenet Bayesian numerical methods First sampling methods 22 / 51 So far, so good ? We have seen that I Sums & integrals are ubiquitous in Statistics. I Numerical integration is limited to small dimensions. I “Basic” sampling is limited to full-information easy cases. I RS and IS, well-tuned, are efficient and versatile, Now what do we do when I it is not possible to sample from π directly, but only evaluate it pointwise, possibly up to a multiplicative constant: π(θ) = R p(x|θ)p(θ) . Θ p(x|θ)p(θ)dθ I We don’t know a good approximation q of π. I Θ is high-dimensional. Well, MCMC is bringing both answers and new problems ! R´ emi Bardenet Bayesian numerical methods First sampling methods 23 / 51 So far, so good ? We have seen that I Sums & integrals are ubiquitous in Statistics. I Numerical integration is limited to small dimensions. I “Basic” sampling is limited to full-information easy cases. I RS and IS, well-tuned, are efficient and versatile, Now what do we do when I it is not possible to sample from π directly, but only evaluate it pointwise, possibly up to a multiplicative constant: π(θ) = R p(x|θ)p(θ) . Θ p(x|θ)p(θ)dθ I We don’t know a good approximation q of π. I Θ is high-dimensional. Well, MCMC is bringing both answers and new problems ! R´ emi Bardenet Bayesian numerical methods First sampling methods 23 / 51 So far, so good ? We have seen that I Sums & integrals are ubiquitous in Statistics. I Numerical integration is limited to small dimensions. I “Basic” sampling is limited to full-information easy cases. I RS and IS, well-tuned, are efficient and versatile, Now what do we do when I it is not possible to sample from π directly, but only evaluate it pointwise, possibly up to a multiplicative constant: π(θ) = R p(x|θ)p(θ) . Θ p(x|θ)p(θ)dθ I We don’t know a good approximation q of π. I Θ is high-dimensional. Well, MCMC is bringing both answers and new problems ! R´ emi Bardenet Bayesian numerical methods First sampling methods 23 / 51 Summary 1 A very generic problem 2 First sampling methods 3 MCMC algorithms 4 A taste of a monster MCMC sampler for Auger R´ emi Bardenet Bayesian numerical methods MCMC algorithms 24 / 51 Metropolis’ algorithm 1/2 Goal is to explore Θ, spending more time in places where π is high. Figure: Taken from [Bis06] R´ emi Bardenet Bayesian numerical methods MCMC algorithms 25 / 51 Metropolis’ algorithm 2/2 I I Metropolis’ algorithm builds a random walk with symmetric steps q, that mimics independent draws from π. Symmetricity means q(x|y ) = q(y |x), e.g. q(y |x) = N (x, σ). MetropolisSampler π, q, T , θ(0) 1 S ← ∅. 2 for t ← 1 to T , 3 Sample θ∗ ∼ q(.|θ(t−1) ) and u ∼ U(0,1) . 4 Form the acceptance ratio ρ=1∧ π(θ∗ ) . π(θ(t−1) ) 5 if u < ρ, then θ(t) ← θ∗ else θ(t) ← θ(t−1) . 6 S ← S ∪ {θ(t) }. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 26 / 51 The Metropolis-Hastings algorithm I When no symmetricity is assumed, change acceptance to π(y ) q(x|y ) . ρ(x, y ) = 1 ∧ q(y |x) π(x) I Remark the exploration/exploitation trade-off. MetropolisHastingsSampler π, q, T , θ(0) 1 S ← ∅. 2 for t ← 1 to T , 3 Sample θ∗ ∼ q(.|θ(t−1) ) and u ∼ U(0,1) . 4 Form the acceptance ratio ρ=1∧ π(θ∗ ) q(θ(t−1) |θ∗ ) . q(θ∗ |θ(t−1) ) π(θ(t−1) ) 5 if u < ρ, then θ(t) ← θ∗ else θ(t) ← θ(t−1) . 6 S ← S :: {θ(t) }. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 27 / 51 All the maths in two slides 1/2 I A (stationary) Markov chain (θ(t) ) is defined through a kernel: P(θ(t+1) ∈ dy |θ(t) = x) = K (x, dy ). I If θ(0) = x, then P(θ (t) ∈ dy |θ (0) Z Z = x) = K (x, dx1 ) . . . K (dxt−1 , dy ) =: K t (x, dy ). R´ emi Bardenet Bayesian numerical methods MCMC algorithms 28 / 51 All the maths in two slides 2/2 I Under technical conditions, the MH kernel K satisfies kπ − K t (x, .)k → 0, ∀x. I This justifies a burn-in period after which samples are discarded. I Further results like a Law of Large Numbers guarantee that T 1 X h(θ(t) ) → T +1 Z h(θ)π(θ)d(θ) t=0 for h bounded. I Central Limit Theorem-type results also exist, see Section 6.7 of [RC04]. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 29 / 51 Autocorrelation in equilibrium I For a scalar chain, define the integrated autocorrelation time X τint = 1 + 2 Corr (θ(t) , θ(t+k) ). k>0 I One can show that Var ! T 1 X τint h(θ(t) ) = Var h(θ(0) ) . T T t=1 Rule of thumb for the proposal Optimizing similar criteria leads to choosing q(.|x) = N (x, σ 2 ) s.t. I acceptance rate is ≈ 0.5 for d = 1, 2. I acceptance rate is ≈ 0.25 for d ≥ 3. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 30 / 51 Autocorrelation in equilibrium I For a scalar chain, define the integrated autocorrelation time X τint = 1 + 2 Corr (θ(t) , θ(t+k) ). k>0 I One can show that Var ! T 1 X τint h(θ(t) ) = Var h(θ(0) ) . T T t=1 Rule of thumb for the proposal Optimizing similar criteria leads to choosing q(.|x) = N (x, σ 2 ) s.t. I acceptance rate is ≈ 0.5 for d = 1, 2. I acceptance rate is ≈ 0.25 for d ≥ 3. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 30 / 51 Autocorrelation in equilibrium I For a scalar chain, define the integrated autocorrelation time X τint = 1 + 2 Corr (θ(t) , θ(t+k) ). k>0 I One can show that Var ! T 1 X τint h(θ(t) ) = Var h(θ(0) ) . T T t=1 Rule of thumb for the proposal Optimizing similar criteria leads to choosing q(.|x) = N (x, σ 2 ) s.t. I acceptance rate is ≈ 0.5 for d = 1, 2. I acceptance rate is ≈ 0.25 for d ≥ 3. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 30 / 51 A practical wrap-up We have justified the acceptance ratio, the burn-in period, and the optimization of the proposal. MetropolisHastingsSampler π, q, T , θ(0) , Σ0 1 S ← ∅. 2 for t ← 1 to T , 3 Sample θ∗ ∼ N (.|θ(t−1) , σΣ0 ) and u ∼ U(0,1) . 4 Form the acceptance ratio ρ=1∧ π(θ∗ ) q(θ(t−1) |θ∗ ) . ∗ (t−1) q(θ |θ ) π(θ(t−1) ) 5 if u < ρ, then θ(t) ← θ∗ else θ(t) ← θ(t−1) . 6 if t ≤ Tb , 7 8 R´ emi Bardenet σ←σ+ 1 (acc. t 0.6 rate − 0.25/0.50). else if t > Tb , then S ← S ∪ θ(t) . Bayesian numerical methods MCMC algorithms 31 / 51 No maths on this slide Feel free to experiment with Laird Breyer’s applet on http://www.lbreyer.com/classic.html. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 32 / 51 A case study from particle physics I Consider again our muonic signal reconstruction task, with p(data|θ) = N bins Y i=1 I n¯i (θ)ni −¯ni (θ) e . ni ! The model (the physics) suggest using specific independent priors for Aµ and tµ . 0.005 p 0.004 0.003 0.002 0.001 0.000 0 100 200 300 400 @ðPED 500 600 Figure: Prior on Aµ for a given zenith angle R´ emi Bardenet Bayesian numerical methods MCMC algorithms 33 / 51 Case study: simulation results log(A) chain 4 log(A) chain 4 marginal chain true value marginal chain true value 3.5 3.5 3 3 2.5 2.5 2 2 1.5 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1.5 0 1000 2000 3000 4000 I One can often detect bad mixing by eye. I Acceptance for the left panel chain is only 3%. R´ emi Bardenet Bayesian numerical methods 5000 MCMC algorithms 6000 7000 8000 9000 10000 34 / 51 Case study: simulation results Marginal of logA 1500 Marginal of logA 2000 marginal histogram true value marginal mean marginal histogram true value marginal mean 1800 1600 1400 1000 1200 1000 800 500 600 400 200 0 1.5 2 2.5 3 3.5 4 0 1.5 2 2.5 I Marginals are simply component histograms ! I Even the marginals look ugly when mixing is bad. R´ emi Bardenet Bayesian numerical methods MCMC algorithms 3 3.5 4 34 / 51 Case study: simulation results Marginal of logA 1500 Marginal of logt 2500 marginal histogram true value marginal mean marginal histogram true value marginal mean 2000 1000 1500 1000 500 500 0 1.5 2 2.5 3 3.5 4 0 2.5 3 3.5 4 4.5 I From now on, we only consider the good mixing case. I Try to reproduce your marginals with different starting values! I Producing the chain was the hard part. Now everything is easy: estimation, credible intervals, ... R´ emi Bardenet Bayesian numerical methods MCMC algorithms 5 34 / 51 Summary 1 A very generic problem 2 First sampling methods 3 MCMC algorithms 4 A taste of a monster MCMC sampler for Auger A generative model for the tank signals [BK12] Reconstruction/Inference R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 35 / 51 Let’s go for Nµ > 1 I Recall Z ti pτ,td (t − tµ )dt. n¯i (Aµ , tµ ) = Aµ ti−1 50 45 40 35 ðPE 30 25 20 15 10 5 0 0 100 200 300 400 500 600 t @nsD I Now ni Poisson with mean n¯i (Aµ , tµ ) = Nµ X n¯i (Aµ j , tµ j ), j=1 R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 36 / 51 MCMC 101: Metropolis I Bayesian inference: obtain π(Aµ , tµ ) = p(Aµ , tµ |signal) ∝ p(signal|Aµ , tµ )p(Aµ , tµ ). I MCMC methods sample from π. R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 37 / 51 Metropolis (B) and adaptive Metropolis (B+G) algorithms ` ´ M π(x), X0 , Σ0 , T , 1 S ← ∅, Σ ← Σ0 2 for t ← 1 to T 3 ` ´ e ∼ N · |Xt−1 , Σ X 4 5 if e) π(X π(Xt−1 ) > U [0, 1] then Xt ← X 6 7 . proposal . accept else Xt ← Xt−1 8 S ← S ∪ {Xt } 9 . reject . update posterior sample 10 11 12 13 R´ emi Bardenet return S Bayesian numerical methods A taste of a monster MCMC sampler for Auger 38 / 51 Metropolis (B) and adaptive Metropolis (B+G) algorithms ` ´ AM π(x), X0 , Σ0 , T , µ0 , c 1 S ← ∅, 2 for t ← 1 to T 4 Σ ← cΣt−1 . scaled adaptive covariance ` ´ e X ∼ N · |Xt−1 , Σ . proposal 5 if 3 e) π(X π(Xt−1 ) > U [0, 1] then Xt ← X 6 7 Xt ← Xt−1 8 10 µt ← µt−1 + 11 Σt ← Σt−1 + 1` t . update posterior sample Xt − µt−1 1 `` t ´ Xt − µt−1 . update running mean and covariance ´` Xt − µt−1 ´| − Σt−1 ´ 1 (acc. rate − 0.25/0.50). c ← c + 0.6 t 12 R´ emi Bardenet . reject S ← S ∪ {Xt } 9 13 . accept else return S Bayesian numerical methods A taste of a monster MCMC sampler for Auger 38 / 51 Three problems, many “solutions” I Possibly high dimensions but also highly correlated model. I I I I I Use adaptive proposals. The number of muons Nµ is unknown. Use a nonparametric prior [Nea00] or use a Reversible Jump sampler [Gre95]. Likelihood P (n|Aµ , tµ ) is permutation invariant. I Indeed, if Nµ = 2, p(n| A1 , A2 , t1 , t2 ) = p(n| A2 , A1 , t2 , t1 ). I R´ emi Bardenet Marginals are useless, a problem known as label-switching [Ste00]. Bayesian numerical methods A taste of a monster MCMC sampler for Auger 39 / 51 Three problems, many “solutions” I Possibly high dimensions but also highly correlated model. I I I I I Use adaptive proposals. The number of muons Nµ is unknown. Use a nonparametric prior [Nea00] or use a Reversible Jump sampler [Gre95]. Likelihood P (n|Aµ , tµ ) is permutation invariant. I Indeed, if Nµ = 2, p(n| A1 , A2 , t1 , t2 ) = p(n| A2 , A1 , t2 , t1 ). I R´ emi Bardenet Marginals are useless, a problem known as label-switching [Ste00]. Bayesian numerical methods A taste of a monster MCMC sampler for Auger 39 / 51 Three problems, many “solutions” I Possibly high dimensions but also highly correlated model. I I I I I Use adaptive proposals. The number of muons Nµ is unknown. Use a nonparametric prior [Nea00] or use a Reversible Jump sampler [Gre95]. Likelihood P (n|Aµ , tµ ) is permutation invariant. I Indeed, if Nµ = 2, p(n| A1 , A2 , t1 , t2 ) = p(n| A2 , A1 , t2 , t1 ). I R´ emi Bardenet Marginals are useless, a problem known as label-switching [Ste00]. Bayesian numerical methods A taste of a monster MCMC sampler for Auger 39 / 51 Label-Switching I If the prior is also permutation invariant, then so is π(Aµ , tµ ). 300 250 200 150 100 50 0 0 R´ emi Bardenet 5000 10 000 15 000 Bayesian numerical methods 20 000 25 000 30 000 A taste of a monster MCMC sampler for Auger 40 / 51 Label-Switching I If the prior is also permutation invariant, then so is π(Aµ , tµ ). 50 45 40 35 ðPE 30 25 20 15 10 5 0 100 200 300 400 500 600 t @nsD R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 40 / 51 Label-Switching I If the prior is also permutation invariant, then so is π(Aµ , tµ ). 300 250 200 150 100 50 0 0 R´ emi Bardenet 5000 10 000 15 000 Bayesian numerical methods 20 000 25 000 30 000 A taste of a monster MCMC sampler for Auger 40 / 51 ` ´ AMOR π(x), X0 , T , µ0 , Σ0 , c 1 S ←∅ 2 for t ← 1 to T 5 Σ ← cΣt−1 . scaled adaptive covariance ` ´ e ∼ N · |Xt−1 , Σ X . proposal ` ´ e ∼ arg min L e P P X . pick an optimal permutation (µ ,Σ ) 6 e ←P eX e X 3 4 t−1 P∈P t−1 . permute ` ´ N PXt−1 |X , Σ P∈P if ` ´ > U [0, 1] then P π(Xt−1 ) P∈P N PX |Xt−1 , Σ e) π(X 7 P Xt ← X 8 9 . accept else Xt ← Xt−1 10 . reject 11 S ← S ∪ {Xt } 12 µt ← µt−1 + 13 Σt ← Σt−1 + 14 1 (acc. rate − 0.25/0.50). c ← c + 0.6 t 15 R´ emi Bardenet 1` t . update posterior sample Xt − µt−1 1 `` t ´ Xt − µt−1 . update running mean and covariance ´` Xt − µt−1 ´| − Σt−1 ´ return S Bayesian numerical methods A taste of a monster MCMC sampler for Auger 41 / 51 AMOR results on the previous example 50 45 40 35 ðPE 30 25 20 15 10 5 0 100 200 300 400 500 600 t @nsD R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 42 / 51 The integration methods ladder (adapted from Iain Murray’s) Growing item number means higher complexity and either higher efficiency or wider applicability. Check [RC04] when no further indication is given. 1 Quadrature, R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 43 / 51 The integration methods ladder (adapted from Iain Murray’s) Growing item number means higher complexity and either higher efficiency or wider applicability. Check [RC04] when no further indication is given. 1 Quadrature, 2 Quasi-MC, R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 43 / 51 The integration methods ladder (adapted from Iain Murray’s) Growing item number means higher complexity and either higher efficiency or wider applicability. Check [RC04] when no further indication is given. 1 Quadrature, 2 Quasi-MC, 3 Simple MC, Importance Sampling, R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 43 / 51 The integration methods ladder (adapted from Iain Murray’s) Growing item number means higher complexity and either higher efficiency or wider applicability. Check [RC04] when no further indication is given. 1 Quadrature, 2 Quasi-MC, 3 Simple MC, Importance Sampling, 4 MCMC (MH, Slice sampling, etc.), R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 43 / 51 The integration methods ladder (adapted from Iain Murray’s) Growing item number means higher complexity and either higher efficiency or wider applicability. Check [RC04] when no further indication is given. 1 Quadrature, 2 Quasi-MC, 3 Simple MC, Importance Sampling, 4 MCMC (MH, Slice sampling, etc.), 5 Adaptive MCMC [AFMP11], Hybrid MC [Nea10], Tempering methods [GRS96], SMC [DdFG01], particle MCMC [ADH10]. R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 43 / 51 The integration methods ladder (adapted from Iain Murray’s) Growing item number means higher complexity and either higher efficiency or wider applicability. Check [RC04] when no further indication is given. 1 Quadrature, 2 Quasi-MC, 3 Simple MC, Importance Sampling, 4 MCMC (MH, Slice sampling, etc.), 5 Adaptive MCMC [AFMP11], Hybrid MC [Nea10], Tempering methods [GRS96], SMC [DdFG01], particle MCMC [ADH10]. 6 ABC [FP12]. R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 43 / 51 Conclusions Take-home message I MC provides generic integration methods, I Potential applications in Physics are numerous: I I I I Producing a good mixing MCMC chain can be difficult Higher efficiency can result from: I I I in forward sampling (aka “simulation”), in Bayesian inference tasks. Learning dependencies. Exploiting existing/added structure of the problem. Broad range of methods allows to find the level of sophistication required by the difficulty of your problem. R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 44 / 51 Conclusions Take-home message I MC provides generic integration methods, I Potential applications in Physics are numerous: I I I I Producing a good mixing MCMC chain can be difficult Higher efficiency can result from: I I I in forward sampling (aka “simulation”), in Bayesian inference tasks. Learning dependencies. Exploiting existing/added structure of the problem. Broad range of methods allows to find the level of sophistication required by the difficulty of your problem. R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 44 / 51 “Introductory” references I Tutorials: Lots on videoconference.net. I I I. Murray’s video lessons: videolectures.net/mlss09uk murray mcmc/ A. Sokal’s tutorial, MCMC from a physicist’s point of view + applications to Statistical Physics [Sok96] I The Monte Carlo bible: C. Robert & G. Casella’s “Monte Carlo Methods” [RC04], and references within. I For more precise informations, please bug me. R´ emi Bardenet Bayesian numerical methods A taste of a monster MCMC sampler for Auger 45 / 51 Glossary In order of appearance: MAP MEP MC RS IS MCMC MH AMOR SMC ABC R´ emi Bardenet Maximum posterior (estimate) Mean posterior (estimate) Monte Carlo Rejection sampling Importance sampling Markov chain Monte Carlo Metropolis-Hastings (algorithm) Adaptive Metropolis with online relabeling Sequential Monte Carlo Approximate Bayesian computation Bayesian numerical methods A taste of a monster MCMC sampler for Auger 46 / 51 References I Christophe Andrieu, Arnaud Doucet, and Roman Holenstein, Particle Markov chain Monte Carlo methods, Journal of the Royal Statistical Society B (2010). Y. Atchad´e, G. Fort, E. Moulines, and P. Priouret, Bayesian time series models, ch. Adaptive Markov chain Monte Carlo : Theory and Methods, pp. 33–53, Cambridge Univ. Press, 2011. R. Bardenet, O. Capp´e, G. Fort, and B. K´egl, An adaptive Metropolis algorithm with online relabeling, International Conference on Artificial Intelligence and Statistics (AISTATS), (JMLR workshop and conference proceedings), vol. 22, April 2012, pp. 91–99. C. M. Bishop, Pattern recognition and machine learning, Springer, 2006. R´ emi Bardenet Bayesian numerical methods References 47 / 51 References II R. Bardenet and B. K´egl, An adaptive monte-carlo markov chain algorithm for inference from mixture signals, Proceedings of ACAT’11, Journal of Physics: Conference series, 2012. A. Doucet, N. de Freitas, and N. Gordon, Sequential Monte Carlo in practice, Springer, 2001. P. Fearnhead and D. Prangle, Constructing summary statistics for approx- imate bayesian computation; semi-automatic abc, Journal of the Royal Statistical Society B (2012). P. J. Green, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika 82 (1995), no. 4, 711–732. W.R. Gilks, S. Richardson, and D. Spiegelhalter (eds.), Markov chain Monte Carlo in practice, Chapman & Hall, 1996. R´ emi Bardenet Bayesian numerical methods References 48 / 51 References III R. M. Neal, Markov chain sampling methods for Dirichlet process mixture models, Journal of Computational and Graphical Statistics 9 (2000), 249–265. , Handbook of Markov Chain Monte Carlo, ch. MCMC using Hamiltonian dynamics, Chapman & Hall / CRC Press, 2010. C. P. Robert and G. Casella, Monte Carlo statistical methods, Springer-Verlag, New York, 2004. A.D. Sokal, Monte Carlo methods in statistical mechanics: Foundations and new algorithms, 1996, Lecture notes at the Carg`ese summer school. M. Stephens, Dealing with label switching in mixture models, Journal of the Royal Statistical Society, Series B 62 (2000), 795–809. R´ emi Bardenet Bayesian numerical methods References 49 / 51 References IV D. Wraith, M. Kilbinger, K. Benabed, O. Capp´e, J.-F. Cardoso, G. Fort, S. Prunet, and C. P. Robert, Estimation of cosmological parameters using adaptive importance sampling, Phys. Rev. D 80 (2009), no. 2. R´ emi Bardenet Bayesian numerical methods References 50 / 51 Thanks for your attention R´ emi Bardenet Bayesian numerical methods References 51 / 51

© Copyright 2019