PSY 626: Bayesian Statistics for Psychological Science 6/12/2018 Model convergence Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University PSY200 Cognitive Psychology
Search of memory How is memory searched? 6/12/2018 Search of memory How is memory searched? Explore by varying the number of items in memory set measure reaction time Sternberg (1969) NO 5 3 2 9 5 3 2 9 8 5 3 2 9 PSY200 Cognitive Psychology
6/12/2018 Search of memory Typical results: Parallel curves for “present” and “absent” targets Implications for how people search short term memory Average of 113 participants PSY200 Cognitive Psychology
Within subjects ANOVA Sternberg search experiment 113 participants We ran various types of models with the rethinking library Consider the set that applied shrinkage across participants
Shrinkage SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction )
Model comparison compare(SSmodelInteractionCondition, SSmodelInteractionMMS, SSmodelInteractionOnly, SSmodelInteraction, SSmodelAdditive , SSmodelCondition, SSmodelMSS, SSmodelNull) WAIC pWAIC dWAIC weight SE dSE Strongly favors full model (main effects and interaction) WAIC pWAIC dWAIC weight SE dSE SSmodelInteraction 93808.8 233.2 0.0 1 178.63 NA SSmodelAdditive 93841.9 224.7 33.0 0 177.89 12.57 SSmodelInteractionMMS 93864.5 214.7 55.7 0 177.91 15.18 SSmodelMSS 93919.9 160.5 111.0 0 176.06 23.33 SSmodelInteractionCondition 94022.5 192.4 213.6 0 174.29 30.12 SSmodelInteractionOnly 94341.3 167.0 532.5 0 168.19 47.85 SSmodelCondition 94493.2 168.2 684.3 0 165.55 55.33 SSmodelNull 94554.9 112.7 746.1 0 164.59 58.00
Issues The models were difficult for Stan to work with The model comparison by WAIC gives rather different conclusions than a standard ANOVA (no effect of condition) The model comparison by WAIC gives rather different conclusions than the linear models of the BayesFactor (essentially all models with the interaction term were indistinguishable)
Convergence > precis(SSmodelInteraction) 452 vector or matrix parameters omitted in display. Use depth=2 to show them. Mean StdDev lower 0.89 upper 0.89 n_eff Rhat grand_mua 755.42 20.76 721.94 786.60 1000 1.00 grand_sa 210.42 14.74 188.49 235.55 1000 1.00 grand_mub 34.60 2.87 30.14 38.97 226 1.00 grand_sb 16.01 2.58 12.27 20.39 137 1.02 grand_muc -63.21 13.13 -82.86 -41.09 154 1.00 grand_sc 65.26 9.65 51.22 80.58 307 1.00 grand_mud 21.03 3.60 14.61 25.96 126 1.01 grand_sd 9.86 3.44 3.53 14.54 45 1.00 sigma 240.11 2.13 236.87 243.82 1000 1.00 Warning message: In precis(SSmodelInteraction) : There were 2 divergent iterations during sampling. Check the chains (trace plots, n_eff, Rhat) carefully to ensure they are valid.
Trace plot plot(SSmodelInteraction) Very broad priors n_eff: effective number of samples used to estimate the posterior Around 200 needed to properly estimate the mean of the posterior Thousands needed to properly estimate the tails of the posterior Sometimes much less than intended (1000) because of autocorrelation in samples
Helping convergence 1) Run more iterations 2) Use less broad priors In call to Stan: , iter=15000, warmup=10000) This will help with n_eff Might not help with Rhat (I found it did not help very much with this analysis) 2) Use less broad priors Big regions of flat probability density in parameter space are difficult to move through A slope to the probability density lets the system move 3) Reframe the model in a way that makes Stan more efficient Even though it is mathematically equivalent
Less broad priors Benefit: avoids negative values We’ve been using a uniform prior on standard deviations SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction ) Benefit: avoids negative values
Cauchy distribution Another choice is a half-Cauchy distribution (big tails) SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- grand_mua + a[Participant] + (grand_mub +b[Participant])*MemorySetSize + (grand_muc +c[Participant])*Condition+ (grand_mud +d[Participant])*MemorySetSize*Condition, c(a, b, c, d)[Participant] ~ dmvnormNC( grand_s, Rho), grand_mua ~ dnorm(1000, 500), grand_s ~ dcauchy(0, 500), grand_mub ~ dnorm(0, 100), grand_muc ~ dnorm(0, 100), grand_mud ~ dnorm(0, 100), Rho ~ dlkjcorr(2), sigma ~ dcauchy(0, 300) ), data= SSdataInteraction )
Less broad priors We also had very broad priors for the intercept SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction )
Tighter prior We can cut the standard deviation of the prior in half SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- grand_mua + a[Participant] + (grand_mub +b[Participant])*MemorySetSize + (grand_muc +c[Participant])*Condition+ (grand_mud +d[Participant])*MemorySetSize*Condition, c(a, b, c, d)[Participant] ~ dmvnormNC( grand_s, Rho), grand_mua ~ dnorm(1000, 500), grand_s ~ dcauchy(0, 500), grand_mub ~ dnorm(0, 100), grand_muc ~ dnorm(0, 100), grand_mud ~ dnorm(0, 100), Rho ~ dlkjcorr(2), sigma ~ dcauchy(0, 300) ), data= SSdataInteraction )
Reframe the model We treated each participant’s intercept (slope) as being drawn from a normal distribution with a mean grand_mua We estimated grand_mua from the data SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction )
Reframe the model Alternatively, treat each participant intercept (slope) as deviation from an estimated grand mean directly estimate grand_mua SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- grand_mua + a[Participant] + (grand_mub +b[Participant])*MemorySetSize + (grand_muc +c[Participant])*Condition+ (grand_mud +d[Participant])*MemorySetSize*Condition, c(a, b, c, d)[Participant] ~ dmvnormNC( grand_s, Rho), grand_mua ~ dnorm(1000, 500), grand_s ~ dcauchy(0, 500), grand_mub ~ dnorm(0, 100), grand_muc ~ dnorm(0, 100), grand_mud ~ dnorm(0, 100), Rho ~ dlkjcorr(2), sigma ~ dcauchy(0, 300) ), data= SSdataInteraction )
Reframe the model Stan knows to interpret the dmvnormNC( ) call as an indication to standardize each intercept (slope) and do the estimation on the standardized values Then convert back to unstandardized values Standardizing puts all the variables on the same “scale” and makes it easier to move through the parameter space Still get shrinkage of deviation parameters (toward zero), which becomes shrinkage to the mean value when unstandardized
Convergence > precis(SSmodelInteraction) 940 vector or matrix parameters omitted in display. Use depth=2 to show them. Mean StdDev lower 0.89 upper 0.89 n_eff Rhat grand_mua 753.96 21.64 722.43 791.56 231 1 grand_mub 34.82 3.08 30.05 39.95 1000 1 grand_muc -62.19 13.50 -80.60 -38.24 1000 1 grand_mud 20.82 3.61 14.89 26.34 1000 1 sigma 240.18 2.11 236.72 243.40 1000 1 > precis(SSmodelInteraction) 452 vector or matrix parameters omitted in display. Use depth=2 to show them. Mean StdDev lower 0.89 upper 0.89 n_eff Rhat grand_mua 755.42 20.76 721.94 786.60 1000 1.00 grand_sa 210.42 14.74 188.49 235.55 1000 1.00 grand_mub 34.60 2.87 30.14 38.97 226 1.00 grand_sb 16.01 2.58 12.27 20.39 137 1.02 grand_muc -63.21 13.13 -82.86 -41.09 154 1.00 grand_sc 65.26 9.65 51.22 80.58 307 1.00 grand_mud 21.03 3.60 14.61 25.96 126 1.01 grand_sd 9.86 3.44 3.53 14.54 45 1.00 sigma 240.11 2.13 236.87 243.82 1000 1.00 Warning message: In precis(SSmodelInteraction) : There were 2 divergent iterations during sampling. Check the chains (trace plots, n_eff, Rhat) carefully to ensure they are valid.
113*4=452 Dummy parameters > precis(SSmodelInteraction, depth=2) Mean StdDev lower 0.89 upper 0.89 n_eff Rhat z_N_Participant[1,1] -0.41 0.25 -0.80 0.00 1000 1.00 z_N_Participant[1,2] 0.59 0.26 0.13 0.96 1000 1.00 z_N_Participant[1,3] -1.22 0.27 -1.58 -0.74 1000 1.00 z_N_Participant[1,4] -1.10 0.27 -1.50 -0.64 1000 1.00 z_N_Participant[1,5] 1.77 0.28 1.30 2.17 1000 1.00 z_N_Participant[1,6] -0.37 0.26 -0.77 0.06 1000 1.00 z_N_Participant[1,7] 0.39 0.25 0.02 0.82 1000 1.00 z_N_Participant[1,8] -1.23 0.27 -1.61 -0.79 1000 1.00 z_N_Participant[1,9] -0.49 0.27 -0.93 -0.07 1000 1.00 z_N_Participant[1,10] -1.55 0.27 -1.98 -1.12 1000 1.00 z_N_Participant[1,11] 0.57 0.26 0.17 0.96 1000 1.00 z_N_Participant[1,12] -1.04 0.27 -1.48 -0.63 1000 1.00 z_N_Participant[1,13] -0.73 0.28 -1.17 -0.27 1000 1.00 z_N_Participant[1,14] -0.50 0.25 -0.94 -0.14 1000 1.00 z_N_Participant[1,15] -0.79 0.25 -1.16 -0.38 1000 1.00 z_N_Participant[1,16] 0.74 0.27 0.32 1.19 1000 1.00 z_N_Participant[1,17] -0.63 0.27 -1.12 -0.26 1000 1.00 z_N_Participant[1,18] -1.02 0.26 -1.43 -0.59 1000 1.00 z_N_Participant[1,19] -0.62 0.26 -1.03 -0.21 1000 1.00 z_N_Participant[1,20] -1.23 0.27 -1.64 -0.74 1000 1.00 z_N_Participant[1,21] 0.10 0.26 -0.28 0.55 1000 1.00 z_N_Participant[1,22] -0.40 0.27 -0.78 0.06 1000 1.00 z_N_Participant[1,23] -0.02 0.25 -0.41 0.41 1000 1.00 z_N_Participant[1,24] 1.36 0.28 0.92 1.78 1000 1.00 z_N_Participant[1,25] -0.56 0.27 -1.01 -0.14 1000 1.00 z_N_Participant[1,26] 0.61 0.27 0.23 1.05 1000 1.00
Model comparison WAIC pWAIC dWAIC weight SE dSE SSmodelInteraction 93810.6 236.6 0.0 1 178.33 NA SSmodelAdditive 93842.2 229.0 31.6 0 177.84 12.56 SSmodelInteractionMMS 93861.0 223.1 50.4 0 177.60 15.45 SSmodelMSS 93920.8 164.3 110.2 0 176.02 23.52 SSmodelInteractionCondition 94025.4 211.9 214.8 0 174.18 29.45 SSmodelInteractionOnly 94342.1 166.2 531.5 0 167.96 47.96 SSmodelCondition 94491.0 168.1 680.4 0 165.43 55.08 SSmodelNull 94554.9 112.0 744.3 0 164.38 57.77 Basically, better convergence gives no change in model comparison Full model (main effects and interaction) convincingly wins
Your turn I uploaded R source code SternbergSearch11.R That implements the results discussed in lecture Run the code with changes Number of iterations Priors To get efficient (or inefficient) model estimates To vary model comparison results
Your turn Because R often freezes (on my mac) when running models one after the other I added checks that you can turn on/off parts of the code if(1==1) # code will run and save model to a file Change to if(1==0) # and code will not run, instead model will be loaded from a file (which you must have generated from a previous run) I also clear out variables with each model run rm(list=ls(all=TRUE)) # clear all variables # load the rethinking library library(rethinking) # load full data file SSdata<-read.csv(file="SternbergSearch.csv",header=TRUE,stringsAsFactors=FALSE) SSdata$TargetPresent <- ifelse(SSdata$Condition =="Present", 1, 0) # Stan model for additive effects SSdataAdditive <- data.frame(RT= SSdata$RT, Condition=SSdata$TargetPresent, MemorySetSize=SSdata$MemorySetSize, Participant=SSdata$Participant)