PSY 626: Bayesian Statistics for Psychological Science

PSY 626: Bayesian Statistics for Psychological Science
6/12/2018 Model convergence Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University PSY200 Cognitive Psychology

Search of memory How is memory searched?
6/12/2018 Search of memory How is memory searched? Explore by varying the number of items in memory set measure reaction time Sternberg (1969) NO 8 PSY200 Cognitive Psychology

6/12/2018 Search of memory Typical results: Parallel curves for “present” and “absent” targets Implications for how people search short term memory Average of 113 participants PSY200 Cognitive Psychology

Within subjects ANOVA Sternberg search experiment 113 participants
We ran various types of models with the rethinking library Consider the set that applied shrinkage across participants

Shrinkage SSmodelInteraction <- map2stan(
alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction )

Model comparison compare(SSmodelInteractionCondition, SSmodelInteractionMMS, SSmodelInteractionOnly, SSmodelInteraction, SSmodelAdditive , SSmodelCondition, SSmodelMSS, SSmodelNull) WAIC pWAIC dWAIC weight SE dSE Strongly favors full model (main effects and interaction) WAIC pWAIC dWAIC weight SE dSE SSmodelInteraction NA SSmodelAdditive SSmodelInteractionMMS SSmodelMSS SSmodelInteractionCondition SSmodelInteractionOnly SSmodelCondition SSmodelNull

Issues The models were difficult for Stan to work with
The model comparison by WAIC gives rather different conclusions than a standard ANOVA (no effect of condition) The model comparison by WAIC gives rather different conclusions than the linear models of the BayesFactor (essentially all models with the interaction term were indistinguishable)

Convergence > precis(SSmodelInteraction)
452 vector or matrix parameters omitted in display. Use depth=2 to show them. Mean StdDev lower 0.89 upper 0.89 n_eff Rhat grand_mua grand_sa grand_mub grand_sb grand_muc grand_sc grand_mud grand_sd sigma Warning message: In precis(SSmodelInteraction) : There were 2 divergent iterations during sampling. Check the chains (trace plots, n_eff, Rhat) carefully to ensure they are valid.

Trace plot plot(SSmodelInteraction) Very broad priors
n_eff: effective number of samples used to estimate the posterior Around 200 needed to properly estimate the mean of the posterior Thousands needed to properly estimate the tails of the posterior Sometimes much less than intended (1000) because of autocorrelation in samples

Helping convergence 1) Run more iterations 2) Use less broad priors
In call to Stan: , iter=15000, warmup=10000) This will help with n_eff Might not help with Rhat (I found it did not help very much with this analysis) 2) Use less broad priors Big regions of flat probability density in parameter space are difficult to move through A slope to the probability density lets the system move 3) Reframe the model in a way that makes Stan more efficient Even though it is mathematically equivalent

Less broad priors Benefit: avoids negative values
We’ve been using a uniform prior on standard deviations SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction ) Benefit: avoids negative values

Cauchy distribution Another choice is a half-Cauchy distribution (big tails) SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- grand_mua + a[Participant] + (grand_mub +b[Participant])*MemorySetSize + (grand_muc +c[Participant])*Condition+ (grand_mud +d[Participant])*MemorySetSize*Condition, c(a, b, c, d)[Participant] ~ dmvnormNC( grand_s, Rho), grand_mua ~ dnorm(1000, 500), grand_s ~ dcauchy(0, 500), grand_mub ~ dnorm(0, 100), grand_muc ~ dnorm(0, 100), grand_mud ~ dnorm(0, 100), Rho ~ dlkjcorr(2), sigma ~ dcauchy(0, 300) ), data= SSdataInteraction )

Less broad priors We also had very broad priors for the intercept
SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction )

Tighter prior We can cut the standard deviation of the prior in half
SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- grand_mua + a[Participant] + (grand_mub +b[Participant])*MemorySetSize + (grand_muc +c[Participant])*Condition+ (grand_mud +d[Participant])*MemorySetSize*Condition, c(a, b, c, d)[Participant] ~ dmvnormNC( grand_s, Rho), grand_mua ~ dnorm(1000, 500), grand_s ~ dcauchy(0, 500), grand_mub ~ dnorm(0, 100), grand_muc ~ dnorm(0, 100), grand_mud ~ dnorm(0, 100), Rho ~ dlkjcorr(2), sigma ~ dcauchy(0, 300) ), data= SSdataInteraction )

Reframe the model We treated each participant’s intercept (slope) as being drawn from a normal distribution with a mean grand_mua We estimated grand_mua from the data SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- a[Participant] + b[Participant]*MemorySetSize + c[Participant]*Condition+ d[Participant]*MemorySetSize*Condition, a[Participant] ~ dnorm(grand_mua, grand_sa), b[Participant] ~ dnorm(grand_mub, grand_sb), c[Participant] ~ dnorm(grand_muc, grand_sc), d[Participant] ~ dnorm(grand_mud, grand_sd), grand_mua ~ dnorm(1000, 1000), grand_sa ~ dunif(0, 2000), grand_mub ~ dnorm(0, 100), grand_sb ~ dunif(0, 200), grand_muc ~ dnorm(0, 100), grand_sc ~ dunif(0, 200), grand_mud ~ dnorm(0, 100), grand_sd ~ dunif(0, 200), sigma ~ dunif(0, 1000) ), data= SSdataInteraction )

Reframe the model Alternatively, treat each participant intercept (slope) as deviation from an estimated grand mean directly estimate grand_mua SSmodelInteraction <- map2stan( alist( RT ~ dnorm(mu, sigma), mu <- grand_mua + a[Participant] + (grand_mub +b[Participant])*MemorySetSize + (grand_muc +c[Participant])*Condition+ (grand_mud +d[Participant])*MemorySetSize*Condition, c(a, b, c, d)[Participant] ~ dmvnormNC( grand_s, Rho), grand_mua ~ dnorm(1000, 500), grand_s ~ dcauchy(0, 500), grand_mub ~ dnorm(0, 100), grand_muc ~ dnorm(0, 100), grand_mud ~ dnorm(0, 100), Rho ~ dlkjcorr(2), sigma ~ dcauchy(0, 300) ), data= SSdataInteraction )

Reframe the model Stan knows to interpret the dmvnormNC( ) call as an indication to standardize each intercept (slope) and do the estimation on the standardized values Then convert back to unstandardized values Standardizing puts all the variables on the same “scale” and makes it easier to move through the parameter space Still get shrinkage of deviation parameters (toward zero), which becomes shrinkage to the mean value when unstandardized

Convergence > precis(SSmodelInteraction)
940 vector or matrix parameters omitted in display. Use depth=2 to show them. Mean StdDev lower 0.89 upper 0.89 n_eff Rhat grand_mua grand_mub grand_muc grand_mud sigma > precis(SSmodelInteraction) 452 vector or matrix parameters omitted in display. Use depth=2 to show them. Mean StdDev lower 0.89 upper 0.89 n_eff Rhat grand_mua grand_sa grand_mub grand_sb grand_muc grand_sc grand_mud grand_sd sigma Warning message: In precis(SSmodelInteraction) : There were 2 divergent iterations during sampling. Check the chains (trace plots, n_eff, Rhat) carefully to ensure they are valid.

113*4=452 Dummy parameters > precis(SSmodelInteraction, depth=2)
Mean StdDev lower 0.89 upper 0.89 n_eff Rhat z_N_Participant[1,1] z_N_Participant[1,2] z_N_Participant[1,3] z_N_Participant[1,4] z_N_Participant[1,5] z_N_Participant[1,6] z_N_Participant[1,7] z_N_Participant[1,8] z_N_Participant[1,9] z_N_Participant[1,10] z_N_Participant[1,11] z_N_Participant[1,12] z_N_Participant[1,13] z_N_Participant[1,14] z_N_Participant[1,15] z_N_Participant[1,16] z_N_Participant[1,17] z_N_Participant[1,18] z_N_Participant[1,19] z_N_Participant[1,20] z_N_Participant[1,21] z_N_Participant[1,22] z_N_Participant[1,23] z_N_Participant[1,24] z_N_Participant[1,25] z_N_Participant[1,26]

Model comparison WAIC pWAIC dWAIC weight SE dSE
SSmodelInteraction NA SSmodelAdditive SSmodelInteractionMMS SSmodelMSS SSmodelInteractionCondition SSmodelInteractionOnly SSmodelCondition SSmodelNull Basically, better convergence gives no change in model comparison Full model (main effects and interaction) convincingly wins

Your turn I uploaded R source code SternbergSearch11.R
That implements the results discussed in lecture Run the code with changes Number of iterations Priors To get efficient (or inefficient) model estimates To vary model comparison results

Your turn Because R often freezes (on my mac) when running models one after the other I added checks that you can turn on/off parts of the code if(1==1) # code will run and save model to a file Change to if(1==0) # and code will not run, instead model will be loaded from a file (which you must have generated from a previous run) I also clear out variables with each model run rm(list=ls(all=TRUE)) # clear all variables # load the rethinking library library(rethinking) # load full data file SSdata<-read.csv(file="SternbergSearch.csv",header=TRUE,stringsAsFactors=FALSE) SSdata$TargetPresent <- ifelse(SSdata$Condition =="Present", 1, 0) # Stan model for additive effects SSdataAdditive <- data.frame(RT= SSdata$RT, Condition=SSdata$TargetPresent, MemorySetSize=SSdata$MemorySetSize, Participant=SSdata$Participant)

PSY 626: Bayesian Statistics for Psychological Science

Presentaciones similares

Presentación del tema: "PSY 626: Bayesian Statistics for Psychological Science"— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback

Iniciar la sesión

Autorizarse a través de una red social:

PSY 626: Bayesian Statistics for Psychological Science

Presentaciones similares

Presentación del tema: "PSY 626: Bayesian Statistics for Psychological Science"— Transcripción de la presentación:

Presentaciones similares

Sobre el proyecto

Feedback