vignettes/blog/2025-02-10-correction-dif-plots.Rmd
2025-02-10-correction-dif-plots.Rmd
A post or two ago I proposed a clustered heatmap as an optimal
graphical representation of the item pair DIF statistics in
dexter. While the plots are helpful and easy to read,
they are still not exactly what I wanted. The code within the
pheatmap package first computes a distance matrix of
the inputs and then performs cluster analysis, while I argue that the
differences (called Delta_R in the dexter object and
distances here) are distances in themselves. Working with the
distances between the distances is a bit far-fetched so, to get what I
originally wanted, we simply replace dist(dist)
with
as.dist(dist)
. With that change and some adjustment to the
plot when what="distances"
, the code now becomes:
DIFplot = function(d,
what=c('distances','statistics','pvalues','significance'),
pam=c("none", "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
"fdr"), cluster = TRUE, alpha=0.05, ...) {
what = match.arg(what)
pam = match.arg(pam)
alpha = alpha/2
dist = abs(d$Delta_R)
o = hclust(as.dist(dist))$order
lbl = d$items$item_id
stat = abs(d$DIF_pair)
outl = lbl[o]
if (cluster) {stat=stat[o,o]; lbl=lbl[o]}
pval = 1 - pnorm(stat)
u0 = pval[lower.tri(pval)]
u1 = p.adjust(u0, method=pam)
pval[lower.tri(pval)] = u1
if(what=='distances') {
if (cluster) { dist = dist[o,o] }
rownames(dist) = colnames(dist) = lbl
diag(dist) = NA
pheatmap::pheatmap(dist, main='PDIF: raw differences', cluster_rows=FALSE, cluster_cols=FALSE)
}
if(what=='statistics') {
rownames(stat) = colnames(stat) = lbl
diag(stat) = NA
pheatmap::pheatmap(stat, main='PDIF: standardized differences', cluster_rows=FALSE, cluster_cols=FALSE)
}
if(what=='pvalues') {
rownames(pval) = colnames(pval) = lbl
diag(pval) = NA
ttl = 'PDIF: p-values'
if (pam != 'none') ttl=paste0(ttl, ' (below diagonal adjusted by ',pam,')')
pheatmap::pheatmap(pval,
main = ttl,
cluster_rows=FALSE, cluster_cols=FALSE,
color = colorRampPalette(RColorBrewer::brewer.pal(n=7, name="RdYlBu"))(100))
}
if(what=='significance') {
ttl = paste0('PDIF: significance at alpha=',2*alpha)
if (pam != 'none') ttl=paste0(ttl, ' (b/d adjusted by ',pam,')')
v = (0 + (pval<alpha))
rownames(v) = colnames(v) = lbl
diag(v) = NA
pheatmap::pheatmap(v, main=ttl, cluster_rows=FALSE, cluster_cols=FALSE, legend=FALSE)
}
return(outl)
}
Looking at the new clustered plot of the differences (aka distances), we see that it is rearranged but does not show the dendrograms, in line with the other plots. However, it has barely changed otherwise for this particular example. Still, I wanted to have it my way because the change might matter in other examples.
pl = DIFplot(d, what='distances', cluster=TRUE)