If I understood the test correctly, the test statistic involves a comparison between the chi-square fit of y's replicated (y.rep) from the fitted model (across various MCMC draws from the parameters, theta) and the chi-square fit of observed y's (y.obs) estimated using the same theta draws. The fit is the sum, across the y's, of the squared difference between each y and the expected value of y given theta (XB in OLS model), divided by the variance of y given theta. Each draw of theta generated two chi-square estimates, one for y.rep and one for y.obs. After a number of thousand of draws of theta, if the chi-squareds for y.obs prove systematically larger (or, I suppose, smaller) than the chi-squareds for y.rep, then the model probably doesn't fit the data.

The problem is that I've tried one Monte Carlo model after another in which I deliberately try to make the data not fit the proposed model, yet chi-sq(y.rep) and chi-sq(y.obs) are not systematically either larger or smaller than the other (chi-sq(y.rep) proves larger in 48% of the cases give or take a percentage point). I've tried an OLS model with normal errors against data with a t, df=1 error. I've tried another OLS model taking a sixth of the y's and setting them to zero (graphically: cloud of points well above zero and a row of points at zero).

I've been over the R script I wrote with a fine-tooth comb and see no errors. I can even see, as I'm stepping through, that, say, the zeroed y values are showing elevated error rates. But, the OLS inflates the std error to accommodate the outlying zero values, with the apparent result that the non-zero points have compensatingly less error. Once the chi-square sums up, there's no difference between y.rep and y.obs. The kinds of models Gelman, Meng, and Stern use to illustrate the test are unlike anything I have seen in my social science, so maybe the test only works in odd non-OLS situations?