The other day, JP de Ruiter tweeted in:
He has a point.
And, well, we do not want to use the sleight of stats Keith Laws suggests.
Which, as this post that just precedes this one shows, I have been pondering before, and I’m far from the only one pondering this. (Hey, it is my blog. I get to repeat myself. I think I’m sketching….)
Unlike Animal Farm animals, all studies with null results are not created equal. All of us know the standard reason why null-results are not published passed down through the training generations: There are many reasons why a study doesn’t work out, and a lot of them are scientifically entirely uninteresting. The uninteresting ranges from poorly thought through methods, badly chosen stimuli, errors in timing, badly run studies, crappy conceptualization, like those unhappy families though terribly uninteresting to write tomes about. This is what we remind our students of when they with feeble hope pipe up that it is really interesting to know what doesn’t work.
Sure. But the universe of” doesn’t work” is endless. Only things that don’t work in interesting ways are informative. Which, well, raises the question, what is an interesting way?
I know of two papers that published null-results prior to the replication flurry. On one, my advisor was a co-author along with June Tangney and others on certain aspects of Higgins Self-Discrepancy theory. The second was work by Jari Hietanen where he looked at whether the emotional expression of a centered face with eyes pointing to either direction in an attention paradigm (bear with me) mattered. That is, are we more likely to be lured by the eye-direction of a frightened face (as evidenced by faster reaction times when the target is in the direction, and slower when the target is in the opposite direction) than other emotional expressions?. He didn’t find that in 5 different experiments, using different depictions of faces. Both involved multiple studies and multiple variants of stimuli and paradigms. Tangney’s et al also included an alternative prediction. Lots of work. Perfectly reasonable. Rarely seen.
But, there are a lot of other types of null results.
Across the street from where I work, there is a museum called “skissernas museum” – the museum of sketches. It is filled with earlier drafts, sketches, and preliminary models of artwork that are officially displayed in museums, or as sculptures in squares, and in some cases well known.
A piece of art is not created from blank thoughts to the finished product in one go. Before are the sketches, the attempts, the miniature models. Even I, in my feeble amateur painting spent a bit of time sketching.
This is how I think about my spiders and snakes and attention (insert Oh My here) work, which has yet to see the light of day. We got something in each study, but could not interpret it. So, we kept tweaking them. Changing a thing here or a thing there. Alas, I left for Sweden before we had a tweak that gave us clear results.
A lot of the filedrawer may be just this kind of work. Sketches. Drafts. Preliminary work.
Some are more like our tweaking of a Stapel Ebbinghaus Study (as far as I know based on genuine data) where instead of social categories we used emotional expression. The non-results of that one probably lingers comfortably in that file-drawer, or land-fill as is the case now (as I emptied the drawers out myself). We gave it a good try, didn’t work, oh well, it was a bit of a long shot (although I have seen it done lately. Gasp).
Then there are those that may be informative in different way. I think the five variants of testing whether emotional state influenced perceptual processing of emotion-congruent faces might have deserved a null-publish. We thought it might work, it didn’t, and we had some ideas why (and, also as a warning, don’t waste your time doing this.)
And, then the even more troubling kinds– when researchers have attempted to replicate fairly directly some interesting effect that has already been published, and not getting it.
Pre-registration takes care of some of that, but that is for fairly late in the game. Here things are well thought out, and one can make a full-blown hypothesis testing that may or may not work out, and people are willing to bet both time and money on setting it up. But, not all of the attempts are of that kind.
These last couple of types are the ones that are missing, and that would be informative for research
But the rest? The sketches? And all those attempts that find no results because of reasons that has nothing to do with what is tested, but everything to do with the performance (and one has to remember that we likely all make these kinds of mistakes on the way, where the problems with stimuli, with collection, with design, and thinking things through which is only evident in hind-sight). What to do with them? Not all are strategic cases where you run a lot of studies and publish what “worked”. They just didn’t.
Publish? As if the literature isn’t crowded enough as it is. Even Skissernas Museum limit themselves to fairly late prototypes and sketches.
Paul Meehl suggested that it might be a good idea to have some place summarizing the pilot work that didn’t work out, in order for others to not go down that particular wrong turn. (Some turns are just so attractive that we may go down there multiple times, just to find it is a dead end).
For some areas that may be very interesting to formalize. But keeping it all may be like insisting on plastering every scribble of your kids daycare work on the wall.
Perhaps one of the issues also is that the criteria for publishing has been too lenient, or that the methods for determining what is real (aka null-hypothesis testing) is just too weak. Yes, I know, lots of people think that, and have said that for a long time! (I just re-read Meehls paper on Sir Karl and Sir Ronald where he chides hypothesis testing for being much too light of a challenge for a hypothesis. Put them to risk!).
*Yeah, I realize I covered this in my earlier post too. But it is my blog so I get to repeat myself if I want to. Perhaps I’m sketching.
I just reviewed a paper that wasn’t stupid, and asked an important question. It is just that it was thin, and a null-result. It used 80 participants in 4 cells and it wasn’t repeated measures. They replicated (weakly) one finding, but found no effect for what most likely was what they really were going for.
I’m getting very sensitive to the file-drawer problem. If we have sensible data, should it languish? Yet, there is a problem cluttering up the journals with short, underpowered studies.
I left it up to the editor (who is my colleague) to reject it.
What I would have wanted to see was, first, better power. Then, follow-up work on the particular question.
But, this makes me think about publishing policy. I really understand the desire to publish things that “work”, (except that the indication of what works are so weak in psychology). It is like you want to unveil the final sculpture, the polished version of the violin concerto, the bug-free version of the software – not all the sketches and wrong steps and other discards on the way. You want to publish a real Finding – even if (as in all research) it is tentative.
But, the sketches, and wrong turns, and pilots, and honing have some kind of information. At least sometimes it is really important to know what doesn’t work. And, as was evident from the special issue on replication, there is work out there that people informally know does not work, but is not in the public record because the failure to replicate has not been published.
We had a brief discussion about this at last years “solid science” meeting. Joe Simmons said that there really are loads of piloting of ideas that turned out to be crap that really don’t need to be cluttering up cyberspace and our ability to navigate information, whereas Jelte Wichert’s thought it is really important to have a data-record.
I’m very ambivalent. There is so much data collected – I’m thinking of a lot of final theses that are done – where the research is the equivalent of arts and crafts projects that show that you can do this, but doesn’t really add to the research record.
Or, all those pilots that you do to tweak your instruments and methods. What to do with those? Meehl, in his theory of science videos, suggested that you collect that info in short communications, just for the record.
I’m thinking of two file-drawers I have. One of them really demonstrates that the phenomenon we were testing doesn’t exist. It is a boundary condition. As such, it might have been important to have it out there (5 studies, 90 people in 3 conditions in each, repeated measures). I have another set of 9 studies looking at threat and attention which are more of the “tweak the paradigm” type. Something happened, but it was terribly messy to interpret, and thus we were working on finding an angle where results could be more clear and interpretable. How do you make that distinction?
I have some idea here that it would be nice if one could spend that time with the sketches. Once it works, one needs to replicate, and one only publishes when one feels fairly certain that there is something there (and possibly include links to the sketches). Which, of course, is not how it is done right now, because of the incentive structure.
The first time I met Daniël Lakens, he and Brian Nosek were working on a special issue in Social Psychology, calling for registered replication of well-known, highly cited studies.
It is now out! 15 articles of attempts to replicate with, let us say, mixed results.
I’m linking in the PDF as they posted it on the OSF framework, so you get both the text, and more exposure to the framework for your future collaboration efforts!
Some people, Science reports, don’t like being replicated, at least when the results are different. I’m thinking, once things are out there in the record, work really is up for being replicated or questioned. I thought that was the point! Maybe, once this is done more regularly, people adapt and won’t go all drama. Exposure therapy, I believe, have evidence on its side.
Chris Chambers, who has long been at the forefront of the call for registered reports (and implemented it at Cortex) has a more uniformly positive of the practice here.
I have done a first skim-through, and clearly, clearly we need to put a lot more effort into replicating results, march slower, be careful with what we accept.
I thought I had more cool posts to share, but I got so wrapped up in the Baumol disease I got discombobulated.
But, yes, plenty more of good posts to share, so I’m sharing them now.
About a week ago, my bud Daniel Lakens reported on this Find on his blog. A paper even older than me! Yes, people have been thinking about these issues for a long time.
Sylvia McLain asks if Spotting Bad Science really is as easy as a nice poster giving instructions on how to do it. And, of course, if it really were, there wouldn’t be as much bad science. But, as a handy dandy tool it can be a useful beginning tool. The creator of the poster answers in the comments, and there is a good conversation.
Speaking of Bad Science, JP de Ruiter linked in a Brain Pickings article highlighting Carl Sagans baloney detector kit (got that?)
Tom Stafford linked in his draft of this very lovely article on rational argument. He brings up both Cialdini, and argument as a means of persuasion rather than correctness. As it is draft 2, it may evolve further, but I thought it was just great.
Last month, Keith Laws and others debated whether CBT for psychosis had been oversold. It was all filmed, so you can check it here (as I watched it at the same time as I was reading about montage and cutting techniques, I found myself wishing for some of those, plus a good sound engineer, but you can’t have everything). A Storify from Alex Langford appears here. I considered it a good example how a good anecdote trumps good data as far as persuasion goes – which ties in with the Cialdini in Tom Stafford’s piece, but I’m not a clinician. Worth checking out though.
Last, I was very sad to hear that Seth Roberts died. I’ve followed his blog for a few years by now, and I thought him very interesting, innovative and thoughtful (I even posted on his comments once, regarding all this Stapel fraud stuff, as he has been involved in that).
I’ve been working on trying to teach myself how to do a meta-analysis. With no really clear results yet (someone needs to help me, I think, though I get the gist of it, enough to worry about messing things up). This means massive blog neglect.
But, others have kept up blogging and writing articles. I wanted to share a couple of those.
The first I got via Stephen Hsu – It’s a Chronicel Article by Nicholas Lemann, called The soul of the research university. I have been thinking about the University – the conflict (and status differential) between research and teaching, and how that historically came about. I think this article answers some of those questions, although I appreciate any historian piping up and setting me and others straight.
It clearly highlights the perception problem between on the one hand the Research Focus from inside research institutions, and the Education focus, which seems to be the perception from outside the research institutions.
It also lifts up a couple of what I would consider economic questions (economists feel free to correct me here also): The research that universities engage in is, in many ways, high risk endeavors. No guarantee of pay-off, and if pay-offs they may very well be far in the future (when did the internet begin? Early 60’s? Yes, I know, defense and things, but also universities, if I recall.)
Also, it brings up what is called Baumol’s disease, which I first heard from an online teacher friend. Person centered work – like teaching, research, live playing, certain services – cannot be automated effectively (as much as the MOOC’s try). But, they still have to be paid, and they cannot be made cheaper. (Well, even with the adjunctification). I’m not sure I would cast that in “disease” terms – I figure much of all the progress we make were originally so that we could live well as humans – but it is a dynamic to consider.
It really is worth a read, and I would like to read more on the area, as I’m very interested in these kinds of policy questions.
The second is from Brent Roberts, on the PIGEE blog. It’s a follow up on his Deathly Hallows post (read that too), and focuses on his scary vision of good science, a vision that involves asking good interesting questions, and damn the direction of the results. As he says, BOO.
I’ve met people who want to have statistical concepts written out as equations, and who swear that graphs and other visual aids don’t help them. Humans vary. I’m very visual, so I like graphs and other visual aids. Like the LA Natural History museums demo of the normal distribution.
Here are some nice visualizations of statistical ideas that I found in my twitter stream. I only remember that the last – on the p-values, came from Chris Said, because that was yesterday.
Science made Cracked, and Not In A good Way.
The headlines of the six:
*(Prove. They wrote Prove. I cannot do that when you are not doing maths. They show, demonstrate, illustrate, ups the confidence, is consistent with, gah. Pass the Smelling salt) OK, as you were. I’m sure some smartass will comment that it is just fine saying prove.)
PoPS has a section in their new issue containing responses from the Pro-Priming people. Alas behind paywall, but at least some here do have access. It is an interesting read, although I don’t agree with some of it. (My position, somewhat vaguely, is that I’m sympathetic to the idea behind behavioral priming – that we are sensitive to our surroundings and respond to it in ways that we are not really aware of, but I suspect that the conceptualization of it is problematic – don’t ask me to come up with a better one.)
But, I also wanted to link in Daniel Lakens’ blog response to the special issue, which, of course, is open to anybody with access to the net. I thought it was a very nice response.
From Dynamic Ecology, thoughts about how to change the funding schemes to ensure an academy focused on research, not prestige. I found the first answer quite interesting. But, I have never heard of Canada as a model before (poor Canadians).
From What’s the PONT is an intriguing post about the scaling problem. It may not be possible to scale up things that work on a small scale. There is a limit to the economy of scale. At some point in the scaling up, something becomes lost (perhaps it undergoes a kind of bifurcation or critical point). I think this is something to keep in mind when we try to educate more and more with less and less. Like the unraveling of the MOOC’s it is clear that it just won’t work. (And, people who had looked at this before basically said “I told you so”. Not quite me, I must confess, until someone pointed out that long-distance education is an old gambit, and the problems don’t go away just because we have new fancy tools). Even Sebastian Thrun has admitted it. A snarkier version from Rebecca Schuman in Slate.
Universities have been hoping to make money on patents from their researchers work. This is most definitely the hope at Lund, and I did read about it in Paula Stephan’s book. But, it is a poor bet. Most of it won’t pay off.
Samuel Arbesman says first to bring back the generalists. (Yay, I say, as I can’t make up my mind whether I’m interested in Emotion, Modeling, Evolutionary Psychology, methodology, behavioral economics, chaos theory, philosophy….), but also that the innovation and research is no longer in the Academy, but among the startups. Going Changizi, as I like to say.
And, in this post, I link in things related to publishing and open access.
Randy Schekman won the Nobel Price, and dissed the glam mags (that is Nature, Science and Cell). Here is his The Conversation piece on how to break free from Glam. Not everyone took kindly to what he said. Here is Opiniomics considering that he may be a hypocrite, considering that he has published in the glams. But, perhaps before they were truly glam. Hypocrite or no, I think it is something that needs to be discussed even more than it is done. But, I don’t think it is really the glams fault. Glams wouldn’t be glams if there wasn’t a market clamoring for them. Like, those deciding on grants and careers looking at how many glossy covers. Yes, science as Hollywood. Vote for the sexiest research project of the year! Ronin institute articulated this well.
Related, here is Stephen Curry on the problem with the Glam Magazines. It is a commentary to a debate that he links in (confession, I haven’t watched. 2 hours!), but I think his commentary are worth it, sans watching.
Elsevier, the publisher that is the favorite hate-target it seems, started telling researchers and everybody else to take down the pdf’s to their own (Elsevier published) research. Which, well, they legally are allowed to do, as we regularly sign away our rights. But, it has been sort of a tacit custom that you get to keep your pdf’s on your home page. Sort of like being allowed to have multiple copies of your records I guess. I think it is time to consider better ways of publishing.
Here are some thoughts on that: First Micah Allen’s call for self-publication instead of via publishers. Then Shauna Gordon-McKeon’s 3-part series Chasing Paper from the OSC blog. Part 2 and 3 linked here. For full disclosure, I’m affiliated with the OSC blogs.
The PeerJ blog has a nice interview with Dorothy Bishop where they discuss open access, and her experience with PeerJ.
A Paper from PLOSOne compared post-pub peer review, impact factor and number of citations. None is a really good measure of, well, impact it seems. And, here something from Science critiquing the h-index.
More to come.