Report from the symposium on Moving Beyond Questionable Practices
I’m back from the Moving Beyond questionable practices symposium, and I’ll try to collect my thoughts here. It really was good. More needs to be done.
First up was Jelte Wichert. Go check his page. He lists all the papers he used for this talk (and more). And, as you can see, some of his work is on how researcher do their research.
He started out with the dichotomy of the two virtual scientists – Dr. good and Dr. Evil. Our dichotomous minds so easily want to classify like that. Stapel and Hauser were Dr Evils, and the rest… well. But, of course, it is never that easy. Jelte cited a former professor (guess who), about how he got really good at making ugly results more pretty, like those he saw published. And, then, well, we’re slip slip sliding along into sheer fabrication. Could it happen to anyone? Well, depends on degree of pressure and payoff I would think, but I would figure the propensities vary.
But, the issue isn’t really about those who end up being pure frauds. There will be frauds and gamers. (The trick is, possibly, to set incentives to minimize the frauds). It is about the rest. What the rest of researchers believe in, and believe others do, and the forces that shape behavior. Jelte showed work on surveying researcher on good research practices and norms.
This is basically Mertons Norms. I stole this list from Wikipedia (but with proper linking)
• Communalism entails that scientific results are the common property of the entire scientific community.
• Universalism means that all scientists can contribute to science regardless of race, nationality, culture, or gender.
• Disinterestedness according to which scientists are supposed to act for the benefit of a common scientific enterprise, rather than for personal gain.
• Originality requires that scientific claims contribute something new, whether a new problem, a new approach, new data, a new theory or a new explanation.
• Skepticism (Organized Skepticism) Skepticism means that scientific claims must be exposed to critical scrutiny before being accepted.
You can come up with a series of antinorms to these too, which he did (not quite the same as those on the Wikipedia site, if I recall, but you get the idea – and I linked his page so you can look it up if you are less lazy than me). For both early and mid career scientists, people endorsed the norms by a very wide margin. Really really really. Just a tiny sliver of those black anti-normers. They thought others did so too. With a really big margin. But, then, it got very grey when it was about what they thought others DID. Here it was iffier. And, here is a problem. You are in a bind. You want to follow the good norms, but you think others may not, and now your career is at stake. What do you do? (Perceptions, of course, may not be true. It is what you believe is true, and consequently act upon).
Jelte showed research he did on the sharing data practices. (This is one of the good norms, and many journals require that data should be available on request). This was published in 2005, so before the current storm, but these issues have been discussed for a very long time. The results (kind of qualitatively processed here through me,) were interesting. Lots of people did not comply, and Jelte listed some of the excuses. The first two are legitimate: the research is ongoing, and the IRB will not allow. Then there are the “the dog ate my homework” reasons. I don’t have time; I’m up for tenure; the research assistant who had the data left; the data was stored on a computer that has since crashed. It reads like a sad version of overly honest methods. Of the data-sets that were shared, some were incomprehensible, with labels like Var001, Var002, Var003…. (I’m like flabbergasted! I really spend time labeling my variables, because otherwise I’d be hopelessly confused!). One thing this suggests (according to Jelte) is that researchers don’t routinely share the data with their coauthors. This is a major problem. Stapel didn’t, and we know why. But, even if the vast majority of researchers are no Stapel, not having it as a practice to share data with your research group is really problematic. It opens for temptation. Also, it is easy to make mistakes. Your research buddy is someone who can help with this. It is clear that my practice from now on in my teaching will emphasize sharing well labeled data even more than currently. I always ask for data, but it is more because I want some data to play with, not to instill good research practices.
You also have the reality that research practices which are, well, ambiguous. (Leslie John, and Brian Nosek got into this also). In your training, you have the book training, and then the lab training, and although they touch the same thing, doing something practical is very different from learning the rules. There are lots of things that are considered reasonable practice with data: Removing outliers. Chucking reaction times that clearly are accidental responses (nobody processes a stimulus in 20 ms). Removing participants who obviously did not do the task (see Zwaan for the shirking turkers – follow the link in my earlier post). Doing linear transformations. And, there are practices that you kind of did, because it was suggested. You collect data, check on the results, not there, but will probably be with 30 more or so, since it is a small effect and you need enough power. All of this I’ve done, and I tend to be conservative. (I analyse with and without. I agonize. Oh… and mostly not yet published.).
He shows this decision tree slide with the path to publication. Run experiment, p < .05,. publish. If not, run 10 more subject. If not, do transforms. If not, throw out pesky outliers. If not – failed study…. He very strongly said that the notion of Failed Study because no significant results should be banned. Failed studies are when the fMRI blows up, when nobody shows up, you know, totally fubar. Not getting results in a well designed study is not a failed study! It is a problem that the incentive structures treat it as such. Also, he pointed out, researchers are human. We want to do good science, but we also want to have a job, and to pay our mortgage, and feed our kids, and expand our wardrobes – well – that is me (and, strictly, I don’t need to). And, if your good science end up being > .05 in a world that only considers < .05 publishable, and only publishable gets you the job.
Leslie John got into this even more. Her work involved recruiting a large number of researchers via e-mail to ask them about research practices. Not explicitly stated as Questionable Practices, but as practices. For each practice, the questions were, do you do it, do your colleagues do it, how common do you think it is done. (Again, I’m linking to her work, so you can verify. I’m doing this mostly from memory). She shows that people do engage in these (more in the kind of acceptable, very much less in the outright wrong), but also that researchers have the perception that others are doing this, probably even more so, which puts you in a bind. They use steroids (to use Brian Noseks wording), and, well, how can I compete against that? She hinted that this is not just getting the research practices in order, because the research practices are the way they are because of the incentives. As Neuroskeptic suggested, researchers are good lab-rats who figures out the reward structure, and act accordingly. It was very interesting, and I had a nice chat with here during the coffee break, where we brought up the reward structure (and my favorite brain blogger).
Gregory Francis * was up next, talking about replication. (Could one do a cool rap on that. Re re re plication…) Much of this was based on his recent paper, where he goes through his method of analyzing multiple replications, their effect sizes and power, and the reasonableness of getting so many positive replications. He shows this really well. (Here’s a link to a post I made about his work earlier)
He starts up with the two mystery effects. One replicated 9 out of 10. The other something like 7 out of 13. Then he reveals the sources: the first being Bem’s infamous JPSP paper on precognition, and the other a metaanalysis of the bystander effect – and we all know which one we believe in. The gist of this is that you have to not blindly use the “replicated” heuristic. There is a publication bias that we all know of. There is a file-drawer that we all know of. T I even have anecdotal stories of editors telling researchers to not include a couple of experiments because they were null and muddied the story. But, with the middling effect sizes we are frequently dealing with (even middling effect sizes can have big consequences and deserve research), and the incredibly underpowered studies (Bargh, priming, n = 15), what you really should expect is a relatively high frequency of not rejecting the null. Always rejecting the null should be the red flag, because with the research questions this concerns, there is a great deal of noise, and you should just expect that the effect will not show up in a number of replication attempt.
I started thinking about Gigerenzer (pdf) and him using frequencies to give better reasoning heuristics. I use the “replicated” heuristic too. It seems so reasonable. If you can do it again, and again, like Pooh with pooh sticks, it must be robust. And, yes, replicating is important. But, also having a sense of what the reasonable frequency of replicating something wobbly should be. He critiqued the focus on replicating that is going on right now (because of this). But, it was tempered. The critique was not on replicating per se, but in using replicability as a heuristic for truth. We need to replicate. Failing to reject the null may be more an issue of effect size and power, than the idea being wrong. The take home, from Greg, is that we cannot do this mechanically. Procedures and heuristics are good (and is where you start learning), but you cannot mindlessly rely on them. (Klaus Fiedler said a lot about this later).
He went through Bargh. I will not. I figure he is a honest researcher in a bind. (Just look at how sad he looks in that picture). Greg also said this: that you shouldn’t think about the research, and the findings as personal. It is not attached to the researcher. The truth, if you will, is out there. But, here I think he is, well, not very psychological. We take it personally. We are researching primates with our tribes, our basking, our enemies, our theories that we fall in love with (that Fiedler also think we should). It IS personal. And, also, we like to eat (and buy clothes). Some more of his take home messages: Scientists are not breaking the rules – rules are part of the problem. (Like p<.05 magic wand, etc). We must recognize uncertainty. We don’t have strong answers. We must do meta-analyses. We should focus on effect sizes. A lot of the issues with questionable practices are inherent in the Null Hypothesis Testing methodology, and he claims Bayesian analysis would get around this (I don’t know enough about that to comment, and I know there is disagreement on this). I didn’t take it personally, but one of the attendees said she did. It felt personal. And, I kind of take it that he is missing the aspect that so many researchers actually do. (So, he does vision, with 3 people doing 30000 observations, like I did for my dissertation, and, man, do you have power then. But, I figure he is more like some of the people in the IU cog department, who I always liked, but who perhaps didn’t have the best social skills, and neither do I, but what do I know).
Klaus Fiedler talked after lunch. And, as Daniel Lakens pointed out, he is Critical. Well, that is a good thing to be. I don’t think he is much for rules to adhere to. Rules, stats, are good heuristics, but that is what they are. He brought up that he thinks we need to think more theoretically. Not just testing (that is needed too), but to spend time on theories and developing them. Also, that there is a need for considering basics, and keep them as basic assumption (the Lakatosian protective band). For example, he brought up prospect theory and the negatively accelerated curve, which shows up everywhere in the psychological domain (as if we have a specific point we are tuned to, and as we move away, effects diminish). Also, that when we make decisions about theories consider some very fundamental things in how we think the world works, not just psychology. Thus, he said, the Bem work on pre-cognition really should be thought of as work in physics – the button-push of the subject evidently seem to have influenced the random number generator…. (Yes, we laughed, but it is a good point).
He also brought up that research is creative. We have to remember the importance of creativity – and he placed a vaguely sinus curve on the power point, with time as the x axis, and the y-axis anchored by loosening and tightening – this is something that George Kelly had proposed. Research goes through cycles. There is the loosening where you are coming up with the wild ideas, where, as he said, the researcher has to fall in love with the idea. Fall In Love! So not the disinterested researcher there (and not like the research not being personal), and I think he has a point. I think you can’t help it, actually, if you are any good, but then, as in love, you may get your heart broken. Then, it is the tightening, the refining. But, this goes in cycles. (Just wondering where we are on the emotional expressiveness right now).
Last of the speakers was Brian Nosek, who briefly recapitulated what the others have brought up, and then talked about the open science project. The issue, right now, is that there is a huge bottleneck or gate between finished research and publishing. This is part of the problem with the research practices. Publishing space is dear, and it is not in any way guaranteed that you can get your research published. This is something we need to be working on. The first suggestion (which he thinks for now is a pipe dream) is to have a place to publish immediately, kind of bypassing journals, or having the journals as the last instance. Type Arxive. I think the economists have something similar (working papers). He brought up the rapidity with which the “faster than light, oops, loose cable” story happened last fall. Which, as he said, would have taken four years for a similar thing in psychology. He doesn’t think this is feasible right now (although I have started to seeing things on blogs: Rolf Zwaan recently. Doctor Zen. I think we are sick of the gate-keepers).
But, he moves more towards what can be done on the open science framework for collaboration. It is really a beautiful work, and I now have to go into it more thoroughly and start to use it. It certainly can open up the work much more than it has been. He talked about the reproducibility project. And, of course, it is important to reproduce, but it is also important to innovate. There has been too much focus on the “novel”, and too little on the robust. We cannot go completely the other way either. And also, I loved this, as researchers we are out there in the dark. We muddle. We think we sail for India, and find Jamaica. As much as we like to do confirmatory research, we do a lot of exploratory research also, and we need to. This is the frontier, babe, where no-one has gone before. Open up for this!
And, I so love Social Psychologists! As the good SP Brian is, he listed all of the (well researched) traps that even researchers fall in. All of the biases (his fave was hindsight – when you look at your data after a few months, and recall the original hypothesis as being something that is in line with the data, so why on earth did it not occur to you then?). He brings up construal theory. Yes, in the long run we want to do good science, but right now, we want to get published (so we can pay our mortgage and our clothing bill). The right now is concrete, and in the long run, as Keynes says, we’re all dead. Well, that was not what Brian said, because, of course, in the long run before oblivion, we would like to do that good science.
It was nice saying hi to Brian in person too, which I did in the break (and then the introvert got the better of me.)
The final was a panel, with members representing fraud investigation (from Stapel, and from Medical fraud), journals, university and funding, which was chaired by Denny Boorsboim. It was very interesting, but it left me wanting more. We spent so much time talking about what to do on the science side, with the scientists – and here were representatives from the other areas of the system, with much less detail, much fewer possible solutions. Perhaps that is reasonable, considering that the focus was on the researchers in this particular symposium, but this is a systemic problem.
The questionable practices are in part due to the incentive structure. But, the incentive structure is also there for reasons and interests. How do you decide who to fund? Qualitatively (as one suggested we should do more of), with all its susceptibility to biases (I know my Kahneman)? Quantitatively, with the current measures that were more developed to not flood the libraries when we didn’t have the cloud? Publishing is expensive, so you have to vet them just because of that. But, also, there is a lot of crap. I’ve heard some concerns about the danger of opening that flood gate.
There is an issue of training. Not in morality, as Jelte clearly showed. But, the rep from the Levelt report said he was flabbergasted when one of the coauthors he interviewed thought nothing of throwing out a lot of participants who did not quite behave, as the person had been trained to do so. There is the sharing of data, and labeling your datasets correctly, and keeping track of them right now (Nosek and Spies has protocols for doing that in their open science – to do it now, not then, in the hazy future, when you have forgotten it all – not sure yet if that is also in the open sci framework). But – this is still only part. What of the rest of the system?
I’m reading Clay Christensen right now (distrupting class – chrissy present from my husband), and he talks about how, well, impossible it is to do disruption from within, at least in business. This is because there are a whole lot of parts that are entrenched and that work in a system, and that have interests that it is very hard to change. This is a bit how I’m feeling now. Can we really reform from the inside? Or do we, like Mark Changizi, need to move out of the universities, or follow Ronin type funding? It is very clear that researchers really are fed up, they have been fed up and grumbling for decades, but now, perhaps, it is possible to change the practices because we have new channels. Twitter, blogs, etc. But, it is clear that it is very hard to influence the established structures. Of course, there are now some of the journals that will accept proposed work, and are less interested in seeing the p<.05. What will happen? Not sure, but I don’t think we will go back to business as usual now.
Edit * Greg stopped by. Check his comment. He is clarifying what he meant (and what I forgot).