On Trust, and the Process of Science
Some weeks ago, there were two tweet-streams that were about trust in science.
The first included Akira O’Connor’s successful campaign against a rejection based on a single review wherein he was accused of p-hacking. Evidently he is not alone when it comes to this experience. From being a high-trust endeavor, where you might have accused people of doing inane and misguided research, there is now suspicion that you are fudging research (but see data-Coladas excellent tutorial on how to respond to suggestions of p-hacking).
The second was from Keith Laws, stating that pre-registration is not checking the sloppiness and the HARKing, as journals don’t always hold the researchers to their preregistration.
In short supply.
When I re-read David’ Hull’s “Science as a process” this summer I ran across his claim that scientists very rarely falsified results. That is not because scientists are a particularly virtuous group – he really strongly states that scientists are human with all the foibles of ambition, self-serving biases, querulousness as well as the standard issue of nice traits, and that this doesn’t matter for science to work. The reason outright fraud was so rare is that it harms knowledge and ALL of the knowledge workers. As a scientist, you need to trust that what comes before works because, as important as reproducibility is, very few have the time to spend reproducing earlier results. We must trust results. They can be flawed, but they must be honest.
But, why was this enough? Well, his model of how the scientific process in the long run accumulates more knowledge, despite being done by flawed human beings, is one of replication and selection: An evolutionary process. Each scientist wants their ideas to spread, to replicate, to be selected, and one of the mechanisms for this is credit. I have a good idea, I test it and publish. You build on it, and give me credit for the good idea.
If I put out an idea based on faked results, my ideas will be selected against, rather swiftly, once found out. That is, you’re dead. Would any of you cite Stapel? Even his non-indicted papers? How about Marc Hauser? Do we really really know about Föster. Would you cite without careful scrutiny?
At the time Hull was writing this (the book was published in 1988), science was, perhaps, smaller. His test-groups were two branches of classification scientists – those that work on finding ways on how to classify species of animals, plants, protists and the likes. The two groups he followed seemed somewhat intimate, and entangled in discussion. The work was published in this one journal, where about 60% of papers sent in were published.* Many of those 40% not published was because the authors never re-submitted. There was a great deal of scrutiny. A faker might very well be discovered early on, and would be out of the science pool.
Stealing, he, claims, was tolerated (as in plagiarizing and appropriating other people’s ideas), because it only hurt the individual stolen from. Fraud hurts everybody.
The fact that Fraud hurts a sizeable proportion of scientists and science is still true, of course (as does the less than robust science, which perhaps is behind the accusations of p-hacking, but not behind sloppiness with pre-registrations).
So what has happened, if anything?**
As I, and many others before me, have pointed out, Science is now a huge enterprise which overproduces scientists. This makes the competition for slots to get to do science that much more fierce – in true evolutionary manner. Evolutionary processes filters for fittest something, but whether this something also coincides with what humans considers good (in this case, increased true knowledge) is not guaranteed at all. Evolution, in its tritest is whatever survives survives.
Towards the end of his book, Hull asks a number of questions that are outstanding from his evolutionary model. One of them is – what happens if competition sharpens? Competition has always been a part of science, but Hull also spends a great deal of time demonstrating how important cooperation is for science to function well, and for science to produce more and more reliable knowledge. Citation is the minimum of cooperation – all of us need to rely on the work of other scientists in order to advance our ideas, and we need to acknowledge their work. But he goes further, demonstrating that you need cooperative allies – Demes. You may not all agree, but usually there is some idea or concept that you agree upon, and that you are all working on, and that you have a similar view on. This could be Darwinism or Cladistics (from his book). It could also be Social Priming, Persuasion, Emotion, what have you. There can be skirmishes, where one group – Deme – marshals evidence for their idea against the ideas of another group (Categorical vs Dimensional concept of emotion, Cladistics vs. Phenetics; Darwinism vs. Idealism – the latter two from Hull). This arguing can be fruitful, and in itself advances science. Having allies is important. Hull demonstrates quite well that ideas that only have single proponent or proponents who cannot cooperate don’t do well for survival of that particular idea.
Hull also mentions, towards the end of the book, that career concerns (rarely mentioned, but of course mattering) tended to align with the more vocal concerns about getting the science as right as possible. Doing good science in a productive deme got you published and cited more, and could be transformed into better career opportunities and resources for continuing driving the idea forward.
Perhaps it is here things have broken down, in the increased competitiveness – I think Shauna Gordon-McKeon’s “When science selects for fraud” lays this out very well. Career concerns are no longer as well aligned with good science. In fact, it can interfere with it, as has been discussed over and over again in various blogs. (Both Jelte Wicherts and Brian Nosek brought that up in the “beyond questionable science” symposium. Worth a second look here).
So, together, the sheer size, lack of good demes and competitiveness can have diluted how the processes in science effectively select against fraud and cheats.
Honest signals, and their faking.
If you look at game theory/evolutionary models on how trust can be maintained there must be some means for the cooperative individuals to protect against the untrustworthy (inspection), and some means to make it costlier to cheat (e.g. damaged reputation). I mused on a model based on Robert Frank’s emotion model in this blog post, but there is plenty of work looking at how to dis-incentivize cheating.
Concern about reputation (as gossip and reputation is a way to keep cheating in check) is one route towards maintaining trust. In science that would be having a reputation as a good hones scientist. *** But, reputation can be gamed. In my marketing psychology course, based on Cialdini’s “Influence” we discuss how authority can be coopted through, for example, clothing or titles. When the field is large and impersonal, as most scientific fields are now, the indicators may be very much removed from actual performance – indicators like number of publications in which papers with what amount of citations – and here journals are also working on maintaining their reputation by perhaps being known for flashy discoveries, or high rejection rates, none of which necessarily correlates highly with increasing actual knowledge (as perhaps the high retraction rate from the glam magazines indicate. Lots of work has been done on this). Publication, journals, and citation are then not necessarily honest signals for high quality, but sometimes, like the king snake or the cuckoo, mimicry.
Routine inspection (peer-review), is somewhat costly, but should be a way of ferreting out at least some of the cheaters. But, a surprising number of papers have been through peer-review where problems were not discovered. Perhaps, as Frank suggested, inspection got lax because scientists generally trusted that the other scientists were honest. The larger amount of honest cooperator, the less time is needed to devote to inspection (which then can be devoted to other, more productive activities).
When the fields are huge, there is not enough nearness to the agents in order to verify and inspect. What rises to the top may not be who does solid work, but who can project well – possibly a kind of narcissism.
I don’t know how to restore trust. But, the ease of establishing social connections via twitter and blogs may make it easier for us to share what doesn’t work, so we don’t end up like this poor bug (thanks to Felicia Felisberti who tweeted it in).
Efforts to do post-publication peer-review also allow more public scrutiny of results from scientists both friendly and unfriendly towards those ideas. (Friendliness is not a requirement. If you are against a theory you may be more likely to find its holes than if you love it. Hull lifts up that kind friendliness is not a requirement for science to go forward, as much as some of us would like it to be so). And, perhaps lifting up how incredibly important cooperation and collaboration is. Competition has its points, but when you use that as the only gage, you get the Lance Armstrong effect. One can argue about the goodness or badness about that in sports, which I tend to think of as trivial. It is not trivial when your ostensible goal is to increase our knowledge about the world.
*(there is a whole chapter analyzing who is accepting papers from which group to specifically investigate if there were obvious biases against the opposite camp. Conclusion – not really),
**I’m making the assumption that there is an increase in fraud. There certainly has been an increase in less than robust science. Feel free to contest.
***According to Hull there are a couple of other issues involved here, which has to do whether one choose to do solid but not very exciting research or risky research. Plodding puzzle solving is low risk, and a way of maintaining a solid reputation as trustworthy. Taking more risks could either result in a very high reputation if the research pans out, but one risks taking a big hit to reputation if it doesn’t, or if it too frequently turns out that the exciting research is not robust. This is entirely with the assumption that both the plodding and the risky work is done honestly.
****I have adopted Simine Vazires footnotes.