On the present problems of publications, and possibly the coming futue? Some Labyrinthine musings.
”The future is already here. It is just not evenly Distributed”
William Gibson @greatdismal
I’m the deputy director of the cognitive psychology department (and here you thought I was a Social Psychologist. I am large. I contain multitudes!) And, as such I was called into the department head meeting where we decided which ones of the many doctoral applicants would actually make it to the final tiny pool, before offering the position.
Applications for a doctorate position in Sweden are vastly different from the US (which it took me years to get my head around). You actually apply at a much later stage – in essence to do your dissertation, unlike the US, where you get into a program, get trained, and then come up with your independent project around year 4 or so. This means that the kind of training that the first few years in grad school provides in lab work, and in getting early publications (as the n’th author, but still) is not included, and there is no clear formal path to acquire those skills. Yet. I’ve brought that up many times during my various deputized eras, and as I’m involved in quite a bit of the planning and teaching in our new master’s program, I am working on providing opportunities for students to get more research practices, and opportunites for possible publications.
To assess our applicants we uses 4 different critera: Peer reviewed publication, master’s thesis, research proposal, and applicable experience outside schooling. The last is just a 0 or 1, but the others have more possible points, and are weighted roughly equally. My focus (if I can be said to ever have one) will be on that first criterion, which we are aware has issues.
If you have published, you are automatically well ahead of anyone else. I understand why we keep this in mind, and I see why it would be hard to not consider peer-reviewed articles, especially if the applicant also is the first author. The university will be paying someone rather well (shockingly well with US grad student standards) to research for 4 years, and you want to make a good bet. Having experienced a, well, not so stringently vetted doctoral student, I agree. No guarantee, of course, but it ups the odds.
But, we started talking about how to evaluate the particular journals. In this instance, one of them had published in a standard, been around a long time, journal. The other in PlosOne. Should we rank the journals differently, as they seemed to have very different criteria for publication? Also what does it mean, quality wise, that PlosOne really has increased its number of publications since its inception?
As I seem to be in one of the future spots when it comes to this anyway, I did an impromptu problematization of journals, journal rankings, impact factors, systems gaming, the higher retraction rates from the higher IF journals, and a short summary of what I think are things to come. In an after-chat with one of the people present discussed how surprising I thought that Psychological Science is highly ranked (indeed called a top journal scroll down for it), when I thought the hard hitting science could be found in journals like Psych Bull, JPSP, Psych Review, and all those JEP variants. Also, some flagship journals, with their peer-review including the results, passed Stapel, 50 times over! (and counting)
So, I wanted to collect more of my thoughts about the problems, and where things are going, and what are the snags.
Scientists (at least the ones I follow) understand the problematic dynamics and forces that guide scientific work and publishing, and the problems are old. (Publish or perish is ancient, and may never disappear. The power and effect size and file-drawer problems have been voiced since before I was even eligible for university studies). The past couple of years have seen increasing efforts in trying to rectify this (e.g. open science frame-work. Symposia on changing practices, sections in major conferences).
But, it is clear that there are systems issues. Adjusting how you do your science at the individual level towards what just about all of us think is better, more robust research practices, may simply kick you out of the research gene pool, as what is used as the proxy is publications.
Of course publication can serve as a honest signal of your quality as a researcher. But it is not immune to faking (e.g. ex Dr. X, Plastic Fantastic Schöns,), and most definitely not immune to gaming. The strategizing surrounding when how and where to publish is part of that gaming although in itself that may not be problematic for science. The problem comes when this increases the prevalence of questionable research practices, and the number of papers of less robust quality. It is problematic when the research that is published is trivial, and spun, and half baked, because getting things out is better than getting things right. Slow, careful science is not something you can afford, when your job and your staff’s job is on the line. And, perhaps also not if your spot at being a contender is at stake.
Whether to use some type of quality ranking of journals when assessing prospective doctoral canditates also runs into problems. The most hated measures as I see it in my twitter and blog stream, is the use of the Impact factor. The impact factor was originally designed as a tool for librarians so they could wisely manage both their purchasing budget and stack space. Stack space (although not budgets, of course) are now hardly relevant. But the impact factor has been turned into a proxy tool for measuring worth of individual scientists, even though it was never designed as an indicator of research quality. The measure is used to assess scientists and to distribute scarce funding resources, and thus becomes the focus of strategizing and gaming both from the scientists and from the journals themselves. For scientists it becomes important to get papers in the highest impact journal. For high impact journals, there is an incentive to try to protect that high impact factor. It is interesting to note that two of the most high-impact journals Nature and Science – also have the highest retraction rate, in some instances due to suspected fraud, but also due to carelessness.
You can read more on the problems with Impact Factors in this arxive article written by Björn Brembs and Marcus Munafo. This recent blog also discusses the history of the impact factor, and the problems with using it as a proxy for quality. This paper looks at retraction rates and their causes from several journals. Finally, this commentary Holbrook, Barr & Wayne in Nature suggests that there is a need for also measuring negative impact – a counter weight to the current bias. (They blogged it too. some of those factors are, um, interesting). Also, follow the links in this earlier post from me, further discussing the problems with Impact Factors. Also, this history and commentary on impact factors from nature.
A concern that came up with papers published in PLOS is that the peer-review that articles go through differs from the standard version, and that potentially that could mean that the research perhaps is not as thoroughly reviewed and thus not as high quality as paper in other journals. I think it is clear from the above that a publication in a high impact (and thus supposedly high quality journal) is not identical with the notion that the research in each individual paper is of higher quality than in lower ranked journal.
But, if I understand right, what PLOS are looking for are papers with sound methodology, the results be damned. There are a few other journals that have begun implementing sections where the research is accepted based on the soundness of the method rather than evaluating research based on the allure of positive results, such as Cortex (guidelines here) and BMC. Frontiers have recently announced special issues for registered replication cognitive psychology and before that Social Psychology called for proposals for a similar special issue on replication. Perspectives in Psychological Science just announced their first call for registered replication of a very well-known result in psychology (Schooler’s verbal overshadowing). The psych file drawer, although not a journal, is a repository for both replications and call for replication of well-known results.
These fledgling efforts are designed to try to get a handle on publication bias in a manner that neither null-result journals or replication journals have been able to handle. As long as the focus is on the allure of the positive results, and the glory goes to those who amasses the most p< .05 driven studies in the highest impact journals, that is what scientists will chase, and journals will select. This may be fine for exclusive clubs, but it is bad for an enterprise that, ostensibly, are searching for that small t’d truth. Negative spaces are information too, as anybody who as ever tried to draw or paint knows.
Some weeks back James Coyne suggested, on twitter, that the high impact journals would make sure they stayed away from the riff-raff of negative results and replication, in order to protect their impact factor. Although his tweet was both pithier and wittier. And, yes, I think those who hope we are on one of those future is here bubbles need to consider that. Protecting ones value is likely one of those organizing principles in evolution, so far deeper than the science problem right now. It is also not idle speculation (not that I think JC ever would), as journals have very explicitly stated that they are reluctant to implement different kinds of policies because it could affect their impact factor (Find the psych science advances whatever that I saw).
Sanjay Srivastava had a suggestion that could possibly check the behavior of exclusive journals, and that is what he calls the pottery-barn principle (you break it, you buy it – It really predates pottery barn). You publish it, you also publish the replication attempts. You now have a responsibility for this bit of research. No reaping the glory without risking the let-down.
There are more concerns with parsing up the measures of published articles, especially for assessing new doctoral candidates. Who gets published when in what journal as which number author has an arbitrary lumpiness to it, which means that it is a particularly iffy measure especially for those that are basically pre-career. At this point, at my university, there is no formal way for students to have a chance at getting a publication prior to applying for a doctorate. Some of them will have worked in labs, where they may have been involved in work that turns out to be publishable, and where also the PI feels it is reasonable to share authorship. Thus, learning research chops in a lab where the research is highly speculative, or learning them where the sharing of authorship is stingier is to your disadvantage. So far, at our university, this would take place outside of the standard course work. Even though I am working on implementing more sections where students are doing research work that could potentially be publishable, as part of their training, it is not yet standard. There are some really good labs here that are happy to let students work with them, but even that is lumpily distributed.
But, this also opens up for a gaming path, which may water down research further. If publication is at a premium for becoming accepted into a program, the strategic thing to do, both for teachers and students, is to focus on having them do research that are more of a sure publishing thing, in order to give them an edge in applications, which means focus away from both more skills training, and pursuing more risky projects that may never pay off. The effect may not be strong, but it goes in the same direction as the other incentives. (Check this rather depressing account of perverse incentives from Paula Stephan. I’ve linked it before on my other but it deserves linking again).
The penchant for spinning and exaggerating the meaning of results (rather than being careful and hedging) is not being well checked, although this “still not significant” post is an amusing list of what those not significant p-values have been called.
The other day, Razib Khan re-tweeted the following exchange
Alicia Martin“Tibshirani: academic competition creates exaggerated claim feedback loop. #bigdatamed”
Richard Harper“ rephrasing? competition has become structured so that exaggerated claim-making is under strong positive selection.”
I suggested (mixing my levels) that we needed to introduce a natural predator.
As of now, there is none, as the costs of the current system are borne not by the originating scientists or the publishing journals, but others spending time on failing to replicate, or using wobbly results in reasoning thus being misled. It results in a knowledge structure that is not robust, but there are no natural costs for those generating the results to be more careful, but plenty of incentives to keep doing what everybody have been doing. So we are drowning in a soup of weakly powered research and exaggerated claims. Within the basic research of psychology where I hang, these invisible costs are largely borne within the community, but when the research concerns medicine or clinical practices, there is a real risk for iatrogenic effects.
There is also the emphasis on the novelty factor (in some ways ironic, as the incentives really goes against taking risks, which is the only way novelty really could be uncovered – unless you do to the word Novelty what Humpty Dumpty does to – well – I think he’d do that to Novelty too, given the opportunity).
In psychology – (which is messy, with multiple possible factors, interacting in most likely non-linear ways although we always try to linearize, using mostly underpowered studies, because we just learn quick heuristics for assessing power) there really needs to be an emphasis on the robustness of a finding, rather than chasing new, exciting and surprising things that can give us a bit of charge.
In some areas this is done. I remember the careful building of reaction time distributions in the Townsend lab, and the years of testing on the most simple features. And, the other day I watched a presentation by Leo Poom where hepresented work on number of aspects about visual short term memory – how many items can you remember (one!). How long before it falls apart (if simple, it doesn’t). Careful plotting out of a very small aspect of processing. So much fun.
If you take the view of novelty that you start tweaking research to look at a slightly different aspect, triangulating in the phenomenon, but not stepping very far, then, yes. But, there needs to be much more tracing over the territory in psychology. The careful, plodding, plotting, replications with small tweaks, new populations, etc. That should be novel enough. This is what I would think is Kuhn’s normal science, and we may need more normal science.
These are systemic problems, and there really is no particular instance to blame. It is easy and tempting to point fingers and accuse others in the system (those editors, those journals, those funding agencies, those tenure track committees, those p-hacking scientists), but in every instance this is done by humans with human foibles, susceptible to those standard pressures and desires and wants both noble and base, and it is rather pointless to ask THEM to change because there is no THEM. It is that feedback loop.
As I’m going through Taleb’s anti-fragile, I noticed where he claimed that it is pointless railing against human foibles that always have been with us. He talks about greed – being the ex-trader that he is. But, that can also fit misplaced good intentions as well as quality tradeoffs, or doing what everybody has done before. The remedy is not really to tell people to stop being so damned human, but to tweak the system so it is robust against all our foibles, well intended or not.
Systems issues with misaligned incentives are not new. In all systems (well all? perhaps just many, but I just doubt there will ever be a perfect one except heat death perhaps) the weak points will be found and exploited, and there will and attempts to block the gaming. This is what I gathered from Robert Trivers talking about evolution of mimicry and deception either to predate or defend against predation. This is clear from Peter Turchin’s work on the rise and fall of empires. Francis Fukuyama discusses it extensively, across state-building after state-building (beginning in ancient china, via india, an on up to the French revolution) the forces from the state attempting to negate the strong desires to feather the nest for ones kin, and how once the state becomes de-generate, the need of the kin takes over again. Systems are never in balance, and people will pursue their personal goals if the incentives are in that direction.
The system in place now may have been quite adequate at a time when researchers were fewer, and the speed of accumulation was slower, (although there certainly were frauds then too).
By now, there are not enough natural checks against iffy practices, and not enough cost borne by those that perpetuate them. And, you are trapped. An individual cannot do much to change. The Impact factor link above (my post – telling you to follow the links) shows an exchange regarding exactly this issue.
My little part, and god knows, it is small, is to at least discuss and problematize these issues. To try to implement better practices in the training of my students. To bring this knowledge to my colleagues. Because I think fundamentally, most of us who are in this business from the lowly human lab-rat to the editors, reviewers, administrators, funding agencies, are in it not for the money (hah) but for the science. Searching for that little t truth.