The PhD Metagame
Don't Try to Reform Science
Did you know there are two sciences?
Don’t try to reform science. Not yet. Not in your PhD.
In grad school, you’ll start complaining about the publishing system. Why do we have to communicate all of our findings exclusively through academic papers? Why can’t we publish negative results, or spend time making things actually work? Why must we—all active academics—continuously emit a stream of papers? And why do we let the notoriously random and inconsistent review process gate our career’s progress?
Several things partially alleviate this: you can arXiv papers regardless of acceptance. People write blogs. There are glimmers of alternative, richer publishing formats. Heck, my field’s most powerful models aren’t even published anymore.01
But, inevitably, the hard facts come down: you’re not going to change how science works in the course of your PhD. You’ve got a PhD to do, and trying to reform science without any authority or reputation will just suck away your time. Worse, when you consider abolishing the traditional system of peer review and standardized formats, intricate and mutually-dependent problems emerge. In other words, problems in doing science are easy to find but hard to solve.
You might be thinking that “doing science” doesn’t require any of this mess. In a way, you’re right, but you’re thinking about Science 1, when you’re actually doing Science 2.
There Are Two Sciences
Let’s call them,
-
Science 1 ← an idealized concept. Trying to understand how the world works. Seeking and describing truth. The scientific method (maybe?). “Pure science.”
-
Science 2 ← how humans attempt to do Science 1. A cultural practice that now spans multiple societies. “Science in practice.”
Important facts about Science 2:
-
people require both training and money (think: salaries, grants, the student → professor transition)
-
lots of people are trying to collaborate (think: spending huge amounts of time communicating work)
-
lots of people are competing for limited resources (think: need ways of determining who is more qualified than another)
Science 2 is a social practice because it must be. While lone wolves can go off and do Science 1 on their own, if you’re reading this, that’s probably not you. Past influential solo researchers I know of were landed gentry (e.g., Boyle, Newton-ish, Darwin, Maxwell) working in immature fields. Even great ideas that lacked the scientific community’s acceptance languished for years (e.g., Mendel’s, Wegener’s, Boltzmann’s). Plus, remember many brilliant productive researchers thrived on collaboration (e.g., Watson / Crick, Einstein / Bohr, Erdős / everyone else).02
Because Science 2 is a social activity, most of what happens is communication between humans.
Science 2 is like an enormous fleet of boats all sailing off to explore some big ocean. You are in a boat (or, you might just be in your advisor’s boat) trying to chart new territories. You have to shout “come check out where I’m going!” and then people must decide whether to listen to you or completely ignore you. If nobody knows who you are, why would they trust you?
The Swirling Mass Is Knowledge
Since you’ll spend so much time working in Science 2, I think it’s worth peeling back the curtain and telling a Science 2 story.
One way of conceptualizing Science 2 is illustrated by Larry McEnerney in a talk on academic writing. He posits the following mental model: the insiders of your field are having a collective, evolving conversation about what they know and care about.
This is the field’s knowledge. If you try to contribute something, but the swirling conversational mass does not find it interesting (for one of myriad reasons), your idea does not join the pool of knowledge.
The cynical view of this is that the most privileged academics gatekeep knowledge itself by filtering facts with their own agenda.
A more pragmatic view is that “knowledge” only really exists in our minds. And due to specialization of labor (etc.) we have a group of individuals—the academy—who are in charge of deciding what knowledge is per field. This idea triage is genuinely useful. If you’re new to a field, your first many ideas will probably be already-studied or boring.03
Larry was mostly talking about history in that lecture. I think hard sciences fare slightly better. We also have the swirling conversation, but that conversation decides on objective measures they agree to care about. If you win big on a difficult objective measure, it’s a guarantee of recognition. Science 2 is still making the rules, but you play by them and do extremely well, they can’t ignore you.
The Story of BERT
I saw this firsthand with BERT04 in my field (NLP).05 Now becoming a historical footnote in the GPT-era, BERT was nothing short of a revolution in the field when it happened. You could cleanly draw a line pre-BERT and post-BERT. After it came out, something absurd like 95% of papers used it. It was so good, nobody could ignore it.
Naively, you’d think such a revolutionary paper would be met with open arms. But when it was given the best paper award (at NAACL 2019), the postdocs I talked to universally grumbled about it. Why? It wasn’t interesting, they bemoaned. “It just scaled some stuff up.”
While this may have been true, nobody had grumbled at previous novel efforts to “just scale up” (but actually also innovate) language representations: ELMo.06 Why didn’t they like this one? One reason is surely that The Bitter Lesson07 is never fun for smart researchers when it happens, and the enormity of BERT’s success made it particularly bitter.
But I think there’s also evidence BERT was not properly part of the cultural conversation. While news of ELMo made the rounds among the community beforehand, BERT was developed completely in secret. Plus, it was made at Google, rather than a university. Its conference presentation was bad, violating the norm that best paper talks are usually highly polished as a sign of respect to the community. I think these cultural details, combined with the usual dissatisfaction from The Bitter Lesson, plus perhaps a sprinkle of jealousy, led to a negative feeling about BERT’s academic recognition nobody admitted publicly.
This illustrates the two-layered view of knowledge in the hard(er) sciences: the swirling mass is still present, sitting one level of abstraction above the results table. That community chooses objectives to care about and promises to abide by them. The metrics then define the law, and we obey what works. But the cultural conversation—about what is interesting—remains.
A caveat to the Science 1-ness of the results table is that revolutionary breakthroughs are rare.08 So you still must be part of the swirling mass. Developing taste is understanding what the swirling mass likes. Entering the swirling mass is making connections. Participating in the swirling mass is building a line of research with good taste.
It’s no coincidence that successful professors09 have airline gold status from the number of university talks they give.
Many Science 2s
If you really want to get into the weeds, you can think of Science 2 as several interconnected orbiting conversations.
A single research lab develops a relatively coherent Science 2. Then, if there are multiple department labs in the same field, they’ll form a larger university Science 2. The whole field for sure has a zeitgeist, which you could call the prevailing Science 2. Big enough institutions like Google not only have their own distinct Science 2, they have several of them.
If you do an internship, you can get culture shock from entering a new Science 2. All your base compass readings change: the models people first reach for, the citations they most readily give, the advances they think are important, and the directions they think are promising. An industry Science 2 feels remarkably alien. Not just because of the differences, but because the conversation isn’t being led by your advisor, but by someone six layers of management above you you’ll never meet.
Don’t Call for Reform on Your Dinghy
While I do share many goals of scientific reform, you have to put in the leg work for anyone in the scientific community to trust what you think. Plus, putting in the legwork by participating in Science 2 as it exists today ensures you have a thorough understanding of it. Seemingly obvious fixes don’t work. You might not even know people have tried them until you meet 100 people in academia and some tenured person tells you that person X already tried that and here’s what happened.10
I write this because PhDs seem to attract a lot of smart, idealistic kids who are interested in doing Science 1 and don’t realize that they’ve signed up to do Science 2. Then on year one they jump off their advisor’s boat, and start rowing a wooden plank around, yelling at the whole earth’s science fleet to change Science 2 to align with Science 1 before they’ve published their second paper. Not yet. Nobody can hear you. Not yet.
Footnotes
The whole secret industry research thing actually sort of subverts the publishing issue rather than alleviating it. ↩︎
I am not a historian, please excuse these brazenly basic examples. ↩︎
This distinction is painful on both sides. Even as a measly PhD student, after a few years it becomes painful to field research ideas from hobbyists / non-academics / prospective PhD students / industry folks, because what they’ve come up with is cool and interesting to them, and you want to keep that spark alive, but you know the field has been down that road fifteen years ago and it was overall meh and how do you even say this to someone gently? ↩︎
If you’re not familiar, think of BERT as an LLM precursor that had a big impact within its field. It improved performance on basically every task we knew of. ↩︎
Ontology: Computer science (CS) > machine learning (ML) > natural language processing (NLP). ↩︎
If you’re an outsider to NLP, I’m sorry, yeah we had a big Sesame Street phase, nobody really knows why. ↩︎
The Bitter Lesson basically says that AI always works better when you throw more compute (I’d add: and more data) at it, rather than any clever programming or knowledge. In the same spirit: “Every time I fire a linguist, the performance of the speech recognizer goes up” (Jelinek, roughly). ↩︎
I.e., probably don’t bet on one for your PhD. ↩︎
Except the very senior ones. It seems you’re eventually allowed to pass on all the invited talks and conferences. ↩︎
In fact, the very conference you’re at (while talking to this hypothetical senior academic) might have been created because someone wanted to reform how their field works. It happens more than you might think. ↩︎