The PhD Metagame
Don't Try to Reform Science
Did you know there are two sciences?
Donât try to reform science. Not yet. Not in your PhD.
In grad school, youâll start complaining about the publishing system. Why do we have to communicate all of our findings exclusively through academic papers? Why canât we publish negative results, or spend time making things actually work? Why must weâall active academicsâcontinuously emit a stream of papers? And why do we let the notoriously random and inconsistent review process gate our careerâs progress?
Several things partially alleviate this: you can arXiv papers regardless of acceptance. People write blogs. There are glimmers of alternative, richer publishing formats. Heck, my fieldâs most powerful models arenât even published anymore.01
But, inevitably, the hard facts come down: youâre not going to change how science works in the course of your PhD. Youâve got a PhD to do, and trying to reform science without any authority or reputation will just suck away your time. Worse, when you consider abolishing the traditional system of peer review and standardized formats, intricate and mutually-dependent problems emerge. In other words, problems in doing science are easy to find but hard to solve.
You might be thinking that âdoing scienceâ doesnât require any of this mess. In a way, youâre right, but youâre thinking about Science 1, when youâre actually doing Science 2.
There Are Two Sciences
Letâs call them,
-
Science 1 â an idealized concept. Trying to understand how the world works. Seeking and describing truth. The scientific method (maybe?). âPure science.â
-
Science 2 â how humans attempt to do Science 1. A cultural practice that now spans multiple societies. âScience in practice.â
Important facts about Science 2:
-
people require both training and money (think: salaries, grants, the student â professor transition)
-
lots of people are trying to collaborate (think: spending huge amounts of time communicating work)
-
lots of people are competing for limited resources (think: need ways of determining who is more qualified than another)
Science 2 is a social practice because it must be. While lone wolves can go off and do Science 1 on their own, if youâre reading this, thatâs probably not you. Past influential solo researchers I know of were landed gentry (e.g., Boyle, Newton-ish, Darwin, Maxwell) working in immature fields. Even great ideas that lacked the scientific communityâs acceptance languished for years (e.g., Mendelâs, Wegenerâs, Boltzmannâs). Plus, remember many brilliant productive researchers thrived on collaboration (e.g., Watson / Crick, Einstein / Bohr, ErdĹs / everyone else).02
Because Science 2 is a social activity, most of what happens is communication between humans.
Science 2 is like an enormous fleet of boats all sailing off to explore some big ocean. You are in a boat (or, you might just be in your advisorâs boat) trying to chart new territories. You have to shout âcome check out where Iâm going!â and then people must decide whether to listen to you or completely ignore you. If nobody knows who you are, why would they trust you?
The Swirling Mass Is Knowledge
Since youâll spend so much time working in Science 2, I think itâs worth peeling back the curtain and telling a Science 2 story.
One way of conceptualizing Science 2 is illustrated by Larry McEnerney in a talk on academic writing. He posits the following mental model: the insiders of your field are having a collective, evolving conversation about what they know and care about.
This is the fieldâs knowledge. If you try to contribute something, but the swirling conversational mass does not find it interesting (for one of myriad reasons), your idea does not join the pool of knowledge.
The cynical view of this is that the most privileged academics gatekeep knowledge itself by filtering facts with their own agenda.
A more pragmatic view is that âknowledgeâ only really exists in our minds. And due to specialization of labor (etc.) we have a group of individualsâthe academyâwho are in charge of deciding what knowledge is per field. This idea triage is genuinely useful. If youâre new to a field, your first many ideas will probably be already-studied or boring.03
Larry was mostly talking about history in that lecture. I think hard sciences fare slightly better. We also have the swirling conversation, but that conversation decides on objective measures they agree to care about. If you win big on a difficult objective measure, itâs a guarantee of recognition. Science 2 is still making the rules, but you play by them and do extremely well, they canât ignore you.
The Story of BERT
I saw this firsthand with BERT04 in my field (NLP).05 Now becoming a historical footnote in the GPT-era, BERT was nothing short of a revolution in the field when it happened. You could cleanly draw a line pre-BERT and post-BERT. After it came out, something absurd like 95% of papers used it. It was so good, nobody could ignore it.
Naively, youâd think such a revolutionary paper would be met with open arms. But when it was given the best paper award (at NAACL 2019), the postdocs I talked to universally grumbled about it. Why? It wasnât interesting, they bemoaned. âIt just scaled some stuff up.â
While this may have been true, nobody had grumbled at previous novel efforts to âjust scale upâ (but actually also innovate) language representations: ELMo.06 Why didnât they like this one? One reason is surely that The Bitter Lesson07 is never fun for smart researchers when it happens, and the enormity of BERTâs success made it particularly bitter.
But I think thereâs also evidence BERT was not properly part of the cultural conversation. While news of ELMo made the rounds among the community beforehand, BERT was developed completely in secret. Plus, it was made at Google, rather than a university. Its conference presentation was bad, violating the norm that best paper talks are usually highly polished as a sign of respect to the community. I think these cultural details, combined with the usual dissatisfaction from The Bitter Lesson, plus perhaps a sprinkle of jealousy, led to a negative feeling about BERTâs academic recognition nobody admitted publicly.
This illustrates the two-layered view of knowledge in the hard(er) sciences: the swirling mass is still present, sitting one level of abstraction above the results table. That community chooses objectives to care about and promises to abide by them. The metrics then define the law, and we obey what works. But the cultural conversationâabout what is interestingâremains.
A caveat to the Science 1-ness of the results table is that revolutionary breakthroughs are rare.08 So you still must be part of the swirling mass. Developing taste is understanding what the swirling mass likes. Entering the swirling mass is making connections. Participating in the swirling mass is building a line of research with good taste.
Itâs no coincidence that successful professors09 have airline gold status from the number of university talks they give.
Many Science 2s
If you really want to get into the weeds, you can think of Science 2 as several interconnected orbiting conversations.
A single research lab develops a relatively coherent Science 2. Then, if there are multiple department labs in the same field, theyâll form a larger university Science 2. The whole field for sure has a zeitgeist, which you could call the prevailing Science 2. Big enough institutions like Google not only have their own distinct Science 2, they have several of them.
If you do an internship, you can get culture shock from entering a new Science 2. All your base compass readings change: the models people first reach for, the citations they most readily give, the advances they think are important, and the directions they think are promising. An industry Science 2 feels remarkably alien. Not just because of the differences, but because the conversation isnât being led by your advisor, but by someone six layers of management above you youâll never meet.
Donât Call for Reform on Your Dinghy
While I do share many goals of scientific reform, you have to put in the leg work for anyone in the scientific community to trust what you think. Plus, putting in the legwork by participating in Science 2 as it exists today ensures you have a thorough understanding of it. Seemingly obvious fixes donât work. You might not even know people have tried them until you meet 100 people in academia and some tenured person tells you that person X already tried that and hereâs what happened.10
I write this because PhDs seem to attract a lot of smart, idealistic kids who are interested in doing Science 1 and donât realize that theyâve signed up to do Science 2. Then on year one they jump off their advisorâs boat, and start rowing a wooden plank around, yelling at the whole earthâs science fleet to change Science 2 to align with Science 1 before theyâve published their second paper. Not yet. Nobody can hear you. Not yet.
Footnotes
The whole secret industry research thing actually sort of subverts the publishing issue rather than alleviating it. âŠď¸
I am not a historian, please excuse these brazenly basic examples. âŠď¸
This distinction is painful on both sides. Even as a measly PhD student, after a few years it becomes painful to field research ideas from hobbyists / non-academics / prospective PhD students / industry folks, because what theyâve come up with is cool and interesting to them, and you want to keep that spark alive, but you know the field has been down that road fifteen years ago and it was overall meh and how do you even say this to someone gently? âŠď¸
If youâre not familiar, think of BERT as an LLM precursor that had a big impact within its field. It improved performance on basically every task we knew of. âŠď¸
Ontology: Computer science (CS) > machine learning (ML) > natural language processing (NLP). âŠď¸
If youâre an outsider to NLP, Iâm sorry, yeah we had a big Sesame Street phase, nobody really knows why. âŠď¸
The Bitter Lesson basically says that AI always works better when you throw more compute (Iâd add: and more data) at it, rather than any clever programming or knowledge. In the same spirit: âEvery time I fire a linguist, the performance of the speech recognizer goes upâ (Jelinek, roughly). âŠď¸
I.e., probably donât bet on one for your PhD. âŠď¸
Except the very senior ones. It seems youâre eventually allowed to pass on all the invited talks and conferences. âŠď¸
In fact, the very conference youâre at (while talking to this hypothetical senior academic) might have been created because someone wanted to reform how their field works. It happens more than you might think. âŠď¸