This past weekend, I spent a dozen hours playing with Open AI's ChatGPT language model and thinking about how it affects researchers. I studied statistics and CS as an undergraduate at Harvard, and I now run a research program for talented high school students. So, the question kept coming to my mind: how will these tools change how students do research?
After only a few hours, it's clear to me that it will change how we do research. But, the way is subtle - potentially involving more about the way we share our research and less about how we do research.
First of all - what is ChatGPT? Why should I care?
ChatGPT is a large language model trained by OpenAI. It can generate human-like text based on input provided to it. ChatGPT is one example of the many AI models being developed and used for various purposes, including natural language processing and generation.
ChatGPT opened up a researcher version for anyone this past week, and people have been using it extensively to write plays, poems, and essays.
(By the way, ChatGPT wrote that descriptive paragraph of itself).
My experiment - writing a research paper on high school employment using ChatGPT
To kick off, I decided to try to use ChatGPT to write a literature review that looks at the impact of high school employment on long-term earning incomes. One of our Lumiere students wrote a similar research paper on high school employment, combining both novel data collection he collected and a literature review that got published in the World Financial Review. I wanted to see how close I could get to his work. To be clear, this is a social science research paper - but, I think a STEM-based literature review paper would likely have similar results. Almost all research papers lab-based or not involve a literature review in the beginning.
I began with the following prompt "Please write me a literature review on high school employment and its effects on career earnings." - and below is the response I got. At first glance, a plausible literature review without citations. I decided to dive a bit deeper into what ChatGPT wrote.
A seemingly reasonable literature review by ChatGPT
Literature reviews are about successfully synthesizing information together to provide a shared context for the reader and the researcher. I wanted to look at the actual content of what it wrote to see how well it understood the material.
Let's look at the first study in the second paragraph.
Here's what ChatGPT said:
"One study, published in the Journal of Labor Economics, found that high school employment is associated with a 5-10% increase in earnings in young adulthood. This effect is strongest for students who work in jobs that are related to their future careers, such as internships or apprenticeships. These types of jobs provide valuable experience and skills that can be transferred to future jobs, which can increase an individual's earning potential."
My deep dive into its academic reference
Without a direct citation, it's hard to know what paper the model is referring to - so I looked around and found a 1997 paper by Christopher Ruhm, which is what the model referenced. It was the only paper I could find in the Journal of Labor Economics about high school employment. It was also highly cited, making me think it was this paper.
I read Ruhm's paper to see how ChatGPT did in describing the study. Here's a quick background: In this 1997 study, Christopher Ruhm looked at a historical dataset of students in the 70s who had part-time jobs; he created an econometric model that controlled for other variables such as family background, school grades, gender, race, and then he analyzed how different amounts of work in high school related to later earnings.
If you read the paper, you'll quickly notice that the summary that ChatGPT created is directionally right but specifically off.
ChatGPT says the range is associated with a 5-10% increase in later earnings, but the econometrics model that Ruhm created shows effects of senior year employment that vary between 4% to 21%. Ruhm also finds that this effect is mainly driven by students who do not attend college - meaning the story is more about transitioning effectively into the workforce, not as much about human capital development. Ruhm also shows how the effect is partially driven by people working more hours, not just earning higher wages. The synthesis in the first sentence is somewhat off at best and misleading at worst.
More concerningly, ChatGPT's second sentence makes up a wrong claim about the paper. ChatGPT writes, "This effect is strongest for students who work in jobs that are related to their future careers, such as internships or apprenticeships." But, if you read the paper, it doesn't at all include this finding. This is not addressed in the paper at any point in the paper. In fact, the paper's main point is that working in high school relates to higher compensation. But, this effect is driven by the school-to-work transition. So, its effect wears off if you look at college-educated students (who have a gap) and employment in sophomore or junior year (which is not directly to jobs).
A confident-sounding but factually incorrect claim by AI is often called a "hallucination," which is part of the generative nature of these models. Open AI is aware of this limitation and working on it, but it reflects a general issue that many people have seen with these models. They are creative, interesting, even captivating, but the final 1-2% of producing highly accurate summaries that stand up to deep scrutiny isn't quite there.
Here is an example of another hallucination in Hacker News
What will this mean for research paper writing?
I am still bullish on what generative AI will do for researchers and research writing. But, there are short-term and long-term limitations.
Given what I've seen about this technology and my experience seeing thousands of students do research with Lumiere, a few things will likely happen.
My takeaways about ChatGPT and research paper writing
1. Research isn't going anywhere - but novel insights will become more important than literature reviews
These models take from the corpus of human knowledge and combine them in useful ways. However, they are not yet developing new knowledge (with perhaps notable exceptions like AlphaFold which predicts protein structuring). This means that the process of generating new insights (research) becomes even more important.
What becomes less important is the ability to synthesize information in a text form. Though the above model is not quite there yet, it will get there. This means that researchers will spend less time on cumbersome literature reviews and more will be spent on novel data collection and analysis.
2. These generative models are surprisingly good at creating novel combinations but not quite good enough (yet) for factual syntheses - especially for research topics
The topic I chose - high school employment's impact on long-term earnings - is an academic topic, but it's not a particularly complex one. As we get to even more technical topics, I expect the risk of error will be higher as the model tries to synthesize things that require exact language and less training data. I do not trust these models (yet) to synthesize information with high accuracy. They can, however, be helpful in creating structures, outlining, and even elaborating on your initial ideas.
3. It can be hard to detect bull$*#$ with these models because the writing is good
If you just read through the above literature review, you would think it's fine, maybe even good. That means that these models have gotten very good at writing human-like language that flows well and connects logically.
The issue here is that we don't know what is true and what isn't (or what is sometimes true, but not always true). That means that for researchers, you'll need to now get good at double-checking claims that are made and understanding the context before blindly moving ahead.
4. The workflow of writing research papers will change – involving a combination of writing out outlines, using AI to generate content, and editing
The workflow of writing research papers will change. Now the role of the human will be to create strong outlines and identify useful sources. From there, we can use these models to approximate what would have been written in a few hours. Finally, researchers will go back, and edit/fact-check.
5. These generative models will make research writing more accessible, especially for non-native English speakers
Many of the best researchers in the world are not native English speakers. However, they are often penalized because their papers look and feel less professional than their native English-writing peers. Because the writing will slowly become less important than the ability to create new findings, this will open up opportunities for non-native speakers to share their great ideas in ways others want to consume.
6. AI will be effective for partially writing the literature review and introductions but less so for other sections of the research paper (methodology, results, discussion)
Sections like the methodology, results, and discussion section will remain for the near future in the realm of what researchers will own. This is the real heart of the paper - what did you contribute, and what did the data show? However, other sections of the paper, particularly those that involve summaries of prior research, will become more and more the land of these generative models.
7. We will need to rethink plagiarism
If a paper is written with the help of AI is it plagiarized? I don't think this will be a binary yes/no. But, rather, it will be a question of whether the researcher has contributed something beyond what a model has synthesized. Eventually, we will come up with strategies to measure this - how much contribution and what contribution the researcher made - but for now, we'll have to have a debate about what is and isn't plagiarism.
8. AI models will become our research collaborators - suggesting ideas and refining what we write
As another experiment, I took a few of our student's research abstracts and asked ChatGPT to make their writing clearer. The results were varied - some seemed very improved, while others lost some of their original meaning. But, I am confident that this type of workflow will continue forward. This is a win for science - far too many papers use way too complex language. This type of technology can make what is complex into something far simpler for others to consume.
9. We're not quite at the holy grail for researchers - but we're close
When I first saw what ChatGPT pushed out for my literature review result, I thought this model would quickly replace human literature reviews. But, now, after going through the results and talking with people who have seen these models, I think we are close but not quite there. The ability to know how "correct" something is with these large language models is very difficult. So, human judgment and work (like what I did with the above citation) will be critical for researchers for a while to come. I would be wary of using this for research projects at this point. But, I think this could change quickly.
10. This is irreversible
I don't believe there is going back from here. It may take years for schools to catch on - for example, by recognizing that this is a modern tool and teaching students to use it effectively instead of banning it (which I imagine they will do first). But, I don't think that we will be able to go back to a world without AI building long-form text for us.
One note - the world is changing as we speak. As of writing this over the past three days, OpenAI has subtly changed its model not to include literature review requests. Below, you can see me trying out this new request (Tuesday, December 6), and the request now being rejected. For now, this might limit its immediate applications with research, but I am confident others will implement a similar model that allows for this.
Stephen is one of the founders of Lumiere and a Harvard College graduate. He founded Lumiere as a PhD student at Harvard Business School. Lumiere is a selective research program where students work 1-1 with a research mentor to develop an independent research paper.