Five Thoughts on Agentic Research
Agentic research is the future. I just don’t know where I fit inside it yet.
100% written by a human
I am coming to the end of my term as Head of the Department of Economics. I love and have loved the role. But between that, building Immersive Software and Immersive Bio, and working with the Tánaiste, research has slipped away from me almost entirely.
So, since January I have tried (and let’s be real, mostly failed) to block off Thursday mornings for research.
My return to reading papers actively and thinking about new research questions has coincided more or less exactly with the rise of agentic research. Claude is everywhere now in economics. You can read excellent tutorials everywhere, this is a good one. The kinds of questions I wanted to spend months trying to answer can now, broadly, be answered in a few days, and that is mostly because I am not very good at agentic research yet.
Like absolutely everyone else in the profession I have been transfixed by Scott Cunningham’s videos and blogs, by the YouTube videos of Claude and Codex and Zerve by the ability of novices to do things masters would strain at just a year or two ago.
Economists have this cool idea called the production possibilities frontier. It’s a sort of concave wave representing the edge of a field, moving away from the origin of a graph. Coming from near the back of the production possibility frontier in economic research to near the front felt like a big ask 12 months ago, and now that concave curve is sprinting away from me just as I started jogging again.
So there’s only one thing for it, I’ve got to get really good at agentic research.
This turns out to be extremely easy to do badly. Fire up Claude/Codex/Zerve, throw your problems at it, see what happens. This thing will write papers that are absolute shite, using data it didn’t gather or didn’t gather properly, write python scripts that half do the thing you want, and confidently predict nonsense that, if you tried to publish it in a journal, should probably see you lose your economist card. And I am 100% sure many papers are getting written in this way, which is really bad for the profession. Given the lags in publication, we will see waves of agentic shite cresting the journals in about 2 or 3 years.
This is going to be super embarrassing for everyone involved. It will be like CGI capes in old superhero films. I liked Superman Returns more than most people but I can’t watch it today. In three years everyone will be using Mythos 9.0 or whatever and the stuff written with Opus 4.7 will be more noticeable than Brandon Routh's cape. It will be mortifying.
I think most people doing research seriously are trying to do the obvious thing, which is to replicate their existing process of research using Claude or Codex or Zerve as a faster, better RA, and the way you get a sense of how good any RA is by asking them to replicate an old paper, extend a dataset obviously, like updating it or adding countries or whatever, or prove some minor result.
I asked it, for example, to take a 2016 paper I co-wrote with folks from the Bank of England and replicate the entire analysis, which we did in Matlab and R, into Python. I asked it to take Future Forty and produce an implementation for me. I asked it to take my 2012 paper on the economics of austerity and back test the hypothesis for other countries.
Results? The 2016 Bank of England paper was a stock-flow consistent macro model of the UK. It took us 18 months of work, Codex replicated it perfectly in about 3 hours. When I checked the code and data against what we had written 10 years ago, it was pretty close. You can view it here, I’m going to use it to teach macro next semester. Future Forty is a Solow-model based forecasting exercise, so it was pretty easy to replicate this with Claude. The Cambridge Journal of Economics austerity paper from 2012 it turns out was the hardest by far because the arguments in that paper were historical, institutional, and macroeconomic, not coding or explicitly ‘mathy’.
A mixed bag, by anyone’s reckoning.
Then I decided to get LLMs helping me with a large project on economics of budget surpluses I am working on at the moment with my colleague Dr Ciarán Casey. This is going to become 4 or 5 papers.
I started doing what I would always do, which is to formulate a basic, rough question to answer—where do budget surpluses come from, and how are they kept going? Many countries have surprisingly resilient budget surpluses, what Haffert (2019) calls surplus regimes, as opposed to surplus episodes, which countries like Ireland and others have experienced. This leads you to many interesting and relevant sub-questions, which is the catnip of research and why I love doing this job. Some questions I have already, having done some thinking about it: Are surpluses generated by the macroeconomy’s structure, or by institutions, or by politicians responding to fiscal norms? Have there been examples where deficit-biased countries turned surplus-biased, or vice versa? Can we run historical horse races to see where these come from?
These questions cross over from macroeconomics to political economy to political science, which is exactly what they should do. The framing, methods and data each field uses are really varied. The data are very different too. You move from GDP indices to text-based parliamentary ‘norm’ type work. Then there is simply the write up. What a macro journal like the Journal of Monetary Economics might publish is very different to the European Journal of Political Economy and again, a methodological journal like Political Analysis or the Cambridge Journal.
Back to the agentic experience. I have found it absolutely useless at helping me refine questions for research and its suggestions on writing papers are beyond dire. If you know how to make this part of the experience better, please let me know. This might just be because my writing ‘style’ is a little too defined for the AI, but it might be something else, I’m not sure yet.
Then there’s the agentic experience itself. I have five initial impressions from using it.
First, it is a management problem. You have all these RAs now, but now you have to check and verify their output. I can imagine management science scholars getting serious mileage out of thinking about how best to manage these things.
Second, it is a Jevons paradox problem. Now that you have an army of RAs to test out your ideas, the cost of that falling means you’re going to test out way, way more ideas. This means that, rather than saving you time, you can spend hours and hours doing ever more spawning of RAs, burning up tokens and energy and water and everything else.
Third, having produced all of this testable output, in my budget surplus project analysing 1.4 million parliamentary transcripts, IMF reports, OECD data, data on fiscal institutions, and helping design a codebook so we can independently score 1000 randomly sampled transcripts, it has become clear to me the change that is really taking place is in scope. Here is a great new paper just out on using agents to create new datasets. Agentic research means one dude with a laptop can do in a few months what would have taken a team. Agentic research dramatically extends the reach of the individual researcher. The question is: reaching to where?
Fourth, it’s pretty clear the academic paper as a ‘final stage of the project, the ur-object’ is finished. Tyler Cowen’s post captures my thoughts pretty well, and that means we are going to have to rethink our model of intellectual production. Will the PhD monograph survive? What does it mean for a young academic’s career? MIT’s President has some thoughts.
Fifth, and finally, we’re all clearly just at the start of all of this.

Interesting - I’m a researcher in renewable energy and our company is now using Claude. Previously I used Perplexity - the free version. After one week using Claude, I can’t stand it; it doesn’t make work faster, it makes it slower and more confusing. Ask Claude a very specific question with guardrails and it will still throw in the kitchen sink and the dog’s dinner into the answer. It comes up with stuff that isn’t true, it exaggerates wildly. I’ve gone back to Perplexity; it isn’t always totally honest but it is so much better for straightforward research where it gives you all the sources and will answer questions in a direct manner.
The challenge is building new research processes that can harness near-infinite knowledge capture; combining unprecedented breadth, depth, speed, and quantity without collapsing into chaos.
The risk that scale and acceleration begin to displace the foundations of good research is real. Tools and processes that augment clear questions, purpose, conceptual clarity, critical thinking, critical distance, incubation, reflection, and the productive friction of sustained thought. Serendipity important also. Not all insight emerges through optimisation.
The emerging skill is not simply accessing information, but imposing constraints upon abundance; orchestrating infinite flows of data in service of coherent knowledge production. Curation, judgement, analysis, reasoning, synthesis, and methodological discipline become even more important.
The central question is how to use AI to augment human inquiry without eroding coherence, integrity, and intellectual agency.
The answer (as you articulate above Stephen) is that there is still much to test, negotiate, and learn as these practices evolve.
I would add that the false binary of AI as either universally transformative or universally corrosive is not especially helpful to researchers. The challenge is determining where and how these systems meaningfully contribute to the production of knowledge, and where human judgement, constraints, deliberation, and desirable friction remain essential.