About this transcript: This is a full AI-generated transcript of How Do AI Models Actually Think? [Dr. Laura Ruis] from Machine Learning Street Talk, published June 11, 2026. The transcript contains 13,389 words with timestamps and was generated using Whisper AI.
"If I understand correctly, you created queries that resemble reasoning and ones which resemble some kind of like, you know, fact retrieval. And after that paper, I was left with this question, like to what extent is the scale that makes these models get better at these tasks? To what extent is that"
[00:00:00] Speaker 1: If I understand correctly, you created queries that resemble reasoning and ones which resemble some kind of like, you know, fact retrieval.
[00:00:09] Laura: And after that paper, I was left with this question, like to what extent is the scale that makes these models get better at these tasks? To what extent is that driving the performance or how is it driving the performance? Is it just that the model is seeing more similar stuff and therefore can memorize more? Or is it really doing something more interesting and learning something sort of qualitatively different from more data or with more parameters?
[00:00:33] Speaker 1: If language models are doing something which is akin to approximate reasoning, what's the difference between that and formal reasoning?
[00:00:40] Laura: So I believe they're in very controlled setups. We have already shown that connectionist models can do formal reasoning. So I think empirically and theoretically we have shown that they can do a form of systematicity or symbolic computation, although it's still limited. But the question with my most recent paper was, can it also learn to do something in that direction approximately from data in the wild? And I think it can. And my paper doesn't show that exactly. It just shows that it's doing something generalizable, that it can apply to many different questions. But intuitively, I think it would be possible.
[00:01:13] Speaker 1: I guess you would agree that agency could emerge even if we're not explicitly trying to make it emerge.
[00:01:20] Laura: Yeah, I think that's the interesting case. So there's this definition from Zach Kenton, from DeepMind. They also have a safety interest in agency. And a couple of years ago, they made this definition of agency that an agent is something that changes its policy when its actions affect the environment in a different way. You can kind of trivially make a system of LLMs in an environment or something where the environment is also an LLM such that it adheres to this definition. So I think the important thing is like, when does something like that emerge from something as simple as NextToken Prediction, and that's kind of what I'm interested in.
[00:01:59] Speaker 3: So Tufa Labs is a new AI research lab I'm starting in Zurich. In a way, it is a Swiss version of DeepSeq. And first, we want to investigate LLM systems and search methods applied to them, similar to O1. And so we want to investigate, reverse engineer, and explore the techniques ourselves.
[00:02:23] Speaker 1: MLST is sponsored by CentML, which is the compute platform specifically optimized for AI workloads. They support all of the latest open source language models out of the box, like LLAMA, for example. You can just choose the pricing point, choose the model that you want. It spins up, it's elastic autoscale. You can pay on consumption, essentially, or you can have a model which is always working, or it can be freeze-dried when you're not using it. So what are you waiting for? Go to centml.ai and sign up now. Laura, it's amazing to have you on MLST. Welcome.
[00:02:57] Laura: Thank you. Amazing to be here.
[00:02:59] Speaker 1: Can you tell us about yourself?
[00:03:00] Laura: Yeah, sure. So I'm Laura. I'm a PhD student at University College of London, supervised by Tim Rock-Tashel at Grafenstetter. And I'm also part-time and co-hear. And I'm broadly interested in understanding language and its relation to human cognition, and how we can evaluate that in artificial intelligence. To what extent can, like, pillars of human intelligence also show up in artificial intelligence? Things like reasoning, both mathematical reasoning, social reasoning, that kind of stuff. And also, specifically, trying to understand how state-of-the-art models are doing what they're doing.
[00:03:37] Speaker 1: Very cool. I'm a huge fan of co-hear, Ed, and Tim.
[00:03:40] Laura: Nice.
[00:03:40] Speaker 1: So this is very cool.
[00:03:42] Laura: Me too.
[00:03:43] Speaker 1: Okay. So you've just written a paper. Now, there's this huge controversy. You know, I've been speaking with Sabah Rao, for example. You know, he calls LLM's approximate retrieval engines and O1 approximate reasoning engines. So he is saying they're doing a little bit of reasoning, whatever that means. But you've written this paper. It has generated loads of interest on socials. Procedural knowledge in pre-training drives reasoning in large language models. Give us the elevator pitch.
[00:04:11] Laura: Yeah. So I was doing evaluation in language models, trying to understand how they were doing social reasoning before. And we designed a benchmark and evaluated models on their social reasoning skills. And after that paper, I was left with this question, like, to what extent is the scale that makes these models get better at these tasks? To what extent is that driving the performance or how is it driving the performance? Is it just that the model is seeing more similar stuff and therefore can memorize more and seems to have more capabilities? Or is it really doing something more interesting and learning something sort of qualitatively different from more data or with more parameters? And of course, the way we evaluate machine learning methods in the past is we separate test from train, but that's not possible anymore these days because models are just trained on everything. Test is in train now. So we wanted to understand when language models are producing zero-shot reasoning traces. So, for example, for simple arithmetic, the steps you, it can produce the steps to reach an answer. Is it kind of relying on having seen those exact steps before in training? Or is it doing something generalizable? Is it sort of taking the steps itself and getting to the answer? And that was like the motivation for this paper.
[00:05:35] Speaker 1: Very cool. So you used influence functions to do this analysis. Can you explain what they are?
[00:05:41] Laura: Yeah. Yeah. So I was very happy when I stumbled across that tool because it's this method from robust statistics that tries to answer a counterfactual question about the model. So the question it tries to approximate is what if I have to take this pre-training document out of the data set and I retrain the entire model? How does the behavior change? How do the model parameters change? And with that, the log likelihood of completions. And that's what influence functions estimate. And that's the tool we use to determine how pre-training data determines reasoning steps by models.
[00:06:21] Speaker 1: Very cool. So if I understand correctly, you created queries that resemble reasoning and ones which resemble some kind of like, you know, fact retrieval.
[00:06:31] Laura: Yeah.
[00:06:32] Speaker 1: And you kind of compared what the influence functions did on those queries.
[00:06:36] Laura: Yeah, exactly. So we used this factual task as a sort of grounding because influence functions are very approximate. We don't actually retrain the model for every data point because that's going to be too expensive. So you want to have some kind of idea that what you're finding actually makes sense intuitively. And factual retrieval is a natural task for that because for those factual questions, the only way to answer them is to retrieve the relevant documents. And we compare this to the influence scores for the reasoning traces. And those tasks are simply like the zero-shot reasoning kind of prompts. And the model generates the reasoning steps itself. So if the model were to be doing retrieval for those types of reasoning, it would really have to retrieve each reasoning step from the pre-training data because it outputs zero-shot itself, the reasoning traces. I don't give it any examples.
[00:07:35] Speaker 1: So if I understand correctly, like the way, the intuition behind your work is that when we are doing, you know, fact retrieval, it seems quite focused. So it's just going to a document and it's retrieving the fact. And when it's doing reasoning, it seems very diffused. It's looking at loads and loads of documents that have reasoning-like processes in them.
[00:07:55] Laura: Yeah. Yeah, that's sort of the abstraction you can take from it. But of course, in reality, even when it's doing factual retrieval, there's much more going on because it needs to adhere to syntax. There's all kinds of like stylistic elements that are going on. And still importantly, like I think the most striking finding from this paper to me was that for factual retrieval, whether a document is influential for a factual question is not predictive of its influence for another factual question. So they rely on very distinct sets of documents, whereas for a reasoning question, if it underlies the same task, if it's both, for example, calculating the slope between numbers, but for completely different numbers, the influence over the documents is very similar. So the same documents can influence these questions in the same way that we didn't see for factual retrieval. And that is really the basis for why we call it this procedural knowledge.
[00:08:53] Speaker 1: Very interesting. So for the folks at home, like an example of a reasoning task, it'd be like two-step arithmetic, calculating slopes, solving linear equations. And factual retrieval might be something like, what is the tallest mountain?
[00:09:05] Laura: Yeah, so what is the tallest mountain? What is the largest ocean? In which year did the Beinecke library open, which is the Yale library? Those are examples of factual questions. And then we have three different reasoning tasks. One is simple two-step arithmetic, so you can imagine seven minus four times eight. That's like two-step arithmetic. You first have to calculate seven minus four and then do three times eight. Calculating the slopes requires, I think, more steps, three steps. You have two different points in a 2D space and you have to calculate the difference between the y points and the x points and divide them by each other to get the slope between two points. And then the linear equations task is you have a linear equation and you have to solve it for x, which also requires three simple arithmetic steps.
[00:09:56] Speaker 1: So what we are observing then is that when we're doing reasoning tasks, the models are synthesizing knowledge in some kind of abstract way from all of these documents.
[00:10:09] Laura: Yeah.
[00:10:09] Speaker 1: Is that reasoning?
[00:10:11] Laura: I would say yes, first of all. But I'm not so restrictive in what I call reasoning. As in, I don't think only formal step-by-step logical reasoning is reasoning. I think deep neural networks can do that kind of reasoning, but our paper doesn't show that that is what's going on here. But I think the important point is that it is seemingly taking knowledge from many different documents and applying it to the same task. So that's a generalizable strategy, and it's using that to generate step-by-step knowledge that solves some kind of problem, and that is reasoning to me. But that doesn't mean that it has a lot of any bearing on other forms of reasoning, like inductive reasoning, for example.
[00:11:06] Speaker 1: Yeah. I mean, I suppose we can go into what knowledge is. Dagar and I had this discussion. It's said that it's a justified true belief, but he said it's a justified useful belief. So you could say in some sense that the templatization of the information in these documents is kind of like creating useful knowledge.
[00:11:25] Laura: Yeah. And what is this distinction between useful and true?
[00:11:30] Speaker 1: Well, I think it's about whether we can know something is true just based on a bunch of data in a corporate... There's always this epistemic gap, isn't there, that like, can we have models that actually give us facts?
[00:11:43] Laura: Yeah. Yeah, true.
[00:11:45] Speaker 1: The really interesting thing from reading your paper is that you found that when doing reasoning, like the documents that were, you know, things like stack overflow and code and stuff like that, that was really like almost had a lot of influence on the reasoning process. Yeah. And that's weird, isn't it? Because it's code, it seems different.
[00:12:06] Laura: Yeah.
[00:12:07] Speaker 1: How do you think about that?
[00:12:09] Laura: That's a good question. And I spent a lot of time looking into those results. I really spent days trying to understand what was going on there because I think, importantly, we find a lot of evidence for documents influencing similar reasoning questions, like one document influencing many slopes questions and another document influencing many linear equation questions. But the only documents that seem to be influential both positively and negatively for all types of reasoning is code. And I tried to look into what, what about code is, makes that it's so influential and I couldn't find any patterns. And importantly, we, we don't only find that it's both, that it's like good for reasoning, but also bad for certain, in certain cases. So it was, of course, conventional wisdom that code helps downstream capabilities. OpenAI knows that, Antropic knows that, they initialize their models with purely co-trained models, but we don't really know what's going on there. And, and that's essentially what I'm working on now, trying to understand that better because, yeah, I couldn't clearly find patterns in the data that we found in this paper.
[00:13:19] Speaker 1: It's weird, isn't it? Because code feels like the perfect materialization of human cognitive processes, right? You know, we're, we're solving problems and then we manifest that in code. Does that have implications for how we design data sets for training these models?
[00:13:35] Laura: Yeah. Yeah, I think it does. Like the, the trends is adding more and more code into the pre-training corpus, right? For models to be trained on. So, um, I think it definitely has implications for that. I think, importantly, a thing that we find in this paper that it is that it seems like the model can learn to do these step-by-step reasoning traces to output them from, uh, descriptions of procedures in code that are really purely descriptive. So, a piece of Python code that calculates the slope between two points is highly influential for actual questions, um, prompts asking the model to do that in math, in, in, in text. And if that is something that generalizes, if you can train a model on procedures and it can, from that, learn to execute those procedures, I think that can be pretty influential for how we should, um, how we should, um, how we should, for example, synthetically generate data. It could be helpful to generate lots of procedures and instead over, like step-by-step applications of those procedures or, you know, uh, focus a bit more on both.
[00:14:44] Speaker 1: I suppose because of this diffused nature. So there are many, many examples in code of solving slopes and stuff like that. Um, in a way that, that, that that's a form of robustness. Do you, do you see what I mean? So in, in a way that, that gives us many, many ways of doing that type of problem.
[00:15:01] Laura: Yeah. Yeah. I see what you mean. That's it's, if you don't just see like the application, the step-by-step reasoning, but you also see the procedure that's gives you more robustness to different ways of expressing it or.
[00:15:12] Speaker 1: Yeah. Or even just in terms of just having more redundancy or having many, many expressions of, of the same thing. It would, it would make it, um, robust to, you know, potentially different selections of the data set. It would still work. Whereas perhaps, um, and fact retrieval, if the fact isn't in the data set, it's just not going to work.
[00:15:31] Laura: Yeah. Yeah. And that's definitely true. I mean, in that sense, it's a form of abstraction. And that's, it can generalize better.
[00:15:37] Speaker 1: So on this abstraction thing, um, I was, I was speaking with, um, the, the guy who wrote the GSM symbolic paper earlier. And, um, Douglas Hofstadter says that, um, an abstraction is a bag of analogies, right? So, you know, we have these concepts in our mind, like the concept of a chair. It's really difficult for us to describe what a chair is because I can give you like a million different descriptions or even the letter A. There was, um, a book he wrote called Surfaces and Essences, where he was talking about like how all the different ways that A's could be written. So, um, it might be the fact that that the case that our brains don't really have these high level abstractions the way we think they do. Like actually all of these neurocircuitry activation pathways are firing and we sort of like know an abstraction via a million different perspectives. And do you think that could actually be analogous in some way to the way a neural network works?
[00:16:30] Laura: Um, yeah, and I think it's, that's also how language works. I think, um, I mean, I didn't come up with this, like Wittgenstein did, but, uh, he, he wrote this whole book where he was just like page after page trying to show that you, you can't define a thing. There will always be a situation where it doesn't exactly apply like that, right? It's all fuzzy and, and meaning is used essentially. And it can change based on context and stuff like that. And I think that's the strength of language, uh, this kind of abstraction that is not, um, formal or like purely symbolic, but very fuzzy, um, in that there's no clear boundaries of meaning or concepts of abstractions.
[00:17:18] Speaker 1: Yeah. Yeah.
[00:17:19] Speaker ?: Yeah. Yeah. Yeah. Yeah.
[00:17:19] Speaker 1: We were speaking about Montagu, so he, he argued that, you know, we should, we should model language, like it's a formal language and of course it's gnarly and it's very, very kind of constructive and whatnot. Yeah. Do you think LLMs are actually, you know, an appropriate tool given that natural language is not a formal language?
[00:17:39] Laura: Yeah, I think that is what we have seen in the past couple of years because like Montagu tried to formalize language and that famously, I mean, didn't lead to the most simple, uh, formalization, right? It's, it's very hard to formalize language. It's, I think Montagu came up with like this very strict form of compositionality and that has been very useful because there, there is definitely something, uh, in language where the meaning is composed from the parts that that's definitely true. But this, the very strict way in which Montagu defined is, is probably also not right. Probably there is like, if you want to make this strict form of compositionality work in language, you have to come up with really like roundabout functions where the meaning of a word is a function of the whole sentence or something and it goes back into the, the word itself. Whereas if you take a more lenient form of compositionality or systematicity, like Feuder came up with, um, that roughly just says like, there's something predictable about the way we use language. If you teach someone a new word, like flips and you say, um, I had really good flips light last night, then you can immediately, um, sort of estimate that this, this word is probably food or something and it was at night. So maybe it was a dessert, um, and you can use it in many different sentences and that, and that is a form of compositionality and systematicity that almost seems like it's, it's, it's, it's formal and predictive. And like it there, you can describe that formally, but actually we've tried and that hasn't really worked. That's probably precisely why language models work better because they can approximate such systematicity, but they are not, um, pure formal systems.
[00:19:29] Speaker 1: Um, so you've said that that language models could develop a causal understanding of the world. And this is really interesting. I suppose it comes back to semantics in general. So, you know, John Searle said that, um, the reason why humans have semantics is basically because we're, we're physically and causally embedded in the world. Right. And lots of linguists like Pianta Dose are talking about things like concept role semantics. And, you know, there, there are these, um, there's a whole intellectual school of thought now around how we could build semantics just in language models. What say you?
[00:19:57] Laura: Yeah, I, I love Pianta Dose's work and it has inspired me in many ways. Um, and I agree with him, but I, I would say he probably also, or I don't know, I shouldn't speak for him, but like there is of course a role for reference in the world. Um, when children learn language, they start off by, um, this again, something I've heard from Pianta Dose is there's, there's phases of language learning where initially a princess is just like, uh, um, um, um, nice woman that, that is, has nice, is nice dress, has nice dresses and is always kind to you. And, and, uh, the child can point to it in the world and it has a clear reference, but as language, uh, evolves and as the child becomes an adult. Language speaker, this reference becomes less and less important and it becomes more and more abstract. And now as an adult language, uh, speaker, I can talk to you about the COVID vaccine, but I would not be able to pick it out. If you give me a bunch of substances and you ask me which of these is the COVID vaccine or what is it made of? I have no idea. And there's many examples of course, of, um, of things we discussed that don't have any reference in the world, but the COVID vaccine is just one where it does have a reference. But I don't know how to pick it out. And I still think I understand what a COVID vaccine is and have some sense of its meaning. But my meaning could also be further developed if I would know how to pick it out in the world, right? That means I understand it better and I have a better world model.
[00:21:30] Speaker 1: How much of a sharp boundary do you think there is between, um, these kinds of facts that you're talking about and reasoning?
[00:21:37] Laura: Probably not really because when I was thinking about factual retrieval, um, and I, I was building these tasks, I, I often struggled to come up with pure factual questions. So you can imagine if, if you ask someone like, what is the largest ocean in the world, maybe this person is retrieving all the oceans in the world and retrieving their sizes and comparing them and then saying like, oh, it's the Pacific ocean. And then they did some reasoning. Um, so there's no clear boundary. And I really tried to make these questions, uh, very factual. Like in what year does the Beineken library open again, you could come up with a way in which you could reason about the answer, but you really need to have some atomic knowledge, um, uh, to answer this question. But yeah, it's, it's, yeah, it's all fuzzy.
[00:22:25] Speaker 1: So coming back to Seoul, you know, there were so many replies to the Chinese room experiment, like the robot reply, the systems reply, you know, this kind of stuff. And, and I guess at, at some point, when does mimicry, functional mimicry just becomes so good that it's a distinction without a difference?
[00:22:41] Laura: Yeah, that's a good question. And I think then you're, you're, that's why it's important that people like François Cholet come up with things like ARC, right? Um, his definition of intelligence is really about, uh, acting in novel ways and, and using your knowledge in, in novel situations and, and, uh, a system that's just mimicking could never do that.
[00:23:05] Speaker 1: Could we design a way of measuring the depth of understanding, whatever that would mean?
[00:23:10] Laura: We're trying. And I think evaluation is one of the hardest parts of the field. And we're, we're like, I, I, this week I heard a funny characterization of moving the goalposts. Someone characterized it in a positive way. And I, I totally agree with that. They said, um, people are constantly moving the goalposts and people are saying that as something that's, that's, uh, bad, but actually what we're doing is collectively refining our definitions. So first we're saying like, oh, if a system can do chess, it must be intelligent, right? But then it could do chess. And then we're like, oh, that's not all we meant actually. Wait, let it, let me move the goalposts. And that's, that's not a problem. It helps us refine our definitions and it helps us. We, no one knows what intelligence exactly is, but designing more and more complex benchmarks and keeping, keep on moving the goalposts, um, makes for a clearer view of what it actually is and what we're all talking about.
[00:24:06] Speaker 1: Yeah. I think experience is helping us carve the space up a little bit better in our minds. Like, for example, I think we used to have quite a puritanical view about, you know, understanding and reasoning that, you know, you, you either you're reasoning or you're not reasoning. And I think what we're starting to see with these models is, you know, there's this Swiss cheese problem that sometimes you're in a hole in the Swiss cheese and it goes bananas and sometimes it's retrieval and sometimes it's reasoning. It's almost like there are these different modalities of function and sometimes it's doing more reasoning and sometimes it's doing less.
[00:24:36] Laura: Yeah. Yeah, exactly. I mean, that's also how I think about them. That's, there's this, this view that if you can show that a model trips up, it necessarily means it cannot reason, but I don't think that's true. I think it's such a complex system and if you prompt it in a certain way, it might use a completely different sort of function or program or whatever, how you want to conceptualize what it's doing than if you prompt it in another way. And if you give it tokens that are so foreign to it that it fails to reason over them, that doesn't mean that it's, it cannot do those actual reasoning patterns and the rules that underlies that kind of reasoning, but it's just a limitation of the system. And it is, it is, it is a statistical model.
[00:25:20] Speaker 1: So you focused on specific types of mathematical reasoning. Do you think that they would transfer to other forms of reasoning, like, you know, solving ethical dilemmas or something like that?
[00:25:31] Laura: Yeah, that's a good question. Yeah, I think they do. But of course, there is a lot, there is, reasoning is such a multifaceted concept that mathematical reasoning cannot nearly cover it all. So mathematical reasoning is very formal, it has rules, that's why we chose it, it's, it's the one, this, the type of reasoning we look at is so simple that you can actually find the answers in the pre-training corpus. But there are forms of reasoning for which we, like, inductive reasoning, you can't find the answers if you, if you only see, if you only observe white swans, can you deduce from, or induce from that, that black swans don't exist? I don't know, it's, that's a form of reasoning that actually underlies most of science and that is more difficult to see if a language model can do that. But I think fundamentally probably, probably can. It becomes just in such cases much more important to do some kind of verification of, like, what's going on? How, why is it making this induction? And can we, like, do experiments to verify it?
[00:26:40] Speaker 1: If language models are doing something which is akin to approximate reasoning, what's the difference between that and formal reasoning? And do you believe in principle that connectionism on its own could scale up to formal reasoning?
[00:26:53] Laura: Yeah, I think it can. And I think my recent paper gave, so I believe they're in very controlled setups. We have already shown that connectionist models can do formal reasoning, that they literally can learn to apply systematic rules in a way such that they achieve 100% accuracy on novel problems. There is a good paper by Lake and Barone in Nature that does this. There are other papers that show that, for example, by Andrew Lampinen, Passive Learning of Active Causal Strategies, that show if you set up the problem in such a way that you can, that the model can learn to do the task as opposed to latch on to sort of unimportant things in the data. It can learn to apply tasks in novel situations. So I think empirically and theoretically we have shown that they can do a form of systematicity or symbolic computation, although it's still limited for sure, like it can't handle completely novel tokens. But the question with my most recent paper was, can it also learn to do something in that direction approximately from data in the wild? Because language models are not trained on data that is so carefully curated that the only way you can make the laws go down is learn the underlying rules because that's what these papers all can do. And the question is, can it also then learn to do something that's that's a formal, formal reasoning or symbolic reasoning? And I think it can. And my paper doesn't show that exactly. It just shows that it's doing something generalizable, that it can apply to many different questions. But intuitively, I think it would be possible.
[00:28:37] Speaker 1: So there's always been this notion of a gap that people talk about, especially in respect of creativity, adaptability, dealing with novelty and so on. In fact, many people think that the definition of intelligence is, you know, dealing with novelty. So there's always this thing that we can do combinatorial creativity, right? So we can do reasoning by like recomposing bits we already have. But people say that this inventive creativity, like being able to, you know, train on all the data up to 1945 and then invent some new theorem that came after that. People feel intuitively that the models wouldn't be able to do that. Yeah. What do you think?
[00:29:13] Laura: That's really like the goal, right? Like that kind of stuff, it will be really cool. I don't feel like current language models can do that, but I don't feel it's technically impossible. Even in the current regime, like if we were to find so much data that the model can learn the causal underlying data generating process of that is relevant to come up with novel information, then it can do that. But of course, like we have used most of the data that were created over the past couple thousand years, or at least we're trying to. And it's probably not feasible to scale up to such intelligence in this way. But yeah, I think it's not theoretically impossible. It kind of gets at whether Einstein came up with some like stroke of genius that doesn't compose stuff he has seen before, or whether he actually also stands on the shoulder of other scientists and reasoned about things for a long time and used that to come up with new knowledge. And I think it's probably the latter. And that that's not so special that we cannot. I mean, OK, I don't want to say that Einstein is not so special here, but probably we can in some form recreate that process.
[00:30:33] Speaker 1: So Tim Rock-Tashel, he's got some great work on open-endedness and creativity and whatnot. And it's interesting because Ilya Tsutskevar, he gave a talk at this conference and he said that we are hitting a data wall. And to me, that doesn't pass the sanity test, right? Because if you think about it, there are an infinite number of ways you can make more data. You can transform the data we already have and you can generate lots of data. But this is where it gets into Tim Rock-Tashel's domain, which is that it's not just about generating more data. It's about generating interesting data.
[00:31:06] Laura: I find Gini and stuff incredibly interesting. And I agree that the intelligence of the system is purely limited by the complexity of its environment. So I think that's an interesting approach. And I also think that, yeah, so I think scaling up data helps because it makes it less and less possible for a model to latch on to spurious correlations. It's going to be more and more useful to learn, like, the causal world model that generates this data, the more data you get. Because it's likely going to be less semantically similar than what you've seen before. But if you were to somehow be able to select from all this data that we have, data in a way that's, like, sufficiently diverse for you to learn this causal mechanism quicker without seeing, I don't know, trillions of tokens. I think that that should maybe also be possible. And I think that is informed by these controlled studies that show that you can train a model to do something systematic in one task. But how do we train a model to do something systematic in as many tasks as we want language models to do?
[00:32:19] Speaker 1: What's your philosophy on scaling in general? So do you think that if we just scale current approaches up that we will get dramatically better results? Or do you think we're missing something significant?
[00:32:30] Laura: I am not going to bet against scaling, because that seems scary. It has worked pretty well. But, yeah, so I think scaling is cool. I think there is issues with it. And there's probably more data phishing ways to do it. Just because, theoretically, you can train a model to do many different complex tasks with next token prediction doesn't mean it's the best way to do it. And maybe there's something about intervening on an environment and generating your own data that can help there and that can make models more data efficient. And I could see how that could be important in the future. And I think Ilya also mentioned, not specifically that, but agency or agents. And maybe that's getting at the distinction between sort of passive learning and active interventional learning.
[00:33:25] Speaker 1: We should save the agency discussion for a bit later because we've got lots of things to say on that. But why don't we just talk a little bit about that Foda and Polition paper from 1988? Yeah, so this was their famous connectionist critique. And they said that the way that humans think is very formal. You know, we have these rules and we have this compositionality. So we can generalize Mary loves John to Mary loves Jane. And we can also take a sentence and we can kind of invert it. We can decompose it back to all of the constituent parts and we can, you know, figure out what things mean. And neural networks on their face, they don't do that explicitly, but perhaps they might do it implicitly. What are your reflections on that?
[00:34:08] Laura: Yeah, I think Foda and Polition's arguments has definitely stood the test of time. Although there's been some theoretical work showing that it's not impossible in a connectionist regime to learn symbolic functions like Smolensky's work in the 90s in tensor product representations. And that was a theoretical work showing that you can do some symbolic computation in the sub-symbolic regime that connectionist networks represent. But the argument nonetheless stood the test of time because this systematicity that we also spoke about earlier is definitely something that's present in language. And that is necessary to explain if you want to understand how humans can produce something that's so varied with such little examples or, you know, memory. And it was a challenge for, I don't know, 30 years or something like that. And it probably still relates to this concept of intelligence as being able to process novel information. But I think there's been a lot of empirical work showing now that actually sub-symbolic models like neural networks can do symbolic computation, albeit not explicitly. I mean, yeah, explicitly in the sense that they can maybe output some symbolic computation in the form of language and explicitly reason over that. That's probably a good idea. But probably they can also do it implicitly.
[00:35:40] Speaker 1: Yeah, I suppose that there was a bit of a theme of just having strong theoretical tools, especially around that time. So, you know, this idea of productivity, being able to generate an infinite number of sentences. I mean, Chomsky said the probability of a sentence is an oxymoron. It just doesn't make sense to say that. And it certainly feels, as you say, that our language is compositional. And again, Chomsky said it's a language of thought. So if our language is compositional, then surely our mind must be compositional. So it might have just been like almost an intuition pump to reason about how our brains work.
[00:36:14] Laura: Yeah, I do think, though, that so this gets at this question of whether language is thought, right? And I think that has been pretty rigorously debunked at this point. And I think maybe language is as useful to us precisely because our thought is not compositional, because we can use it as a compositional tool that is maybe a bit harder for us to do systematically in our brain. And, I mean, there's been work by F. Fedorenko in 2020, for example, showing that people with aphasia can still be chess grandmasters. So when your language system is completely messed up, you can still reason perfectly fine, which kind of, in my view, debunks the theory that language is thought.
[00:37:03] Speaker 1: You said to me earlier, like, well, what's the big deal? Why do we need to have invertibility? You know, and when I say invertibility, I think I'm kind of saying decomposition. So, you know, they were talking about compositionality, but I think decomposition is really important. That's being able to go back to the constituents. And it's not only about being able to, like, explain what I'm thinking. It's also about parsimony and reuse. So we see in McInterp, for example, on that scaling monosemanticity paper, that, like, the representations for the Golden Gate Bridge, it was scattered throughout all of these circuits throughout the neural network. And it feels like, certainly at a psychology level, it feels like our brain doesn't work that way. But maybe that's just a bit of an illusion.
[00:37:47] Laura: Yeah. It's hard for me to say, as, I mean, I don't want to comment on neuroscience because I have no idea about that. But what I could say about this is that it seems to me that it's pretty useful that the model is representing it in this way. And maybe that's also, as in, it's doing a very distributed representation, right? And that's essentially the core reason why people in the 90s believed in connectionist models, this distributed representation that can, where all neurons can essentially light up for all different tasks, as long as there's some shared structure. And that makes them so flexible. And that makes them so good in novel situations, actually.
[00:38:28] Speaker 1: I suppose there's a bit of a broad theme as well. So, certainly 20 years ago, we used to design AI systems with explicit strategies. So, planning was an explicit thing. Reasoning was an explicit thing. Even certain architectures like Dreamcoder, Kevin Ellis' Dreamcoder, had an explicit wake dream state. So, when you dream, you kind of expand your hypothesis space. And then, like, you know, when you're awake, you kind of select the ones that work. And neural networks do this expansion and collapse all of the time. And, but what we're seeing, though, with the newer architectures is that they kind of do the same thing, but they do it more and more implicitly. Like, we don't, like, we don't, we don't hard code it in.
[00:39:10] Laura: Exactly. And that's what we've learned the past couple of years, that that's the way to go, probably. Because that's what we've learned from the LSTM to transformers, which is quite funny, actually. One of my first papers is on compositionality. We designed this benchmark together with Brendan Lake and others where we held out systematic experience in the data. And we showed that you can, a human can easily do this, but a LSTM couldn't. And this was all pre, like, transformers, LSTMs, LLMs, and JGPT and that kind of stuff. And someone this week told me that actually a transformer gets almost 100%, 100% performance on most of the tests we designed in that paper. Not all, but most. And that's just one example of the transformer being a much better fit for compositional tasks than LSTMs. And the lesson maybe we can take from that is that LSTMs have this explicit recurrence, which seems very useful, right? Because there's clearly a recency bias. Clearly what we have just talked about is more relevant than what we, I don't know, you and I talked about last time we saw each other. But if this recency bias is so obvious, why would you build it in? Because the model can easily learn it from language. And this is what we've learned in the past couple of years, that if something can be learned, don't build it in.
[00:40:41] Speaker 1: Or use Excel as TMs.
[00:40:43] Laura: Yes. Yes. You spoke to Sepp, so. Spoke to Sepp the other day, yeah.
[00:40:47] Speaker 1: He was saying about their new exponential gating scheme allows them to kind of, like, you know, overwrite their memory.
[00:40:52] Laura: Yeah, very cool.
[00:40:53] Speaker 1: It's kind of weird though, isn't it? Because there is this notion, I was saying to him, like, when are we going to see industry adoption of Excel STMs? Yeah. And I think in industry, the perception is it kind of doesn't matter. Like, it's just about scale.
[00:41:06] Laura: Exactly. But that's the thing also as OpenAI. Like, they don't care about these, like, compositional, like, is it out of distribution? Are they, are they, are we holding out the right things? Has it seen this before? No, they're just like, we're going to make it in distribution and we're going to scale it up. And that's kind of their genius, essentially. They, no matter what the architecture is, no matter how much it looks like the brain or, or what's the, why it should theoretically work better than something else. If you can, you know, use more flops, it's better.
[00:41:38] Speaker 1: Let's quickly talk about Smolenski. I always mispronounce his name, so I'm going to say it very slowly. So around 1990, he, I guess it was a response to this photo pollution thing. And, and he said that with these, you know, connectionist models, you can still implement the essential capacities of symbolic processing, you know, such as representing variable bindings and structured data and compositional operations. What did he propose?
[00:42:03] Laura: So he proposed a mathematical, it was a mathematical framework for variable value binding. And that's like this very intuitively symbolic computation, right? No matter what the value is, the variable can take it and you can do processing on it and the results will be reliable and the same. And yeah, further inflation says connectionist models can't do that and that's produced a decade long back and forth between connectionists and symbolicists. And Smolenski gave this answer with a tensor product representation to say, you know, like, look, you can actually represent variable value binding in a purely sub-symbolic connectionist way. And that's what he showed in tensor product representations where you represent the variable and the value both in a, in a distributed sub-symbolic way and you can do processing on them and they all become embedded in this continuous space, this distributed space. But then you can still extract the value and the variable after processing on binding is what they call that.
[00:43:08] Speaker 1: So what were the drawbacks with that approach? And also, I think there's a bit of a leap of faith here that neural networks could in some way approximate what he was talking about.
[00:43:20] Laura: Well, and I didn't read this paper, but Tom McCoy published a paper together with Smolenski, I think, that's titled RNNs implicitly learn tensor product representations. So that seems to indicate that they can, but that's just, they can, but that's just based on the title. I think that one is, has been on my reading list since I found out about the tensor product representations. But you're right, like, what's the limitations of this method? It's a purely theoretical argument, right? He's saying to Fader Inflation, look, actually, you can do this. That doesn't mean that it's practical. That doesn't mean that it's scaled tensor products the way he proposed in the 90s doesn't scale at all because it explodes in the number of variables that you're representing. So let's say variables are positions in sequences and values are the tokens. Then the tensor product representation will, I think, is squared in the, or no, like, explodes in the number of positions that you're trying to represent and in the number of tokens. So that's not feasible. I think Smolenski is working on this at Microsoft, so I'm sure he's working on making it more scalable. But another thing that I took away from reading that paper is that to get, actually, this value back from this distributed representation, the, the, there, something needs to be linear, independent, like the, the, the rows in the matrix or something like that needs to be linear and independent. And that seems like a very, um, hard restriction to me that's probably won't naturally arise that, or maybe it would because I sent people also tell me that if you randomly sample, it's almost always linearly independent in, in high dimensions. So maybe actually that's not the big limitation, but yeah, the way he proposed it back then wasn't that scalable.
[00:45:10] Speaker 1: Yeah. So these tensor outer products composed of these roles and fillers, and apparently the roles required quite a lot of hand engineering and yeah, we've got this combinatorial explosion problem, but, but anyway, it's, it's interesting. It's, it's, it's, it's a potential, it's a potential way forward. Yeah. Okay. Laura, where does agency fit into all of this? And, and just to frame the question a little bit, some people are really worried about agency. I was speaking with Ben Gio the other day and he said that agency is, is really bad. You know, it's, it's going to lead to these things controlling their own goals and it could be very dangerous and whatnot, and we should strip away all agents.
[00:45:52] Laura: Yeah. Yeah. No, I totally agree with that. It's like, if you think about an intelligence system, that's also an agent or just like a random human can be very dangerous. Right. And probably agency is a large part of that. So if you, if you have two systems that are otherwise completely identical in capabilities and one is an agent and the other is a tool, I would prefer the tool. The thing is just that I'm not so sure if it's possible to reach an interesting form of intelligence without the notion of agency. So my interest in this question has just been like, how can we define this concept and how can we detect whether it's present in a, in a system? And that's a pretty difficult question, I think.
[00:46:36] Speaker 1: It certainly is. It certainly is. Do you think LLMs to any meaningful extent have agency?
[00:46:43] Laura: Yeah, that's, that's the question that I've been thinking about. Um, I think like there's many definitions, uh, of agency and what, to me, it's just a kind of goal, goal directed intentionality and we can get into what that exactly means. And doesn't LLM have that? I, I think you could in some way. See it as modeling, uh, agents and maybe it also models their goals. Um, so of course they're, uh, they're trying to model the text and this text is, they're trying to. And predicts the next word, uh, efficiently or the next token and decrease the loss there. And this text has been generated by agents and probably is useful to decrease the loss. If you also understand what goal overarching goal this agent has. So if this agent is trying to persuade you, maybe, uh, that, that informs or not, not you, the LLM, but if the agent is, the text is trying to persuade something or someone, then maybe it's useful to model that goal. Um, to sort of decrease the number of possible tokens that can be, um, show up in that text.
[00:47:52] Speaker 1: When we say the LLM is trying to persuade someone, there's this, there's this weird thing, isn't it? Cause, uh, you know, to a certain extent agency is, is observer relative. It's like, it's a thing that we say that another thing has, so it feels like at the bottom of the spectrum, it could be as if the thing has this goal, because the LLM probably isn't thinking, oh, um, Laura's an agent and Laura's got this goal. And in order for me to control Laura, I need to do this. And it's in service of that, you know, it feels like there's an unwitting form of agency.
[00:48:24] Laura: Yeah, which might be even more dangerous, right? Like if it's accidentally persuading you and it doesn't understand the, the things that can happen when you, when you do that, then that might be even more dangerous. So this kind of gets at the distinction between simulating something and actually coming up with it, with it yourself. And I don't know what's, how you can just, yeah, find the distinction between the two.
[00:48:47] Speaker 1: I guess you would agree that agency could emerge even if we're not explicitly trying to make it emerge.
[00:48:55] Laura: Yeah, I think that's the interesting case. I, that's, I've been thinking about this a lot recently. And I think the interesting case is when it's emerged. So there's this definition from Zach Kenton from DeepMind. They also have a safety interest in, in agency. And a couple of years ago, they made this definition of agency that's about how, if you, an agent is something that it changes its policy when its actions affect the environment in a different way. And that's a nice definition. And I think that definitely captures something that I also find important about agency. But you can kind of trivially make a system of LLMs in an environment or something where the environment is also an LLM, such that it adheres to this definition. So I think the important thing is like, when does something like that emerge from something as simple as next token prediction? And that's kind of what I'm interested in.
[00:49:49] Speaker 1: How might we measure that?
[00:49:51] Laura: Yeah, that's a good question. And I have no answer to that. But I've been thinking a lot about that. I've been even speaking to some psychologists, Ellen Sue at NYU, actually, who works on intent detection in AI. So there is methods we can learn from in psychology that can help us inform here. But I think the thing I've been thinking about is what makes agency potentially interesting and complex is planning. So if an agent, if it's can't plan, it's probably not super useful or dangerous. And planning seems like somehow an important aspect of an agent that's able to achieve complex goals. So I've been thinking more about planning and trying to detect when a model can be doing planning and when a next token predictor can actually be set to do planning.
[00:50:49] Speaker 1: Yeah, it's so interesting that so many people are converging on the same idea. I mean, certainly in active inference, Carl Fristin would say that the planning horizon is basically the measure of the degree of agency that a thing has. Even Eliezer Yudkowsky, he basically said that, you know, an intelligent thing is defined by its planning horizon pretty much.
[00:51:09] Laura: Yeah.
[00:51:09] Speaker 1: And Josje Bak told me that agency is the ability to control the future and the future, of course, implies a planning horizon. But you definitely think of agency, though, fundamentally as about this kind of cybernetic information exchange with the environment. Can you tell me about that?
[00:51:28] Laura: Yeah, so you just said that someone called it the ability to control the future. That maps on to what I think. I think an agent is something that takes actions in order to control its own future inputs, which is essentially the same thing said differently. And I also think, importantly, it is able to do this in uncertainty, under uncertainty, in uncertain environments, because you kind of want to get at this distinction between reflexes and maybe deterministic environments where nothing changes and environments where there is uncertainty and the system can still control the future.
[00:52:09] Speaker 1: In the kind of, you know, the biological world, we are decomposed into all of these autonomous cells and agency is just something which emerges through the sheer complexity of interaction. Yeah. Yet we still talk about LLMs as having a type of agency. What's the difference between the two?
[00:52:30] Laura: Oh, that's a difficult question. I think it's just an abstraction that we use to describe a complex behavior. And we can get at that abstraction in a way that it applies both to the, you know, balls of cells that we are and to the other types of cells that the LLMs are composed in a very different way. I think what you can't get at this view is, you know, it feels like something to be an agent kind of thing. Like, there is something that's, you know, that's explained by this abstraction that I was describing earlier that maybe doesn't describe what it is like to be an agent or something like that. Or whether or whether or not it feels like I'm setting my own goals or whether they're induced by the environment. And I don't know how to make a definition that can distinguish between those two things.
[00:53:29] Speaker 1: I suppose the world model comes into it as well. That in order to do planning into the future, you have to have a very good representation of the world.
[00:53:37] Laura: Yeah, definitely. The more, like, causal your world model is, the better you can plan. And, I mean, you also need other things, like some way to represent the possible futures that you're holding out. But it's definitely, yeah.
[00:53:51] Speaker 1: And even that seems to suggest that causally embedded agents, we have this active sense-making, continual learning. So we're always doing experiments, right? We're kind of learning about the microcausal patterns in the world, which makes us more, you know, it makes our world model higher fidelity. Language models seem to have a very globalized version of that, but that still works quite well.
[00:54:15] Laura: What do you mean by globalized?
[00:54:17] Speaker 1: As in, even though they're learned in all of these patterns from many, many data sources that have been mixed together, they can learn powerful representations that can respond well.
[00:54:27] Laura: Oh, yeah, yeah, yeah.
[00:54:27] Speaker 1: But we're, like, in the situation, continually learning active sensing, like, finding out about our environment. So it feels like we understand our world that we're in even better.
[00:54:38] Laura: Yeah. No, definitely. That's true. I think we have, I mean, we have these sort of core knowledge systems that our intelligence is built upon, right? And that are present in all animals on the world to some extent. And that shows that they are just so useful for surviving in the world that they just emerge for everything. Whereas language models are trained on language and they have probably some sense of all these kind of things. And, but they're not constrained in the same way that we are. Language is inherently able to describe impossibilities and things that are physically not possible and imaginate and imagine and stuff like that. So it's, it's also not so surprising that they show some different behavior and hallucinate and produce impossibilities. But humans are learning in a very, very different environment. And we have also learned to talk about impossible situations, true language, and to imagine a future that is possible or not possible and reason about these things. But we're still constrained by, you know, the physical reality.
[00:55:45] Speaker 1: So I often have disagreements with my co-host, Dr. Duggar. So he has a real no-nonsense definition of agency. He thinks it's basically just an automaton, right? So it's, I mean, I can give you the definition. It's a machine that receives input S from an environment E, performs a computation C that depends on a non-empty subset of S, and takes action A that depends on C to modify E. So it's, it's basically like, you know, you have an environment, pretty much. And to be honest, you could use this rough definition even to describe active inference and many other things like that. But the thing I don't like about it is, you know, it's basically describing a kind of state machine. And of course, like for him, the environment could mean like any, any ambient things in the environment. And for him, computation is very important. So he's a big fan of like the Chomsky hierarchy. And he thinks there's something special about Turing machines. So he thinks that we as strong agents, we must be able to do this kind of recursive, nested, iterative form of computing, which is what allows us to do planning and whatnot. But to me, that just seems a little bit like a little bit weird, right? I love this philosophical notion of agency. And I realize this is a bit wishy-washy because I'm using words like emergent self-organization, autonomy, learning, adaptability, intentionality, you know, degrees of agency and all of this kind of stuff. And it kind of feels like how can a computer program that maps from an input to an output, how could that be an agent?
[00:57:17] Laura: Yeah. Yeah, that's, that's really the question, right? And I think that I agree with you that this definition that your co-host gives is, I mean, it's a fair definition. I just think it puts the emphasis on the wrong thing. I think it exactly doesn't explain what I find interesting about agency, which is this like acting under uncertainty kind of idea, right? It doesn't, it doesn't get at that. And there's something very intuitive about agents to us. And it would be useful if we were able to describe that in a way that's more, more abstracted away from computation than this definition that somehow gets at the difference between a thermometer that you could also describe in the system and an agent, because that's what we're trying to do, right? And maybe there is, maybe there is no distinction, but humans perceive a distinction. Like, it's, it's actually one of the core knowledge systems agency. And this is very nicely shown by this Heider and, and similar simulation, this video from the 1940s, where you have a big triangle and a small triangle moving around in a 2D environment. And there's a, there's a little box with an opening and the small triangle is trying to escape from the big triangle and it's going into the box and a big triangle is like bumping against the box. And these are just moving shapes, but we immediately assign agency to them. And we say the big triangle is mean and the small triangle is scared. And maybe this is a failure of our application of the agent core knowledge system, because they are not agents, probably someone programmed them to be. We intuitively pick out an agent from a thermometer and that's the distinction I want to get at. And the definition by your co-host doesn't really get at that.
[00:59:04] Speaker 1: To what extent is agency just, just the way we think?
[00:59:09] Laura: Yeah. You mean it's not, it's just not real or something.
[00:59:13] Speaker 1: It could be, it could be, it could be both. It could be because it's real. It's such an important way of dividing the world up that it's become embedded in us as a core cognitive primitive, but it, it seems fundamental to the way we recognize things.
[00:59:26] Laura: Yeah. Yeah. It definitely seems fundamental. I think it's just important in the sense that an agent can be of use to us or in a different way than a non-agent can, or can be dangerous to us in a different way than a non-agent can. And, and whether or not that's, um, just, you know, something we perceive that's not sort of fundamentally there doesn't really matter then, I think.
[00:59:52] Speaker 1: Yeah. One thing I guess that sometimes people say, well, we just philosophize everything. And, you know, certainly when we talk about consciousness, you know, there's people like, um, David Chalmers, who says, you know, it might, we might be philosophical zombies. It might just be a little bit extra. And even with free will, which is almost like a stronger form of agency, which is that in the situation you could have done differently. So we're kind of like imagining how things could have been differently. And it's a similar thing with intentionality. Like we think that intentionality is, is like something on top of what a language model might do or what an automaton might do. Yeah. And do, do, do, do these philosophical properties are like, are they useful?
[01:00:33] Laura: I think they are because again, they get it, uh, so it's, it definitely feels like something to be conscious to me. And, and people have talked about that a lot. So it, it must get at something interesting, I'd say, um, and therefore it's useful. And I think similarly with, um, intentionality, um, it's, I just view it as a, as a useful abstraction of, of behavior that can guide us towards understanding better. Um, uh, maybe how cognition has emerged or, or how animals, certain animals are different from other animals and how, um, and can also help us evaluate an artificial intelligence and whether or not they are doing something that can be seen as intentional and, um, goal directed.
[01:01:22] Speaker 1: So I think the other thing I don't like about the automaton view or maybe even reinforcement learning as, as an extension is that, um, it's a form of behaviorism, which is that we only, we only look at what the thing does and we don't have rich cognitive models of like what the mental states are. And if, it feels to me, and maybe this is like an interesting departure for you because like in, in a way, like with the language model discussion, it felt like you were arguing that we don't really, you know, it doesn't. It doesn't, it doesn't matter if we convolve functions together into this big soup, but with agency, it feels like you are saying that we need to have an explicit structure of how an agent thinks.
[01:02:02] Laura: Yeah. So I don't think it doesn't, representations don't matter, right? I think they matter a lot. I think there, there is a distinction in, um, pragmatic representations that are purely goal directed and representations that are somewhat divorced from the current situation you're in or the current goal you have. I think both are important and I think both exist in the real world, in animals and also in language models. And, um, so I don't think it matters that we, whether or not we convolve or how we do it, but I do think it's important to reason about what kind of representations have been learned and whether they are nicely reflective of a causal world model that we want the models have learned. So I think this behaviorist, what you said that this definition is a bit behaviorist, I think that kind of gets at what my problem is with it. I think, yeah, because it's, it's, it's sort of like, sure, yes, this definition applies, but it doesn't explain to me why I care about this system and what is interesting about this system. And, um, yeah, often behavior can, you know, explain a lot and you can say a lot about the behavior, but if you know something about the representations that produce that behavior, you can describe a system in a more useful way.
[01:03:20] Speaker 1: I know you're a big fan of the simulators article by, by Janus and he said, I guess you can, you can interpret this in an agential way that there's some kind of agential decomposition of a language model into these role players. Yeah. What do you think about that?
[01:03:36] Laura: Yeah, I, I think that was like, so first of all, I'm a huge fan of the article, but I, I, um, became a big fan because of Jacob Andrea's language models as agent models paper, because somehow he describes it in a type of language that I find easier to follow. But the simulators post has, of course, been hugely influential and also for me, in my conceptualization of, of language models, this is essentially the reason why I think like, I think about them maybe modeling human intent and the intent of the agents of the text that they have been, uh, learning from and this view of them as a sort of superposition of many different agents just is such a rich conceptualization. And that really explains many things, both their successes and their failures. And I think that's, what's cool about it.
[01:04:25] Speaker 1: One interesting thing about the article is, is this notion of coherence. So when an, when an agent is, you know, like a role player is selected, then that role player will stick around for a little while. And certainly it feels like our intuitive notion of agency in the real world is that we maintain ourselves and we also stay kind of coherent over time.
[01:04:44] Laura: Yeah. A bit, maybe.
[01:04:46] Speaker 1: Yeah. A little bit.
[01:04:47] Laura: I think that's definitely changed my views over time. I think that's also the sign of, yeah, and it's important, but no, you're right. Like, I think there's also been a paper here and that shows that are here at NeurIPS, uh, that shows that language models are not, don't stay in character as long as, as actual agents do or humans do.
[01:05:06] Speaker 1: Tell me more.
[01:05:07] Laura: To be honest, I didn't, I didn't read it, but I saw it as something that like I, I, I should look into, but I, I mean, definitely they're probably not as coherent and they don't stick to their role as, as clearly as humans do. I think that's probably the nature of the nature of the nature of the nature of being an approximate sort of agent or a superposition of agents and that you can't really disentangle one agent from the other agents.
[01:05:31] Speaker 1: And what do you think of, um, non-physical agency? So I'll give you an example of that. Um, we, as a collective form a kind of agency, you know, like a meme is a type of agent maybe. It, and I know Dagar doesn't agree with me about this, um, even the COVID virus, I mean, I heard that flu rates are dramatically up in the UK and there's a weird kind of symbiotic relationship between flu and COVID. So when flu is up, COVID is down and it's almost as if they are these virtual agents that are sort of like interacting with each other through the hosts.
[01:06:04] Laura: Yeah, no, I think that makes total sense. I think probably it will be really hard to say that a sort of collection of agents is not an agent, but an agent itself is. And I think it can be a useful, um, useful way to represent something. For example, a company can be seen as a group of agents, right? And how, how they behave, how it behaves. But at the same time, at the company level, there's something that seems sort of extra, uh, that you can't exactly explain from the parts, which might be some kind of emerge, I don't know, some kind of emergence, I don't, yeah, that's hard to describe, but that many people, of course, have thought about. But I think it makes sense that a collective of agents can also be seen and abstracted as an agent in some sense. But there's probably also something to, uh, sort of single agent that understands, you know, the actions you're taking that are, uh, that you guide. Whereas in a collection of agents, maybe, maybe that's, that becomes different or more difficult or something.
[01:07:10] Speaker 1: When we look at a super agent, like a company or a country or a religion or something like that, do you think the purpose bubbles up or down?
[01:07:21] Laura: Oh, um, um, let's see. Both? I think both. Yeah. Yeah. I think the purpose of a company is probably definitely some combination of the people that work there. Um, and then probably the company as a whole forms some values or something that then inform the agents individually or something again as well. Um, yeah.
[01:07:48] Speaker 1: On this subject, uh, that I was discussing with, with Benjo about AI safety, is that, is that something you're concerned about?
[01:07:55] Laura: Yeah.
[01:07:56] Speaker 1: Tell me more.
[01:07:57] Laura: Um, yeah. I think just, if you think about, and if you philosophically think about a system that's intelligent, that is just, can be dangerous. So, um, as a society, we don't even really know how to control humans, but we have set up a pretty okay system to do it. That fails at different levels, right? At the individual level, at the between country level, at all kinds of level, it fails sometimes. That's scary and dangerous. And I think intelligence is, um, not so special that we can never build it. Um, therefore that can be dangerous, right? Um, but I, I've, I've struggled talking about my timelines as in when, when will this happen? And I have no idea, um, I don't see it happening in the next three years. I see, I feel like a lot needs to change. I think society, um, move slow as well. I think there's massive issues in adoption. Like these, these systems are not reliable. So I think as sort of philosophically about an, an, an intelligent agent is dangerous and a whole separate, separate thing of AI safety that I found even more compelling is that if we slowly give over control to dumb agents or dumb, dumb, dumb AI, that's, that can also be dangerous in a society like ours. And that's also something I worry about. So I think that's, um, I think understanding how everything works and how these system works is important. Um, because I also think it can, it, you know, I'm, I'm not purely a pessimist. I think it could bring a lot of, um, great things to the world. There's a lot of things that should probably be automated because, or at least it would be great if we can get certain professions, some help because, you know, we're all getting older and, um, a lot of people are working in care. And, uh, if we don't do something, I'm not saying AI is going to help there, but it will be great if, if, if, if AI could alleviate some of the things that are going to become more difficult in the future. Um, like healthcare, if they could make, uh, doctors more productive there, for example. Um, but it's really non-trivial to think about how it can have a positive impact. I think, and it's, it's good that lots of people are thinking about it.
[01:10:19] Speaker 1: I love agency as a kind of mental model to think about this, cause if it is the ability to control the future, then to me, it's analogous to power. Yeah. And certainly talking about power dynamics is, is the language of talking, you know, how, how we should govern this. And I can see many arguments. I can see how this kind of technology actually takes away our agency. It can also dramatically give us agency because all of a sudden people can build chemical weapons and bombs and stuff like that. Yeah. Um, but, but the, the other concern of course, is that it itself will, you know, adopt a form of agency and through instrumental goals or whatever.
[01:10:56] Laura: Yeah.
[01:10:57] Speaker 1: You know, so, so on, on those three, where do you see the, the most significant risk is?
[01:11:02] Laura: I think all are risky. I think the, the thing I am most worried about is, um, skewed access. So if AI becomes very useful and makes us more productive, it would be great if we can distribute that in society in a way that helps people. Um, like, I think technical, not technological improvements have maybe not, have not like in the right proportion, uh, help the right people. And that's, you know, a result of the system we live in and a result of our politics. So I think it's, it's really important in the future to think more about that and to think more about how we can, uh, give access to the right people. So, and I think the way to go about that is policy and, um, think about how our system works, how our economy works and, and be prepared for, um, massive in, uh, improvements in AI capabilities.
[01:12:01] Speaker 1: If this starts to go bad, what do you think, cause obviously that there should be some kind of a harbinger, you know, what would be the early warning signal for you?
[01:12:09] Laura: Sorry, what's harbinger?
[01:12:10] Speaker 1: As in like, um, a harbinger is like a, some, like a signal that something bad is about to happen.
[01:12:16] Laura: Okay. Yeah. Yeah. I, I think probably that's like, that's not what's going to happen. I think we are going to slowly build something. And then at some point we're going to be like, oh wait, you know, remember back then when there were elections and Facebook apparently, uh, may have influenced them. And we built this tool and we didn't realize how it would affect us. And I think that's, that's will probably also work like this with AI. We don't understand what intelligence is and probably we won't recognize it immediately if we see it.
[01:12:44] Speaker 1: That's fascinating. Yeah. I love this notion of kind of undermining our weaknesses. So it's, it's actually a very sort of alien diffused thing that we might not even be fully cognizant that it's happening.
[01:12:57] Laura: Yeah, exactly. Yeah. I think that's more likely to happen, uh, than that we're all of a sudden going to be like, oh, wait a minute. This is, this is dangerous though. I mean, there are also examples of that happening, right? Like chat GPT, um, in NeurIPS 2020, 2021, 2019, 22, 22, a couple of years off. Yeah. So I remember chat GPT dropping and that was for me the first time that I was like, oh my God, language models are crazy. But chat GPT was a very, so OpenAI is like, they made much more than incremental process. But chat GPT itself that dropped that, that day was maybe just somewhat incremental in the sense that it was just like a usable interface to a model that was already pretty powerful. And it helps us understand that GPT-3 was actually really powerful with, uh, some instruction tuning on top and actually a chat interface. So that, that, that was like a sort of slow, um, change of things that immediately made people aware of like, wow, this, this is pretty, pretty crazy. So maybe there can be something similar where, where, where an AI does something that we all didn't expect it to do. And that makes us collectively think like we now have to like pay attention and, and, you know, change things.
[01:14:23] Speaker 1: But yeah, I think the, um, the locus of AI is, is something that we're hinting at as well. Because certainly no one at Meta intended for these issues of social media to, to happen, you know, they built these algorithms, they, they just kept going one step, you know, let's build like an advertising system. Let's do collaborative filtering and all of these stuff are just externalities that I don't think that, you know, so it's like this unwitting agency. But then what is the agent, so like the whole system, including us, we're, we're, we're the agent, and it's the same thing with AI that we're in a way we're looking for agency inside the large language model. But like we, as a system, we're actually a, a weird form of new collective intelligence that no one really even understands. And that's pretty scary, isn't it?
[01:15:04] Laura: Like, yeah, if, yeah, if, if Facebook can be seen as an agent in and of itself, and we have built legal structures around who to blame for what, right. But that doesn't mean that they, these people that we blame intended that to happen. So that's pretty scary. And that can become even scarier when you build a bunch of intelligence, artificial agents that, that you cannot subject to the same level of societal control that we do to ourselves.
[01:15:32] Speaker 1: So we last spoke at Europe's 2020, at 2022. Yeah. And I feel that you have, you've moved your position a little bit since then. Yeah. Can you, can you talk me through that?
[01:15:46] Laura: Yeah. Um, I, at the time, I, I, so I took a while accepting that JTBC is cool. Uh, like many others. Um, yeah, I was skeptical at first, and especially like the amounts of data that it has been shown. And that's also like my recent paper has again moved my opinion, right. I, I, for a while also thought they're doing a bit more like, um, less generalizable retrieval than the kind of approximate, um, generalizing they're doing now. And over time, I've just changed my view of, of, um, how promising this approach is. And I can pinpoint it to, um, a specific thing also that happens is I put out this paper on LLMs are not zero shot communicators. And at the time I thought like zero shot communication is pretty important, right? All of us can do it. We don't need five examples. So I thought, okay, we need to, we need to make sure these models can, um, respond zero shot to these questions. But later I, I developed this view of them being multi-task learners and general learners that you, you do need to find the right way to interact with them. Right. And, and one very salient, um, memory for me was that Andrew Lampinen described it as, I think even on this show, he said zero shot prompting a language model is like walking down the street and shouting to someone like, what is 15 times 32? And they're going to be like, you know, who are you? Far off. And that was sort of the, his analogy for zero shots, um, zero shot reasoning. And that makes total sense to me. Like just because they can't do something in which your specific zero shot prompt doesn't mean that they can't do it at all. And it's important for them to do zero shot. Definitely. It's a limitation, but if they can't do it, you need to try a few shots and you need to find the right prompt. You also don't need to go overboard. You don't want to do, um, you know, prompt engineering on the test set essentially. But, uh, there's a, the middle ground.
[01:17:50] Speaker 1: Laura, thank you so much for joining us today. It's been amazing.
[01:17:53] Laura: Thank you. Thank you.
[01:17:55] Speaker ?: Thank you. Thank you.
Related Transcripts from Machine Learning Street Talk