About this transcript: This is a full AI-generated transcript of Pattern Recognition vs True Intelligence — François Chollet from Machine Learning Street Talk, published June 6, 2026. The transcript contains 25,573 words with timestamps and was generated using Whisper AI.
"Intelligence is very specifically your ability to handle novelty, to deal with situations you've not seen before, and come up on the fly with models that make sense in the context of that situation. And this is actually something that you see very little of in LLAMS. If you ask them to solve..."
[00:00:00] Francois Chollet: Intelligence is very specifically your ability to handle novelty, to deal with situations you've not seen before, and come up on the fly with models that make sense in the context of that situation. And this is actually something that you see very little of in LLAMS. If you ask them to solve problems that are significantly different from anything they've seen in their training, they will fail. The Abstraction Reasoning Corpus for Artificial General Intelligence, or ArcGi for short, you can think of it as a kind of IQ test that can be taken by humans, it's actually very easy for humans, or AI agents. Every task that you see, every task you get, is novel. It's different from any other task in the data set. It's also different from anything you may find online. ArcGi is designed to be resistant to memorization, and all the other benchmarks can be hacked by memory alone.
[00:00:57] Speaker 2: When I've spoken to AI researchers, I've gone through Arc challenges together with them, and they are trying to look at their introspection. So they're saying, I'm looking at this problem, and I know it's got something to do with color, I know it's got something to do with counting, and then they run the program in their mind, and they say, one, two, three, no, that doesn't work, that doesn't work.
[00:01:18] Francois Chollet: I think introspection is very effective when it comes to getting some idea of how your mind handles System 2 thinking. I think it's not very effective for System 1 because System 1 is inherently not something you have direct access to. It happens unconsciously, instantly, in parts of your brain that you're not directly observing via your unconsciousness. But System 2 is not like that. System 2 is very deliberate, it's very slow, very low bandwidth. It's very introspectable, but what's not mentioned here is…
[00:01:58] Speaker 2: Francois Chollet, it's an honor to have you on the show. So, honestly, this means so much to me, you're my hero, so thank you so much.
[00:02:05] Francois Chollet: It's my pleasure to be here, and I would say you shouldn't have heroes, right? It's… I shouldn't? No. Why not? It makes for a disappointing experience. Um, not for me. Okay. Yeah, not for me.
[00:02:19] Speaker 3: But hopefully I can live up to the expectations. Oh, definitely, I'm sure you will.
[00:02:25] Speaker 2: Francois, I mean, you've been critical of the idea of scale is all you need in AI. Can you tell me about that?
[00:02:32] Francois Chollet: Sure. So, yeah, so this idea that scale is all you need is something that comes from the observation of scaling laws when training deep neural networks, which is… So, scaling laws are this relationship between the performance you see in deep learning models, so typically LLMs, and how much data and compute went into training them. Um, and it's this sort of, like, logarithmic scaling of LLM performance as a function of training computes. Typically, that's how it's formulated. And many people are extrapolating from that that, well, there's no limit to how much performance we can get out of these models. All we need is to scale up the compute by a few orders of magnitude, right? And eventually we get much beyond a human-level performance, purely via scaling compute, with no change in architecture, with no change in training paradigm. And, well, the major flaw here is the way you measure performance. In this case, performance is measured via exam-style benchmarks, which are effectively memorization games. So, you're effectively measuring how good the LLM is at memorizing the answers to the questions that you're going to test it on. Not necessarily the exact answers, but maybe the sort of, like, program templates that you need to apply to arrive at the answers. And if you're measuring something that's fundamentally driven by memory, then it makes sense that as you increase the amount of memory in the system, like the number of parameters, the amount of trained data, and compute is really just a proxy for that, you see a higher performance. Because of course, if you can memorize more, you're going to do better at your memory game. My take is that this performance increase that you're observing, it is actually orthogonal to intelligence. You are not really measuring intelligence because your benchmark can be hacked purely by preparing for it, by memorizing things in advance. If you want to benchmark intelligence, you need a different kind of game, a game that you cannot prepare for, something like Arc, for instance. And I think if you look at performance on Arc over time or the function of compute, you don't see this relationship. In fact, the highest performing models on Arc today did not require tons of compute. And some program search approaches actually did not require any training time compute because they were not trained at all. They do require some inference time compute, but it's not very large amounts.
[00:05:39] Speaker 2: So you said that language models are interpolative databases. And I've spoken with Sabaro the other day, and he calls them approximate retrieval systems. And many people say to me, Tim, this is ridiculous. Of course, they're not databases. They do extrapolation. But I think as an intuition pump around memorization, that is what they do. And you wrote a Substack blog about this as well. Yes.
[00:06:05] Francois Chollet: Yes. Memorization is what they do. I think the part where people get stuck is that when they hear memorization, they think the LLMs are just memorizing answers to questions. They are just memorizing content. And, of course, they do memorize a lot of content, a lot of knowledge and factories and so on. But that's not primarily what they do. What they're primarily memorizing is functions, programs. And these programs do generalize to some extent. There can be a lot of free generalization. And when you query LLMs, you are basically querying a point in program space. You can think of the LLM as a manifold where each point encodes a program. And of course, you can interpolate across this manifold to compose programs or combine programs via interpolation like this, which means that you have an infinite number of possible programs to choose from. And what happens with LLMs is you are training them, training these very rich, very flexible models to predict the next token, right? And if you had infinite memory capacity, what you could do is, of course, just learn a lookup table, right? But in practice, the LLM only has some billions of parameters, so it cannot just learn a lookup table for every sequence in this train data. It has to compress. And so what's actually learning is predictive functions that take and they take the form of vector functions, of course, because the LLM is a curve. So the only thing you can encode with a curve is a bunch of vector functions. And so you're learning these vector functions that take as input elements of the entry sequence and output elements of what comes after that. So for instance, let's say the LLM comes across the works of Shakespeare for the first time, but the LLM has already learned a model of the English language. Well, now the text that it's looking at is slightly different, but it's still the English language. So it is possible to model it by using a lot of functions that came from learning to model English in general. And it becomes much easier to model Shakespeare by just learning a sort of style transfer function that will go from the model you have to this Shakespeare-sounding text. And it's kind of like how you will end up with things like the ability to do textual style transfer with an LLM, right? It's because it turns out that it is more compressive to learn style independently from content. And based on the same kind of model, the LLM is going to learn millions of independent predictive functions like this. And it can, of course, combine them via interpolation because they're all vector functions. They're not like discrete programs like you might imagine a Python program, for instance. They're not like that. They're actually vector functions.
[00:09:36] Speaker 2: Because when you say program, I think a lot of people think of a program as being something with conditional logic and with an LLM.
[00:09:43] Francois Chollet: That's not what they are.
[00:09:44] Speaker 2: Yeah, it's almost like in an input-sensitive way. You see this kind of traversal through the model and it's like a mapping. So it feels more like...
[00:09:53] Francois Chollet: It's an input-to-output mapping and that mapping is continuous. And it is implemented via a curve.
[00:09:59] Speaker 2: But we can describe that as a program.
[00:10:01] Francois Chollet: Yes, of course. Yes. They are functions.
[00:10:04] Speaker 2: Yes. And you said they were compositional.
[00:10:07] Francois Chollet: Yes. Because these functions are vector functions, you can sum them, for instance. You can interpolate between them to produce new functions.
[00:10:19] Speaker 2: I love this kaleidoscope hypothesis. So can you, you know, dramatically introduce the kaleidoscope hypothesis?
[00:10:28] Francois Chollet: Sure. So everyone knows where the kaleidoscope is, right? It's like this cardboard tube with a few bits of colored glass in it. And these just like few bits of original information get mirrored and repeated and transformed. And they create this tremendous richness of complex patterns. You know, it's beautiful. And the kaleidoscope hypothesis is this idea that the world in general and any domain in particular follows the same structure that it appears on the surface to be extremely rich and complex and infinitely novel with every passing moment. But in reality, it is made from the repetition and composition of just a few atoms of meaning. And a big part of intelligence is the process of mining your experience of the world to identify the bits that are repeated and extract these unique atoms of meaning. And when we extract them, we call them abstractions. And then as we build sort of like inner banks of such abstractions, then we can reuse them things that we're going to make sense to make sense of novel situations, of situations that appear to be extremely unique and novel on the surface. But actually, they can be interpreted by composing together these reusable abstractions. So that's the fundamental idea behind intelligence, intelligence is a cognitive mechanism that you use to adapt to novelty, to make sense of situations you've never seen before. And it works by creating models on the fly of the new situation by combining together existing building blocks, abstract building blocks, which were mined from your past experience. And there are two key tricks here. One trick is the synthesis trick, whereby you take these building blocks and quickly assemble them to form a program, a model that matches the current task or the current situation that you're facing. And that's the synthesis. And there's abstraction generation, which is the reverse process in which you're looking at the information you've got available about the world, like your experience, your perception, also the models that you've created to respond to it. And you're going to turn that, distill it into reusable abstractions, which you then store in your memory so that you can use it the next time around. So synthesis and abstraction generation. And together, they form intelligence in my model, at least in my architecture of EGI.
[00:13:39] Speaker 2: So you've been prominent in the AI space for many, many years now. What experiences or insights led you to develop such a clear perspective of intelligence so early in your career?
[00:13:51] Francois Chollet: Right. So if you read some of my old blog posts or the first edition of my deep learning blog posts, book, you see that I started talking about how deep learning could do system one very well, but could not do system two. And I started talking about the need for program synthesis roughly in mid-2016. I mean, I started writing a lot in 2017, but in practice, I started forming these ideas in 2016. And there are several things that led me to it. I think one of the big catalyst events was working on automated theorem proving using deep learning with Christian Segedy. And the key idea was, you know, theorem proving is very akin to program synthesis. You're basically doing tree search with operators taken from a DSL. And the key idea was to use a deep learning model to guide the search process. And so I tried to do it for a pretty long time, you know, trying and trying lots of different ideas. And everything I was trying basically failed. I mean, it was doing much better than random, but if you analyzed how it was performing and how it was producing that ability to perform better than random, it was just doing a shallow pattern recognition. It was not really doing any kind of system tree reasoning. And it seemed like a huge obstacle that I was just not able to overcome by tweaking the architecture or the treeing data or anything else. There was this pattern recognition shortcut available. And this shortcut would be taken every single time. You could not learn generalizable discrete programs via deep learning. And that came as a big insight to me because, you know, before that point, I was, you know, like everybody else in the field. I was under the assumption that deep learning models were a very general computing substrate. That you could train deep learning models to perform any kind of computation. that, you know, they were Turing complete. That they were Turing complete. And around the same time, you know, 2015, 2016, there were a lot of similar ideas floating around, like the concept of the old Turing machine. For instance, people thought, and I thought, this was a very promising direction that deep learning could ultimately replace handwritten software, you know. So I subscribed to these ideas very early on. But then in these experiments, trying to get neural networks to do math, I realized that actually they were fundamentally limited. That they were a pattern recognition engine. And that if you wanted to do system two thinking, you needed something else. You needed program synthesis. So that's when I had this realization, I started talking about it. But in general, you know, I've been thinking about intelligence and how to create it for quite a long time, like my first sort of like EGI architecture, something I developed in back in 2010, summer 2010. So and the reason I developed it is because I was already thinking about it for a few years before that. So I've been I've been in the field for quite a while.
[00:17:42] Speaker 2: Quick meditation on the shortcut rule, because I think this gets to the core of it. Yes. Deep learning. I mean, basically, we're projecting into a Euclidean space. And the only semantic metric is the Euclidean distance. And, you know, so these models learn a spectrum of spurious correlations and perhaps more spurious than not spurious. Sure.
[00:18:07] Francois Chollet: So in general, the reason they're doing this is because spurious correlations are always available to explain something, no matter what you're looking at. There's always some element of noise, which you can wrongly interpret as being meaningful. And it's also because deep learning models, they're curves, meaning that they are continuous, differentiable surfaces in a higher dimensional space. And we are fitting the parameters of these curves via stochastic gradient descent. And a curve is, you can represent many things with a curve, but it's a very bad substrate to represent any sort of discrete computation. You can do it, you can embed discrete processing on a curve, but it's just not a very good idea. Right? It's not easy to fit generalizable discrete programs in this format. And this is why you end up with things like the fact that it's tremendously difficult to get a deep neural network to learn how to sort a list or to learn how to add two sequences of digits. For instance, even LLMs, state-of-the-art LLMs, they have a very hard time doing it. It's like they've been trained on millions of examples of adding digits, but still they are only achieving something like 70% accuracy on new digits. So they've memorized a program to do it, but because this program is a vector function, is embedded on a curve, it is not a very good program, it is not very accurate. And you see this time and time again with any sort of algorithmic type processing.
[00:19:49] Speaker 2: And just for those of you at home, a piecewise linear function is still a curve. People might get confused by that because they think of a curve as being this smooth thing, but if you look at the Wikipedia definition of curve, you're absolutely right, it's still a curve. You mentioned the neural Turing machine, which actually isn't a Turing machine, of course, but it behaves a little bit like one. What do you see is the gap there, you know, with neural networks not being Turing machine?
[00:20:19] Francois Chollet: Fundamentally, I think fitting parametric curves ground descent is a good fit for what I call value-centric abstraction, which is the idea that you're going to compare things via a continuous distance function, which leads to the idea that you're going to embed things. And by things, I mean like instances of something like could be images, could be discrete concepts, could be words, right? That's going to lead to this idea that you're going to embed them on a manifold, so a space where two things that are similar end up close together and different dimensions of variation on your manifold are semantically meaningful. You can do this with curves because the sort of like continuous, it naturally leads you to compare things via continuous distance. But that's a very bad fit for any kind of type 2 abstraction, like what I call program-centric abstraction, where you're actually interested in graphs and you're not interested in comparing graphs via a distance function you're interested in comparing when two graphs are exactly identical to each other, or more precisely when a graph appears to be a subcomponent of a larger graph. So for instance, as a software engineer, if I'm refactoring some code, if I want to compress my code by expressing multiple functions as just one function, I am not interested in how close the functions feel on the perceptual level. I'm interested in whether they are implementing the exact program or maybe in different forms. Maybe I need to inject some abstraction in there. And this is a comparison that you have to do in a very explicit, step-by-step way. You cannot just look at two pieces of code and instantly say, without having to think about it, oh yeah, they look similar.
[00:22:31] Speaker 2: And how would you describe that capability? It's like a kind of epistemic risk rather than an aleatoric risk. Or verification might be a better way of describing it.
[00:22:40] Francois Chollet: Yeah, step-by-step verification is a good way of describing it. And, you know, I just said it's definitely not like this sort of like perceptual, continuous distance style comparison. And that's true. But I think it can also be guided by perception. It's like doing this step-by-step exact comparison is very costly. It requires, you know, all of your attention expanded over some length of time. So you're not going to want to do it kind of in a brute force-like way over many different possible candidate functions. You want to use your intuition to identify just a small number of options. And these options you're going to try to verify exactly. So I do think we have the ability to do approximate distance comparisons between discrete objects. But the key thing to keep in mind is that these fast comparisons are not exact, right? They're approximate, so they might be wrong. And I think you get the same type of outputs from an LLM if you're trying to use it for programming. They will often give you things that feel right but aren't exactly right. And in general, I think that's the thing to keep in mind when using deep learning or when you're using LLMs is that they're very good at giving you things that are directionally accurate but not actually accurate. So if you want to use them well, you need this post-facto verification step.
[00:24:34] Speaker 2: So watching your children grow up, how has it influenced your thinking on intelligence and learning?
[00:24:40] Francois Chollet: One thing you notice when you watch children grow up is the fact that constructivism is entirely right. That they learn things in a very active manner. They try things out. And from these experiences, these very deliberate experiences, they extract new skills, which then they reinvest in new goals. And in general, you know, you see pretty clearly that learning, learning in general, but especially in children, is structured in what I would describe as a series of feedback loops. Where the child will notice something interesting, come up with an idea, set that as a goal. Like imagine you're there on the floor, crawling, then you notice something that looks intriguing. So you're like, hey, I'm gonna grab it, right? So that's your goal. And now you're entering this sort of feedback loop where you're trying to reach that goal. You're doing something towards it, then you get some feedback and you're evaluating, right? You have this sort of like plan, action, feedback, back to plan loop. And if you reach the goal, then in the process, you will have learned something. And you will be able to reinvest that new skill in your next endeavour. And the way they set goals is always grounded in the things they already know about. And you start not knowing much, like when you're born, you're animated by just a few reflexes. But when you start forming these goals, they always come from this layer that you've already mastered. And you're building your own mind, kind of like layer by layer. Like at first, for instance, one of your most important sensory motor affordances is your mouth. Because you have the sucking reflex, which is extremely important. It's something that you're born with. It's not something that's acquired. It's extremely important because it's high feed, right? And you also have the things like the palmar grasp reflex for grabbing things. But you cannot really use it yet because you are not in full control of your limbs. So you cannot really like grasp things. But when you start being more in control of your limbs, you will want to grasp things. And the reason, the first thing that you try to do after you grasp a thing is you bring it to your mouth to suck it. Because you set this goal because it sounded interesting with respect to the things you already know how to do. With the things you already find to be interesting, right? And once you know how to grab things, you're going to add that to your world. You're sort of like inner world. And you're going to build the next layer on top of those things. So next thing, you're learning to crawl, for instance. Why do you crawl? Why are you trying to move forward? Because you saw an object that seemed interesting that you want to grab. So you are learning to crawl to grab something. You are learning to grab to put it in your mouth. And you're not learning to put things in your mouth because it's already something that's hard-coded. So you're sort of like constructing yourself in this sort of layer-wise fashion. So basically everything you know, everything you think about, is built upon lower-level primitives, which are built upon lower-level primitives and so on. And ultimately it comes back to these extremely basic sensorimotor affordances that newborn children have. I do believe we construct, especially young children, they construct their thoughts based on their sensorimotor experiences in the world. You cannot think in a vacuum. You have to construct thoughts out of something. And that something is extracted from your experience, right? And the younger you are, of course, the more grounded your thoughts are. They relate more directly to the things you're experiencing and doing in the world. As you get older, your thoughts will get increasingly abstract, increasingly disconnected from physicality. But they are ultimately built upon the physical layer. It's just that the tower of layers has gotten so tall that you cannot see the ground anymore, but it's still connected.
[00:29:40] Speaker 2: So children see the kaleidoscope and the kaleidoscope is created from abstractions in the universe. And then children over time derive abstractions from the kaleidoscope and reason over them.
[00:29:53] Francois Chollet: Yeah, they notice bits in their experience or their own actions that appear to be reusable, that appear to be useful to make sense of novel situations. And as you go, you're building up these vast libraries of reusable bits, and having access to them makes you really effective in making sense of new situations.
[00:30:24] Speaker 2: And you said constructivist, which is quite interesting. So do you think children construct different abstractions, or do you think there's a kind of attractor towards representing the abstractions which the universe came up with?
[00:30:38] Francois Chollet: I mean, do different people come up with different models? To some extent, probably yes. But because these models, they're ultimately extracted from the same kind of experiences, and they're extracted via the same kind of process, they will end up being very similar, I would think. I mean, you do definitely see that different children follow slightly different developmental trajectories. But ultimately, they are all somewhat parallel. They are all roughly following the same stages, maybe with different timing, you know.
[00:31:15] Speaker 2: So another interesting thing you've said is, you know, language models have near zero intelligence. And I just wondered, if it's near zero, which part of it is not zero?
[00:31:27] Francois Chollet: Sure. Yeah. And, you know, people think that it's a very prorogative statement, because they're using LLMs all the time, they find them very useful, they seem to make sense, they seem very human-like. And so I'm like, hey, they have near zero intelligence, and that sounds kind of shocking. But the key is to understand that intelligence is a separate concept from skill, from behavior, that you can always be skilled at something, without necessarily being intelligent. And intelligence is very specifically your ability to handle novelty, to deal with situations you've not seen before, and come up on the fly with models that make sense in the context of that situation. And this is actually something that you see very little of in LLMs. If you ask them to solve problems that are significantly different from anything they've seen in their training, they will fail. So that said, if you define intelligence in this way, and you come up with a way to benchmark it, like RKGI for instance, and you try LLMs, like all the state-of-the-art LLMs on it, they don't have zero performance, right? And so this is where the non-zero part of my statement comes from. So that said, it's not entirely clear whether that non-zero performance, that ability to adapt to novel problems, is actual intelligence, or whether it's a flaw of the benchmark. Maybe the benchmark was not actually producing entirely novel problems. Maybe there was very significant overlap between this or that question, and something that the LLM has seen in string data. It's very difficult to control for that, because the LLM has just memorized so much. It has seen, you know, pretty much the entire internet, plus tons of data annotations that were created specifically for that LLM. And we don't know, fundamentally, what's in the string data. So it's kind of difficult to tell. But it does seem to me that LLMs are actually capable of some degree of recombination of what they know to adapt to something that they've genuinely not quite seen before. It's just that the degree of this recombination, their generalization power is very weak, it's very low.
[00:34:01] Speaker 2: This gets to the core of it, because a lot of people argue that this combinatorial creativity, or this cone of extrapolation, does constitute novel model building. And I interpreted what you said as, you know, if we zoom out and think of the training process as well, that obviously is model building.
[00:34:18] Francois Chollet: Obviously, it's just gradient descent, like fitting a curve to a dataset via gradient descent is model building. The major flaw there is that it's very inefficient model building. It requires, to get a good model, you need a dense sampling of pretty much everything the model is going to have to deal with at test time. So the model is effectively only displaying weak generalization. It can adapt to things it has not seen before, but only if they remain very, very close to things it has actually seen before. And where intelligence comes into play is the ability to adapt to things that are way out of the distribution. Because the real world is not a distribution, right? Every day is new, every day is different, but you have to deal with it anyway.
[00:35:13] Speaker 2: Critics will say, and I can empathize, I mean, I use Claude Sonnet all of the time for my coding. I'm paying for about, I don't know, 2000 requests a month on Cursor, so I'm using it a lot. And it appears clairvoyant in many cases. And they would argue, I'm sure that, well, because it's trained on so much stuff, the convex hull is, you know, enough to capture any novelty we might need. Therefore, what's the problem?
[00:35:40] Francois Chollet: Sure, that's something I hear a lot, this idea that, yeah, maybe novelty is overrated. I just need to train on everything. This idea that, yes, there can exist a dense sampling of everything you might ever want to do, everything you might ever want to know. So, I mean, I disagree with that, because imagine you were training at a LAMS 10 years ago, and you're trying to use them now. They're not going to know about the programming languages that you're using. They're not going to know about all the libraries and so on. They're certainly going to seem much less intelligent, just because there's this gap in your knowledge. The world is changing all the time. And you could say, well, but what if you just retrain the model on freshly scrapped data every single day? I mean, sure, you can do that, and this will address some of these problems. But still, it's likely that at some point you will come up with problems that are actually novel, problems that don't have a solution on the internet. And that's where you need intelligence, right? And I'm actually quite confident that at some point in the future, maybe in the near future, we'll be able to create a system that can actually address this issue of novelty, that can actually take what it knows and recombine it in truly original ways to address completely new problems. Once we have a system like this, we can start developing new science, for instance. Like one of the things you cannot do with LLMs today is develop new science, right? Because the best they can do is speak back to you some interpolation of something they've read online, right? They're not going to set you on the way to some grand discovery.
[00:37:36] Speaker 2: Again, the devil's advocate on that. I agree that the creativity and the reasoning comes from the prompter. And because we anthropomorphize the models and we miscredit the role of the human. But still, inside that addressable space in the LLM with a human supervisor, I'm sure we can creatively explore the convex hull of what is known, perhaps not create new things.
[00:37:58] Francois Chollet: Sure, you can do that. And that's a process, you know, as you said, to be driven by you, the human, because you are going to be the judge of what's interesting versus what's nonsense. And without this sort of external verification, it's difficult to make good use of LLMs. In general, you know, I think that should be the thing you always keep in mind when using LLMs, is that they are very good at making useful suggestions, but you should never blindly trust the suggestions they make, especially if it's something like code, right? You should always use it as a starting point, but verify, like make sure it's actually correct. LLMs are very good at putting you in the right direction. But they're not very good at putting exactly correct answers.
[00:38:49] Speaker 2: And that's why if we look at all of the successful implementations of LLMs or applications, they always have a human supervisor in the loop.
[00:38:58] Francois Chollet: LLMs, yes. Or it could also be an external verifier, like sometimes the verification process is something that you can delegate to a symbolic system.
[00:39:09] Speaker 2: So now is a great segue for intelligence. Now, fans of the show will know Yannick and I have already made about eight hours of content on your measure of intelligence paper back in the day. We poured through it and it's fascinating. But could you just briefly introduce it now, just to give a refresher?
[00:39:23] Francois Chollet: LLMs: Sure. So my definition of intelligence is skill acquisition efficiency. So it's this idea that intelligence is separate from skill. So if you have a benchmark that just measures the skill of an AI at something, it is not a benchmark of intelligence. It is always possible to score high without actually displaying any intelligence whatsoever. If you want to actually measure intelligence, you have to look at how efficiently the system acquires new skills given a limited amount of data. So you have to control in particular for the data that the system has access to, which usually takes two forms. It can take the form of priors, like the information that the system only has access to before it's looking at your benchmark, and then experience, which is the amount of information that the system will extract from the task, the benchmark that you're giving to it. And so if you control for priors, you control for experience, and you measure skill, then you have some measure of skill acquisition efficiency, the information efficiency of the acquisition of high performance on a novel task. And that's something that I've tried to turn into a concrete benchmark, and that was the ArcGIS dataset.
[00:40:43] Speaker 2: Just a quick point on that. Is one of the potential issues with the measure of intelligence is that it's non-computable because we can't represent the domain of all possible tasks?
[00:40:54] Francois Chollet: Sure. So in the paper, I had this formalization of my measure of intelligence, and it is non-computable. Its purpose is not to be used as a practical tool, like you're not going to actually want to run this equation on a system and get a number out of it. It is a formalism that's useful to think about the problem of intelligence precisely, right? It's a cognitive device. It's not a practical device.
[00:41:32] Speaker 2: Very cool. So there's this wonderful figure, which will show up on the screen now, which is you describe the intelligence system as being a thing which produces skill programs while adapting to novelty. But one thing I was wondering, though, is you're talking about it as a kind of meta-learning prior. And do humans come with the meta-learning prior baked in? Or is that something we also learn? And should it be the same for AI systems?
[00:41:57] Francois Chollet: Yeah. So that's a very important question. So intelligence is, it's not skill. It's a kind of meta skill. It is the skills through which you acquire new skills. And is this meta skill also something that is acquired through experience? Or is it something that you're born with, that comes hard-coded in your brain, so by evolution, presumably? I think the answer is that it is both. I think you are born intelligent. So you are born with this skill acquisition mechanism. But this skill acquisition mechanism does not operate in a vacuum. It actually needs – it's composed of two bits, right? There's the synthesis engine, which takes a look at a new situation, a new task, and will try to combine existing parts, existing abstractions, into a model for that task, for that domain. And there's the abstraction engine bit, which looks at the models that we have produced so far, looks basically at the information available, and will try to produce reusable abstractions to be added back to the library that's going to be used by the synthesis engine the next time around. And this library, of course, is acquired through experience. And the better your library of abstraction becomes, the more effective you are at synthesis, the more effective you are at acquiring new skills efficiently. So I believe that this sort of macro-level architecture of intelligence is something that you are born with. But as you use it throughout your lifetime, you are getting better at it. You are polishing it. So you're not acquiring intelligence as a skill from scratch, but you are polishing it. Another mechanism through which I think you are polishing it is that the synthesis mechanism is probably incorporating learned components. So that synthesis is itself, synthesis from existing abstractions, is itself a skill, and you are getting better at it as you use it. So I think, for instance, a 15-year-old is going to get better, is going to be better at skill acquisition than a 10-year-old.
[00:44:26] Speaker 2: This is really interesting because, in a way, you're combining rationalism, nativism with empiricism. Because I think you're saying that there is the creation of de novo skill programs that are not just compositions of the fundamental ones. But the broader question as well is, we do this library learning. So children develop, they finesse, they refine, they build these abstractions. And surely there must be some trade-off with complexification, because you don't want the library to be too big. No, because then you can't do search with it anymore. So is there some kind of pruning or does it converge on a certain side? Is that the reason why our cognitive development seems to kind of plateau at a certain point?
[00:45:11] Francois Chollet: That's quite possible. You know, that's actually a very deep question. It's also very practical, I think, to building an AGI. So your AGI is going to have this library for usable primitives. Do you want to expand the size of this library indefinitely? Or do you want to cap it at some number? Like you want, at most, one million programs in it or something like that. So clearly, our ability to efficiently acquire new skills, our intelligence, does not improve over our lifetime in an unbounded fashion. It seems to peak relatively early on. I think there's actually a trade-off here, which is that your raw brain power, like for instance, the amount of information that you can integrate in your mind at any given point, kind of trends down as you age inevitably. But the quality of the abstractions that you work with and also your intuition for how to combine them, so the learned components of the synthesis engine, they do get polished over time. They do get better over time. So you have this kind of factor that makes you smarter and this factor that makes you dumber. You know, empirically, I think intelligence probably peaks in your early 20s. That's when you're the most efficient in acquiring new skills. But then again, you know, it depends. I think a higher level cognition peaks probably in your early 20s. But there are things that you should be learning earlier than that, right? Anything. So, you know, I mentioned like cognition is built layer by layer. Each layer is built on top of the previous one. The lower layers in the stack, they crystallize, they're set in stone relatively early, before 15 typically. So if you want to acquire any kind of skill that deals with low-level sensoriometer primitives, like you want to get really good at playing an instrument, you want to get really good at singing, you want to acquire a native accent in some language, you should do it before you're 15 typically.
[00:47:34] Speaker 2: Yes. I mean, on the abstractions, you could argue that it's kind of limited by a computational bound, or you could argue that it's just converging towards universal abstractions. But I wanted to comment on what you just said. Personally, I think knowledge is very important. So I've spent years doing this thing with Keith Duggar, who's one of the smartest people I know in the world. He did his PhD at MIT, and he's taught me how to be smart. Just the way he thinks about things, he has, I've reprogrammed my brain, and I'd much rather be like this than go back to my early 20s. Require better abstractions. Much better abstractions. But then again, I can give counter examples. I've spoken with, I don't want to mention any names, but sometimes professors who lean too much on their knowledge and not their fluid intelligence, they can seem quite entrenched. And so too much knowledge and not enough fluid intelligence can be a bad thing as well. There seems to be some kind of optimal balance.
[00:48:32] Francois Chollet: So it depends whether you're relying on, it depends on whether you believe you already have the answers to the questions, or whether you believe you have templates that you can use to get the answers. Gaining better templates for problem solving, or even for generic learning, that makes you more intelligent. That's one of the points of education. Like if you learn math, you learn physics, you learn programming, now you have all these meta-level templates for problem solving that make you more effective at problem solving, that even make you more effective at learning. I think at 20, I was much more effective, both in the methods I was using in my approach at language learning, than I would have been at 12. Even though at 12, I had more brain plasticity and more memory, it was easier to retain things, but I did not have the right tool set, pretty much. And that tool set is very much required. If you think you already have all the answers, then you're not going to be looking to create anything new, or looking for new information. And maybe that's the pitfall that some intellectuals kind of fall into. They think they've got everything figured out, so they don't need to search any further. But instead, if you're just carefully collecting and curating ways to solve problems, or like interesting ideas, and you're not quite sure how you're going to use them yet, but they sound useful, they sound intriguing. And then, you're faced with something new, you're going to look into your library, look for the best sort of like thing to connect it to. That's how you get insights. Like, if you keep all these things in mind, and then you come across something new, instead of ignoring it because you already know everything, or you think you know everything, you're going to try to connect it with the solar flag. Things in your mind that are waiting for the click, you know. And that's how you get big Eureka moments, you know.
[00:50:50] Speaker 2: Yes, the templates become activated. But I can give an example, actually, with your Measure of Intelligence paper. I spent weeks studying that paper. I read it so carefully and so deeply. And I remember there were a lot of ideas in it that I struggled with. And now I could read it, I could just flick through it, and I just got it. And actually, it's the same with many other papers, because you learn these abstractions. And on MLST, we've always focused on the abstractions. But maybe there's a cost to that, because I'm just a cognitive path where my brain is just lighting up. And then, and I understand it, but maybe there's something else I'm missing.
[00:51:24] Francois Chollet: Sure. I think, you know, by sort of like abstracting away the details, you're able to focus on the bigger picture, the third or the fourth time that you're reading it. And then you kind of find something new at a higher level. Yeah, you know, you don't get stuck in the details.
[00:51:44] Speaker 2: So at the end of the Measure of Intelligence paper, it's from '99, right, you introduced the ARC Challenge, the Abstraction and Reasoning Corpus. Can you bring that in?
[00:51:54] Francois Chollet: Sure. So yeah, it's from 2019. The Abstraction and Reasoning Corpus, it's a dataset, a benchmark, that tries to capture the measure of intelligence that I outlined in a paper. So it's basically an IQ test for machines, but it's also intended to be easy for humans. It's a set of tasks that are reasoning tasks. So each task, you get a couple, like typically two to four demonstration examples, which are the combination of an input image and an output image. And the input image is basically a grid of colors. They're pretty small grids, like from five by five to 30 by 30. 30 by 30 is the largest. And so you're seeing some patterns in this input grid. And then you're told that it maps to a certain output grid with some other pattern. And so your job is to figure out what is the transformation, what is the program that goes from input to output. And you get a few pairs, input-output pairs like this, to learn this program on the fly. And then you are given a brand new input grid. And you must show that you've understood the program by producing yourself the corresponding output grid. And it's pretty easy for humans. For instance, the dataset is split into different subsets. There's a public training subset, which is generally easier. It's intended to demonstrate the sort of core knowledge priors that the tasks are built upon. So core knowledge is another important concept here. I mentioned the grids feature patterns. Well, these patterns must be referring to something, you know. And in order to build anything, you need building blocks. So these building blocks are core knowledge, which are sort of like these knowledge priors that all humans are expected to have mastered by age roughly four. So there are going to be things like objectness, like what is an object, basic geometry, like, you know, symmetries, rotations, and so on. But basic topology, like things being connected, agentness as well, like goal-directedness. So just these very simple core knowledge systems. And everything in the RKGI tasks is built upon these atoms of knowledge, right? And so the training subset is just intended to demonstrate what core knowledge looks like in case you want to apply a machine learning approach. And instead of, you know, hard coding core knowledge, you want to learn it from the data. Then there's a public validation subset, which is intended to be as difficult as the private test set. So it's intended for you to test your solutions and see what score you get. And then there's the private test set, which is what we actually evaluate the competition on Kaggle. And it's pretty easy for humans because we had the private test set verified by two people. And each one scored 97 to 98 percent. So there are only 100 tasks in the private test set. So it means they actually solved with no prior exposure, 97 to 98 tasks out of 100. And together they get to 100, right? So the tasks that each did not solve actually had no overlap. So that shows that if you're a smart human, you should be able to do pretty much every task in the data set. And it turns out this data set is tremendously difficult for AI systems. And so I released this in 2019. Today, the state of the art was actually achieved earlier this morning. It's 46 percent, right?
[00:56:21] Speaker 2: Yes. Nice one, Jack and team. Yes, Mohamed, Jack and Michael. Congratulations, guys. Yeah, congrats.
[00:56:29] Francois Chollet: So, oh, by the way, there's actually an approach that's not public, but that has a proof of existence, which should do 49 percent at least. 49 percent is what you get if you merely ensemble every entry that was made in the 2020 iteration of the competition.
[00:56:55] Speaker 2: Wow. Why has nobody done that then?
[00:56:58] Francois Chollet: Well, it's not exactly apples to apples, right? Because we are talking about hundreds of submissions. Each submission was using some slightly different tweak on brute force program search, but you have hundreds of them, and each one was consuming some number of hours of compute. So, even if you had all the notebooks for all these submissions, and you put them into one mega notebook, it would actually take too long to run it in the competition, right? So, in a way, you are, by assembling the submissions, you are, in a way, scaling up brute force program search to more compute, and you're getting better results. You know, in the limits, if you had infinite compute, you should be able to solve ARC purely via brute force program search, right? It is definitely possible to produce domain-specific languages that describe ARC transformations in a relatively concise manner, in a manner so concise that you would never need more than, like, 40 different transformations to express a solution program. And you're going to have, like, you know, 200 primitives in your DSL. While just finding every possible program that's 40 operations deep out of a DSL of 200, if you had infinite compute, you could definitely do that, right?
[00:58:34] Speaker 2: Well, there's an interesting discussion point on that. I think I raised this with Ryan and Jack, which is that even if you did have an infinite amount of computation, there's still a selection problem, because you could select based on complexity, for example.
[00:58:49] Francois Chollet: Selection is comparatively easy, because you can simply, so for, let's say you have infinite compute, so for each program you get, well, technically you get an infinite number of matches, right? But let's say, realistically, you get, like, 10. You can simply pick the simplest one, like, the shortest one.
[00:59:09] Speaker 3: But is the simplest one a good heuristic? Empirically, yeah, it seems to be.
[00:59:16] Francois Chollet: Occam's razor, it seems to work in practice.
[00:59:19] Speaker 2: Because the other potential weakness is, I mean, you mentioned Elisabeth Spelke, and folks at home, you should read, she's from Harvard, she's a professor of psychology, and, you know, she came up with those core knowledge priors. But I think you're coming at this very much from the psychology school of thought, which is that we should understand the psychology of the human mind and build AI around that. Is that fair?
[00:59:42] Francois Chollet: Yeah, so I'm a little bit cautious about the idea that AI should try to emulate human cognition. I think we don't really understand enough about the human mind for that understanding to be a useful guide when it comes to creating AI. So I have my own ideas about how intelligence might work and how to create some software version of it. But it's only partially derived from, you know, introspection and looking at people.
[01:00:17] Speaker 2: Interesting. And the reason I said it might be a potential weakness is, let's say we select the lowest complexity program, we have an infinite amount of computation, we do the program synthesis, and then we assume that because all of the generalization space would be in the kind of compositional closure of the priors that we start with, then it will work. Yes. But that is an assumption.
[01:00:40] Francois Chollet: Sure, but it's a reasonable assumption. You could also train a system to judge whether a given program is likely to generalize or not. It will use length on the DSL as one of its features, but not the only feature.
[01:00:57] Speaker 2: One of the other really important things about the Arc Challenge is task diversity. And the reason we need task diversity, I think if I understand correctly, there are about 900 tasks in the original Arc Challenge. Now, you spoke about developer-aware generalization. What is it and why is it so important?
[01:01:15] Francois Chollet: Right. So, developer-aware generalization is deciding that if generalization is the ability to adapt to things that are different from the things you've experienced before, then it kind of matters what a frame of reference you're taking. Are you taking the frame of reference of the agent? Does it matter if this agent is able to adapt to things that it has not in person experienced before? Or do you take the frame of reference of the developer of the agent? Are you trying to get the agent to adapt to things that the developer of the system could not have anticipated? And I think the correct frame of reference is the frame of the developer. Because otherwise, what you end up with is the developer is going to build into the system, either via hard coding or via pre-training, the right kind of models and data so that the agent is going to be capable of performing very well, but without actually demonstrating any kind of generalization, just by leveraging the prior knowledge that is built into it.
[01:02:31] Speaker 2: The current ARC benchmark, I just wondered if you could comment on its weaknesses. But just to cite a couple of examples, Melanie Mitchell put a piece out saying that it should be a moving benchmark and Dilip George put an interesting piece out saying that it might be perceptually entangled in a way that we might not want. So what are your reflections on the potential weaknesses of it?
[01:02:53] Francois Chollet: Sure. I mean, ARC-HDI is the first attempt at capturing my measure of intelligence. It's a pretty crude attempt because, of course, you know, I'm technically limited in what I can produce. And it has, of course, pretty strong limitations. So I think the first limitation is that it might be falling short of its goals in terms of how much diversity there is into it and how much novelty. So some tasks in version one of RKGI, because by the way, that's going to be version two as well. So some tasks are actually very close to each other. There is some redundancy. And they might also be very close to things that exist online, some of them. And which might be actually one of the reasons why you see LLMs able to solve some percentage of ARC. Maybe they're actually doing it because they've seen similar things in their training data. So I think that's the main flaw. And so, yeah, so Melanie Mitchell mentioned, you know, a benchmark like this should be a moving benchmark. I actually completely agree. I think, ultimately, to measure intelligence, you're going to want not a static data set. You're going to want a task generation process. And you're going to ask it for a new task. It's going to be capable of giving you something that's very unique, very different, handcrafted just for you. It's going to give it to you. And then it might try, for instance, to measure how data efficient you are in solving the task. So it's first going to give you maybe one or two examples. It's going to challenge you to figure it out. And if you cannot, then maybe it can give you a couple more and then a couple more. And that way, so the reason why something like this would be interesting is that you can start benchmarking approaches that have very low intelligence, like, for instance, curve-fitting via gradient descent. Technically, curve-fitting via gradient descent is a kind of program synthesis. So you should be able to apply it on arc. The main reason why you cannot is because for each task, you only have a couple examples and the space is not interpretive. So it doesn't really work. Curve-fitting doesn't really work. But if for each task you had 1,000 examples, for instance, it could be conceivable that you could fit a curve that will generalize to novel inputs. Well, if you have this dynamic tasks generation and example generation system, then you can start benchmarking techniques like this. And it will be interesting because then you can start grading on the same scale, fitting a transformer for a gradient descent versus program search, brute force program search, heuristic program search, deep learning guided program search, and so on. And then you can start seeing very concretely what it means to be more intelligent, what it means to be more data efficient in your ability to produce generalization. And the other thing that you can start creating when you have this sort of dynamic benchmark generation process is you can start grading how much generalization power different systems have. So you can measure how data efficient your synthesis, your model synthesis processes, but also how much generalization power the output model has because you can challenge the test taker with different inputs that will be more or less difficult. So you start at the lowest level by demonstrating a task with very few examples, and let's say for instance, very simple test inputs. And as you go further, you're going to add more examples to kind of refine the constraints of the problem, but you're also going to send the test taker much more difficult examples of the problem to kind of test how far it can generalize or how complex the models it can produce can be. I love this idea of a generative arc. And I can see… Ultimately arc will be a generative benchmark, yes.
[01:07:23] Speaker 2: And I guess that is similar to the way things work in the world. So there's a generative function of the universe. It produces the kaleidoscope, and we go backwards from the kaleidoscope, and we go backwards from the kaleidoscope to the generative function. But knowing… This is the thing, like in this intelligence process, we need to know what the priors are, and the priors must be either fundamental or deducible from the fundamental priors that were there in the first place.
[01:07:47] Francois Chollet: Yes, that's right. And, you know, I think the big pitfall to avoid here is… And that's actually the reason why I did not release arc 1 as a generative benchmark. This was, by the way, the first direction I investigated when I was trying to come up with the thing that eventually became arc. I was thinking that I would create a program synthesis benchmark where the test examples would be created by some kind of master program. And I investigated many different directions, things like cellular automata and so on. Like, for instance, you're given the output of cellular automata and you need to reverse engineer the rules that produce it, that sort of thing. And ultimately, so I did not go with that for several reasons. So one reason is that I wanted the tasks to be easy, intuitive for humans. And that's actually difficult to achieve in this way. I also wanted to avoid formalizing too much of the core knowledge because any formal formulation of core knowledge might be losing something, might be missing something important that you cannot really put into words, but that is there. And also because, and that's very important, if you just write down one master program and let it generate your data set, then the complexity of the tasks in your data set is fundamentally limited by the complexity of the master program. And so, as someone trying to solve the benchmark, the only thing I have to do is reverse engineer the master program. And then I can use it, for instance, to generate infinitely many tasks that I can fit, I could fit a curve to, or I just hard-code the system that already understand, already understands how this master generative function behaves and can anticipate it, right? So I can hack the benchmark. And that's why, ultimately, I ended up with this model where every task in Arc 1 is actually handcrafted by me in this case. And I think, you know, that's touching on something that is subtle but very important, which is that I'm a big believer in the idea that the solution to the problem of intelligence must be co-evolved with the challenge, the benchmark. Like, the benchmark should be a tool that points researchers in the right direction, that is asking the right questions. But to ask these questions, that is, in itself, that is a complex problem. So I think if you were capable of coming up with a master program that generates a test of intelligence that is rich enough, complex enough, novel enough, interesting enough to be true test of intelligence, coming up with that program is as hard as coming up with AGI. It is, in fact, the same kind of thing. You basically need AGI to create the challenge that AGI is a solution to, right?
[01:11:22] Speaker 2: How explainable should these programs be? I mean, as an example, you could explain to me the reason why you got a coffee this morning or something like that, and I would understand. But AGI, presumably, would be able to build models for things that we don't understand, like economics or financial markets or something like that. It would be an inscrutable mess. So how could that work?
[01:11:44] Francois Chollet: Well, yeah. So AGI would be capable of approaching a new problem, a new task, a new domain, and very quickly and very efficiently from very little data, coming up with a model of that thing. And that model should be predictive. So it should be able to anticipate the evolution of the system it's looking at in the future. I think it should also be causal. So you should be able to use it to plan towards goals. Like you can imagine, like, I have this model of the economy, for instance. I want to get it towards this state. Here are the interventions I can make that will actually causally lead to the desired state. So it should be a a predictive model, a causal model that you can use to sort of, like, simulate the behavior of the system. And I think that actually makes it inherently interpretable. You don't need to explain how the model works. You can just show it in action. So one example is, let's say we are looking at ARC. We're not looking at the economy anymore. We're looking at a task in ARC-AGI. Currently, most of the program synthesis approaches, they are looking for input to output transformation programs. And if you're not reading the contents of the program, then one way you can interpret them is just running them on the test input and seeing what you get. I think the kind of a model that an actual ARC-AGI would produce in this case, they would not just be input to output transformations. They would explain the contents of the task. So there would be programs that you could use, for instance, to produce new instances of the task, right? Or even to go from output to input when applicable instead of just going from input to output. And such a kind of program is extremely interpretable because you can just ask for new examples and then look at them, right?
[01:13:55] Speaker 2: Okay. So I can imagine there might be some kind of mediated interface which does encapsulation, you know, and we understand the interface. But maybe we should think about this the other way. So when I've spoken to AI researchers, I've gone through ARC challenges together with them. And they are trying to look at their introspection. So they're saying, I'm looking at this problem and I know it's got something to do with color. I know it's got something to do with counting. And then they run the program in their mind and they say one, two, three, no, that doesn't work, that doesn't work. And then they try and formalize that into some kind of an approach. Do you think that the way we introspect is a useful way to
[01:14:35] Francois Chollet: build a solution for the ARC challenge? I think so. I think introspection is very effective when it comes to getting some idea of how your mind handles system two thinking. I think it's not very effective for system one because system one is inherently not something you have direct access to. It happens like unconsciously, instantly in parts of your brain that you're not directly observing via your unconsciousness. But system two is not like that. System two is very deliberate. It's very slow, very low bandwidth. There's only a few things happening at any given time. It's very introspectable. So I think what you're describing is this idea that you're looking at a new task, you're trying to describe it via a set of properties in your mind and then you're coming up with a small number of different hypotheses about what could be some programs that match these descriptive constraints and then you're trying to execute them in your mind to check that your intuition is correct. I mean that's kind of called system two thinking, right? I think that's basically how program synthesis works in the brain. But what's not mentioned here is all the system one parts that are in support of this system two thinking. I'm really a big believer in the fact that no cognitive process in the human mind is pure system one or pure system two. Everything is a mix of both. So even when you're doing things that seem to be extremely reasoning heavy like solving ARC or doing math or playing chess or something, there's actually a ton of pattern cognition and intuition going on. You're just not noticing it, right? And it takes the form, for instance, the fact that you're only looking at maybe two to four different possible hypotheses for your ARC task. In reality, the space of potential programs is immense. There's like hundreds of thousands of possible programs you could be looking at. But no, you're only looking at like two or three. And what's doing this reduction is your intuition, right? Or pattern cognition. It is system one. And I think the reverse is also true. Even when you're looking at cognitive processes that seem to be extremely system one, like perception, for instance, there's quite a bit of system two elements. I think perception, for instance, is very, very compositional. It's not pure input to output matching the way deep learning model would do it. There's actually quite a bit of generalization via composition that happens. And that is actually system two.
[01:17:28] Speaker 2: I really agree that there's some strange entanglement between the two systems. I mean, there was one task where color certainly had something to do with it. You can almost visualize it as a SQL query, you know, group by the colors, select counts, order and descending order, skip one, take three, you know, that kind of thing. And it's similar to abduction in the sense that there's this perceptual inference happening to this set of hypotheses. And then at some point, I'm doing some post hoc verification, which really does seem like system two, but the whole thing seems to work together in a symphony.
[01:18:04] Francois Chollet: Yes. And they are so intermingled that maybe saying that we're looking at system one plus system two, or system one versus system, maybe that's the wrong framing. Maybe what we are looking for is actually a different kind of data structure or substrate that underlies cognition that is inherently both system one and system two. But yeah, what you're doing in your mind, as you describe, is basically program synthesis. But that program synthesis is very, very heavily guided by perceptual primitives and just by intuition about what you feel might be the correct solution.
[01:18:54] Speaker 2: So when we implement programming synthesis in a computer, I mean, we could just do a naive, greedy, brute force search, and then we have this combinatorial explosion. Tell me about that.
[01:19:06] Francois Chollet: Right. The primary obstacle that you run into if you're doing program synthesis is that at a very high level, you have a language. Typically, it's domain-specific because that's a shortcut. So it's not like a language like Python. It's a language that's a little bit more specialized than that. And you have a bunch of functions in this language and you use them to create programs. A program is basically just a composition of these functions into something. Like in the case of Arc, it's typically going to be a program that takes as input an input grid and produces the corresponding output grid. And the way you do program synthesis is that you try a bunch of compositions of these functions. And for each one, each program, you're going to run it in practice. So run it on a target input, look at the corresponding output, and check whether that output is the output you expected. And you do that across all the examples that you have available, across all the programs that you can come up to. And then you look at which are the programs that actually match, actually produce the correct outputs across all the examples, right? And maybe you have one such program that's a match. Maybe you have 10 and then you must make a selection. You must try to guess which one is more likely to journalize, and typically it's going to be the shorter one. But the huge bottleneck that you face is that the size of program space, like the number of programs you have to look at, it grows combinatorially with the number of building blocks in your DSL, but also with the size of the program. So if you're looking for programs that involve, for instance, 40 different function cores, you're looking at a very, very large space. So you could not possibly iterate over every individual element of that space. So that's the combinatorial explosion bottleneck. And humans clearly do not suffer from this problem. Like you described this introspection process when you're looking at an hard task. And you're only executing a very small number of programs, step by step, and you're only really executing them to verify that they're actually correct. You apparently rely on an extremely powerful kind of intuition that is not entirely reliable, which is why you still have to perform this verification step. It does not give you the exact right answer, kind of like an LLM. I believe what LLMs are doing is actually the same kind of cognitive process. It's better matching, right? It's intuition. So you still have to verify, but it's directionally correct. It's doing a really, really good job at sifting through this almost infinite space of programs and reducing it to just a few possibilities. And I think that's actually the really hard part in cognition as this reduction process.
[01:22:16] Speaker 2: Mark Miller: So there are some interesting approaches to ARC. So I spoke to Jack Cole and Ryan Greenblatt, and then there's the Dreamcoder type approach. Maybe we should start with Dreamcoder, because you know, Tannenbaum's group at MIT, you know, Kevin Ellis was the author of the Dreamcoder paper, and he's actually working with Zena Tavares building a lab called BASIS. I spoke with them the other day, and they are very much focused on the ARC challenge and they're implementing a lot of MIT's work on the ARC challenge, which is really cool. But I guess like the elephant in the room is that Dreamcoder, and please introduce what that is. It's a really elegant, beautiful approach to ARC, but unfortunately, it doesn't work very well yet.
[01:22:58] Francois Chollet: Mark Miller: Right. So it's been a while since I read the paper, but my recollection of Dreamcoder is that it's a program synthesis technique that tries to create a bank of reusable primitives that is actually developing kind of like as it gets used to solve new tasks. And I think that's a fundamentally right idea. And it's probably the only system in which I've seen this idea in action, this idea of abstraction generation, that you're going to use your experience and your problem-solving experience to try to abstract away functions that you're going to put in your DSL for reuse later. I also remember it had this wake sleep cycle. So I think that was to train. So the synthesis component that they had leveraged deep learning, and they were training the deep learning model via the wake sleep setting. Can you correct me?
[01:24:11] Speaker 2: Mark Miller: Yeah. So they had a neural network generative model for programs, and then they had a sleep phase where they would retrain the generative model and something called an abstraction sleep, where they would kind of combine together programs that work very well and discard ones that weren't in use very well, you know, that kind of thing. Yeah.
[01:24:28] Francois Chollet: Mark Miller: Yeah, that's what I usually call an abstraction generation. Mark Miller: Yes. Mark Miller: Like I see intelligence as having two critical components, synthesis, where you're taking your existing building blocks and assembling them, composing them together to create a program that matches the situation at hand, right? And then there's abstraction generation, where you're looking back on the models you generated, or just the data you got about the world, and you're trying to mine it to extract reusable building blocks that you're sending to your memory, where you can reuse them the next time around. Mark Miller: And yeah, and Dreamcoder was actually trying to implement these two components, which I think is really the right direction. So it's very promising.
[01:25:15] Speaker 2: Mark Miller: So what about Jack Cole? What do you think of his solution? And that's the Minds.ai group on the leaderboard.
[01:25:22] Francois Chollet: Mark Miller: Right. So what they're doing is basically they're doing an LLM. So it's an encoder decoder model. I think it's based on T5, on the T5 architecture. Mark Miller: They are pre-training on a large code and math data set, because apparently it helps, which, you know, on its own it's an interesting finding. And then they are further fine-tuning it on millions of generated Arc-like tasks. So they're producing, programmatically, lots of tasks that look like Arc tasks, and they're fine-tuning the model on it. Mark Miller: When I say fine-tuning, so they're basically for each task, they're tokenizing the task description. They're reducing it to a sequence of tokens. Mark Miller: So that's actually pretty easy, feeding that into the LLM, and they're expecting to produce the output grid in tokenized form. And then they're decoding that back out. Mark Miller: And so just the setup I described on its own, as it turns out, does not perform very well. It does like a few percent. Mark Miller: But they added a really powerful twist, which is that they're doing test-time fine-tuning. So they're taking their pre-trained LLM, and at inference time, on each new task, they're producing a fine-tuned version of the LLM. Mark Miller: So they're doing that by producing a variance of the task by applying a bunch of randomized hard-code transformations, basically. Mark Miller: And they're turning that into a sort of like mini-trained dataset. They're fine-tuning the LLM on that trained dataset, and then they're applying that fine-tuned model on the test input and producing a test output. Mark Miller: And if you think about it, so just this test-time fine-tuning trick is actually getting their model from a very, very low performance, like a small percentage of tasks solved, to, you know, over 40%, which is very impressive. Mark Miller: So if you zoom out by a lot, I think what they're doing is not that different from program search. Mark Miller: It's basically at a different point on the spectrum. Mark Miller: So you can think of program search as a spectrum with two axes. Mark Miller: One axis is like the richness and complexity of your DSL, of your bank of reusable building blocks. Mark Miller: And the other axis is the richness and complexity of the ways that you recombine these building blocks. Mark Miller: And discrete program search typically is going to operate over a very, very small DSL. Mark Miller: A DSL with maybe 100 to 500 primitive functions in it. Mark Miller: But it's going to recombine them in very complex ways to get programs that may have depths 20, for instance. Mark Miller: And what Jack Cole is doing is basically turning his LLM into a database of reusable vector functions and it has millions of it. Mark Miller: So it's very, very broad, very large DSL in a way. Mark Miller: And then test time fine tuning is using gradient descent to recombine these primitives into a new program. Mark Miller: And by the way, the fact that you have this huge performance jump from not using test time finding to using test time finding really highlights empirically the fact that recombination, program search, is a critical component of intelligence. Mark Miller: If you're just doing static inference, you're not doing any sort of recombination. Mark Miller: Or if you're doing it, it must be some form of in-context learning, so basically using a memorized recombination program. Mark Miller: If you're only doing static inference, you basically do not display much intelligence at all. Mark Miller: If you're doing recombination via test time fine tuning, then you are starting to implement the synthesis component of intelligence that I described. Mark Miller: And the problem is that gradient descent is a very weak, very data inefficient way of doing synthesis. Mark Miller: It is in fact a wrong paradigm. Mark Miller: And so what you get is that the resultant programs have a very shallow depth of recombination. Mark Miller: So on the programs in this spectrum, the Mind's AI solution is this point where they're really maxing out on the richness of the DSL axis, but they're very, very low on the depth of recombination axis. Mark Miller: Whereas discrete program search, as it's usually implemented, is on the complete other side of the spectrum where you have a very, very small, very concise DSL, but very sophisticated recombination. Mark Miller: And intuitively, my guess is that what makes human intelligence special is that it's not at either end of the spectrum. Mark Miller: It's somewhere in between. Mark Miller: You have access to a very large, very rich bank of abstractions, of ideas and patterns of thought. Mark Miller: But you're also capable of recombining them on the fly to a very meaningful degree. Mark Miller: You're not doing test time fine tuning in your brain when you're coming up with novel ideas. Mark Miller: You're not doing gradient descent at all. Mark Miller: You are doing some form of discrete program search, but you're doing it on top of this very, very rich bank of primitives. Mark Miller: And that enables you to solve any arc problem pretty much within seconds.
[01:31:17] Speaker 2: Mark Miller: I remember reading your Deep Learning with Python book many years ago, and you were talking about the perils of fine tuning. Mark Miller: You have to have the learning rate quite low because you might damage those representations in the base model. Mark Miller: And when I spoke with Jack, he said that I'm not sure how much of it I should say publicly, but he encoded the fine tuning in a kind of language which would reinforce the existing manifold of the model. Mark Miller: So, you know, he was kind of like saying, I want to use it as a foundation model by transforming the descriptions in a way that reinforces it. Mark Miller: And also the active inference thing. Mark Miller: It's not active inference from a Fristonian point of view, but the test time inference. Mark Miller: That is moving away from what you said earlier, which is that it's not a retrieval system. Mark Miller: I'm actually now generating new compositions as part of the inference process. Mark Miller: That's correct.
[01:32:07] Francois Chollet: Mark Miller: It's not just a retrieval system. Mark Miller: When you're just doing static inference with an LLM, you're just prompting it, getting back some results. Mark Miller: That's pure retrieval. Mark Miller: And there's very little recombination happening. Mark Miller: Any recombination, if it happens, must go through one of these pre-learned recombination programs. Mark Miller: Some people say that in-context learning is leveraging some kind of hard-coded gradient descent algorithm that's latent in the LLM. Mark Miller: So maybe that's happening. Mark Miller: But whatever is happening, clearly, empirically, we can see that it doesn't work very well. Mark Miller: It doesn't adapt to novelty to a very meaningful extent. Mark Miller: But if you add test time fine tuning, then you are actually starting to do real recombination. Mark Miller: You're not just reapplying the programs stored in the LLM. Mark Miller: You are trying to modify them, to recombine them into something that's custom to the task at hand. Mark Miller: That's the process of intelligence. Mark Miller: I think, directionally, this is the right idea. Mark Miller: The only issue I have with it is that gradient descent is just a terrible way to do recombination. Mark Miller: I mean, it is a program synthesis algorithm, of course, right? Mark Miller: It's just the wrong approach.
[01:33:30] Speaker 2: Mark Miller: So in which case, I mean, I had this discussion with Jack when I interviewed him. Mark Miller: But while I accepted that it's a general method, of course, it's still domain specific in the sense that you have to come up with a prompting technique in order to fine tune the language model and so on. Mark Miller: But it could, in principle, be applied to, you know, fairly broad domains of problems. Mark Miller: But you would agree, though, that it goes against the spirit of your measure of intelligence.
[01:33:53] Francois Chollet: Mark Miller: So there are elements of the approach that are not quite in line with the spirit of the competition. Mark Miller: I think, in particular, the idea that he's going to pre-train his LALAM on millions of generated ARC tasks. Mark Miller: So this kind of makes me think of an attempt to anticipate what might be in the test data set, in the private test set. Mark Miller: Trying to generate as many tasks as possible and hope for collisions between what you've generated and what's actually going to be in the test set. Mark Miller: So that, of course, is trying to hack the benchmark via memorization. Mark Miller: It is not what we intended. Mark Miller: But, you know, ultimately it is up to us, the creators of the benchmark, to make sure that it cannot actually be hacked via memorization. Mark Miller: It is resistant to memorization. Mark Miller: If we did a bad job with that, because it's actually possible to anticipate what's in the private test set, then that's on us. Mark Miller: So in practice, by the way, I think we did a decent job. Mark Miller: Because that, so if you're not doing test time fine tuning, right, you're only getting a very low accuracy on the test set. Mark Miller: So it kind of shows that, yes, the test set is actually decently novel, right? Mark Miller: I think this is also shown by the fact that the best LLMs, Mark Miller: LLMs right now, if you're just doing direct prompting, they are doing, so the best one is cloud 3.5, it's doing 21%, right? Mark Miller: So it kind of implies that about 80% of the data set is decently novel, right? Mark Miller: Even if you use as your frame of reference the entirety of the internet, pretty much. Mark Miller: So that's actually a good sign. Mark Miller: But I think, you know, in Jack Cole's approach also, the overall approach is in the spirit of what I had in mind, because what it's doing is a form of program synthesis. Mark Miller: It's just that it's gathering via learning, it's gathering this enormous DSL, right? Mark Miller: And then it's doing very, very shallow combination and doing it with gradient descent, which I think is not what you should be doing. Mark Miller: But it ends up working, right? Mark Miller: So why not? Mark Miller: I agree with that.
[01:36:10] Speaker 2: Mark Miller: So actually in spirit, it's the right approach, but it's bottlenecked by stochastic gradient descent on a large language model. Mark Miller: But this is just an interesting segue though. Mark Miller: So again, in your deep learning with Python book, I think around chapter four, it's very pedagogical for folks who want to learn about machine learning. Mark Miller: You spoke about the leakage problem. Mark Miller: So, you know, the reason why we have a training set and we have a validation set and a test set is we don't want information to leak between the sets. Mark Miller: And it can happen inadvertently. Mark Miller: So for example, every time someone gets a new score on the ARC challenge, it's tested on the private set and that's information. Mark Miller: And people then modify their approach and it's as if they've seen something in the private set when they haven't seen it directly. Mark Miller: That's correct.
[01:36:49] Francois Chollet: Mark Miller: And what they've seen is that this approach they've tested performs better. Mark Miller: So now they've learned something about the contents of the private test set. Mark Miller: And yeah, like many folks, even, you know, folks who are machine learning experts, they have this misconception that you can only overfit if you are directly training on something, if you're using this training data. Mark Miller: That's not the case. Mark Miller: So, for instance, some years ago, people were doing neural architecture search to find new covenant architectures that would perform well on ImageNet. Mark Miller: They all used ImageNet as their reference. Mark Miller: And what they were doing is that they were mining this enormous space of possible architectures and selecting the ones that ended up performing well when trained on ImageNet. Mark Miller: And what you ended up with was an architecture that was, at the architecture level, overfit to the ImageNet evaluation set, right? Mark Miller: In general, if you have any sort of process that extracts information, even just a few bits of information from your evaluation data set and is re-injecting this information back into your model. Mark Miller: Even if it's not an automated process, even if it's just you looking at the results and then tweaking the approach by hand, you are starting gradually to overfit to what you're testing on. Mark Miller: Ultimately, this would happen with the private test set of ArcGIS. Mark Miller: It's just that because the only bit of information you get each time you submit something is your total score. Mark Miller: You're really not extracting many bits of information, right? Mark Miller: But eventually, because each participant can make three submissions a day, and there are many participants, eventually you would start overfitting. Mark Miller: Which is part of the reason why we're going to release version two of the data set. Mark Miller: And by the way, with version two data set, we're going to do something that is pretty important. Mark Miller: It should have been done earlier, probably. Mark Miller: Which is that we're going to have two private test sets, right? Mark Miller: There's going to be the one that we evaluate on when you submit and for which you see the score. Mark Miller: That's going to be the publicly double score. Mark Miller: But then we're also going to have an extra private one, which we're only going to evaluate your solution on at the end of the competition. Mark Miller: So that you're going to proceed through the competition by only getting the feedback signal that here's how well you perform on the first private test set, right? Mark Miller: But at the end, we're going to swap that out with the new one. Mark Miller: And then you're going to hope that your model will generalize to it.
[01:39:39] Speaker 2: Mark Miller: Hope being the operative word.
[01:39:42] Francois Chollet: Mark Miller: Yeah.
[01:39:43] Speaker 2: Mark Miller: Yeah. Mark Miller: I mean, now might be a good time to talk about our friend Ryan Greenblatt from Redwood Research. Mark Miller: I interviewed him. Mark Miller: He's a very, very smart guy. Mark Miller: I enjoyed talking with him. Mark Miller: And he did a kind of, you know, let's generate loads and loads of candidate programs with an LLM and then validate them in a kind of, he didn't want to call it a neurosymbolic framework, which I thought was curious. Mark Miller: But what do you think about his approach?
[01:40:08] Francois Chollet: Mark Miller: Yeah, I think that directionally that's the right approach. Mark Miller: You know, we kind of described how when you are solving an odd task, you are generating a small number of hypotheses and there are programs. And then you are actually executing them in your mind to verify whether they're correct or not, right? Mark Miller: It's the same kind of process where you're using a big intuition machine to produce candidate programs. Mark Miller: And these candidate programs, you're hoping that they're more or less right, but you're not sure. Mark Miller: Right? Mark Miller: So you still have to verify them via a system two type process, which, you know, in this case, that's going to be a code interpreter. Mark Miller: In your case, you're actually literally going to be executing your programs in your head. Mark Miller: I think that's basically, again, the same type of program search approach that we are seeing among the folks that are doing brute force program search or the MindZI approach. Mark Miller: It's just a different point on the program-centered spectrum, but it's the same kind of thing, right? Mark Miller: And in general, you know, I think the research direction that is the most promising to me is combining deep learning with discrete program search. Mark Miller: Maybe not quite what Ryan Greenblatt is doing, but the idea that you're going to use a deep learning model to guide program search so that it has to look at fewer candidate programs or sub-programs. Mark Miller: That is absolutely the right idea, right? Mark Miller: So I'm not surprised that this is getting good results. Mark Miller: And I do expect we're going to keep seeing even better results from variants of this approach. Mark Miller: So one thing I would change is instead of generating end-to-end Python programs and then just having a binary check, is it correct or not? Mark Miller: I think it might be more interesting. Mark Miller: It might be a better use of the LLM to generate modifiable graphs built on top of an arc-specific DSL. Mark Miller: And then instead of just checking whether the program is correct or not, you might want to do local discrete search around your candidate programs. Mark Miller: Basically use your candidate programs as seed points, like starting points for discrete search to reduce the amount of work that the discrete program search process has to do. Mark Miller: I keep repeating this, but you should use LLMs as a way to get you in the right direction, but you should never trust it to land in the exact right spot. Mark Miller: You should assume that where you land is probably close to the solution, but it's not exactly the solution. Mark Miller: You're still going to have some amount of manual work to do to go from the points, like for instance the candidate programs that the LLM produced, to the actual solution. Mark Miller: And that work has to be done by a system two type process.
[01:43:20] Speaker 2: Mark Miller: Yeah, I discussed this with him and he still is of the mind that they are doing emergent reasoning and given enough scale that the divergence between aleateric risk and epistemic risk will tend towards zero, which of course we don't agree with. Mark Miller: But I agree with you that wouldn't it be interesting if -- it's quite stateless, the system at the moment -- wouldn't it be interesting if there was some kind of program library and maybe retrieval augmented generation into the library? Mark Miller: He does have some interesting properties to the solution, which maybe you might want to comment on. Mark Miller: He's using vision. Mark Miller: He's doing some interesting prompting. Mark Miller: He's using self-reflection. Mark Miller: He's got like a candidate evaluation methodology. Mark Miller: What do you think about the overall thing? Mark Miller: Sure.
[01:44:01] Francois Chollet: Mark Miller: I think it's promising. Mark Miller: And yeah, you know, I think we're going to keep seeing variants of this that are going to perform well. Mark Miller: And this is the reason why you introduced the public track in the challenge. Mark Miller: You know, we kept hearing from folks saying, hey, I'm sure GPT-4-0 can do this. Mark Miller: We were like, well, maybe let's try it. Mark Miller: And of course, you cannot enter the private competition with GPT-4-0 because it would involve sending the private task data to the OpenAI server, so it would no longer be private. Mark Miller: So that's not possible. Mark Miller: So what we did is that we introduced an alternative test set, right, which we call semi-private. Mark Miller: So it's private in the sense that we're not publishing it, but it's also not quite private because it is being sent to OpenAI servers or Anthropic servers and so on. Mark Miller: And we did this because we want people like Ryan Greenblatt to show up and come up with some sophisticated chain of thought pipeline and prove us wrong if possible.
[01:45:06] Speaker 2: Mark Miller: And just before we leave this bit, are you aware of any other interesting approaches which perhaps aren't in the public domain but you know about?
[01:45:16] Francois Chollet: Mark Miller: So I am aware of various people making claims about their solutions to ARC, but I'm not aware of specific details. Mark Miller: They tend to be very secretive people. Mark Miller: And ultimately, I only trust what I see. Mark Miller: We have two tracks. Mark Miller: We have the private track on Kaggle with a lot of money on the line. Mark Miller: We have the public track where you can use any set of the art LLM you want. Mark Miller: If you have something, you should submit it to one of the two tracks. Mark Miller: If it's self-contained, then just go for the money. Mark Miller: If it uses an LLM API, then use the public track. Mark Miller: But if it's not on the leaderboard, I'm probably not going to believe you.
[01:45:58] Speaker 2: Mark Miller: Are the organizers worried that if someone did reach human level performance, Mark Miller: that it would be worth more than a million dollars if they sold it somewhere else?
[01:46:06] Francois Chollet: Mark Miller: Sure, maybe. Mark Miller: I doubt that's what's going to happen though, but maybe. Mark Miller: Interesting.
[01:46:15] Speaker 2: Mark Miller: And also, just on the economics of it, this is quite an open source approach, Mark Miller: but what do you think the incentives are? Mark Miller: Because if I already had a really good solution, if I was Jack Cole, I mean, Mark Miller: it's worth me spending six months on it because there's a good chance I might win. Mark Miller: Yes. Mark Miller: If I have nothing, then maybe I'll just have a quick look and see if there's anything, Mark Miller: but I won't invest much time versus start up a lab and put the money into that Mark Miller: and just hire good people to work on it.
[01:46:42] Francois Chollet: Mark Miller: So, of course, there's a big money prize. Mark Miller: But, you know, we don't expect that people are going to show up and sort of arc Mark Miller: because they want the money specifically. Mark Miller: The amount of money is not high enough that this is going to happen. Mark Miller: Instead, the money that we are putting on the line is just a signal to indicate that Mark Miller: this challenge matters and we are serious about it and we think it's important. Mark Miller: But ultimately, the real value that there is in submitting a solution and winning Mark Miller: is, I would say, a reputational value. Mark Miller: It's like you become the first person to crack this open challenge that's been open since 2019 Mark Miller: and presumably your solution is a big step forward towards AGI. Mark Miller: A lot of people are talking about arc right now. Mark Miller: If you were to solve it, you would definitely make headlines, right? Mark Miller: It would be a big deal. Mark Miller: So, for instance, you mentioned starting a lab. Mark Miller: Well, it would be a great opportunity to start a lab around your solution and then raise a bunch of money, right? Mark Miller: And you could do that just on the momentum generated by your winning entry.
[01:47:56] Speaker 2: Mark Miller: Could you comment on, you know, I had Sabaro Kambahati on recently and he's got this LLM modulo architecture, which is really interesting. Mark Miller: You know, basically you have this newer symbolic, you know, LLM generating ideas, critics. Mark Miller: What do you think about that general idea?
[01:48:12] Francois Chollet: Mark Miller: Yeah, I think that's generally the right approach. Mark Miller: Like you should not blindly trust the output of an LLM. Mark Miller: Instead, you should use it as an intuitive suggestion engine. Mark Miller: It will give you good candidates, but you should never just blindly believe that these candidates are exactly the correct solution that you're looking for. Mark Miller: You should verify. Mark Miller: And this is why LLM modulo some external verifier is so powerful. Mark Miller: It's because you are cutting through the combinator explosion problem that would come with trying to iteratively trying every possible solution. Mark Miller: But you're also not limited by the fact that LLMs are terrible at System 2, right? Mark Miller: Because you still have this last mile verification and that's going to be done by a true System 2 solution.
[01:49:03] Speaker 2: Mark Miller: The architecture was really interesting because it was bi-directional as well. Mark Miller: So the outputs, you know, like the verifiers might give you yes, no, maybe, or some additional information and then the LLMs could be fine-tuned and so on. Mark Miller: But my read on it, though, is that it brutalizes it a little bit because the verifiers, of course, are very domain-specific. Mark Miller: And that seems to be slightly different to some of the solutions to the Arc challenge.
[01:49:27] Francois Chollet: Mark Miller: Yeah, it will tend to be domain-specific. Mark Miller: And also, it's not always the case that you're operating in a domain where there can be an external verifier, right? Mark Miller: Sometimes there can be, I think in particular, this is true with program synthesis from input-output pairs. Mark Miller: So in particular, this is true for Arc, in fact, because you know what output you have to expect given certain inputs. Mark Miller: And what you're producing can be, you're producing programs, so they can actually be executed, they can be verified. Mark Miller: For many other programs, you have no such guarantees, right?
[01:50:03] Speaker 2: Mark Miller: So moving on a tiny bit, agency. Mark Miller: Yes. Mark Miller: Now, I think of agency as being defined as a virtual partition of a system that has self-causation and intentionality allowing for the control of the future. Mark Miller: And I assume that it's a necessary condition for intelligence, and I know you don't because we spoke about this the other day. Mark Miller: But what do you think is the relationship between agency and intelligence?
[01:50:31] Francois Chollet: Mark Miller: Right. Mark Miller: Right. Mark Miller: Right. Mark Miller: So, you know, many people kind of treat agency embodiment intelligence as almost intentionable concepts. Mark Miller: I like to separate them out in my own model of the mind. Mark Miller: And the way I see it, intelligence is a tool that is used by an agent to accomplish goals. Mark Miller: But it is related to, but it is separate from your sensory motor space, for instance, or your ability to set goals. Mark Miller: And I think you can even separate it out from your world model. Mark Miller: So, I don't know if you're an RTS player, maybe? Mark Miller: Yes. Mark Miller: As in Command and Conquer, Warcraft. Mark Miller: Right. Mark Miller: Yes. Mark Miller: Warcraft, Warcraft. Mark Miller: Exactly. Mark Miller: So, all these games are RTS games. Mark Miller: And in an RTS game, well, you have, you know, units moving around and you can give them commands. Mark Miller: And you have a mini-map as well. Mark Miller: So, imagine that you're selecting a unit and you're right-clicking somewhere on the mini-map to tell the unit to go there. Mark Miller: Well, you can think of the mini-map as being a world model. Mark Miller: Like it's a simplified representation of the actual world of the game that captures key elements of structure. Mark Miller: Like where things are, typically, and where you are. Mark Miller: And when you're right-clicking the mini-map, you are specifying a goal. Mark Miller: And, well, in this metaphor, intelligence is going to be the path-finding algorithm. Mark Miller: It's taking in this world model, taking in this goal, which are externally provided, Mark Miller: And figuring out what is the correct sequence of actions for the agent to reach the goal. Mark Miller: Right? Mark Miller: Intelligence is about navigating future situation space. Mark Miller: It's about path-finding in future situation space. Mark Miller: And in this metaphor, you can see that intelligence is a tool. Mark Miller: It is not the agent. Mark Miller: The agent is made of many things, including a goal-setting mechanism. Mark Miller: You know, in this metaphor, it's played by you. Mark Miller: You are setting the goal. Mark Miller: It's made of a world model, which enables the agent to represent what the goal means, Mark Miller: and maybe simulate planning. Mark Miller: It's also going to be including a sensory motor space, like an action space, Mark Miller: And they can receive sensory feedback as well. Mark Miller: But the agent is the combination of all these things. Mark Miller: And they're all separate from intelligence. Mark Miller: Intelligence is basically just a way to take in information and turn it into an actionable model, Mark Miller: something that you can use for planning, right? Mark Miller: It's a way to convert information about the world into a model that can navigate possible evolutions of the world.
[01:53:52] Speaker 2: Mark Miller: I agree with everything you've just said. Mark Miller: I think the tension is, after speaking with people like Carl Friston, Mark Miller: you know, when we think about the physics of intelligence and, you know, Mark Miller: this epic particle system we live in with function dynamics and behavior and so on, Mark Miller: the agency and the intelligence, it's not explicit. Mark Miller: The world model isn't explicit. Mark Miller: So there seems to be something else going on, which is why, in many cases, Mark Miller: I think of agency and intelligence as being virtual properties rather than explicit physical properties. Mark Miller: That's not to say that we couldn't build an AI where everything is explicit, Mark Miller: because that would be useful. Mark Miller: We could build it in computers. Mark Miller: But there's always the tension of whether we think of the world as this complex simulation Mark Miller: of low level particles and nested agents. Mark Miller: I have cells which are agents and my heart is an agent and I'm an agent, Mark Miller: or whether it's explicit. Mark Miller: All right.
[01:54:39] Francois Chollet: Mark Miller: Well, I think in the first AGI that we're going to build, Mark Miller: these different components are going to be explicitly separated out in software Mark Miller: because that's simply the easiest way to get there. Mark Miller: At least it's my take on it. Mark Miller: The architecture is going to be explicit, yes.
[01:54:55] Speaker 2: Mark Miller: So you actually spoke about functional dynamics the other day, Mark Miller: which was music to my ears, obviously being a fan of the Prestonian worldview. Mark Miller: What's your take on that?
[01:55:06] Francois Chollet: Mark Miller: So to be honest with you, this is actually something I've been thinking about, Mark Miller: but I do not have very crisp ideas about it yet. Mark Miller: But it is my general intuition as to how the human mind performs programming. Mark Miller: So I think there are two scales, two levels at which the mind changes itself. Mark Miller: There's the long-term scale, which has to do with abstraction mining, Mark Miller: like abstraction generation and memory formation. Mark Miller: It has to do with neuroplasticity as well. Mark Miller: You are basically changing connections in your brain to store reusable programs.
[01:55:52] Speaker 2: Mark Miller: Your formalism of intelligence focuses a lot on internal representations. Mark Miller: So this idea of in our minds, we have a world model and so on. Mark Miller: And when I read some of your blog posts from years ago, you're talking a lot about this externalist tradition, Mark Miller: which is that a lot of cognition happens outside of the brain. Mark Miller: How do you reconcile those two worldviews?
[01:56:15] Francois Chollet: Mark Miller: Right. Mark Miller: Well, I'm a big believer that most of our cognition is externalized, as you say. Mark Miller: When we are talking to each other, for instance, we are using words that we did not invent. Mark Miller: We are using mental images, ideas that we just read about somewhere and so on. Mark Miller: And if we had to develop all these things on our own, you know, we would need extremely long lives to start being intellectually proactive. Mark Miller: So I don't think there's really any contradiction between the two views. Mark Miller: Like the idea that, sure, like humans as individuals are intelligent. Mark Miller: You possess intelligence, I possess intelligence. Mark Miller: We can use it sort of like in isolation on our own. Mark Miller: And we can extract from our environment, from our lived experiences. Mark Miller: We can extract reusable bits, which we can use to make sense of normal situations. Mark Miller: That's the process of intelligence, we process it as individuals. Mark Miller: But also, we are able to communicate, right? Mark Miller: We are not just individuals, we are also a society. Mark Miller: So these ideas, these reusable abstractions, we can extract them from our brains. Mark Miller: We can put them out there in the world, share them with others. Mark Miller: Like we can write books, for instance. Mark Miller: We can type up computer programs that can be not even just executed by other brains, Mark Miller: but even by computers, right? Mark Miller: And this process is just the creation of culture. Mark Miller: And then once culture is out there, you can download it into your brain and that's education. Mark Miller: And as you're doing it, you are sort of like artificially filling up your bank of reusable abstractions. Mark Miller: And so you shortcut, you know? Mark Miller: It's almost like downloading skills like in the matrix. Mark Miller: It's a little bit of that, like learning about physics, learning about math. Mark Miller: You are downloading these very rich reusable mental templates, like really mental building blocks. Mark Miller: And then you can, in your own brain, you can recombine them. Mark Miller: You can reapply them on new problems. Mark Miller: It makes you more intelligent, like literally more intelligent. Mark Miller: It makes you more efficient at skill acquisition, more efficient at problem solving and so on.
[01:58:40] Speaker 2: Mark Miller: Yeah, beautifully articulated. Mark Miller: I mean, there's a couple of great books I've read on this. Mark Miller: The Language Game and also Max Bennett's book on intelligence. Mark Miller: Basically talking about this, the plasticity of mimetic information sharing, you know, allowing us to stand on the shoulders of giants.
[01:58:58] Francois Chollet: Mark Miller: I think there's an interesting angle to the question you asked. Mark Miller: I don't know if you were aware of it, but what I've described there is this idea that humans are the source of abstraction. Mark Miller: Human, individual human brains use their lived experience to extract abstractions. Mark Miller: And then they're externalizing them via language, typically, not exclusively, but most of the time. Mark Miller: And then other brains can download abstractions and kind of make them their own, which is a huge shortcut. Mark Miller: Because you don't have to experience everything on your own to start leveraging these abstractions. Mark Miller: But in this model, abstraction generation and abstraction recombination to form new models is always happening inside brains, right? Mark Miller: The only part that's externalized is the memory. Mark Miller: It's that you're moving the abstractions, the reusable building blocks out of these individual brains, putting them in books and so on, and then downloading them back. Mark Miller: But to be useful, they need to be internalized in your brain. Mark Miller: A question then is, could abstraction generation or recombination actually happen outside brains as well? Mark Miller: Not necessarily in the context of creating an AGI because, you know, that's exactly what an AGI would be. Mark Miller: It would be this recombination and abstraction process, this synthesis and abstraction process encoded in software form. Mark Miller: But do we have today like external processes that implement this? Mark Miller: Well, I think we sort of do. Mark Miller: I think science in particular is doing a form of synthesis that is driven by humans, but it is not happening inside human brains. Mark Miller: Like we have the ability to do recombinative search over spaces that actually cannot fit inside human brains. Mark Miller: And then you see it in a lot of the things that we invent, like when you create a better computer, for instance. Mark Miller: You are doing some kind of recombinative search over a space of possible devices, but you are not really able to hold a full model of the device inside your own brain. Mark Miller: Instead, the model is distributed across some number of externalized artifacts. Mark Miller: And I do believe that human civilization is implementing this highly distributed synthesis part of the process of intelligence. Mark Miller: We have implemented it externally across many different brains, manipulating externalized symbols and artifacts. Mark Miller: And this is what's underpinning a lot of our civilization because the systems we've been creating, we've been inventing, are so complex that no one can really understand them in full. Mark Miller: So you cannot run this invention process inside brains anymore. Mark Miller: Instead, you are using brains to drive a much bigger externalized process. Mark Miller: So I think cognition is externalized not just in the sense that we have the power to write down and then read ideas, abstractions, and then reuse them inside our brains. Mark Miller: We're actually running intelligence outside our brains as well.
[02:02:40] Speaker 2: Mark Miller: I completely agree. Mark Miller: And you've written about this, about how intelligence is collective situated and externalized. Mark Miller: Yes. Mark Miller: But there's always the question of, you know, like science, for example, is a kind of collective intelligence which supervenes on us and languages as well. Mark Miller: But do things like mimesis happen outside of biology? Mark Miller: I mean, certainly it happens in the, you know, the selfish gene that happens with genetics. Mark Miller: Yeah. Mark Miller: But you could argue that a kind of mimesis actually happens just in any open physical system with certain patterns of functional dynamics and so on. Mark Miller: So, you know, the real question, I think, with this externalized cognition is where do the abstractions come from? Mark Miller: Perhaps our brains are just very efficient at building the map from the territory. Mark Miller: And it's just a slightly better way of doing what already happens naturally externally. Mark Miller: Yeah.
[02:03:39] Francois Chollet: Mark Miller: I think to a large extent, the way we've externalized cognition is not as efficient as the way we've implemented cognition in our brains. Mark Miller: These externalized cognitive processes, they, you know, so intelligence is a kind of search process, right, over a space of possible combinations of a thing. Mark Miller: I think right now this search process is to a large extent externalized when you're looking at technology, when you're looking at science. Mark Miller: But it's not externalized in a very smart way. Mark Miller: I think we are roughly implementing brute force search. Mark Miller: I see it a lot, especially in deep learning research. Mark Miller: The way the deep learning community as a whole is finding new things is by trying everything else and eventually hitting the thing that works, you know. Mark Miller: And I believe individual humans actually much, if they had enough brain power to actually model these things in their own brains, they would be much more effective at finding the right solution.
[02:04:44] Speaker 2: Mark Miller: Interesting. Mark Miller: I mean, Ryan Greenblatt's view was emblematic of some of the ex-risk folks in that he was arguing that he can be in a hermetically sealed chamber or be a brain in a vat. Mark Miller: And it's a pure intelligence, he would still be able to reason and solve tasks and so on. Mark Miller: And the counter view is that physicality and embodiment is really important. Mark Miller: I mean, when I asked Murray Shanahan this, I said, what's the reason why we need to have physically embodied robots? Mark Miller: And he said, well, these robots are interacting with the real world, they're understanding the intricate causal relationships between things and that helps them build models more efficiently. Mark Miller: But perhaps in service of just learning about the abstractions which already exist in the physical world.
[02:05:25] Francois Chollet: Mark Miller: Yes, to exercise intelligence, it needs to be operating on something, like you think out of something, about something, like you need to have some concrete environment and goals in that environment that you want to accomplish and actions that you can take. Mark Miller: So it's about something, it cannot be about nothing, but it's also made of something. Mark Miller: You are making your plans to reach your goals based out of existing components, existing subroutines. Mark Miller: If you have nothing at all, not only you have nothing to be intelligent about, but your intelligence has nothing to recombine. Mark Miller: Right? Mark Miller: And that's why embodiment is important. Mark Miller: I mean, in humans, you know, I mentioned this idea that cognition is built layer by layer. Mark Miller: Each new layer, which is a little bit more abstract than the one before it, it is built in terms of the components that came before. Mark Miller: And if you dig deep enough, if you unfold your mind layer by layer, at the very bottom, you will find things like the searching ratio. Mark Miller: The searching reflex, for instance. Mark Miller: It's like it starts, everything starts with your mouth. Mark Miller: And then you start having things like grabbing objects to put them in your mouth and then things like crawling on the floor so that you can reach objects. Mark Miller: So you can grab them and put them in your mouth and so on. Mark Miller: And at some point when you start putting objects in your mouth, but the new things you're learning are still expressed in terms of this sort of like concept and skill hierarchy. Mark Miller: Right. Mark Miller: And when you end up doing abstract math, well, you are using building blocks that eventually resolve to these extremely primitive sensory motor subroutines. Mark Miller: Right. Mark Miller: So yeah, embodiment is important. Mark Miller: But at the same time, I think the kind of body and sensory motor space that you have is very much plug and play. Mark Miller: If you have a true AGI, you could basically, if you have an AGI, you could plug any environment, any sensory motor space, any DSL as well into it. Mark Miller: And it would start being intelligent about it, you know. Mark Miller: So in that sense, like embodiment is important, but what kind of embodiment might not necessarily be important. Mark Miller: And, you know, another thing that's really important is goal setting, by the way, which is distinct from embodiment, is also distinct from intelligence. Mark Miller: If you're just a brain in a jar with nothing to think about, well, you're not going to be very intelligent, but also you're not really going to be doing anything because you have nothing to do. Mark Miller: You have no goal to drive your thoughts. Mark Miller: And I think it's especially true if you're looking at children. Mark Miller: The way you learn anything is by setting goals and accomplishing them. Mark Miller: You cannot really build good mental models, good world models, passively, purely by, you know, observing what's going on around you. Mark Miller: With no goals of your own. Mark Miller: That's not how it works. Mark Miller: Goal setting is a critical component of any intelligent agent.
[02:09:01] Speaker 2: Mark Miller: I completely agree. Mark Miller: I think the only unresolved tension in my mind is that there are many manifestations of intelligence. Mark Miller: And it is possible for us to build an abstract, explicit version, which would run on computers. Mark Miller: Essentially, it doesn't necessarily need to mimic the type of intelligence we have in the real world.
[02:09:20] Francois Chollet: Mark Miller: Yeah, I think so. Mark Miller: And I think it will probably have, at least in its first few iterations, it will probably have significant architectural similarity with the way intelligence is implemented in people. Mark Miller: But ultimately, you know, it might drift away towards entirely new types of intelligence.
[02:09:39] Speaker 2: Mark Miller: Now, you've said that language is the operating system of the mind. Mark Miller: What do you mean by that? Mark Miller: Right.
[02:09:45] Francois Chollet: Mark Miller: So what's an operating system, right? Mark Miller: It's not the same thing as a computer. Mark Miller: It is something that makes your computer more usable and more useful. Mark Miller: It empowers computing for some user. Mark Miller: Well, it empowers some user to best leverage the capabilities of their computer. Mark Miller: I think language plays a similar role for the mind. Mark Miller: I think language is distinct from the mind. Mark Miller: Like, it's a separate thing from intelligence, for instance, or even from a word model. Mark Miller: But it is a tool that you, as an agent, is leveraging to make your mind, to make your thinking more useful. Mark Miller: Right. Mark Miller: So I believe language and thinking are separate things. Mark Miller: Language is a tool for thinking. Mark Miller: And what do you use it for? Mark Miller: Well, I think one way is that you can use language to make your thoughts introspectable. Mark Miller: Your thoughts are there. Mark Miller: They're like programs in your brain, which you can execute to get their output. Mark Miller: But you cannot really look at them. Mark Miller: By writing them down in words, I don't mean like literally writing them down, Mark Miller: but just expressing them as words, suddenly you can start reflecting on them. Mark Miller: You can start looking at them. Mark Miller: You can start comparing them. Mark Miller: And critically, you can start indexing them as well. Mark Miller: I believe one of the rules of language is to enable you to do indexing and retrieval over your own ideas and memories. Mark Miller: If you did not have language, then to retrieve memories, you would have to rely on external stimuli. Mark Miller: Right? Mark Miller: Like, you know, Proust is eating a madeleine and it's reminding him of a specific time and place. Mark Miller: And if Proust did not have language, then every time he needs to think about that particular time and place, he would have to eat the madeleine. Mark Miller: This would be his only access point to that memory, right, this external stimuli. Mark Miller: If he has language, then he can use language to try to query his own world model and retrieve the memories that he wants. Mark Miller: So it's a way to express what you want to retrieve inside your own mind. Mark Miller: It's also a way to compose together more complex thoughts. Mark Miller: If you cannot reflect on thoughts, if you cannot kind of like materialize them and look at them and modify them in your mind, Mark Miller: then I think you're also quite limited in the complexities of the thoughts you can formulate. Mark Miller: This is a very simple problem analogy, by the way. Mark Miller: If you have a computer, you can actually use it to write programs. Mark Miller: You do not need an operating system, right? Mark Miller: You can just write an assembly code. Mark Miller: Why not? Mark Miller: But you are severely limited in terms of the complexity of the software you can produce. Mark Miller: If you have an operating system and you have high-level programming languages and so on, then these are tools that you can use as a programmer to develop much more Mark Miller: complex software. Mark Miller: And your intelligence as a programmer, your programmability has not changed. Mark Miller: It's just your tools that have gotten better. Mark Miller: And suddenly you are much more capable than you were before, right? Mark Miller: So I think intelligence is using language as a similar kind of tool. Mark Miller: Yeah.
[02:13:30] Speaker 2: Mark Miller: We have this information architecture of mediated abstractions at almost like concentric circles of complexity. Mark Miller: And in the language game that they spoke about, you know, scissors are a physical tool and language are the memetic equivalent of scissors. Mark Miller: And of course, we can compose these tools together and use them in different circumstances. Mark Miller: But moving to consciousness a tiny bit, I mean, you suggested that consciousness emerges gradually in children. Mark Miller: How does this, you know, inform your views of machine consciousness? Mark Miller: Right.
[02:14:04] Francois Chollet: Mark Miller: So, I mean, to start with, I am not that interested in the idea of machine consciousness. Mark Miller: I'm specifically interested in intelligence and related aspects of cognition. Mark Miller: I think consciousness is a separate problem. Mark Miller: Clearly, you know, it has some relationship with intelligence. Mark Miller: You see it, for instance, in the fact that, well, anytime you use system two thinking, you are aware of what you're doing. Mark Miller: Consciousness is involved. Mark Miller: So, clearly, there is a relationship between consciousness and system two. Mark Miller: The nature of this relationship is not entirely clear to me. Mark Miller: And I also do not pretend that I understand consciousness very well. Mark Miller: And honestly, I don't believe that anyone does. Mark Miller: So, I'm always very suspicious when I hear people who have very, very detailed and precise and critical ideas about consciousness. Mark Miller: So, you know, I do believe that it's plausible that machine consciousness is possible in principle. Mark Miller: I also believe that we don't have anything that resembles machine consciousness today. Mark Miller: We're probably pretty far from it. Mark Miller: For a system to be conscious, you know, it would need, at the very least, it would need to be much more sophisticated Mark Miller: than a sort of like input to it with mapping that you see in deep learning models or in LLMs. Mark Miller: At the very least, you would expect the system to have some kind of permanent state Mark Miller: that gets influenced by external stimuli, but that is not just fully set by external stimuli. Mark Miller: It has some kind of consistency and continuity through time. Mark Miller: It can influence its own future states. Mark Miller: It is not purely reactive, right? Mark Miller: I think consciousness is in opposition to purely reactive type systems like deep learning models or insects, maybe. Mark Miller: And I don't think we have any system that looks like this today. Mark Miller: I also think consciousness requires the ability to introspect quite a bit, like this sort of like self-consistent state of the system Mark Miller: that is maintained across time. Mark Miller: It should have some way to represent and influence itself. Mark Miller: It should be self-driving in a way. Mark Miller: And we don't have anything like that today. Mark Miller: But in principle, you know, maybe it's possible to build it. Mark Miller: And so you mentioned this thing I mentioned on Twitter, like this idea that Mark Miller: babies are not born conscious, which apparently is extremely controversial. Mark Miller: So maybe I can say a little bit more about that. Mark Miller: So first of all, you know, we have no real way of assisting with 100% certainty Mark Miller: whether anyone is conscious at any stage of development, right? Mark Miller: It's basically a guess. Mark Miller: It seems to me that babies in the womb are very unlikely to be conscious Mark Miller: because they're basically fully asleep all the time. Mark Miller: Like they're asleep, you know, they're in one of two possible sleep states, Mark Miller: like 95% of the time. Mark Miller: There's deep sleep where they're just, you know, inert. Mark Miller: And there's active sleep where they're moving around, you know, and, you know, Mark Miller: the mother can feel them move around. Mark Miller: And when they're moving on, they're not actually awake. Mark Miller: They're actually asleep. Mark Miller: It's just active sleep. Mark Miller: And the remaining 5% is not wakefulness. Mark Miller: It's just transitions between deep sleep and active sleep. Mark Miller: And the reason they are just sleeping all the time is that they're being sedated, right? Mark Miller: The womb is very low oxygen pressure environment. Mark Miller: And it's sedating them. Mark Miller: And also the placenta and the baby itself are producing anesthetic products. Mark Miller: Basically, the placenta is actually producing anesthetics. Mark Miller: And so that's keeping the baby like in this dreamless sleep pretty much. Mark Miller: Which doesn't mean, by the way, that their brain is not learning. Mark Miller: Their brain is not like just disconnected and doing nothing. Mark Miller: They are actually learning, but they are learning in this very passive way. Mark Miller: They're just computing statistics about what's going on in the environment, Mark Miller: which is what brains do whether you're awake or you're asleep. Mark Miller: But yeah, I believe that babies in the womb are not conscious. Mark Miller: And when they're born, they start at consciousness level zero, pretty much. Mark Miller: And as they start being awake and they start experiencing the world, Mark Miller: then consciousness starts to light up. Mark Miller: But it is not this sort of like instant switch where they go from being unconscious Mark Miller: to being fully conscious. Mark Miller: It happens gradually. Mark Miller: So you start at zero. Mark Miller: And by the way, you can have to start at zero even after you wake up Mark Miller: because when you're born, you have nothing to be conscious of. Mark Miller: You know, like pretty much everything, not just actions, but even perception is something Mark Miller: that you have to learn through experience. Mark Miller: When you're born, you cannot even really see because you have not learned to see. Mark Miller: You know, you have not trained your visual cortex, right? Mark Miller: So you can see maybe like blobs of light. Mark Miller: You cannot, you do not have a model of yourself, of your own sense of motor affordances. Mark Miller: You have maybe a very crude proto model that you developed by moving around Mark Miller: in the womb and having your brain kind of like map what's going on and correlations Mark Miller: kind of like in your sense of motor space. Mark Miller: It's not really a model. Mark Miller: It's not sophisticated model of anything. Mark Miller: So you have nothing to be conscious of. Mark Miller: You have no world model, no model of yourself, no real incoming perceptual stream, Mark Miller: because you have not learned to take control of your sense of motor affordances just yet. Mark Miller: So you start at zero and then as you build up these models, Mark Miller: your world model, your model of yourself and so on, Mark Miller: you start gradually, bit by bit, being more conscious. Mark Miller: And at some points you reach a level where you can be said to be fully conscious, Mark Miller: the way maybe like a dog might be fully conscious. Mark Miller: And I think it happens pretty fast. Mark Miller: It happens probably significantly earlier than the first clear external signs of consciousness. Mark Miller: I think around one month old-ish, the babies are probably conscious at the same level as most mammals, I suppose. Mark Miller: But it's still not adult level consciousness, right? Mark Miller: And I think adult level consciousness is something that children only start experiencing around age two to three. Mark Miller: It doesn't mean that they were not conscious the whole time. Mark Miller: Again, they're conscious pretty much starting on day one. Mark Miller: It's just to a very small amount, right? Mark Miller: And so consciousness is something that you have to build up over time, at least that's my theory. Mark Miller: And there are some sort of like indications that this is not entirely made up, basically. Mark Miller: One example is if you try to observe attentional blink, try to measure it in children, you will see that basically up until age three, they have a significantly slower attentional blink than adults. Mark Miller: And they're going to pass the events around them into fewer events. Mark Miller: So they can have a more coarse-grained resolution of time in the world. Mark Miller: And I think that's actually tied to this idea of level of consciousness. Mark Miller: I also have this very probably controversial idea that, well, so you reach adult-level consciousness around like age two to three, roughly. Mark Miller: But then you don't stop there. Mark Miller: You actually keep getting more and more conscious over time. Mark Miller: And your consciousness level probably peaks around age like nine to ten. Mark Miller: And then it goes in reverse. Mark Miller: You get less and less conscious with every passing year. Mark Miller: But not to a very significant extent. Mark Miller: So that the difference in degree of consciousness between, I don't know, a ninety-year-old and a ten-year-old and a three-year-old is actually very, very minor. Mark Miller: But it is still there. Mark Miller: And I think this plays into some things like, for instance, our subjective perception of time. Mark Miller: I think the more conscious you are, the higher your level of consciousness, the slower your perception of time. Mark Miller: Because your perception of time is highly dependent on how many things you can notice in any time span. Mark Miller: So one way you could conceptualize your degree of consciousness is you can imagine consciousness is kind of like nexus in your world model. Mark Miller: It's a focus point from which span like a bunch of connections to other things, connections that encode this focus point and give it meaning. Mark Miller: And these connections, they can be fewer of them or more of them, and they can be more or less deep. Mark Miller: And the deeper the connections, the more you have, the more conscious you are. Mark Miller: And there's also this temporal component where if you're highly conscious, then even in one signal, you might be noticing many things and drawing many connections between these things and things you know. Mark Miller: That's a higher level of consciousness. Mark Miller: On the other hand, if you're noticing very few things, if you have a very coarse-grained perception of reality that is evolving, and you're only noticing few things in any time span, then you have a faster perception of time. Mark Miller: Like things just pass in a blink. Mark Miller: And that's a lower level of consciousness. Mark Miller: If you drink a lot of booze, you have reduced consciousness, right? Mark Miller: And things will actually seem to move faster, and you will notice fewer things, and the depth of connections that you establish between things is less. Mark Miller: I think something like, you know, if you're a one-year-old toddler, you have a much slower attentional blink, your perception of time is likely very, very fast. Mark Miller: You know, we have this idea that children perceive time slower. Mark Miller: I think that's true, but it really depends on your age. Mark Miller: I think if you're one, time is super fast, because again, you're at this lower level of consciousness. Mark Miller: If you're three, it's basically adult level. Mark Miller: But if you're ten, it's actually pretty slow, right? Mark Miller: Or if you're seven, it's slow as well. Mark Miller: It actually gets slower and slower and slower until it peaks around age like nine, ten. Mark Miller: Then it starts getting faster again because you're less and less conscious at the time.
[02:25:51] Speaker 2: Mark Miller: I remember being very bored when I was a child. Mark Miller: I've not felt bored in as long as I can remember. Mark Miller: And I interviewed Professor Mark Solmes recently. Mark Miller: He's got a great book called The Hidden Spring, and his basic idea is that consciousness is prediction errors. Mark Miller: So the more, you know, like you're conscious when you first learn how to drive. Mark Miller: So the more things become automated, the less conscious we are, and then maybe time goes faster in many ways as we grow up. Mark Miller: But this idea of being more or less conscious is really interesting. Mark Miller: Yes. Mark Miller: As you say, it's like a dimmer switch. Mark Miller: Yes. Mark Miller: But on the machine sentience thing, I remember you came on the show to talk about the Chinese room argument, Mark Miller: and you said understanding is a virtual property of functional dynamics in the system. Mark Miller: And presumably you would also argue that consciousness is a virtual property of functional dynamics in the system.
[02:26:36] Francois Chollet: Mark Miller: I think so. Mark Miller: I think it is not strongly tied to substrate. Mark Miller: So in principle, you should be able to implement consciousness using the right functional dynamics in silicon. Mark Miller: Yes. Mark Miller: I don't think we have it or that we're close to having it. Mark Miller: But in principle, I don't see a problem with that.
[02:26:52] Speaker 3: Mark Miller: Yes.
[02:26:53] Speaker 2: Mark Miller: And we'll leave the hard problem of consciousness to one side. Mark Miller: By the way, Mark Solmes was quite dismissive about the hard problem of consciousness, Mark Miller: which is that there is something it is like to be conscious. Mark Miller: Well, I think there is. Mark Miller: Oh, go on.
[02:27:05] Francois Chollet: Mark Miller: I think there is. Mark Miller: Yeah. Mark Miller: Like some people dismiss, yeah, some people dismiss the problem of consciousness saying, Mark Miller: yeah, no, like something like consciousness is what it feels to be an information processing system or things like that. Mark Miller: It really means nothing. Mark Miller: It's just pushing the problem back to where you can better control it with words, but it's not reusing the problem. Mark Miller: There is clearly such a thing as qualia and you are experiencing them right now. Mark Miller: So you cannot deny that they exist. Mark Miller: And we have no way to explain or even describe what they are. Mark Miller: Like you can describe many things about consciousness, but the subjective experience is not reducible to these explanations. Mark Miller: There is something. Mark Miller: And we don't know what that is.
[02:27:50] Speaker 2: Mark Miller: And you think we have it. Mark Miller: And animals have it. Mark Miller: Yes.
[02:27:56] Francois Chollet: Mark Miller: Animals have it. Mark Miller: I mean, not all animals. Mark Miller: And again, like I believe in this idea of degrees of consciousness. Mark Miller: And animals probably have it to less extent than we do. Mark Miller: It might not be a huge difference, by the way, but it's probably less, yeah.
[02:28:14] Speaker 2: Mark Miller: Do you think the earth could be conscious to some degree?
[02:28:17] Francois Chollet: Mark Miller: No, I don't think so. Mark Miller: So I think non-animal systems typically lack the basic prerequisites that I would want to see in a system to even start entertaining the notion that it might be conscious. Mark Miller: Like, for instance, the ability to maintain this self-influenced, self-consistent inner state across time. Mark Miller: That's influenced by perception. Mark Miller: But that is also capable of driving itself, pretty much influencing its own future state. Mark Miller: That's capable of representing itself, introspecting and so on. Mark Miller: I don't think you see that in non-biological systems today.
[02:29:00] Speaker 2: Mark Miller: Do you think the collective of all Americans could be seen as a conscious being? Mark Miller: No. Mark Miller: Why not?
[02:29:06] Francois Chollet: Mark Miller: Again, because it lacks these basic prerequisites.
[02:29:10] Speaker 2: Mark Miller: So it needs to be a physical form of connectedness to the surroundings? Mark Miller: There couldn't be a virtual version distributed over many agents?
[02:29:21] Francois Chollet: Mark Miller: No, you could definitely imagine a distributed version. Mark Miller: It's just that I'm not seeing the collective of all Americans, for instance, implementing this self-influenced, self-consistent state that's capable of representing itself and the world and so on. Mark Miller: And even then, you know, even if you have these things in a software system, for instance, it's not automatically conscious. Mark Miller: It's just that it starts being plausible that it might be conscious if you also see signs, like, pretty clear signs it might be. Mark Miller: So what might be such a sign? Mark Miller: Well, it's difficult. Mark Miller: And I don't think that you're ever going to see a proof of consciousness that works 100% of the time. Mark Miller: I think it's always kind of a guess. Mark Miller: But typically, you know, I think it's highly likely that the system is conscious if it has all these prerequisites and it is capable of expressing statements about its own inner state that cannot be purely a product of repeating something the system has heard. Mark Miller: You know, like if you ask an LLM about how it feels and so on, it will answer something. Mark Miller: But it's really just rehashing something it has read. Mark Miller: So what I would want to see is the system is making statements about how it feels Mark Miller: and there seems to be a strong correlation between the behavior of the system and what it is telling me. Mark Miller: And what it is telling me is unlike anything that the system has seen elsewhere before. Mark Miller: Like, I don't know, I'm holding my two-year-old and trying to console them because they're crying. Mark Miller: And I'm like, hey, you shouldn't cry. Stop crying. Mark Miller: And they're like, but I want to cry. Mark Miller: That's how I feel like. Mark Miller: Well, there's a pretty strong correlation between what the child is doing and what they're saying about themselves, Mark Miller: so you can believe them. Mark Miller: And they've never heard anyone saying, I want to cry. Mark Miller: They're really expressing something they could not have picked up from anywhere else. Mark Miller: So in this situation, it's just highly plausible. Mark Miller: It is not proof of anything. Mark Miller: It is highly plausible that they, in fact, do have some awareness of their own mental states Mark Miller: and they're expressing something about them. Mark Miller: And they are actually conscious. Mark Miller: They are experiencing qualia, you know.
[02:32:04] Speaker 2: Mark Miller: So Francois, you've been very critical of singularitarianism and Doomerism. Mark Miller: What do you think is the driving force of these extreme views?
[02:32:15] Francois Chollet: Mark Miller: Well, you know, I think they're good stories, like stories about the end of the world, Mark Miller: this idea that we are living in the end times and maybe that we have a role to play in it. Mark Miller: These are good stories, which is why you find them a lot in fiction, like in science fiction, for instance, Mark Miller: you find them a lot in religion as well. Mark Miller: And they're not new. Mark Miller: They've been around for thousands of years. Mark Miller: So I think that's the primary driving force. Mark Miller: It's just that they are good as memes. Mark Miller: They are good stories. Mark Miller: People want to believe them. Mark Miller: And they're also very easy to retain and propagate. Mark Miller: That's really the main thing. Mark Miller: You know, everyone is just craving meaning. Mark Miller: And they have to organize their lives around, which is why cults are still a problem in our day and age. Mark Miller: And that's just an instance of that, I think.
[02:33:19] Speaker 2: Mark Miller: Do you think there's a bit of a messiah complex as well?
[02:33:23] Speaker 3: Mark Miller: Oh, absolutely.
[02:33:24] Francois Chollet: Mark Miller: Yeah. Mark Miller: Absolutely. Mark Miller: I think you see it a lot in the San Francisco Bay Area. Mark Miller: There are people who have kind of latched onto this idea of building a GI and who are using it to sort of like picture themselves as messiah, as you say. Mark Miller: Personally, I see creating a GI as a scientific problem, not a religious quest. Mark Miller: You know, and this is often kind of merging together with the idea of eternal life, by the way. Mark Miller: Which is, of course, very natural because the story in most religions is always about this combination of... Mark Miller: Anyway. Mark Miller: But yeah, it's kind of merging as well with this idea of eternal life, right? Mark Miller: That if you create a GI, it will make you live forever, pretty much. Mark Miller: So it's this very religious idea, right? Mark Miller: And it has become this religious quest to get there first. Mark Miller: And whoever gets there first will become as gods, right? Mark Miller: So I'm not really subscribing to any of that. Mark Miller: I think building a GI is a scientific problem. Mark Miller: And once you build a GI, it's basically just going to be a very useful and valuable tool. Mark Miller: It is going to be, you know, as I mentioned, a path-finding algorithm in future situation space. Mark Miller: It's going to be a piece of software that takes in information about the problem Mark Miller: and is capable of very efficiently synthesizing a model of that problem Mark Miller: which you can use to make decisions about the problem. Mark Miller: So it's a valuable tool, but it does not turn you into god. Mark Miller: And certainly you can use it in scientific research and maybe you can use it in longevity research. Mark Miller: But it does not automatically make you immortal because it is not omnipotent. Mark Miller: I think if you start having very powerful ways to turn information into actionable models, Mark Miller: your bottleneck quickly starts becoming the information that you have. Mark Miller: So for instance, if you have an AGR that can do physics, it can quickly synthesize new physics theories. Mark Miller: The thing is, human scientists today, they're already very, very good at that. Mark Miller: They are in fact too good. Mark Miller: They are so good that their ability to synthesize plausible new theories far exceeds Mark Miller: our ability to collect experimental data to validate them. Mark Miller: That's what you see with string theory, for instance. Mark Miller: And that's a pretty stark illustration of the fact that if you are too smart, Mark Miller: then you start running kind of like free of information and that starts not being very useful anymore. Mark Miller: Applied intelligence is grounded in experimental data. Mark Miller: And if you are very intelligent, then experimental data becomes a bottleneck. Mark Miller: So it's not like you're going to see a runaway intelligence explosion.
[02:36:34] Speaker 2: Mark Miller: Is there anything that would make you change your mind? Mark Miller: I mean, again, I had this discussion with Greenblatt, and I try and avoid having X-risk discussions Mark Miller: when I'm actually debating. Mark Miller: And a lot of it hinges on agency. Mark Miller: So I said, because I don't think systems are agential or will be, I don't see the problem. Mark Miller: Because a lot of the mythos around this, you know, the Bostromium ideas around instrumental convergence Mark Miller: and orthogonality, it's all goals, it's all agency based. Mark Miller: So no agency, no problem. Mark Miller: Presumably, you agree. Mark Miller: Yes. Mark Miller: But, you know, maybe if there was agency, would you think there was a problem?
[02:37:06] Francois Chollet: Mark Miller: Yeah, no, I think intelligence is separate from agency, is separate from goal setting. Mark Miller: If you just have intelligence in isolation, then again, you have a way to turn information into actionable models. Mark Miller: But it is not self-directed, it is not able to set its own goals or anything like that. Mark Miller: Goal setting has to be an add-on, an external component that you plug into it. Mark Miller: Now you could imagine that, well, what if you combine this HGI with an autonomous goal setting system, Mark Miller: with a value system, you turn all of that into an agent, and then you give it access to the nuclear codes, Mark Miller: for instance, something like that, is that dangerous? Mark Miller: Well, yes, but you've kind of engineered that danger in a very deliberate fashion, right? Mark Miller: I think once we have HGI, we'll have plenty of time to kind of anticipate this kind of potential risk. Mark Miller: So, I do believe, you know, HGI will be a powerful technology. Mark Miller: So, this is exactly what makes it valuable and useful. Mark Miller: Anything powerful is also potentially risky. Mark Miller: But we are very much going to be the ones in control, because HGI, on its own, Mark Miller: cannot set goals until you actually create an autonomous goal setting mechanism. Mark Miller: But why would you do that, you know? Mark Miller: So, the difficult part, the dangerous part, is not the intelligence bit. Mark Miller: It's more like the goal setting and action space bits. Mark Miller: And if you want to create something very dangerous, that sets its own goals and takes action in the real world, Mark Miller: you do not actually need very high intelligence to do so. Mark Miller: You can already do so with very crude techniques, right?
[02:39:02] Speaker 2: Mark Miller: So, the thing is, existential risk, I mean, it's a legitimate form of inquiry. Mark Miller: And especially nuclear risk, for example. Mark Miller: And I know many of these folks, they're not just solely focused on AI existential risk. Mark Miller: They're looking at other risks as well. Mark Miller: But how do you view the incentives? Mark Miller: I mean, you could be really cynical and just say, oh, effective altruism and open philanthropy, Mark Miller: they're throwing lots of money at this and what they actually want is power and control. Mark Miller: How do you kind of think about this?
[02:39:33] Francois Chollet: Mark Miller: Well, there's definitely a little bit of that. Mark Miller: I also think a lot of the true believers, they're just buying into it because they want to believe. Mark Miller: Again, it's very parallel to religious ideas in many ways. Mark Miller: So, I don't think it's very rational, you know. Mark Miller: So, that said, you know, once we have EGR, because today we don't, and I don't think we're particularly close to it, Mark Miller: but once we have it, then we can start thinking about the risks that are involved. Mark Miller: I don't think you're going to see, you know, the day you just start trying the program, Mark Miller: it becomes self-aware and takes control of your lab and so on. Mark Miller: I don't think you're going to see anything like that. Mark Miller: Again, intelligence, EGR is just a piece of software that can turn data into models. Mark Miller: It's up to you to use it in a certain way, right?
[02:40:28] Speaker 2: Mark Miller: I mean, like an abstract way to think about this is framing it as safetyism and governance in general. Mark Miller: So, if we take away the hyperbolic extra risk and we talk about, you know, misinformation and things like that. Mark Miller: Sure. Mark Miller: What do you think about that? Mark Miller: I mean, maybe I should be more specific. Mark Miller: I mean, you know, deep fakes and misinformation and infringement of copyright and so on. Mark Miller: Do you think that we should strongly regulate this or would it harm innovation if we did?
[02:41:01] Francois Chollet: Mark Miller: I think there are definitely harms that can be caused by current technology, by current and near-term uses of AI. Mark Miller: And yes, I think some form of regulation might be useful to protect the public against all these harms. Mark Miller: I also think that the regulation proposals that I've seen so far are not really satisfactory. Mark Miller: They are more leaning towards harming innovation than protecting the public. Mark Miller: I think ultimately they are more likely to end up concentrating power in the AI space than just protecting the public. Mark Miller: So, I think regulating AI is difficult. Mark Miller: And just relying on existing non-AI regulation to product people might be the better course of action. Mark Miller: Given that introducing a new AI-specific regulation is, you know, it's a difficult problem. Mark Miller: And I don't think based on what I've seen so far, I don't think we're going to do a very good job at it.
[02:42:21] Speaker 3: Mark Miller: Francois Chollet, it's been an honor and a pleasure. Mark Miller: Thank you so much. Mark Miller: It's my pleasure. Mark Miller: Thanks so much for having me. Mark Miller: Amazing.
[02:42:28] Speaker ?: Mark Miller: Thank you.
Related Transcripts from Machine Learning Street Talk