The Ghost in the Machine — Noam Chomsky — Full Transcript (June 12, 2026)

[00:00:00] Speaker 1: First we should ask the question whether large language models have achieved anything, anything in this domain? Answer, no they've achieved zero. [00:00:12] Speaker 2: Hello Street Talkers, welcome back to the channel. Today is a special episode for me personally, an emotional episode. We've been through quite a tumultuous journey to to create this episode for reasons which will become abundantly clear pretty soon but Noam Chomsky is an intellectual heavyweight. He's one of my personal heroes and it truly is a dream come true for us to have this experience of interviewing him. So anyway I really hope you enjoy the show today. We are coming live from Lake Como in Italy. I think this is possibly one of the most beautiful places on planet earth and you know in a Douglas Hofstadter style I hope that adds a favour to the creativity behind this episode. Anyway, enjoy. Thank you so much to our Patreons. I wanted to give a special shout out actually to our VIP Patreons, Alex McNamara and Ebonia Elliot Lewis. Now psychologically if nothing else it helps us so much having your support and we're also interested in finding sponsors for the show and of course don't forget to join our amazing Discord community. But no we really appreciate our community and our support and thank you so much. Welcome back to Street Talk. Today is absolutely incredible. Our dream came true and we got to interview our hero Professor Noam Chomsky. Waleed couldn't even keep his shit together. Chomsky is an intellectual heavyweight. He's probably the most important intellectual of the 20th century. He's been cited so many times. I mean we're quite spoilt here on MLST because Friston and Bengio also are in the top 100 of scientists based on their H score. But it was an incredible honour to speak with Chomsky. I mean I still have 10 Chomsky books on my bookshelf Downstairs. I went through a certain political phase at university and no harm in that. I think it was very enriching for me going forwards in my life. But yeah he's been a towering figure I think in so many people's lives throughout the entire 20th century and beyond. This is going to be a long show. It's going to be over three and a half hours long so use the table of contents to skip around. This first chapter is mostly about Yann LeCun and the reason we wanted to talk about him and his recent paper is because as a radical empiricist he's the antithesis of Chomsky and I think it frames some of the subsequent discussion nicely. But if you want to skip ahead and you know just only do Chomsky stuff then it's about 50 minutes to skip. Look at the table of contents. So chapter one. [00:02:48] Speaker 3: Really the revolution that has happened over the last decades is the fact that we've realized that AI has to be intimately linked with learning. The fact that when we observe you know any animal with a brain in nature is capable of learning. And perhaps because I'm lazy or because I don't think I'm very smart I've always thought that it would be very difficult to just design from scratch an intelligent system. An intelligent system has to basically design itself more or less through through learning. [00:03:26] Speaker 2: So LeCun just released a position paper called A Path Towards Autonomous Machine Intelligence. Now I admire LeCun a lot. I want to give him credit personally for explaining things in very simple terms. It's always very insightful reading his papers or listening to his lectures. Note the lack of jargon. And even with the energy based models that the high level abstract pictorial formalism is something which is great for communicating science but not so great if you don't want to get Schmidhuberd but we'll come back to that in a bit. So he really knows everything in his space. He can essentialize complex ideas. So yeah, he's a wonderful scientist. However, let's not forget that. I mean, God knows he must be earning about five million dollars a year, probably way more than that as a vice president of Facebook. I mean, even at an E7 level, he could probably earn into the millions. So anyway, you know, I guess what I'm saying is he's not going to rock the boat, is he? He's going to, he's going to stay comfortable. They've already gone quite far in introducing the world to energy based Bayesian Bayesian models. So, um, yeah, that that's the caveat. Anyway, in his paper, he lamented that our best systems are still far from matching human reliability in real world tasks, such as driving, even after being fed with ridiculous amounts of supervisory data from human experts and after going through millions of reinforcement learning trials in virtual environments. And even after engineers had hardwired hundreds of behaviors into the model. So Lacoon thinks the answer may lie in the ability of humans and many animals to learn world models, which is to say internal predictive models of how the world around us works. I thought the paper contained a lot of common sense when it comes to some of the pitfalls of the current approaches to AGI. And, um, you know, in my personal opinion, I think he's still a little bit too wedded to neural networks, but as we're going to discuss in this program, when it comes to symbolism and empiricism, uh, there's no middle way, you're either all in or it's like a house of cards and it, and it all comes crashing down. But anyway, um, he thinks that the main challenges of AI research today are being able to learn passively, being able to learn efficiently, introducing type two models. Yeah, of a similar vein to our friend Francois Cholet and, um, also this ability to learn abstractions and compositional semantics. And I guess we should lump those in together for the time being. I mean, he did actually recognize that compositional semantics while important were, um, something that was, you know, it's still beyond our reach. He doesn't really know how to achieve that with the current paradigm. Now, um, on the blank slate, you know, the prevailing wisdom from empiricists like Lacoon and Rich Sutton, remember his bitter lake essay, is that, uh, all handcrafted knowledge is bad, basically. And, and, you know, perception derived knowledge is the only knowledge, uh, game in town. Now, um, one thing I found quite ironic in this paper is that it's a huge step away from that wisdom. I mean, of course they'll say that it isn't, but when you look at this, um, architecture diagram, and there was even a comedy version of it made by, um, Christian Sergedi, it does look like a rich handcrafted cognitive architecture. Um, you know, I don't think it's hyperbolic to say it's representing or resembling the architectures that we used to build in the 1980s. Of course, Lacoon would argue that it's actually a level of abstraction above that. It's like a prototypical architecture with learning, but in a way it's not because all of those, you know, in all of those levels levels of abstraction are actually hard coded levels of abstraction. Now, Lacoon even showed a development graph of human cognitive skills strongly implied to be a directed acyclic graph, which was derived only from perceptual information or nearly only. Now I thought that was a bit of a sleight of hand because he strongly suggests that the sequential development of skills is evidence of the blank slate and human cognitive development. But then he went on to show that with these levels of abstraction in his hierarchical models, they must be hard coded. This isn't to say that he thinks that neural networks today cannot learn deducible abstractions from the base abstraction prior or, you know, the, the inductive prior, which is to say the encoder and the prediction model. You know, um, it did give me pause for thought because presumably he does think that current neural networks can learn abstractions. I mean, that's the common wisdom. I mean, in my opinion, they can't. Well, they, they learn a tiny dot of abstractions out of an infinite sea of, of abstractions. But yeah, it gave me pause for thought. And he also said that he thought the current language models were not artificial general intelligence because they don't have these abstract latent variables to explore multiple interpretations of a percept and indeed search for an optimal course of actions to achieve a goal. So, um, it's not entirely clear to me whether he thinks that the lack of Bayesian style uncertainty quantification or the lack of a hard coded abstraction hierarchy is, is the biggest reason why it's not AGI. But, um, it's interesting to see him point out those two, um, you know, particular issues that he sees with current architectures. For Lacoon, we need to be able to dynamically switch between levels in an abstraction hierarchy. The levels should strictly depend on the information derived from the level below only. And, um, the abstractions while handcrafted should learn empirically by predicting the representation of another concept in the same level of the hierarchy, but pointing at another point in time or space. This is what Lacoon means by passive or self-supervised learning. Now, um, unfortunately we don't have time in this episode to go into all of the Jepa stuff, but to be honest, I'm glad we didn't waste our time because Yannick has just made an episode on it. So you can think of this as supplementing Yannick's episode. I do think it's quite fascinating though. I mean, I'll say a couple of things here. Lacoon introduces these latent variables of unnormalized energy to represent possible futures. It's basic, as far as I'm concerned, it's analogous to Bayesian style probabilistic graphical models. I mean, from a, you know, philosophy and mental framework point of view, uh, you know, where you have these latent or unobserved, uh, variables. So, and in the context of Lacoon's architecture, we're learning the dependency between what is observed and what is not observed. But now we use this unnormalized energy instead of pure probability distributions, which means that the distributors don't have to sum up to one because doing so is usually an intractable operation. Um, but it, they are actually interchangeable. So you can quite easily convert energy distributions into probability distribution. You know, obviously it's not, it's not a perfect conversion. It's still possible to use neural networks as prediction modules in these architectures and Lacoon obviously would want to do that. The main change here is that the models become stochastic over the domain of the latent variable, which is to say there are many possible Y's or predictions for a given X or for a given signal. Lacoon then regularizes the latent to stop information leaking into the predictions and also to shrink wrap the representation around volumes of high density. Um, to, I mean, as Lacoon says, mostly to overcome the curse of dimensionality, which you get with so-called contrastive methods. So I always wondered what Lacoon meant by contrastive versus non-contrastive. They're both contrastive. They're both the same, but the non-contrastive has, uh, some regularization tricks to prevent mode collapse and to prevent, um, problems with the curse of dimensionality. Anyway, we discuss self-supervised learning and contrastive models quite a lot in our interview with Ishan Mizra. So, uh, why don't you go and check that out if you're interested and watch Yannick's video as well. Anyway, all of his models, um, being, you know, self-supervised non-contrastive models, um, learn by filling in the missing gaps in time or space. So they work mostly passively and his models are stacked vertically to capture concept or abstraction hierarchies and then stacked in time to learn these action space abstractions, which is what he calls, you know, Daniel Kahneman style mode two, or, you know, thinking fast and slow. So he thinks that being able to find trajectories in action space is analogous to reasoning. Um, I don't think it's analogous to, well, I mean, I guess it is technically analogous, but it's at the wrong level of description and you need to have a, an infinite number of traversals to actually reason. So I, I, I will make the argument later in the video that what we need is, is symbolics and, um, compositionality or what Fodor and Phyluson called, um, systematicity. Okay. Um, the paper also has some fascinating discussion of uncertainty quantification. Again, we don't have time to get into it here, but I do recommend you check out the paper. I think it's a wonderful, um, you know, tour de force of energy-based models and, uh, joint embedding prediction architectures and, and Jan Lecun's view on many topics in artificial general intelligence. [00:12:05] Speaker 4: I mean, everything comes from observation, from sensory input, which is, which it has been crutched long time ago. I'm, I'm even surprised. I'm even surprised that the likes of Jan Lecun still think that everything comes from observation. I mean, are you kidding? I could be blind and deaf and be as rational as Dom Chomsky. And God, we have another huge part of cognition that has nothing to do with perception. Okay. [00:12:43] Speaker 2: The most remarkable quality of human cognition, the very core of our cognition, is the ability to take any two objects and select from an infinite set of possible abstractions. Abstractions which are not deducible from percepts. Why does Lecun think that the only abstractions which are needed are directly deducible from perceptual information? All of this fails when you consider the fact that perception-derived data cannot deduce most rules about the world, certainly not in limited time and space. I'm talking about world models and abstractions and hierarchies. Surely they cannot just be probabilities. Now, Lecun's architecture can produce a tiny sliver of abstractions, a minimum spanning tree of which directly deducible from the handcrafted priors of the encoders and the prediction models. I agree that these are abstractions, but they are essentially human crafted or at least human seeded. The space of possible abstractions between two objects is infinite. Yes, infinite. Lecun said that objects may spontaneously emerge. Once the notion of object emerges in the representation, concepts like object permanence may become easy to learn. Objects that disappear behind others due to parallax motion will invariably reappear. So it was at this point that it occurred to me what Lecun really meant by abstractions. And indeed, I mean, I can see how three-dimensional objectness might be deducible given the visual priors that we've designed on these models. But this is just a drop in the ocean. Now, Lecun admitted that it's these handcrafted priors which determines what is represented in the models and indeed which abstractions are deducible. He said that the joint embedding prediction architecture finds a trade-off between the completeness and predictability of the representations. What is predictable and what does not get represented is determined implicitly by the architectures of the encoders and the predictors. They determine an inductive bias that defines what information is predictable or not. Now, Lecun gives a concrete example of these different levels of description or abstraction, if you like. He says, let's take a concrete example. When driving a car, given a proposed sequence of actions on the steering wheel and pedals over the next several seconds, drivers can accurately predict the trajectory of their car over the same period. The details of the trajectory over longer periods are harder to predict because they may depend on other cars or traffic lights or pedestrians and other external events that are somewhat unpredictable. But the driver can still make accurate predictions at a higher level of abstraction, ignoring the details of trajectories, other cars and traffic signals, etc. Now, the car will probably arrive at its destination within a predictable time frame. The detailed trajectory will be absent from this level of description. But the approximate trajectory, as drawn on a map, is represented. A discrete latent variable may be used to represent multiple alternative routes, end quote. So Lecun goes on to say that a model could in theory work at multiple levels of description or abstraction simultaneously, just like humans do. And he asserted that the ability to represent world states at several levels of abstraction is essential to intelligent behaviour. For Yann Lecun, reasoning is simply finding a good path through a state action space. This is a very low resolution view of a complex topic. It's the equivalent of predicting the weather tomorrow, using the average temperature of this month. Deep learning folks tend to lean into the parlour tricks, and lean away from any mechanistic understanding. Lecun admits there is an exponential blow up traversing state action spaces in his hierarchical joint embedding architecture, and suggests using a discrete approximate dynamic programming algorithm, like Monte Carlo Tree Search, to find good trajectories in tractable time. But a much better way to cut down the search space is with compositionality. Compositionality is that the meaning of a complex expression is fully determined by its structure and the meanings of its constituents. Once we fix what the parts mean and how they're put together, we have no more leeway regarding the meaning of the whole. This is the principle of compositionality, a fundamental presupposition of most contemporary work in semantics, or the study of meaning. We can understand a large, perhaps infinitely large collection of complex expressions, the first time we encounter them. And if we understand some complex expressions, we tend to understand others that can be obtained by recombining their constituents. And guess what? This doesn't just apply to expressions, this also applies to planning and reasoning. This is only possible with an algebraic approach to semantics and planning, and achieved with symbolic manipulation. "For any two physical objects, x and y, if y is contained in x, then, if nothing exceptional happened to y, the location of y must be the location of x." Now, this is a symbolic rule, a function, a procedure. You cannot represent this fact without symbolic logic. This isn't data, but rather, a procedure which needs verification. You have variables of a specific type, where types come from an ontological structure. And second, you have quantification over these variables. That's what the upside-down A means. It's addressing a potentially infinite set of possible values. Neural networks are extensional. They cannot represent intentional, which is to say infinite objects. Now, if you want a refresher on this stuff, check out our intro to the Gary Marcus and Lewis Lamb show. [00:19:27] Speaker 5: It's good to be in an environment where people take for granted these questions, because I spent a lot of the last 20 years, almost, or even more than 20 years, trying to get people to recognize the importance of abstraction. So, I came to this having worked in psychology on children learning rule, and came into the first wave, or second wave, depending on how you count it, of neural networks. And people trying to argue that there was no abstraction, that it was all just basically memorization through multi-layer networks. They were then three-layer networks. And it's been a long, hard slog to get people to realize how important abstraction is. And I think that there's been a real sea change in the last couple of years. [00:20:13] Speaker 2: The important thing to realize is the only way to represent infinite objects in a finite way is using quantification or logic over typed symbolic structures. That's why neural networks cannot do basic arithmetic. You need intentions that are symbolic procedures over variables. Let's look at another example, addition. Here's a program, add zero, n equals n, add m, n equals 1, plus add m minus 1, n. Now, it's a finite representation of an infinite object. So, addition of m to n is nothing more than adding m1s to n, or m successes to n. And from that you can define multiplication, because multiplying m by n is adding m n's, or adding n m's. That's why it's commutative. [00:21:09] Speaker 4: Yeah, I mean, compositionality, which I'm very excited to hear terms like compositionality being discussed as of late, I mean, that's music to my ears. But there were huge results that were mathematically done that show that compositionality actually evolved for survival reasons, even. The examples they give is usually language. Like, as you speak, you're interpreting and understanding what I'm saying, practically nearly real time. Now, let me, let me, let me put you in, let me make you appreciate the complexity here. You're taking a sequence of sounds, in this case, because I'm speaking, it could be written, I'm taking the sequence of words, and you're almost real time building a mental picture of what the, I'm saying, the thought that I'm trying to convey, right? If we didn't have compositionality, you know, it would, every, so if, in the absence of compositionality, you know what you have to do, every time you have to go and grab a sequence, a sub sequence, and make a meaning for it, then make a meaning for the whole thing. If you didn't have rules for sub parts already built in, and you just say, oh, they just said a phrase, they just said a phrase, I know how to build a meaning for that one, and then I do three, four operations in the tree, and I'm done. If every time you have to try all the parts that should fit together, you wouldn't understand me real time, we couldn't have communicated. And here's the genius of Richard Montague, who mathematically showed, I mean, people don't appreciate the work that these guys did, you know, it's all, it's all a large language model now. But look, look at what Richard Montague did. He said, I can have John likes to play guitar. I can have the boy next door likes to play guitar. I can have my uncle's nephew who lives in Australia who likes to play guitar. You get my point. Montague said, how could all these things, John, he, the guy next door, how could they all have the same semantic type in the end, because they fit in the same slot. The genius of Montague was to devise an algebra that no matter what you have here, it will reduce when you do all the typing to E, an entity. That's compositional semantics. [00:24:02] Speaker 3: There's been a lot of people who've been sort of saying there's a limitation to deep learning, let's say, or machine learning more generally, because it's obvious that those things basically do curve fitting. What's our definition of reasoning? What is the process by which we elaborate models? And is there a qualitative difference between models that merely perform curve fitting, as we normally know it, and a model that has a, let's say, to adopt a terminology that others have proposed, that models that establish sort of a causal model of the data you're observing, which can be the basis for reasoning and things like that, right? And the answer to this is probably no, that there is a difference, of course, but is it an essential qualitative difference? I'm not entirely sure. And then there is the argument, if there is a qualitative difference, which I'm not sure about, would this qualitative difference be in the form of fundamentally different things from deep learning, you know, things that are, you know, like discrete symbolic reasoning or things of that type. And to that, my answer is [00:25:19] Speaker 2: clearly no, I do not believe that's the case. Okay, we were going to publish the show today. And look what Juergen Schmidhuber has just dropped on his blog. He said that Jan LeCun's 2022 paper on autonomous machine intelligence rehashes, but does not cite essential work of his lab from 1990 to 2015. Now Juergen Schmidhuber has established a bit of a reputation for constantly saying that all of the current ideas that are being published in deep learning today were already done previously in his lab. And he feels resentful that he hasn't had proper attribution. He says he's not without a conflict here. And it might seem self-interested, you know, correcting the record like this. But the truth of the matter is that he says, yes, it is self-interested. Much of the closely related work pointed to below was done in his lab. And he wishes to be acknowledged and recognized. He's basically resentful that he didn't get the Turing Award, along with the other Godfathers. To be honest, I personally do think of Schmidhuber as one of the Godfathers. So he quotes Lacoon, "Many ideas described in this paper, almost all of them have been formulated by many authors in various contexts, in various forms." And then Schmidhuber says, yes, in fact, unfortunately, much of the paper reads like déjà vu of the papers from his lab, going all the way back to 1990 without any citations. So I'm scrolling down here, mentions of controllers, world models, planning and rollout. Indeed, this was covered in Schmidhuber's papers. He's famously argued that GANs, for example, adversarial learning, is a specialization of one of his earlier models, which he released from his lab. Lacoon's idea about learning to act by observation. So Lacoon has this hierarchy of data streams with increasing amounts of agency. And being a radical empiricist, Lacoon thinks that we learn everything we know about the world, largely by observation, and mostly by not interacting with the world around us. Schmidhuber says that the recurrent predictive world model, which may be good at predicting some things but uncertain about others. So this is this thing at the top here. So Schmidhuber thinks that he's already been there and done that. This idea of the hierarchical percepts as well, he says in 1991, the neural sequence chunker, which is to say the neural history compressor, used unsupervised learning and predictive coding in a deep hierarchy of recurrent neural networks. So he thinks he's done that as well. Most interesting for me, he commented on the symbolic component of Yann's paper. So do we need symbols for reasoning? And he said that he had previously argued of the importance of incorporating inductive biases into neural networks that enable them to efficiently learn about symbols. Now, based on the phylicion paper, I think that's an oxymoron, he said that many neural networks suffer from a binding problem, which affects their ability to dynamically and flexibly combine, which is to say bind information that is distributed throughout the neural network, as is required to effectively form, represent and relate symbol-like entities. He said he released a 2020 position paper, which offers a conceptual framework for addressing this problem and provides an in-depth analysis of the challenges and requirements and corresponding inductive biases required for symbolic manipulation to emerge naturally in neural networks. I must admit, I've not read that paper. I'm interested to check it out now. So I'm just reading the abstract of that paper now on the binding problem in artificial neural networks. It was primary author Klaus Greff from the Google Brain team, also with Schmidhuber in 2020. So it said that contemporary neural networks fall short of human level generalization, which allows them to extend far beyond their experiences. And they put it down to this binding problem, which they say affects the capacity to acquire a compositional understanding of the world in terms of symbol-like entities, like objects. By the way, this is exactly what we're talking about in this show, which they say is crucial for generalizing in predictable and systematic ways. To address this issue, they propose a unifying framework that resolves around forming meaningful entities from unstructured sensory inputs, maintaining the separation of information at representational level. So, yeah, I guess we'll get back to you on that, whether it's any good. Lacoon said that the centerpiece of the paper is the Joint Embedding Prediction Architecture, JEPA, and the main advantage of JEPA is that it performs predictions in representation space, assuring the need to predict every detail of why. He says in 1997, a quarter of a century ago, he built a general adversarial reinforcement learning machine that could ignore many or all of these details and ask arbitrary abstract questions with computable answers in representation space. And he also noted that his even earlier, less general approach to artificial curiosity, since 1991, naturally direct the world model towards representing predictable details in the environment. Schmidhuber said, given his comments above, he doesn't see any significant novelty there. He's not claiming that everything is solved, but he said in the last 32 years, we've already made substantial progress along the lines proposed by Lacoon. In his paper, Lacoon said below is an attempt to connect the present proposal with relevant prior work. And Schmidhuber said he cited a few somewhat related things while ignoring most of the directly relevant original work as mentioned above, possibly encouraged by an award that he and his colleagues shared for inventions of other researchers whom they did not cite. He said that the point is that these ideas are not as new as they may be understood by reading Lacoon's paper. There's a lot of prior work that is directly along the lines proposed in his lab. We've not had a great experience with him. We tried to invite him on the podcast and he did seem initially interested. And then he suddenly demanded a fee for coming on. And then I did actually offer him $5,000 just for 60 minutes of his time. And that wasn't enough, apparently. So the main thing I don't like about this is, is the tone of it. Frankly, I think it's really easy just to go back in time and say, Oh, I invented something which is conceptually similar to this because many things are conceptually similar. We've just made a whole show on the infinitude of abstraction space. I will hand it to Schmidt-Huber that they are very similar, but the architecture and approach that Lacoon is presenting here is different. It's using modern methods. It's not using RNNs, for example. Yes, it's using the same abstract ideas. In fact, what I like about Lacoon is that he presents all of his work in an abstract way. The pictorial formalism of energy-based models is very abstract. That's what makes it understandable. The way he describes contrastive and self-supervised learning is very abstract. He's talking about predicting unobserved information from observed information, and he uses a language which is very accessible to lots of people. So I can understand why other researchers would look at it and say, Oh, that's basically the same as what I've done. But that's because he's talking in the abstract. If you look at his physical models, they are different in my opinion. Lacoon has been a huge advocate of passive or so-called self-supervised learning. [00:32:45] Speaker 3: Supervised learning sucks. I mean, it's very limited in the sense that you can train machines to do very specific tasks. And because they're trying to be very specific tasks, they're going to use all the biases that are in the data to do that task. And if you try to get outside of that task, they're not going to perform very well. That's a limitation of supervised learning. It has absolutely nothing to do with deep learning. [00:33:09] Speaker 6: And this has been a complaint of mine for a while about, um, you know, let's say the dominant paradigms of neural networks is from my perspective as a Bayesian. Okay. They've always been essentially maximum likelihood estimators. It's like, okay, I've trained my neural network to take in a whole bunch of inputs and to give me the one true value, you know, which is really just the maximum kind of likelihood computation of, of my inputs. And, you know, as a Bayesian, it's like, I mean, guys, that's just an estimate, you know, like the real truth is it's a distribution. Like there are many possible, you know, output values that this input should have given me. It's sort of maybe it's 75% this value and a little bit less percent that, or it's a continuum, you know, if it's a, if it's a density function or whatever. So of course it is. And like, this is the nature of reality is that there's uncertainty and, and, you know, when you're trying to build models, if you want them to be generalizable, like they have to, they have to under, they have to encode in some way, this type of this uncertainty, right? Otherwise you're always just using the MLE or, you know, whatever, some other kind of statistical projection of, of that, of that distribution, you're throwing out information, you're losing a lot of information. So I totally agree that, um, that, uh, neural networks or that, you know, the paradigm needs to evolve to take much more account of, of uncertainty. I think where I kind of object to it is why aren't we just using the word probability? Okay. Like, because if we don't embrace, so let's suppose you agree, okay, we need to start allowing for multiple possibilities. Great. Now you need a mathematics to deal with multiple possibilities. We already have a mathematics to deal with multiple possibilities. It's called probability theory or better yet, conditional probability theory. We've got that mathematics. It's, we have hundreds of years of, you know, development and theory behind that. Let's, let's use it because if you don't use it, if you don't just admit that what you're doing is probability theory, then you're rolling your own probability theory. And just like the fuzzy logic people that kind of did the same kind of thing, you're going to wind up with all kinds of inconsistencies and problems or whatever, because for better or worse, we only have one mathematically rigorous and consistent theory of uncertainty. And it is probability theory. That's just what it is. It's conditional probability theory. [00:35:41] Speaker 2: Yeah. But devil's advocate. I mean, Lacoon has been challenged on this before. I mean, this is with his energy-based models where the, the punchline is that rather than have the normalized probability, which means it sums up to one, uh, you store these, these exponential energies. And Lacoon says, well, I don't care about calibration, which means I don't ever want to compare my models with other, with other models. And also I only ever want to make decisions. And he also says that if you, if you ever have a normalized probability distribution in high dimensions, the model is probably wrong anyway. And he gives the example of some density that has a manifold of zero width, where if you sample that density, you'll never, ever get any samples on the manifold anyway. So he said, what you want to do is regularize it as he does with his energy-based models. And that's what you end up with anyway. [00:36:31] Speaker 6: Yeah. I mean, there's a lot of things to say about that. I mean, and as you can imagine, like having the fact that probability theory is centuries old, like, you know, these, these questions have been addressed already, like in the, in the, like, let's say Bayesian literature or probability theory literature. So all this stuff has already been talked about and addressed for a long time. So first of all, if, if a probability is, is meaningless in high dimensions, so is its logarithm, which is the, the energy function, right? I mean, you don't, you don't gain anything by ignoring the problem. Like you can ignore it, but then you wind up with, with inconsistencies as far as, is not needing to normalize the distribution. Well, if you want to, if you ever want to add two probabilities together, you better normalize your distributions. And in fact, he does do that. Like it was like, yeah, I need to do this kind of trajectory sampling. I need to do some normalization here with like Gibbs sampling or whatever to, to actually do the normalization. So I can add up different trajectories and, you know, calculate some means and things like that. And as far as not doing model comparison, well, so here's, here's something that's important to understand is that from a Bayesian perspective, any of this regularization stuff that machine learning people try to do all the time, that's model comparison. You know, if you're, what you're trying to do is say, I'm comparing a model A and a model B where model A has hopefully fewer parameters or it's simpler in some, you know, some way. Okay. Which one should I prefer in light of the evidence? Like how much can I regular, regularize it and still fit the data well? That's model comparison. At least that's what a Bayesian would call, you know, model comparison. And we have a mathematics for how to do that. Like you have to conduct these integrals and I get it. That's hard. Okay. Like totally understand. Believe me, I've been there, done that, tried that. It's really hard. Okay. You wind up with these multidimensional integrals. They're really hard, but just ignoring it and then forgetting about it and going back to maximum likelihood estimation, like, you know, you're not going to make any advances there. Where I see cool advances made is when people embrace it. They say, okay, I got to do this integral here. Can't, it's really intractable, but here's some approximations to it. Okay. But I, but I know what I'm trying to do. I'm trying to approximate this specific integral and I can come up with really nice approximations under these set of criteria. It gives you a whole theoretical foundation, right? To advance the approximations rather than just giving up and going over into energy land and doing arbitrary hacks and approximations for which you have no theory really. Like that's why we end up with all these things in machine learning, like this kind of batch norm, that batch norm. Why does batch norm even work? People arguing about it. Should we do it? Should we not do it? They have no theoretical foundation. It's just hacking. Right? [00:39:24] Speaker 2: Yeah. Well, it's quite ironic because even Jan LeCun himself, you know, he wants to have end-to-end gradient based systems, but he even admits that on his joint embedding prediction architecture, you know, there's a, there's a hierarchical one for stacking the concepts. And there's also what he calls a mode to Daniel Kahneman style one, which is where you have this discrete action space and you learn abstractions of, of actions over time. And of course he needs to add these probabilities together to, to generate trajectories through this action space. So, um, he, what does he do? He uses the, um, the, the Gibbs, uh, sampling conversion. And it's the same thing all along, you know, because we've got this, this discrete action space, he can't use a gradient based method to, to do the, um, the traversal search. So he has to use Monte Carlo tree search. So, uh, and, and obviously he's also an empiricist and a blank slate guy. And, um, he's, he's creating this very, very complex and, and highly specific cognitive architecture, which reminded me of the systems in the 1980s. So, you know, it's, uh, not quite as [00:40:30] Speaker 6: puritanical as, as, as you would think. Yeah, no. Well, one thing I really respect about LeCun is that he's a pragmatist. Right. And so at the end of the day, like he really cares about things that work and, you know, and I have to think a large part of what he's doing here is, um, trying to get the, the, let's say the Orthodox machine learning community to move in a certain direction, right. To improve these systems. Um, but, but he's got to do so kind of gently has to guide them, you know, maybe one step at a time and avoiding trigger words. Like, you know, let's not say probability because like maybe that triggers somebody or, or, uh, no Bayesian because that'll trigger somebody. I don't know. I mean, it's, so it's going to, they're moving in the right, right direction. Let me just say that. It's just that, uh, it's maybe things are going to be a bit slower than they need to be because we're not embracing a lot of the theoretical results that that already exist. And instead of tackling some of the really hard problems, we're sort of like trying to avoid them temporarily by, you know, but it's okay. Like, I mean, eventually the truth will out, right. Like people will, will get this. Um, yeah. So what, what was that fractions analogy, uh, for energy-based models? Well, I was just, I was just thinking, you know, this idea that, Hey, we'll work on the probabilities. We can just ignore the normalization, right. And just work with the energies is almost like somebody coming along and saying, you know, whenever I'm doing arithmetic, like adding fractions, it's such a pain because I have to have the same denominator every time I add the fractions together, you know, and when the denominators get to be these really big integers, I have to do this massive multi-digit multiplication, you know, forget that. Like, I'm just going to add the numerators and just ignore the denominators, right? It's like, okay, you know, you can do that, but you're doing a completely different function. You know, it's like the, the median of two fractions is you add the numerators together and you add the denominators together. Right. And I think it has a name like, uh, you know, freshman edition or something. Cause it's a common mistake to do. Right. I mean, that's, that's what it seems like to me. It's like, let's ignore all the hard work and just, just do kind of a hack and it just doesn't work. Like it introduces problems. What do you think Lacoon would say though, to that? Well, I think he, he says, I mean, he, you know, in the paper itself, like there are many scenarios where you can't ignore the normalization and like, he kind of admits that and, and uses it in certain places, but, but that's just Bayesian, Bayesian probability at that point. It's like, Bayesians don't have any problem, you know, understanding that there is a normalization constant. Sometimes you can work without it. Sometimes you can't. And so they'll defer normalization until necessary. But the problem is there's so many scenarios where normalization is required. So many useful scenarios. I want to comment on, on, on something interesting you brought up there, which is, um, you know, this, like the, the desire to have differentiable, um, learning methods. I totally get that desire. I wish, I wish we had differentiable methods to find like program search in general. Okay. Like where I, where I run into a problem with it is that, um, he talks a lot about world models, right? And he says that, that, that, that the big problem that's going to be facing AI in the future is how do we build these world models? Like, how do we, how do we represent them? How do we build them? How do we learn them? Well, here's something to think about here, which occurred to me when I was reading that paper is, Hey, a lot of the world that we're operating in right now that we want to model and understand for better or worse consists of symbolic computational systems. Like pretty much every single program in existence today is a symbolic piece of code running in a, you know, von Neumann machine, some kind of like, you know, finite kind of Turing machine kind of thing. It's running these symbolic systems. Like they're all over the place. Okay. And whether we like it or not, people's cognition at a high level is, is pretty much symbolic. It may be implemented at a, at the lowest level by by sub symbolic nano things or whatever, but, but we operate in this kind of symbolic thing. We're surrounded by symbolic machines. Okay. And programs and software. Well, your world model is going to need to be able to model those things. And I'm really skeptical that, I mean, if, if we're trying to model symbolic systems, are we really hopeful that we can model them well enough with non symbolic differentiable systems? You know, I'm, I'm pretty skeptical. I think we have to just grab the bull by the horns, as they would say, just embrace the fact that we've got to figure out how to do search over these discrete spaces. I don't know how to, I don't know how to do it. Nobody knows how to do it, but we got to figure that out. You know, whether it's some evolutionary algorithms or whatever the case is, if we can crack the nut of learning how to do searches over the space of all possible programs outside of the space that's differential, differentially accessible, that's, you know, we're going to make like huge project there with progress there, right? Whether it's dream coder or something else, you know, I don't know, neat, you know, neuro evolutionary topologies, whatever it is, we need to put much more effort into learning how to search that space. That's not differentially accessible. [00:46:04] Speaker 2: I agree. And I think it's easy just to say that, like there aren't other challenges as well. I mean, Lacoon had said in the paper as well, that just something as simple as being able to take a goal and break it down into intermediate sub goals at different levels of description. We just take stuff like that for granted. And it's as simple as, you know, like the problem in cognitive science of these categories, just being able to draw circles around things at different levels of description and traverse between them. We take that for granted. So Professor Yann LeCun recently released an article called What AI Can Tell Us About Intelligence. Can deep learning systems learn to manipulate symbols? The answers might change our understanding of how intelligence works and what makes humans unique. Now, this is pretty much in direct response to some of the hype in the AI community. He does have a huge stab actually, not only at Gary Marcus, which is about symbols, but there are many other kind of fronts being fought in the world of AI at the moment. But yeah, I mean, now's the time that there's so much bullshit in the deep learning and AGI scene at the moment. We almost need to create a bullshit bingo card. What do we got on that card? Well, some people think scaling is all you need. Well, Yann LeCun agrees that that's bullshit. Some people think that reward is enough. Well, Yann LeCun also thinks that that's bullshit. Some people think that AI systems today are slightly conscious. Yann LeCun thinks that that's bullshit. Some people think that AI systems understand us. Yann LeCun thinks that's bullshit as well. Some people think that deep learning can do symbolics. Well, Yann LeCun thinks that it can. Gary Marcus thinks that it can't. Some people think that data is all you need. Some people think that emergence is all you need. Anyway, I mean, I was really impressed when I read this article from Yann because in a way I was really happy that he was calling bullshit on so many of these AI hypesters, you know, stuff that I perceive to be bullshit. But I think it's very unfair towards Gary Marcus because unlike open AI that's been spouting all of this utter nonsense, Gary Marcus just has a different perspective. There are so many different perspectives in artificial general intelligence. And Gary Marcus has the same perspective as Noam Chomsky, which is that psychology has a lot to say about artificial general intelligence. And yes, maybe he has a philosophical agenda. Maybe he even has a monetary agenda because if the focus changed to his view of artificial general intelligence, he could create a startup company, he can become as successful as Yann LeCun has. But I think it's very unfair to criticize Gary Marcus in this way because Gary Marcus still has his credibility intact. Now onto chapter two. This is the emergent abilities of large language models. So there's an interesting paper just out called Emergent Abilities of Large Language Models by Jason Wei et al. Now they say scaling up language models has been shown to predict predictively improve performance and sample efficiency on a wide range of downstream tasks. This paper, they say, discusses the unpredictable phenomena which they refer to as the emergent ability of large language models. And they consider an ability to be emergent if it's not present in smaller models, but is present in larger models. Thus emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. Now, I don't really like their definition. I think a better definition would be a transient change in phenomena, right? So it's not the relationship between small models and large models. Because if there was a kind of continuous improvement in perplexity, no one would call that emergent. So yeah, I think my definition of emergence, and we've been thinking a lot about emergence because we just did a show on it, is an unexpected and transient change in macroscopic phenomena. That's the best definition that I can come up with personally. Now, in this paper, they seem to have the, in my view, incorrect impression that emergence is the same thing as extrapolation. Similar to that grokking paper a while ago, all of the long-termists quote this all the time. They say, oh, there's this sudden snap point where the perplexity goes down, the validation accuracy goes up, and suddenly the model can extrapolate. And I don't think that's really true. It doesn't pass the sanity test to me. I mean, it's just a quirk of optimisation that the model suddenly fits the data set better. That's not the same thing as extrapolation, right? Extrapolation means that I can, I can fit functions outside of the training range. But the problem is, it's quite easy to measure with things like arithmetic, but it's quite difficult to measure with some of the, you know, language tasks, like for example, on Big Bench, because many of these things don't exist in the training data, and it's very difficult to measure them. So the whole concept of extrapolation becomes very vague with some of these language models. And assuming it's not extrapolation, why is it so surprising? And why is it so interesting that we have a sudden jump in perplexity? I just don't get it. It's, I don't think it's that big of a deal, honestly. Definitely interesting. Definitely requires further examination. But I think this is the main argument which is used by these folks that talk about scaling laws, these emergentists who think that if only we could train for longer, or if only we could train on more data or on bigger models, then suddenly we'll get this snap, and we'll get this recursive self-improving artificial general intelligence. That just doesn't make any sense to me. I genuinely don't believe that. Chapter three on empiricism. Chomsky argued that the way we actually acquire the faculty of language, and therefore its relationship to experience, and indeed the physical world, are radically different from the empiricist tradition. Chomsky became well known for his famous 1950s critique of behavioral psychology. Now behavioral psychologists tended to believe that humans were an unadulterated blank slate made of putty, which was then molded and shaped by our environment, you know, through a process of empirical stimulus and response. They believed that simple reinforcement learning was how we modeled the world around us. Indeed, how we learn language. Now, Chomsky argued that this could not possibly explain how virtually all human beings, regardless of their intelligence, do something as miraculous as human beings. Chomsky thought it would be the same as human beings as human beings as humans. And they also do this at such an extraordinarily young age, and in such an extraordinarily short space of time. He argued that for this to happen at all, we must be genetically pre-programmed to do it, and therefore, all human languages must have in common a basic structure which corresponds to this pre-programming. As Brian McGee pointed out in his 1970s BBC interview, of Chomsky, this also has some very negative implications, chief of which is that anything which can't be accommodated to the structure, any piece of the cosmic jigsaw puzzle which can't connect with other pieces, are linguistically inexpressible and unintelligible to us. So the general principles, common to all languages, set vital limits to all languages, set vital limits to our capacity to understand the world and communicate with each other. But Chomsky would argue that this is the very basis for our creative capacity to understand an infinite space of abstractions, and also understand anything which has been expressed by another human being. [00:53:37] Speaker 7: Centrally, your whole approach represents a rejection of the empirical tradition in philosophy, doesn't it? Because, I mean, the very fact that you think that the empiricists are wrong about how we learn must mean that they're wrong about knowledge and the nature of knowledge, and the nature of knowledge has been the central problem in the whole empirical tradition of philosophy. Well, the classical empiricist tradition, [00:54:05] Speaker 8: which I think was the tradition that's represented, let's say, perhaps in its highest form by Hume, seems to me to be a tradition of extreme importance. When we investigate it, I think we discover that it's just completely false. That is, that the mechanisms that he discussed are not the mechanisms by which the mind reaches states of knowledge. That the states of knowledge attained are radically different than the kinds that he discussed. For example, for Hume, the mind was, in his image, a kind of a theater in which ideas paraded across the stage. And it therefore followed, necessarily, that we could introspect completely into the contents of our mind. If an idea is not on the stage, it's not in the mind. And the ideas may be connected and associated. Well, that's a theory. And in fact, it's a theory that has had an enormous grip on the imagination throughout most of the history of Western thought. For example, that same image dominates the rationalist tradition as well, where it was assumed that one could exhaust the contents of [00:55:10] Speaker 2: the mind by careful attention. Chomsky was inspired by continental rationalism, which refers to a set of views more or less shared by a number of philosophers who are active on the European continent during the latter two thirds of the 17th century and the beginning of the 18th century. Rationalism basically defines your view of knowledge. As a rationalist, you consider the primacy of reason and intuition over sensation and perceptual experience. You would tend to think that most ideas, rules, and knowledge are innate. And just like our friend, Dr. Waleed Sabah, you would eschew any type of uncertainty quantification. You would prefer to deal with absolute black and white certainty. Anything else, as far as you are concerned, wouldn't be knowledge. So on the Discord community, I made the statement that knowledge cannot be entirely derivable from perceptual information and they were having none of it. Actually, most abstract mathematical knowledge is far vaster than human experience. So there's this kind of dualism. It's what I call the mind experience dualism. So now, given the apparatus we have, we can kind of interconvert between perceptual experience and mathematically abstract experience. If you take all of our perceptual [00:56:37] Speaker 4: cognitive powers, I think if you want to draw a Venn diagram, it's a dot in a huge, the abstract word is much, much, much. It's a vet. All experiential knowledge is almost a tiny dot in that ocean. And you know what they're doing now in AI? They're concentrating on the damn dot. No, no. Learning from instances is brutal, guys. So when you want to defend it, you have to tell me how... Hold on. You have to tell me, look, learning from data, either you adopt it and defend it or you don't. There's no middle, right? If you want to defend learning from observations, a template like this, you have to convince me how a child learns 200 million sentences from few examples. If you don't have an explanation for that, the rest is hand-waving. [00:57:35] Speaker 2: Waleed Sabah thinks that empiricism is a huge house of cards. As soon as you allow any symbolic manipulation, the whole thing just falls down flat on its face. So according to Waleed Sabah, if you have to be an empiricist, you have to be an empiricist all the way. [00:57:53] Speaker 4: And let me make an example out of this. You take a guy that lives in Bangladesh. You take a guy that lives in Amsterdam Hall. Perception-wise, observation-wise, empirical data-wise, they're apart so much that they live practically on different planets. But what they know is the same. Knowledge that is not empirically obtained is the same. And that's all they need to survive. Their observation, their four-year-old knows that if I say I have a Greek statue in every room in my house, they know I'm not talking about a Greek statue. What I mean is in every room in my house, I have a Greek statue. And because they know, this is knowledge, not from observation, that a physical object could not be in more than one location. But that requires symbolic manipulation. [00:58:50] Speaker 2: Dibble's advocate though, let's imagine I'm playing a computer game and you can have the same statue in two rooms at the same time. So let's say it's a blue statue and I'm observing lots of episodes of this game. And I very quickly learn that you can have this statue simultaneously in two different rooms, but not both in the same room. I can then create a cognitive program and I've just empirically learned that program. [00:59:13] Speaker 4: No, I challenge you to even think it. Look, Tim, even in the world of Star Trek where the, you know, transportation can be, I mean, we can just decompose all our molecules and transport them and reassemble them and all that stuff. Even in that world of sci-fi, there are things that the mind cannot even accept. Right. These are, this is how the universe works. So I challenge you to imagine the two blue statues in two different rooms as one. You can't even think it. [00:59:52] Speaker 6: I think some of this comes down to these, you know, matters, matters of definition, right? And black and white lines. So, I mean, traditionally a dimension on which empiricists and rationalists differ is the degree to which they admit the existence of innate knowledge or innate, innate concepts. And so, obviously on, on one extreme is, is zero. And then that would be like an ultra empiricist who says, there's no such thing as any kind of innate, you know, knowledge. Like literally, uh, let's say, if we're talking, the unit of analysis is the human, an individual human mind that they start as some type of literal zero blank slate and then through just empirical observation alone, um, learn, right? Um, and then on, on kind of a, the other extreme, you know, I don't think rationalists have ever denied that there, that some knowledge is empirical. Like I don't, I don't know that they've ever denied that, but perhaps the extreme version there is that they, they say there's a kind of superiority to deduced rationally derived knowledge versus empirical knowledge. So like in a way, you know, like the, the, uh, platonic ideals are in a sense more reliable and, and more superior and, and, and, you know, more ultimate or pure than, than any type of empirical knowledge. I mean, I think any of us probably should know that the answer is somewhere in between here and where the dividing line is between, okay, if you go beyond this, you know, you become a, a, a rationalist instead of an empiricist. I'm not sure. Like, I don't know where they, they draw the kind of battle lines today. But I think if we, if we take, say, LeCun as an empiricist, he, he admits that there is prior knowledge. You know, you need these inductive priors to be quite useful. After all, one of his huge achievements was CNN, i.e. structurally encoding a certain kind of, of prior, this translation and variants, um, into neural networks. But he thinks it should be as close to zero as possible, that it should be so minimal that it's like just enough to kind of, uh, jumpstart or bootstrap a learning system. And then from then on, it can kind of just learn everything on its own through observation of, of data. And I think people on the other side of the camp, and I include my, myself there pretty much, and I think Wally too, is that we believe that the, there's just not enough data, not enough data, not enough computational resources, not enough time, uh, for that to make sense. That sure, there are a lot of concepts, a lot of knowledge that you can learn by, by observation and then, and then, you know, reasoning kind of on top of that. Um, and by the way, empiricists now no longer deny that, that reasoning happens, right? But they view reasoning as just generating connections between facts that were learned empirically, not as bringing facts to the table themselves, but really only as mechanisms of connecting and deriving from, from facts. Um, I just think that the, that it's not realistic, it's not pragmatic, you know, in the same way that, sure, a neural network of infinite size, like Axie, Axie size, you know, the Axie folks can do everything. Well, that's not what I'm interested in. I'm, I'm interested in pragmatic systems. And I think that there's types of knowledge that are derived from, um, um, rational sources, you know, these deductive logical kinds of sources that, uh, you can't hope to get to pragmatically from empiricism alone. And what fascinates me is, okay, what's the mechanism then by which that knowledge entered into, into our brains? Like that to me is, is the fascinating question. Yeah. I mean, I want to get to that in a second, [01:04:15] Speaker 2: because I know you've got some very interesting ideas about, um, Friston's free energy principle, actually, and it relating to our existence, but I don't think Lacoon or even Jeff Hawkins, you remember Jeff Hawkins. I mean, he, um, cited Vernon Mountcastle, you know, that famous neuroscientist from the 1980s. Uh, this idea that the, the, the neocortex has all of these, um, little units that are exactly the same. They're just wired differently, but that's still a prior. So he said that we had, um, you know, some of our neurons were fed from our sensory motor circuits, and then there were other neurons that were kind of like concept neurons. But if you think about it, the way the brain is structured is still a prior. So any conception that we have, you know, you were just saying before that empiricists think that there's a stage and all of the ideas we have are derived from things which are on the stage. And if you look at the brain, it's a stage, right? All of that sensory motor signaling it's a stage. So I don't think they would really deny that the structure of that stage defines a lot of our conception. [01:05:21] Speaker 6: Yeah. I, I can't put words in their mouth, but I don't think they deny that. I would, I would hope to think that, that they don't, I mean, because, because it's obvious that there, that there is structure there. And I think, and as I said, I think Lacoon has agreed that, you know, sure, there is some initial structure that you, that you need in order to get, you know, bootstrapped, if you will. I think, you know, maybe if, if we talk about in terms of say something simpler, which is just logic gates, like if you just imagine that your brain was consisted only of NAND, you know, one of the logic gates that can create any, any logical operation. So whatever, you just have trillions of, you know, NAND gates. And if you connected them up in a potentially fully connected way, so every single NAND gate connected to every other possible NAND gate. And then, and then the goal of learning is to turn off and on some of those connections such that you wind up with something useful. I think the question is just how much do you have to start with, like, you know, maybe if you just start with a random knockout. So I just randomly assign a bunch of, a bunch of connections, you know, can I learn, you know, can I then learn from that through some type of back propagation or some other, you know, weird, cause now we're in like a digital space, you know, some type of EA algorithm, you know, can it, can it learn things or do you need to have like some additional kind of structure on there? Because if you look at the human brain, it's way, way, way far away from that extreme of a fully connected, you know, NAND circuitry. Like it's very sparse. It has, you know, structure in these kind of cortical columns, like Hawkins talks about, you know, in a very certain kind of vertical and also horizontally parallelized way. You know, there's tons and tons of structure there. Like it's in the kind of distance of all possible networks. It's a very long way from a fully connected tabula rasa with just random connections, right? So I think that, I think that they have to admit that there's, there's structure there. I think it's just that they want it to be as lightweight and, and as, you know, the constraints to be as loose as possible, because then that allows as much flexibility to that, that learning algorithm to, to learn things that are most optimized and most appropriate for any particular task. [01:07:51] Speaker 2: Yeah. Yeah. I agree with that. And when Waleed was talking about cognitive templates earlier, his, his main rationale is he doesn't want to be surprised by reality. So he says that, you know, what we call someone who, um, makes poor predictions about the world. We call them crazy. That's what Waleed would say, but I don't entirely agree with that because I know that I could, I put this to him. I could have a computer simulation and in this simulation, the physics are different. The reality is different. So I could quickly come up with a cognitive template in my mind that in this computer world, you can have two blue objects simultaneously existing in two different rooms. And I would very quickly learn that cognitive template and I could reason over it. So I have just empirically come up with a new cognitive template. Right. I mean, I agree because you don't even have [01:08:41] Speaker 6: to imagine simulations for, for very strange. And I think this is kind of what Chomsky gets at when he says that we have this innate radical empiricism, right? Is we have this, this false belief that concepts that we learn at the, at the scale of pool balls and, you know, apples falling from trees and, and things like that. Just, we feel strongly that they must also apply to electrons and protons and atoms and, you know, whatever else. And they just don't. I mean, that's like, even take a simple concept of, you know, oh, two objects cannot exist at the same place at the same time. Well, that's true if they're fermions, you know, if, if they're bosons, like a photon, for example, you can have as many photons as you want in the same place at the same, at the same time. Right. So, I mean, physics alone and just like the world in general, or take a, you know, mixing colors. If you mix pigments together, you know, red and green pigment, for example, is going to, going to wind up with kind of like a brownish, you know, muck or whatever, but red and green light produce yellow. So, I mean, you get all kinds of almost antithetical or opposite, you know, behaviors of things that happen all throughout the natural world. And I don't see people as having any trouble really learning like a new set of rules, if you will. I mean, I play a lot of video games or used to when I, back before we started doing the show and I had time to play video games, but, you know, for example, you run into a video game where like it has a time travel element. It doesn't take long to learn how to work in this area where you can move in a dimension that corresponds to, to time, or you learn kind of different sets of world rules. Like people somehow don't really have trouble necessarily, but there are areas where we're really driven to be these radical empiricists and certain concepts that we [01:10:40] Speaker 2: hold is almost inviolable. You know, like, yeah, I mean, but, but I mean, this, this is what I wanted to get to as well, that, I mean, in Waleed's refutation of empiricism, he says that anything that can be experienced or observed will be known. But we spoke with Kenneth Stanley and he says, well, you know, we experienced consciousness and we don't know it. Uh, if we have a hallucinogenic experience or a dream, do we know it? Well, actually, actually we, um, we reason to the best explanation. So we try and hang it on a structure, which is already in our brain. So the really interesting bit is whether we can take percepts and whether we can build an abstract structure. And that does largely depend on the structure we already have. But this to me makes me think, well, whether it's innate already is irrelevant because we know that we can create new structure in response either to thinking and reasoning or in response to [01:11:34] Speaker 8: new perceptual information. Of modifying these capacities. What we might do, however, is gains. I mean, at least it's in theory imaginable that we might discover something about the limits of our science forming abilities. We might discover, for example, that some kinds of questions simply fall beyond the area where we are capable of constructing explanatory theories. And I think we even maybe now have some glimmerings of insight into where this delineation might be between intelligible theories that fall within our comprehension and, uh, areas where no such theory is possible. [01:12:12] Speaker 6: Yeah. So I, I agree, but I do, I do think there are limits and, and I think this is, this is one of these real mysteries. So in some areas people are very flexible and others, you know, we're not. And I think what was interesting talking to, to Chomsky was, you know, when we asked, um, is it possible that there are really just these, these limits to human cognition, you know, this horizon beyond which we may never go. And he brought up the example of, uh, rats and prime number mazes, you know, that if you have a prime number maze, so this is a maze where at every prime intersection, you take a right, for example. And if you do that, then you get to the cheese or you escape the maze or whatever the goal is that no amount of training, no amount of time can you ever, ever train a rat to complete a, uh, a prime number maze because their brain, their cognitive structure there just doesn't have the concept of prime numbers. It's just totally lacking. It's like if it was just excised from their capability to understand the concept of prime numbers. And he said, it would be a miracle if we human beings don't have similar limitations. So maybe there are concepts there that are necessary to understand all of physics, to understand the physical universe that just, they just don't exist in any human mind. And it may even be the case that we can't even formulate them in any externalized intelligence form. We may just have this almost this blind spot to this particular concept. And I think that's fascinating. You know, I think there's, in my mind, there's really two possibilities here. One is Chomsky's correct that it would be a miracle. And in fact, there are blind spots that human cognition doesn't have. On the other hand, I think it's also possible that there may be, um, a level of, of cognition at which, like, let's say it's higher order logic. Like once you get to the ability to understand higher order logic, it may be, it may be possible that all facts of the universe can be, can be described and somehow, and, and you know, higher order logic. I don't know the answer to it, but I think it's a fascinating question. Well, let me push back on, on that a little bit, [01:14:28] Speaker 2: because yeah, the fascinating thing is when you reach potentially, when you reach a threshold of intelligence, everything might be accessible because there is this infinity of abstractions out there in abstraction space, most of which aren't particularly useful and most of which will always be unexplored. But with the rat example, yeah, they can't reach the abstraction of a prime number and apparently they were not trainable. But if you think about it, shortcuts exist because abstraction space is a topological space and you could create a breadcrumb trail. And in theory, you could train the rats to learn a sequence of mechanical steps, just like a neural network does to effectively, um, perform the maze of a prime number maze without understanding. I wonder whether that's possible. [01:15:14] Speaker 6: Yeah. I mean, I think it's, I think it's always possible that in any finite scenario, so a maze that, that has at most, you know, in, you know, intersections, I think it's always possible in any finite scenario. And we've talked about this, some kind of in the context of neural networks too. Sure. You can, you can memorize like, so, I mean, I don't know. I haven't personally looked at the rat, you know, maze literature, but, but I imagine you could probably train a rat to like maybe always for a particular maze to take like the second and the third and the fifth turn or something. But I think probably the point is that, that there's a, a very small number of turns at which it's limited. Like it can't go beyond that. And it can't generalize to kind of, I say arbitrary in, you know, I mean, obviously, uh, it takes a certain amount of time to escape and it'll die before it can go through one million turns or whatever that number is. But, but this is the distinction that we always try to make about algorithms and concepts that generalize such that they can just be re they can, they can operate on any almost arbitrary, you know, time or number of operations, as long as you can extend like the memory space. So the, the code, the algorithm is completely separate from, from really the, the memory space, if you will. And so like, if I was in a prime number maze and you just stuck me in the middle and said, Hey, uh, you're welcome to prime number maze. See you when you get out. Well, I could get out, you know, for a really large number of, of turns, right? Because I can, I can count, you know, three, five, it might take me a little while, but eventually I can get out there. And, and, you know, if you have a machine that has an expandable memory, it could get, it could continue that for an arbitrary number of, of cycles and escape. It [01:17:06] Speaker 2: wouldn't need to memorize it. Right. That's right. But, but then it, then it comes down to the machine of our brain or a rat's brain. And I'm interested in the dichotomy between being able to mechanistically perform what is an abstraction. And as you say, if the machine could take enough steps in this abstraction space, then they could perform the prime number calculation without understanding it. And this brings me onto Godel because, um, Waleed brought this up as well. So Godel's proof is all about abstraction. It's the ability to prove a theorem finitely. Right. So a proof is an abstraction. And, you know, like Waleed talks about this quantification logic, right? That's the ability to use a finite object to address an infinite set. So, um, this, this Godel's proof is actually, I mean, Waleed said, it's about a space of things that cannot be known. [01:18:01] Speaker 6: Yeah. I mean, it, it's always, it's always treacherous territory when you, when you get into Godel's, you know, proof and, or, you know, whatnot, because, uh, um, everybody thinks everybody thinks everybody else doesn't understand it. And, you know, certain philosophers think that they have the one true kind of understanding. So I don't like, I don't want to present what I'm trying to say here as, you know, the truth. Like you, you know, you guys have to kind of decide for yourself. Right. But, but the idea is that in any, in any particular finite formal system, so we kind of set down, here's the set of rules that we follow. And there's like a finite list of those set of rules. And then we start with a set of, you always got to start with somewhere. We've been talking about this whole conversation. You start with a set of, you know, axioms or whatever. So it's, and it's a closed system, you know, new axioms don't enter the picture there later on. Um, there will always be statements that you can write down in that, in that formal system that are true. They're true statements, but cannot be proved, you know, within that, within that formal system. So the idea is that you just, you know, you can't prove everything either, either you're incomplete, which, which is what this is, meaning there are statements that are true, which I can't prove. So I'm not complete. My formal system isn't, isn't complete enough to prove every, every true statement. Okay. Or you're inconsistent. So you can, you can have a, a set of rules in which it's possible, for example, to prove that a statement is both true and false. Um, but then of course you can prove that any true statement is true. So it's, you got to kind of pick your poison here. You're either incomplete or inconsistent, but it's important to keep in mind that, that this applies to closed formal systems and technically and mathematically, you know, human beings and human cognition are not closed. Like this is in fact, what science does, right? It poses a question. Okay. I don't know if this, if the statement is true or false, I don't know. And I can't prove it, but I can go do an experiment. And then the universe tells me like, whether or not it's true or false, right? That's the fascinating thing about human beings embedded in this universe is that we have this process called science, which is an open-ended, it's an open system, you know, not, I want to say open-ended because that gets us in a totally different line of question, but it's an open system that can interact with the universe and new stuff comes in, right? So we don't really, we're not strictly bound by Goethe's theorem. Although any particular finite system that we write down is, but we're always expanding that by conducting science and doing experiments. [01:20:56] Speaker 4: That huge apparatus of reasoning has nothing to do with precept. End of story. We don't sense transitivity. We don't sense, I mean, that, that is a ridiculous paradigm. And it has been proven, by the way, Skinner has been skinned long time ago, behaviorism, at least by reputable people. I mean, it's a joke to still think of behaviorism. All I need to know I learned in kindergarten. I, I don't need to know how to ride a bicycle. That's where learning happened. I learned, you learn the skill. You don't learn knowledge. You acquire knowledge. You just go and grab it, right? I learned how to play guitars, but the universe doesn't give a damn if I play guitars or not. Because the full term for both is learning, right? But really we don't learn that stuff. We acquire it. It's knowledge acquisition. I go and I acquire, I grab it, I steal, right? Learning is different. Regarding empiricism, there's [01:22:01] Speaker 6: this issue of what's your unit of analysis. So are you talking about an individual human mind? Because for an individual, individual human mind, it's abundantly, immediately, experimentally obvious that we're not empiricists. Like we have, we have encoded innate knowledge for an individual human being. But I think like Friston would expand the level analysis to, to the species and life in general, which is that life, this process of life has been evolving, you know, and developing this set of dynamics that survived. And so whether or not you call that empiricism, you know, it's kind of a definitional matter. What I'm, what I'm more concerned about is what's the mechanism by which, by which that knowledge becomes encoded in, in the human circuitry or the circuitry of life. And, and where I, what I think we're missing or not talking about enough is that yes, observation and, you know, interaction with the environment do a little bit better than appear or whatnot definitely plays a role, but just, just the mere fact of survival seems to play a role in imparting that. Um, you know, and so it'll, it'll be interesting to see how over time we learn more and more about how to apply survival, which we do in say EA algorithms, right? Like in EA algorithms, you have a population, some may survive to reproduce, some don't versus, versus, you know, observation and reasoning and that type of dynamic, you know, learning that you do. Um, and then the other, the other point that you brought up there is once you have that, like once you have this knowledge, however it gets there. Okay. So it's from your genetic endowment, it's from the laws of nature or whatnot. The human cognitive system has learned to take that, those seeds, those seeds of knowledge and generate with them all kinds of mathematics, like all kinds of abstract ideas that have no correspondence to anything that we know of in reality. And that's pretty, it's almost like the bootstrap, right? Like, like, like it's, it's kind of weird that there is this platonic world of ideas that are all, you know, can be consistent and interconnected and you can say all kinds of useful and crazy. Well, I say useful. You can say all kinds of, you know, consistent and interesting things about them. And yet they have no, there's nothing in the physical world that corresponds to them, but they're, but they're no less, you know, um, they're no less, um, uh, sensible. Like they're no less, they're no less mathematical. They're no less, you know, they, they just don't have a correspondence to reality. And likewise, by the way, on the other side of the coin, there's clearly a lot about the universe that's happening that we have no sufficient mathematics to describe. So there's a vast chunk, maybe, maybe, you know, maybe the piece that's, that's even mappable to the physical world is measure zero, like in the vast chunk of mathematics. So there's this vast chunk that doesn't map to anything in reality. And there are things happening in the physical universe for which we have no, no mapping to mathematics for. So there are, they're almost these, these two separate worlds that slightly overlap here in this little sliver of, um, of, of math that we would call physics, right? [01:25:28] Speaker 8: Chapter four, cognitive templates. In fact, while it's true that our genetic program rigidly constrains us, I think the more important point is that the existence of that rich, of that rigid constraint is what provides the basis for our freedom and creativity. And, uh, the reason what you mean, it's only because we're pre-programmed that we can do all the things we can do. Exactly. The point is that if we really were plastic organisms without an extensive pre-programming, then the state that our mind achieves would in fact be a reflection of the environment, which means it would be extraordinarily impoverished. Fortunately for us, we're rigidly pre-programmed with extremely rich systems that are part of our biological endowment. Correspondingly, a small amount of real rather degenerate experience, uh, allows a kind of a great leap into a rich cognitive system, essentially uniform in a community and in fact, roughly uniform for the speech. [01:26:33] Speaker 7: Which would have developed over countless evolutionary ages through the logical evolutionary process. [01:26:38] Speaker 8: The basic system itself developed over long periods of evolutionary development. We don't know how, really, uh, but for the individual it's present. As a result, the individual is capable of, with a very small amount of evidence, of constructing an extremely rich system, which allows him to, uh, act in the free and creative fashion, which in fact is normal for humans. We can say anything that we want over an infinite range. Uh, other people will understand us, though they've heard nothing like that before. Uh, we're able to do that precisely because of that rigid programming. [01:27:15] Speaker 7: But short of that we would not be able to at all. What account are you able to give of creativity? If we are pre-programmed in the way you say, then how is creativity a possibility for us? [01:27:25] Speaker 8: Well, here I think one has to be fairly careful. Uh, I think we can say a good deal about, about the nature of the system that is acquired, the state of knowledge that is attained. We can say a fair amount about the biological, the basis, the basis in the initial state of the mind for the acquisition of [01:27:45] Speaker 4: this system. We are only respecting the universe we live in. It's no more than, uh, how the planets, uh, orbit each other. And they, they just, I mean, they're obeying the laws of physics. We're, we're obeying the mental laws of the metaphysics. They call them metaphysics. Actually, the ontology is all about metaphysics. What, how the word functions, that has not been obtained by going out to the park and dying three times until I discover that thing. Like that's stupid. We come equipped with that. Even animals have that, by the way, a calf after two minutes starts walking and eating on its own. If, if you look at newborn and like, obviously it has to do with evolution and how those that [01:28:41] Speaker 2: didn't obey the laws of nature didn't survive. Yeah. So, um, quite often people confuse or conflate empiricism with nativism. So, you know, nativists think that we have all of the cognitive apparatus already built into our mind, but actually then they're not synonymous. I think what Chomsky says is via some ethereal mechanism, we are endowed with the laws of nature and that's how all of it, you know, via osmosis gets into our brain. So you've got a really interesting view on this, Keith. [01:29:13] Speaker 6: Well, yeah, because I've been thinking about this, you know, Chomsky said, um, look, the evidence is absolutely clear that, that humans do not start off as a blank slate. We have these endowments of, of, of prior knowledge. Um, one of them is obviously genetics. You know, we have a genetic endowment. We have this code in our DNA that, that results in structures that unfold and grow and sure they, they develop in response to, to the environment, but they're growing from this, this encoded template in our genes. But he said, there's another possible source of, of this knowledge outside of experience, which is the laws of nature. And, and it's kind of a mystery how, how the laws of nature enter into, enter into our, our knowledge. And, and I know Chomsky would agree, you know, there's some mechanism there. We just don't know what it is. And I've been thinking about that, you know, like what is, what is the mechanism? And, and my guess is it really comes down to something very simple, but profound at the same time, which is survival. Okay. Like if you think about, um, at the end of the day, life has evolved to create this circuitry. Okay. And it doesn't matter whatever the circuitry is made out of, you know, neurons, it could be, uh, also in lower, you know, monacellular organisms. It could be in, in the, in the dynamic pathways that they're, um, you know, the concentrations of, of electrolytes or whatever's going on inside of them. Right. But there's some circuitry. Okay. One thing that enters into that circuitry is, does it survive to see another day? And so long before there were any, any organisms that did anything that we would call observe, you know, that sit there and observe nature and think about it. Okay. They were developing this circuitry because as they randomly explored the space of possible circuits, some circuits processed information in such a way that gave them an advantage in the real environment, i.e. under the laws of nature and they survived. And ones that didn't have processing that corresponded to the laws of nature simply were destroyed. Right. And so the mere fact of survival of existence or continuing existence, I think encodes knowledge into, into life. So it's almost a form of ontic knowledge. You know, it's knowledge that's there because of your factual existence and it had to correspond to some degree to nature and the laws of nature, or you would have perished. Right. And I think there's this interesting connection to Friston's free energy principle, because it's all about, hey, look, if we take the assumption that things exist, that, that definable things, objects, definable things exist. And for them to continue to exist, if they continue to exist, what must they do? Right. That's the question he asked is what must a definable entity, something that has this boundary, you know, what he frames as a Markov boundary, what must it dynamically continue to do in order to exist to continue existing. And, and you get this free energy principle, which is that, well, to, you know, whether you want to call it thinking reasoning, it doesn't matter. It has a dynamics, it has a dynamic behavior that mathematically corresponds to something like Bayesian inference, right? Like it has to, it has to model predictions of the future and its interaction with the environment in order for it to continue existing, or it would have been destroyed. And so for me, this is a fascinating idea that, that you have this endowment, one comes from genetics, one comes from the evolution of life in general, which is of course encoded genetically in us, but it comes from the evolution of life in general, the fact that it survived. It's an existential form of knowledge. [01:33:25] Speaker 2: But the interesting thing, I mean, first of all, Friston is an empiricist and it's quite easy to look over the history of our evolution to explain how do these cognitive functions get implanted into our brain. And I think that that's a, that's, that's a great explanation because then using those cognitive functions, we can start to, um, extrapolate into this much larger abstractive, um, space. But what you're saying is interesting because there is a kind of, um, epistemic resonance between the cognitive templates we have and the reality we live in for this precise reason. But I still think though, that it's possible for us to learn new cognitive templates because what Waleed says, you'll never be surprised by reality for that reason, because the cognitive templates that we've been endowed by nature are the templates that describe the universe. [01:34:15] Speaker 6: Yeah, I agree. And, and, and this gets, that's what we're saying. It seems almost immediately obvious that we have, we have the ability to, to learn, you know, to learn or to apply, definitely to learn new cognitive templates. I mean, we definitely do it. You know, humans are entering new, new realms in which we didn't evolve all the time. And it's certainly happening in the virtual space and, you know, and we're certainly able to do that. The question is, are there limits to that? And yeah, I mean, there's obviously some very trivial, you know, kinds of limits to it, but, but I think there may be these deeper limits too. Um, and it's kind of this, this question Chomsky raised of, you know, are we, are we rats in a prime number maze? You know, are we missing certain concepts about the universe that we just may never be able to get? I think it's an open question. Um, there's certainly, as he puts it, there's certainly this innate drive to be radically empiricist to try and take our, human level understandings of, you know, balls and apples and, and, um, sticks and things like that, and project it down to, to, uh, the level of quantum mechanics or far out into the stars, you know, at the, at the like galactic super cluster level. Um, so we definitely have some failings there. Um, and it'll be interesting, you know, it's, it's going to be a question that we'll be asking probably forever, uh, is what are the limits of human cognition? And I don't know. [01:35:43] Speaker 2: Chapter five, the ghost in the machine. Some of you may have noticed that when we first released this video, we got blocked by the BBC on copyright. We had included a small clip from Richard Feynman from a BBC horizon interview in 1980. Um, back in those days, the BBC actually made quality content remarkably. So, uh, we also included that clip from Brian McGee, uh, which was in the 1970s. Luckily, they don't have a copyright block on that. Yeah. Anyway, let's not go there. I'm absolutely seething about it. I believe this information belongs in the public domain. I'm so annoyed with the BBC for blocking that. Anyway, this is what Richard Feynman said. I'm just going to quote it. I have a friend who's an artist and he sometimes has taken a view, which I don't agree with very well. He'll hold up a flower and he'll say, look how beautiful it is. And I'll agree. And he says, you see, I, as an artist can see how beautiful this is, but you as a scientist, you take this all apart and it just becomes this dull thing. And I think he's kind of nutty. I mean, beauty that he sees is available to other people and to me too, I believe. And although I might not be as refined aesthetically as he is that I can still appreciate the beauty of a flower, at the same time, I see much more about the flower than he sees. I could imagine the cells in there, the complicated actions, which would also have a beauty. I mean, it's not just the beauty of the dimension of one centimeter. There's also the beauty at a smaller dimension, the inner structure, also the processes, the fact that the colors and the flower evolved in order to attract insects to pollinate. It is interesting. It means that the insects can see the color. It adds the question. Is this aesthetic sense? Does it also exist in a lower form? Does it? Why is it aesthetic? All kinds of interesting questions, which the science only adds to the excitement, a mystery in awe of a flower. It only adds. I don't understand how it subtracts "The ghost in the machine" is British philosopher Gilbert Ryle's derogatory description for René Descartes' mind-body dualism. Descartes, as a man of scientific genius, could not but endorse the claims of mechanics. Yet, as a religious and moral man, he could not accept, like Hobbes did, the discouraging rider to these claims, namely that human nature differs only in the degree of complexity from clockwork. Descartes and subsequent philosophers naturally, but erroneously, believed that they availed themselves of the following escape route. Since mental words are not to be construed as signifying the occurrence of mechanical processes, since the mechanical laws explain movements in space, other laws must explain some of the non-spatial workings of the mind, which is to say, [01:38:51] Speaker 1: "The ghost in the machine." I'll talk some about Isaac Newton and his contributions to a study of mind, that he's not known for that. But I think a case can be made that he did make substantial, indirect, but nevertheless substantial contributions. I'd like to explain why. There is a familiar view that the early scientific revolution, beginning through the 17th century, provided humans with limitless explanatory power. Newton's greatest achievement was that while he seemed to draw the veil from some of the mysteries of nature, he showed at the same time the imperfections of the mechanical philosophy and thereby restored nature's ultimate secrets to that obscurity in which they ever did and ever will remain. The mechanical philosophy, of course, was the guiding doctrine of the scientific revolution. It held that the world is a machine, a grander version of the kind of automata that stimulated the imagination of thinkers of the time, much in the way programmed computers do today. They were thinking of the remarkable clocks, the artifacts constructed by skilled artisans. And there is a further task, that's to determine the scope and limits of human understanding. Incidentally, some differently structured organism, some Martians say, might regard human mysteries as simple problems, and might wonder that we can't find the answers or even ask the right questions. Just as we wonder about the inability of rats to run prime number mazes. It's not because of limits of memory or other superficial constraints, but because of the very design of our cognitive nature and their cognitive nature. So actually, if you think it through, I think it's quite clear that Newton's remarkable achievements led to a significant lowering of the expectations of science, a severe restriction on the role of intelligibility. They furthermore demonstrated that it's an error to to ridicule what's called the ghost and the machine. That's what I and others were taught at your age in the best graduate schools, Harvard in my case, but that's just a mistake. Newton did not exercise the ghost. Rather, he exercised the machine. He left the ghost completely intact. And by so doing, he inadvertently set the study of mind on quite a new course. [01:41:48] Speaker 2: This is Eric Curiel from Harvard University. [01:41:51] Speaker 9: Eric Curiel: Well, the world is a complex place and our mathematical models of its parts are almost childishly, recklessly simple. How can a relation of representation hold between them? This issue of complexity in the world is a very serious problem for the standard view of representation. There's a second problem is what I call levels of abstraction. In any given theory or framework within which formulates theories, there are many given levels of abstraction at which one can write down the mathematical formula that one in standard parlance uses to represent physical theories. In Newtonian mechanics, we have F equals MA. That's about as general and abstract as one can possibly get. Does that represent in the same way as the expression for the Newtonian force law? Eric Curiel: F equals G, M, M, R, R head over R squared. Can F equals MA represent at all? Is there anything in the world that is a pure acceleration, even in a world in which Newtonian mechanics would be true, putting aside the fact that it's not in fact true? Can the ladder, the force, the Newtonian force law, the gravitational force law, can that represent in the same way as the equation modeling two perfect homogeneous spheres as a Keplerian binary system without a specified target? As an undergraduate, when you're, when you're learning, you're joining gravitational theory and you write down this Keplerian binary system and solve for it, does it represent something? I have no idea. Can that perfectly idealized Keplerian binary system represent in the same way as a set of equations modeling the earth and sun as a concrete individual, gravitationally coupled system with lunar and Jovian perturbations accounted for? Do different levels of abstraction represent in the same way? How does one decide when the mathematics is concrete enough to represent? I don't know. And nothing in the standard views gives me any clue as to how to answer that question. [01:43:45] Speaker 6: Really the, the key point that Chomsky has made a few times, and he made it with us when we were talking to him about what Newton did, you know, how Newton changed physics like forever, right? Is, is to show us that the universe is not intelligible to us. Like, at least in the sense that we can't take our kind of common sense, uh, you know, uh, you know, mechanical at the scale at which people operate, where things are, have this kind of mechanical properties, right? Machines, gears, things touch each other. In order to, to have a force, they have to have contact, right? Between objects and things like that. That there are scales or regimes of physics where it's just, it's just, it's not the way it is. Like our intuition doesn't apply to that. And that was a problem with the action at a distance, like, you know, gravity, right? You know what I mean? That it's exerting this force that behaves as if it was pointing at the instantaneous location of that object and acting over, you know, vast distances instantaneously, right? And it doesn't help that sure, um, years later, okay, we, we now have, you know, a better theory like GR, right? But it introduces all kinds of things that are not intelligible to, uh, to humans like curved space time. I mean, what's, how's that correspond to anything we perceive in reality? Like at our, at our level of cognition, it doesn't, right? And so it's just not intelligible. It's almost, you have the math and it's almost like the, the kind of school of quantum mechanics. And so it's just shut up and calculate, forget about trying to make it intelligible, just, just use the tools and the math and. [01:45:29] Speaker 2: Yeah, exactly. Well, that, that's why. So, um, you know, when Chomsky says that Newton exercised the machine, but left the ghost intact, he was saying that we no longer seek mechanical explanations. The machine was exercised from our theory of science, but the ghost is still there, right? So after Newton, the problem was largely forgotten. [01:45:51] Speaker 6: Yeah. And I think he might've made some points about why it was forgotten. I mean, I don't think it's fair to say it was forgotten because he points out that Newton struggled with this action at a distance, you know, for the whole, for his whole rest of his life, right? Like, like really just not willing to accept it, that, that, uh, that there was a deep philosophical problem there. And you look at things like Russell, right? And you look at things like Russell, right? With Russell's paradox where, where he just, it just crushed his vision of kind of mathematics even is resting on a solid foundation. But I think it's totally fair to say the ghost was left. Yeah. Like we exercised the ghost and left the, or exercise the machine and left the ghost there. I kind of agree with that. You know, there's so much mystery left. [01:46:39] Speaker 2: Yeah. I mean, it's interesting. I mean, um, because now we try to build models of natural phenomena that are intelligible to us, but you know, so the model is intelligible, even if the underlying phenomenon is not intelligible. And I guess Chomsky would say that all modern science is like that, basically, whether it's quantum physics or biology or, or even linguistics. [01:47:02] Speaker 6: And, and physics. So I've recently, I've been watching some interesting, um, videos and lectures of, uh, Eric Curiel from the, uh, the black hole Institute. And he has, there, there, there's a really cool lecture he gave, um, is basically the point of it was that mathematics does not represent. And he has some very solid arguments for how we shouldn't even think of our mathematics as corresponding to what's actually happening in, in physical reality, nor even that they represent what's happening faithfully, even in that abstracted form. You know, the, he has a solid argument to say, look, what it really is, is it's a bridge. Mathematics is a bridge between two almost equally mysterious things. One is the actual underlying physical reality, which is, we've been talking about, it's, it's like not intelligible. It's weird. You know, it does all kinds of things that are not describable really, or, or, or within the realm of our cognition. And it's a bridge between that and something equally mysterious, which are these abstractions and concepts that somehow exist up in our head that we can't directly look at. You know, we, we perceive them and we think about them, but it's not like we can really, you know, um, understand where they come from. Like how do we, how does abduction work, right? How do we, how do we come up with these, these concepts and these generalizations and this creative act and what are they really? And there are these things that are always almost outside the boundary of our cognition and mathematics is, is just a bridge between those two things. [01:48:46] Speaker 2: Um, yeah, well, this is, this is really fascinating because it, it links back to that rat in the maze example that, that you gave, because if we shouldn't expect the study of nature to be reducible. Right. To, to models, which are intelligible, um, it does make you wonder whether it's possible at all, right. To understand the world we live in with models. [01:49:10] Speaker 6: Why I, I think on the one hand, an answer is it clearly is it's clearly, we're clearly able to understand the world to an extremely pragmatically useful degree. I mean, because we have all this technology that we've built. We've, we've, you know, come to understand concepts at a scale so small that it's, it's hard to believe anything's happening down there. And so large that it'll forever be out of reach of anywhere that human beings can ever, can ever get to physically. Right. So in a strange way, our mind is able to span and understand this vast reach of stuff happening. And yet there's still infinitely more that's mysterious and, and perhaps, you know, forever out of reach of our cognition. And that's just, it's really fascinating to me, at least, and beautiful in a way. [01:50:09] Speaker 2: Yeah. And, and these are the things that Chomsky spoke about a lot in the interview with us. So there's this notion of closedness, which is that science thrives on reductionism. So by separating one phenomenon or one effect from the rest of the world, we gain the ability to model it, to understand it and to reinsert it into the broader picture. So, you know, things like physics experiments to theoretical computer sciences, simplifications, but, you know, this whole thing about mechanical philosophy that originated with Galileo, didn't it? You know, which is this idea that we can view the world as a machine and Galileo insisted that theories are intelligible only if we can duplicate, you know, what they do by means of artificial devices, which I think is fascinating. [01:50:52] Speaker 6: Yeah, and this kind of pokes a bunch of holes in that. And, and in a way, I'm kind of hopeful that it frees us up to be even more creative with mathematics and science. And it kind of, it kind of gets at some of the aspects that, you know, you and, and kind of Stanley talked about pretty often too, right? Which is, you know, we should be free, right? To experiment and have serendipity and creativity, utilize those aspects of, of, of human cognition when understanding even fundamental physics. And, you know, this, this analytic idea, right, of kind of splitting up things and drilling down on the one component that does something actually has a lot of negative effects to say in medicine, for example. Like, you know, we're always looking for the single molecule that will stave off disease or, or cure an illness, right? Whereas what we're trying, what we're finding out now is that sometimes you need cocktails, you need mixtures of, of multiple molecules. You can't just distill it down to the one true essence of something. You have to take a more holistic approach to medicine, to health, and maybe even to things like physics and mathematics. So I'm kind of hopeful actually, that if people accept this more, it'll, it'll actually expand what we're able to understand, um, rather than reduce it. [01:52:18] Speaker 2: Exactly. I just wanted to, to close this bit talking about Descartes as well, which is another one of Chomsky's heroes. And, um, he recognized the creative aspect of language and thought, right? Which is this ability unique to humans that can't possibly be duplicated by machines. So he said that, you know, language was, uh, innovative without bounds appropriate to the circumstances, but not caused by them and can engender thoughts in others, which they recognize that they could have expressed themselves. So, and this is actually a creative principle of the mind. He called res cogitans. I remember Chomsky, uh, rose that which stood alongside res extensa, you know, which is this Cartesian dualism, you know, the two substances. And it's this idea that, um, card, you know, uh, Descartes actually thought that there was a kind of separation between our body and the, um, the infinite set of expressions, which could be created in the mind. [01:53:14] Speaker 6: Yeah. And I, I think, um, was it a Galileo that he said considered, you know, the alphabet is the greatest, the greatest invention. Was it because it's with this finite number of symbols, you can express this infinity of, of, uh, of concepts, you know, and, and, and there's so many real mysteries here. Like where, um, the one you brought up with what Thompson called Descartes problem, you know, how is it that, that, um, how is it that we can have ideas and, and linguistic expressions that are not directly caused by, by the inputs that we're getting? It's not like a, an input stimulus response, right? It's actually something else is going on in there and we're generating a new, a new linguistic expression. That's maybe never even been uttered before. And yet it's somehow maps correctly in a sense to what's happening in the world, to the, to the situation and people are able to understand it, you know, and how is that kind of freedom of the will, if you will, even possible, because the best of our, our science, you know, points certainly at one, one interpretation of things that it's all deterministic. Um, even if it's very nonlinear and chaotic, it's still deterministic or if it's random, it's random in ways that certainly don't, you know, provide any type of this, um, freedom of will. And, you know, he had this, this hilarious kind of, uh, paraphrasing, I guess, of William James, where he said, you know, if, if you believe there's no freedom of the will, why bother presenting an argument? You know, you're being forced to do it. The person you're trying to convince can't be convinced because they don't have free will either. So why, why bother doing it at all? [01:54:59] Speaker 2: I know. And Chomsky did say actually that this very human ability we have to select an action. Um, you know, given the, given the circumstances, using free will is, is one of the biggest mysteries in, in science. But, uh, just to wrap up that, I mean, um, so Chomsky kind of took us on a, on an intellectual journey, which is to say that in the olden days, we used to think of the world as a machine. You know, it had a kind of, uh, mechanism and, um, and now we don't think of it that way anymore, but what we do do is we construct intelligible theories around the world. So, you know, we can't know what the world is, but we can build theories, um, that a machine can compute. So the world is not a machine, our mind is a machine. [01:55:42] Speaker 6: Yeah. I mean, he said, he did, he did say, he said that the goal change from, from the world being intelligible to we'll just, we'll be satisfied with theories that are, that are intelligible. Um, I'm not quite sure if, if, if he even believes that the, the theories are, are always intelligible because, because sometimes, you know, they're almost on the, on the boundary of, of not intelligible. [01:56:07] Speaker 2: Um, but I know, well, um, I mean, do you remember Jeff Hawkins said that, um, Einstein's general relativity was actually quite intelligible because it uses, um, a lot of everyday concepts that humans would understand. But the thing is, like Einstein didn't actually explain, um, anything, right? All he did was he came up with this abstraction, which was a model, which was arguably intelligible, but no one really understood it. [01:56:30] Speaker 6: Yeah. And, and, and I think this is something that, um, that I was talking about Eric Curiel and that, and that talk about mathematics doesn't represent. He brings up the fact that even if you just stick with theories, okay. Take something like general relativity, there are multiple radically different formulations of general relativity. Okay. Like they, they have very different sort of, uh, concepts, atomic concepts built into the foundation. And the, you know, I mentioned Eric Curiel earlier, because in, in that talk of his, that mathematics doesn't represent. From my perspective, at least he brings up that even if you focus on the theories, just the theories themselves, not the world, there's questions as to whether those theories are even intelligible themselves. Because if you take something like general relativity, for example, you know, there are radically different formulations of it. Like there's the metric formulation, the tetrad formulation, there's three plus one dimension formulation, there's chiral formulation, there's four dimensional formulations. And these things have radically different, you know, elements that, that make them up. So even, even in purely in the theory and in the mathematics, you can have very different structures that, that all map to the same underlying, underlying reality. So how, you know, how can they all be simultaneously intelligible when they have such radically different, you know, structures? [01:58:05] Speaker 2: This is Richard Feynman again, I'm quoting. My father had taught me, looking at a bird, he says, do you know what that bird is? It's a brown-throated thrush. But in Portuguese, it's a... Contrapeiro. In Italian, it's a... Chuta rapidita. And in Chinese, it's a... He says, now they know all the languages, you want to know what the name of the bird is. And when you finish with all that, he says, you'll know absolutely nothing. Whatever about the bird, you'll only know about humans in different places and what they call the bird. Now he says, let's look at the bird and what it is. He told me how to notice things. And one day when I was playing with what we call an express wagon, which is a little wagon, which has a railing around it for children to play with so they can pull it around. It also has a ball in it. And I remember this, it had a ball in it and I pulled the wagon and I noticed something about the way the ball moved. So I went to my father and I said, hey, pop, I noticed when I pull the wagon, the ball rolls to the back of the wagon. It rushes to the back of the wagon. And when I'm pulling along, I suddenly stopped. The ball rolls to the front of the wagon. And I said, why is that? And he says, nobody knows. He said, the general principle is that when things are moving, they try to keep moving. And when things are standing still, they tend to keep standing still unless you push on them really hard. And he says, the tendency is called inertia, but nobody knows why that's true. Now that is a deep understanding. He knew the difference between knowing the name of something and knowing something. Feynman said something can only ever be explained by taking something else for granted. And at some point you need to stop this infinite regression and just admit that you need to take something as true on faith or simply admit that you cannot know. Chapter six. This is a discussion of the Foda-Folition paper in the 1980s on connectionism, where they put this massive critique forwards, which we think hasn't been answered yet. And their main argument centered around productivity and systematicity. [02:00:22] Speaker 10: So here's a fact. Trains of thought are often like arguments. In particular, they often lead from entertaining true premises to entertaining true conclusions. This fact engenders a problem. The problem is, suppose that the mind is a mechanism. And what, after all, what else could it be since its states have causal powers? Supposing it is a mechanism, it would be nice to know how a mechanism, a piece of physical matter, could have this property that minds have. How the state transitions of a mechanism could be like arguments in the way that the state transitions of minds are like arguments. This is the problem that cognitive science has made some progress on. In my view, it's in fact the problem that defines the field. Turing's idea was that you can explain the analogy between trains of thought and argument, consonant with assuming that the mind is a mechanism, if you will also assume the following two things. First, that mental states are syntactically structured. And second, that the syntactic structure of a mental state determines its causal role in mental processes. I'll sometimes refer to this as the language of thought picture of the mind, because the most obvious example of things that have syntactic structure is sentences. In fact, it wouldn't be very misleading to say that Turing's idea was that thinking is a syntactic operation on mental sentences. And what Turing argued, I think persuasively enough so that one wants to follow out the research program that he set forward, what Turing argued was that if you assume that thinking is a syntactic operation on mental sentences, then the nature of the language between trains of thought and arguments, the truth-preserving character of trains of thought can be made compatible with a mechanistic theory of the mind. I mean, at least all the connections that I've talked about with this bite the bullet, that is, they say, no, the idealization to an infinite capacity is not allowable. Actually, there's only a finite number of thoughts you could think, even if you lived forever. And even if memory constraints and attention constraints and stuff like that are relaxed, even under those conditions, you could be in the position of having thought all the things you can do and running out of thoughts. Okay, so that's a way of biting the bullet. I think the views that the mind is, is inherently finite is a bizarre view, what I'll call a systematicity argument. Systematicity arguments are supposed to do that. They're supposed to be arguments in the Turing model that don't require idealization to infinite capacitance. I'm going to try to set out, uh, um, that class of arguments. I'll argue first that thought isn't just productive. It's also, as I say, systematic, and that the best explanation of systematicity presupposes a combinatorial syntax and semantics for mental representation, just like the best explanation of productivity does. So, um, I'll argue that it can't be the case that all mental representations are atomic, so it can't be the case that the mind is a connectionist network. That's the form of the argument. [02:03:28] Speaker 2: Fodor and Feilishun wrote a seminal critique of connectionism in the late 1980s. They released a paper called "Connectionism and Cognitive Architecture: A Critical Analysis", and in Waleed's opinion, this critique of connectionism has still been unanswered. In the paper, they deride the term "sub-symbolic". Uh, it was a term that was invented in the 1980s by connectionists to, um, describe how a representation can be sliced and diced and stored over many nodes in the neural network. But Fodor and Feilishun do not think that "sub-symbolic" confers the kind of cognitive architecture which they think is possible with a classicist architecture. There are systematic interrelations among the thoughts a thinker can entertain. For example, if you can entertain the thought that "John loves Mary", then you can also entertain the thought that "Mary loves John". So, systematicity looks like a crucial property of human thought and thus demands a principled explanation. [02:04:28] Speaker 10: What I have to do is to tell you what systematicism is and why it has this implication. And here, my strategy will be to start with natural language. That is, I'll show you, uh, what systematicity is and what the arguments for it are in the case of natural language, and then show you why having defined the notion for you, mental representation thought must be systematic too. Okay. So I'll take natural language, English as a sort of paradigm case of a system, which is systematic. [02:04:57] Speaker 2: Chomsky and Fodor also spoke a lot about productivity, which is basically this idea that you can generate an infinite number of meanings from language. [02:05:07] Speaker 10: In natural languages, and this is just the way of making the productivity point that I was making before, in natural languages, there are always an infinite number of semantically distinct syntactic forms, right? You can say one plus one is two and two plus two is four and John believes it's raining. Mary said, John believes it's raining. Bill thought Mary said, John believes it's raining. It's surprising that Bill thought that John said Mary said, okay. So that's, I mean, natural languages were always productive in that way. [02:05:32] Speaker 2: Jerry Fodor and Zenon Phylation in 1988 wrote a paper called Connectionism and Cognitive Architecture: A Critical Analysis. And the funny thing was even back in 1987, these guys, these classical AI guys, they thought that the hype of neural networks and connectionism was ridiculously getting out of control. And they wrote this paper and they wrote this paper. So, uh, Keith, can you just give us a summary of the paper? [02:06:01] Speaker 6: Oh boy, a summary of the paper. First of all, the, what I'll tell you about that paper is it's, it's just chock full of goodness. So I recommend, um, you know, anyone go and read it. It's very, uh, contains a huge wealth of, of thought and, and argumentation and examples. And, um, even though it was written, you know, back then in 87, 88, whenever it was, um, uh, you'll find that it's, it's as applicable today as, as it was, you know, back then. Um, and, and they come with, you know, really the, the crux of their argument is that, um, that there is fundamental differences between symbolic systems and connectionist kind of architectures. Right. And they try to, they try to really drill down on what those fundamental differences are and the fundamental differences that, that practically matter for like the capabilities, um, of these systems. Okay. And really it kind of, it hinges on, uh, you know, I guess three main, you know, pillars, although there's much more in the paper than just the discussion of this. But, but one is that, that, uh, symbolic systems are productive, productive systems by which they mean you can like take, like say a formula, for example, that has variables and you can substitute in, you know, structures and you can produce, you can generate, you know, more and more. Structures from this simple set of rules. So think about like a, a context-free grammar, for example, um, you know, you, you've got this kind of set of rules and from that you can generate this, this infinite set of, of, uh, sentences, if you will, that still follow a certain structure. And every single sentence that you generate from that set of rules will, uh, will be consistent with that set of rules. So it'll, it'll have a certain structure, you know, defined by that. And in fact, you can even be given a sentence and then parse it back to what was like the tree that generated it. And so they have this productive nature. Um, and this is, by the way, this is like kind of important to understand here. And we've talked about this a lot too, this is the nature of, of computation is that it's unbounded in time. So you can always sit there and kind of iterate. And like, if you think of the Turing machine, you know, it has like this tape that it can go and write down some symbols and it can go back and expand one and expand it more. And it can kind of keep going for an unbounded amount of time, you know, producing, producing a larger and larger, you know, productive result, if you will. Everybody understands that in reality, every machine that we're going to build is, is finite. Okay. We get that. Okay. But there's a big difference between architecturally, there's a big difference between finite state machines and machines that have this potentially infinite unbounded memory that they can kind of operate on. So the idea that they're saying about these systems are productive. Okay. And secondly, they're also compositional, which is almost the, the opposite of this. Okay. It's saying that, you know, you maintain the parts of a whole. So even though you've, you've created a structure, which is say the sentence, you still have kind of the words in there as these separate entities, and you can go in and pull them apart. You can analytically reach in, grab pieces of them and look at like, say a phrase that's within that sentence. So if you think about like, say, you know, Boolean satisfiability, like logical formulas, you know, here's like a three SAT formula, a bunch of terms and they're anded together and then or together. And you're asking, you know, you're kind of analyzing whether or not you can assign variables to satisfy this structure. Like this is very kind of NAND gate type, type calculation, logic, you know, suppose you have a part of a, of a program that determines for this particular task, this formula is important here. Like this formula, if, if I operate on all the input variables in this way, and I, and I get a value out of that. Okay. It's, it's useful for something like determining whether or not it's a hot dog or a human face or whatnot. And another part of the network that figures out, you know, or program, if you will, that figures out, well, there's a different, another formula that's useful for some other task. Okay. In symbolic world, you've actually got the formulas and you still have their parts and their pieces. And so you can have like a meta analysis that takes a look at those individual formulas and compares them and goes, oh, look, these two things have the following terms in common. So now maybe those terms are important for some purpose. Whereas in a connectionless network, you know, typically what happens, and there's some caveats here that we can talk about, but typically what happens is you've got a, a neuron, a node that's performing this calculation. It collapses it all into a single output. It's like I either fire or I don't fire with a certain value from that point on, there's no more parts. Okay. Like it doesn't have a part. It just has a signal. You can't go in there and figure out, okay, well, that 0.3 came from these particular terms for this input, right? There's no way to, to get that back out again. It's been convolved. It's been collapsed. It's been added up. And the problem is you can say, well, yeah, but you know, some other nodes in the neural network, for example, in a connectionless network can have those other terms. Right? But the difficulty is then you wind up with this exponential blow up because how do you know in advance, like, which subterms actually matter? Like, okay, then let's just do all of them. Let's just have a neuron for every single possible combination of the, of the, um, input and it'll fire. And then, you know, some subsequent layer can go and decide, well, I care about these, these subsets. That's where you get this exponential blow up problem from is you can't defer that. You don't have parts that you can go back and piece apart and compare and analyze later when you figure out there were something you have to have baked in ahead of time. Um, all possible kinds of combinations. Okay. [02:12:26] Speaker 2: Um, and then has this got something to do with intention versus extension? So as I understand the classical AI folks, they want to maintain the intention and they don't want to materialize the output immediately so that in the future, let's say I've done some processing, I can now go back in time and decompose and recompose the, um, uh, the computation. [02:12:52] Speaker 6: Yeah, I think so. I mean, and I'm not sure if this is naive of me or not, I don't know, but I generally think of intention, you know, with the S as, uh, formulas, you know, they're the way I think about them concretely is if it's a function from all possible worlds to worlds that, that are true or worlds that are possibly true worlds. Right. So an intention is saying, look, you have this, this infinite, you know, space of possible worlds and a, and a function on that, that gives you a subset is an intention. And you never actually need to materialize it as an actual, um, you know, as an actual extension that gives you that, that set of true worlds because you have a procedure, you have an algorithm, you have a formula, which can determine for any given, you know, possible world, does it meet that, that criteria? Like that to me is the difference between intention and extension, which is that the intention is a formula that could give you, that you could iterate or apply, you know, to arbitrarily many, you know, worlds to generate, um, to generate the extension, you know, of that intention. So the intention is like the generating function, right. [02:14:11] Speaker 2: For, for a set of extension or for an extension, but, but critically, the building blocks are there. So if you have the intention, like for example, I use the example of the discrete Fourier transform, so you can represent, represent that symbolically. And now you can change all of the components, you can change N, for example, because as soon as you materialize it, you don't understand it anymore and you can't generalize it into slightly different circumstances. [02:14:36] Speaker 6: Right. Yeah, exactly. And this is what, you know, and this is what the natural, like, let's say the classic natural language understanding folks like Wally would, would talk about all, all day long, which is that suppose you have a grammar. Okay. It's, it's pretty easy to write down a grammar. You know, I'm not saying it's easy to write down a grammar for natural language. I'm just saying it's easy to write down a grammar, like let's say a context-free grammar or something. And there's an infinite extension to that, that grammar. There's infinitely many, you know, countably infinitely many, um, uh, sentences that, that could be generated by that grammar. And it's quite easy to build a machine that you can give it any one of those, any one of those sentences. Okay. And it can turn on it for a while for some, you know, finite period of time and come back and tell you whether or not that sentence comes from this grammar, right? It can decide whether or not that's a sentence in this, in this language. Um, but it doesn't need that infinite extension existing somewhere. It doesn't have to be materialized because it just has an algorithm. It just has a simple, you know, grammar that it can use to figure that out. [02:15:49] Speaker 2: Yeah. Okay. Okay. Well, this makes a lot of sense. So based on reading the Lacoon paper and this connectionism paper so far, um, Lacoon is saying we need to have, um, a probabilistic-ish, uh, interpretation of possible futures. Mm-hmm. The connectionism paper is saying that we need to have composable, recomposable, decomposable abstractions. Right. And, uh, and this is very, it's almost analogous to what Francois Chalet talks about with his library of modules and, and, and type two traversal, et cetera. He has this distinct dichotomy between type one and type two. So yeah, it's almost as if we're saying, look, neural networks at the moment, there are two clear opportunities for improvement. And what are, what are those opportunities specifically again? So being able to represent possible futures with some uncertainty quantification. Yep. And secondarily being able to, I mean, I'm, I'm bagging it in with this discrete space, but it doesn't necessarily have to be discrete functions. But what we've just been speaking about from the Phylician paper, composability, rich abstractions, maintaining the, the structure or the intention. Right. And even later on in the computation cycle, being able to introspect about how and why I got it. [02:17:02] Speaker 6: Yeah. So the other important thing that comes up in there in the, the, the Pulition paper, photo and Pulition paper is, is one thing that's fundamentally different about symbolic systems is that the code, that the, the algorithm and the memory are separated. Okay. Okay. And this is, this is critical to understand here, which is that you have an algorithm that can run on memory. And if it ever runs out of memory, you just need to add on another memory stick. Okay. And it can continue processing with the same algorithm. So the algorithm doesn't fundamentally change if you just enlarge, enlarge the memory. And this is a big difference between connectionist systems, right? Because in a, in a neural network, if you want to increase the size of the input, for example, like double the, the vector space or something, it's all got to be retrained. Okay. Because it's all tightly kind of convolved together, the, the memory and the computation. Um, this is a huge difference between these, these, um, you know, computational paradigms. [02:18:01] Speaker 2: So I think it's important to understand that it is, but I guess if I designed the system, it reminds me of object oriented design in programming and you create these abstractions. Unfortunately, they are handcrafted, although maybe it's possible that some of them are so platonic that they would apply in, in many systems. If you could only recompose them, but the, that ability to separate the storage from the, from the computation seems to depend on that abstraction in how I design the program. And Lacoon would say, well, if you have to handcraft the abstractions, then learning's gone out the window. [02:18:42] Speaker 6: I agree with you. I agree with both of you. This is an extremely important problem we need to solve. So once you, once you, it almost, once you go to the route of saying, I'm going to take my algorithm and abstract it from the memory, that's when you run into all these training problems, right? Because now you're trying to train systems like differentiable neural computers and, and whatever. There are things that have essentially a Turing machine consists of two parts, really a finite state machine and an expandable memory. Okay, that, and then it gets iterated and it can sit there and keep operating on it. As soon as you do that separation and you say, okay, well, I'm going to have that finite state machine be a neural network that I can train differentially. The problem is now that it's abstracted from, or separated, if you will, from its memory, and you try to do things like have a tension and, you know, be able to operate on that memory, you run into all these kinds of training problems. So it's almost like, as soon as you, if you've got the memory and the computation all, you know, glued together in a single hole that's finite, you can differentially train it. As soon as you separate it out and make the memory expandable, now you start running into all these, these training problems, but we need to do that. Like we need to figure out how to train algorithms that are abstracted or not abstracted, but separated from their memory and allow them to learn these abstractions that they can then use to operate on the memory. Like somehow we have to figure out how to do that. [02:20:08] Speaker 2: Chapter seven, we're going to discuss how we rescued the broken recording of Chomsky. So interviewing Professor Chomsky was a dream come true for us. I just never thought something like this would actually happen. And the worst happened, the unimaginable happened, the recording messed up. Have a listen to this. This was the before clip. [02:20:27] Speaker 1: There's a curious distinction, which is empirically known for many years, between sentences. One interpreter each seems to have been assigned to the diplomats. And this was the after clip. There's a curious distinction, which has been empirically known for many years, between sentences. Like one interpreter each seems to have been assigned to the diplomats. [02:20:56] Speaker 2: Now, it would be remiss of us not to use our combined expertise in computer science and machine learning and whatnot, and to throw technology at the problem, essentially. So the irony wasn't lost on us that Chomsky believes that deep learning isn't particularly valuable. I mean, I'm being a bit unfair there. He did say that it was particularly valuable for things like speech transcription to help him. He's hearing impaired, for goodness sake. But we don't take L's here on MLST. Taking L's, by the way, is a British colloquialism. It means that we do not accept losing so dramatically. That's why we took it upon ourselves to come up with a solution. You know, we were not going to let the rather minor matter of a corrupted recording get in the way of us realizing our dreams. You know, this was the moment of our lives. And we were not going to let it slip away. Now, we recorded the podcast with Riverside.fm, which is the podcasting platform which is supposedly supposed to prevent recording problems from wrecking your show. Ironically, in our case, it did the precise opposite. I feel at this point, like, I could work as the engineering VP at Riverside or something like that because I've got so much experience recovering every single possible failure mode on their platform. I think I'd be quite a useful spare pair of hands to be around there. Now, Riverside, of course, blamed it on Chomsky's hardware, but, I mean, Chomsky does so many podcast appearances. I mean, he probably does more podcast appearances than Gary Marcus writes blogs, Trash and Connectionism. When I listened back to the recording after the show, because as we were recording the show, we could hear that it didn't sound right. And we were all saying to ourselves on the side chat, my God, please, please, God, please make the recording sound okay. And much to our horror, it was completely broken. It was so bad. I could just feel the entire life force of my being just draining away in that moment. I could not believe it. If our faces looked really concerned during the interview, it was because we were so petrified about what was going to happen when we played back the recording. So how did we fix this thing? Well, it's quite a long story, to be honest. But even though the recording sounded terrible, the interesting thing was that when we ran it for a transcription service, the results were still reasonably good. So we had the word boundaries and what we wanted to do was synthesize Chomsky saying the same thing. So we painstakingly went through the script. I mean, Keith must have spent about 17 hours word by word filling in the missing gaps, inferring what words that Chomsky actually said. So we created a transcript, we then created a voice clone model. So we started off with a Tachytron 2 voice clone, which allowed us to get past the voice authentication system on overdub. And then we recorded an overdub voice using mostly recent clips of Chomsky, but also some stuff from about five years ago. So it's probably the voice of a slightly younger Chomsky that you're listening to. We synthesized absolutely everything. But then we've got the lip sync problem. The first thing we tried doing was using LipGAN and that didn't work because Chomsky has a beard and it didn't recognize his face. So we de-bearded Chomsky, we made Chomsky narrate the new script. And while it was a success, I kind of felt it was almost taking the piss a bit too much. It looked like a cartoon character and we want to respect Chomsky, of course, as much as possible. [02:24:09] Speaker 1: The transcription, for example, which I'm very happy about because I like to use it. I like bulldozers too. It's a lot easier than cleaning the snow by hand, but it's not a contribution to science. [02:24:22] Speaker 2: So we decided to write a time warping algorithm to align frame by frame the original recording to the synthesized version. And to do that, we just transcribed the synthesized and the original version. We used dynamic time warping. So it's very similar to the Needleman-Wunsch algorithm in bioinformatics. So you just kind of build an alignment matrix and then you compute every single cell as a function of the neighboring cells. And then you trace up behind the matrix from the bottom right to the top left. Leberstein distance uses the same algorithm. So you can kind of keep track of matches, insertions and deletions. So, yeah, we created the best cost, you know, the minimum cost alignment between the two tracks. And then between all of the aligned words, we just did a linear frame interpolation. So when he was saying hello in one script and hello in another script, we just did a linear frame interpolation. And as you can imagine, there's all sorts of things that can go wrong when you do that kind of coding because there's numerical precision problems. Because we were dealing with hundreds of thousands of frames. So it was drifting over time. Maybe in the future, we'll make another video about how we did that. But needless to say, it was in linear time complexity just for all of you recruiting managers at Meta. I'm sure you wanted to know that. Yeah, so we did a pretty good job, but we wanted to stress a few things. So first of all, we will make the original recording available to peruse because I want everyone to be completely clear that these are Chomsky's words. You can tell by the lip sync that it is indeed Chomsky's words. The amazing thing is that with the lip sync, you can actually see Chomsky's physical expression as he was saying the word. So even though technically we deep faked Chomsky, it's amazing how when we synchronized his expression to the generated words, it kind of just seems so real. It was like some invisible boundary of reality has been transgressed again. And it really was Chomsky saying all of that stuff. So there are a couple of occasions where we've actually inserted in a little bit of the original Chomsky, even though it was corrupted, just to capture him chuckling or, you know, saying some words. I mean, I think there is a point where he said language models have achieved zero, zero. They've done nothing. And obviously we just wanted to capture the original sentiment of Chomsky when he was saying those things. In certain parts, the generated voice is roughly twice the speed of Chomsky. And we felt that was fair, to be honest, because even if we had recorded Chomsky, I might have sped it up a little bit. It's not so much that Chomsky talks slowly, it's that he has gaps in his speech. I've noticed this before when I've played clips of Chomsky, I've kind of typed up the gaps in his speech. So if anything, it'll make it easier to listen to because there's less gaps in his speech. So I wanted to touch on ethics because deep fakes is a huge topic at the moment. I feel that this is a legitimate use of deep fakes. We've basically used engineering technology, which is something that Chomsky talks about in the podcast, to recover a broken interview. And we got Chomsky's full permission. So this is the email that we got back from Chomsky. By the way, this email means so much to me personally. I'm going to frame this email. I should put this email in my CV. Just doing all of this work, having this story to tell and getting this response from Chomsky, to me, is something that I could tell my grandchildren about. It really is that special to me. So as I said before, you can listen to the original version just to kind of satisfy yourself that we didn't put any words in his mouth. Although Chomsky has checked the synthesized version and given us his blessing. And I'll also make it clear as well that we will be deleting the voice clone. So we're not going to use it again. We're not going to give anyone else access to it. It was just a temporary expedient and we will now delete that. You could have said to us, well, why didn't we just record it again with Chomsky? And the reason is just our mental states at the time of that interview. The interview meant so much to us. You can just see it in our face. It was like meeting your childhood hero. And I don't think it would have been the same again if we kind of did it again. I felt that Chomsky's reactions were very novel. I felt that he said a lot of things in this interview with us that he hasn't said anywhere else. And he really trusted us. And that means a lot to us. He entered into a confidence with us at the end of the interview. Saying that, you know, it was a really productive conversation. And some of the conversations he's been having recently are quite, almost tiring for him. You know, because he's constantly having to push shit uphill basically. And we're his friends. You know, like we really, we're really genuinely interested in what he had to say. And in a way we were kind of, I think we were really seeing Chomsky at his best. And that is very important. And that's why I think that we wouldn't have been able to replicate that if we did it again. But anyway, if you do want us to go into more technical detail about how we recovered to that recording, or just in general really. I mean, we are, we're quite an intellectual podcast. By the way, we've been accused of being armchair philosophers. There's lots of gatekeeping apparently going on on the ML Reddit. But yeah, there's quite a funny story as well that we had bated breath, right? So we had spent about a month fixing this recording. It was quite stressful for us because we've got a huge backlog. You know, it took quite a while to get the code running. We weren't working on the intro. Just everything was just getting out of control. Keith and I had a massive argument about it. And all of this time I think we were stressed that it would be for nothing because we had asked Chomsky for his permission to publish it. And then he'd say no anyway and it'd be like all of that time was wasted. So, you know, it was a really tricky situation for us because we felt cornered. We felt like there was no other option. This was the only thing that we could have done. So, yeah, it was quite an interesting story to tell at the end of it. Chapter 8, Language. [02:29:57] Speaker 8: People like, for example, Wilhelm von Humboldt and Rousseau, they both grasp the idea that languages are basically infinite, that they're expressions of human creativity. In fact, that's a leading Cartesian idea. At some core level, part of human nature, which is reflected on the cognitive side in things like language, is the capacity to produce and understand and articulate and express new thoughts without limit and without control. So the crucial fact about language use is that it's not determined by our situation. It's coming out of us as freely willed action in some sense and continually novel and so on. And to express thoughts and ideas that are new to oneself and other people, but that are intelligible and appropriate and so on. This is a core aspect of human nature. [02:30:49] Speaker 2: Chomsky is a big believer in autonomy, free will, creativity, and novelty. Shouldn't be entirely surprising, given that he isn't a narco-syndicalist. It's really important for him that we are individual actors that have free will, that is not determined by the situation we're in. Which is why he quite often says that language use is appropriate to the situation, but not caused by the situation. But the really interesting thing that he says about language is that it's an expression of human creativity. Language is an infinite space of possible expression, which is what makes it so remarkable. What is language, and is there even such a thing as a pure language? [02:31:30] Speaker 8: English is relatively homogeneous. You can go a long way in the United States. You know, I mean, I just came from Boston, and I understand everybody in Portland, and Seattle, and so on. But that's not true. Most of the world, you can get very different languages pretty close by. And much of the world is what we would call multilingual. But what does it mean for the language to be pure? Or when people say they want English to be pure, what are they talking about? Was Shakespeare pure? I mean, first of all, there is no such thing as a language. There are just lots of different ways of speaking that different people have, which are more or less similar to one another. Some of them may have prestige associated with them. For example, some of them may be the speech of a conquering group, or a wealthy group, or a priestly caste, or one thing or another. And we may decide, okay, those are the good ones, and some other one is the bad one. But if social and political relations reversed, we'd make the opposite conclusions. [02:32:25] Speaker 2: Chomsky is often referred to as the father of modern linguistics. So for us, having this rare opportunity to discuss minds and machines, linguistics and cognition with Professor Noam Chomsky is literally like having the chance to discuss syllogisms with Socrates, or having the chance to discuss the mind-body dualism with René Descartes. It really is that fascinating, and it really did happen. Professor Chomsky was so humble as to give us some of his precious time to discuss many contemporary issues, especially as they relate to many hot topics in artificial intelligence. Chomsky's goal as a linguist is to find principles that are common to all languages, which allow people to creatively speak freely and understand each other. Chomsky's work is so much more than linguistics. He actually thinks that linguistics should be a branch of psychology, and that so much about our language actually determines how we behave as human beings. I think finding the principles common to all languages and understanding what enables us to speak freely, and importantly, creatively, is Noam Chomsky's number one goal in life as a linguist. Chomsky said that when we study human language, we're approaching what some might call the essence of humanity, the human essence, the distinctive qualities of mind that are, so far as we know, unique to humankind. Hey folks, I really hope you enjoy the show with Noam Chomsky today. I mean, as you can tell from the introduction, this has been an emotional rollercoaster for us just over the last couple of months or so. We've done so much stuff to recover the recording, to build the intro. It's a slightly new domain for us as well, getting into cognitive psychology. But anyway, please hit the like and subscribe button, drop us a comment, let us know what you think. We've got six incredible shows coming up. We're building introductions for them as we speak. You know, the likes of Joshua Bach and David Haar and many others. I don't want to spoil the surprise. So, yeah, hit the subscribe button and I really hope you enjoy the show today. Cheers. Chapter nine, the chapter that you've been waiting for. This is our discussion with Chomsky. Enjoy. Professor Chomsky is an American linguist, professor, cognitive scientist, social critic and political activist, sometimes called the father of modern linguistics and is the most cited living academic. He's a laureate professor of linguistics at the University of Arizona and an institute professor emeritus at MIT. Some of the big names which Professor Chomsky has influenced include Stephen Pinker, Jerry Foda, George Lakoff and Barbara Partey. Professor Chomsky, it's an absolute honour to welcome you to MLST. This is a dream come true for us. I've still got 10 of your books on my bookshelf and I can't even believe we have this honour of speaking with you today. Very pleased to be with you. Large language models such as GPT-3 are receiving huge investment and are being hyped beyond belief. This is happening despite very strong theoretical arguments for the futility of learning language from data alone. The combinatorial complexity of language is on a scale which would eclipse any earthly data set. There's also this problem of the so-called missing text. That is to say, human cognition extrapolates from common knowledge in order to understand text. We can ascertain background knowledge which is never actually communicated in the text. We believe that research into large language models is what Francois Cholet recently called make-believe AI and is thus the road to nowhere. Gary Marcus even calls it a parlor trick. Assuming that you do believe that large language models are not the solution for natural language understanding, which paradigm do you think is the most promising? [02:36:31] Speaker 1: Well, first we should ask the question whether large language models have achieved anything, anything in this domain. Answer, no, they've achieved zero. So to talk about the failures, that's beside the point. Let me give you an analogy. Suppose that I submitted an article to a physics journal saying, I've got a fantastic new theory and accommodates all the laws of nature, the ones that are known, the ones that have yet to have been discovered. And it's such an elegant theory that I can say it in two words. Anything goes. Okay. That includes all the laws of nature. The ones we know. The ones we do not know yet. Everything. What's the problem? The problem is they're not going to accept the paper because when you have a theory, there are two kinds of questions you have to ask. Why are things this way? Why are things not that way? If you don't get the second question, you've done nothing. GT3 has done nothing. With a supercomputer, it can look at 45 terabytes of data and find some superficial regularities, which then it can evitate. And it can do the same with all languages. If I make up a language which violates every principle of language, with 45 terabytes of data, the same supercomputer, it will do the same thing. In fact, it's exactly like a physics paper that says anything goes. So there's no point in looking at its deficiencies because it does nothing. All it does is waste a lot of energy in California. I should be more careful. It has some engineering and applications that can be used to improve live transcription, for example, which I'm very happy about because I like to use it. I like bulldozers too. It's a lot easier than cleaning the snow by hand. But it's not a contribution to science. So it's okay? I mean, if you want to use up all the energy in California to improve live transcription. Well, okay. GPT-4 is coming along, which is supposed to have a trillion parameters. It will be exactly the same. It'll use even more energy and achieve exactly nothing for the same reasons. So there's nothing to discuss. It's exciting for the reporters in the New York Times. You probably saw the lead article in the Times Magazine a couple of weeks ago. They're absolutely ecstatic. We now have machines, just like a human. You can fool reporters, but you shouldn't be able to fool computer scientists. [02:39:04] Speaker 4: Yeah, first of all, I can't say how much of an honor this is, Professor. I mean, it goes without saying that we think you're one of the people that know something about language, unlike what we hear these days. So, as you can imagine, there are many questions that I can ask, but I'm going to ask a question that is about the current dominant paradigm. I'd like to know your thoughts on the current rise of connectionism or connectionism or the resurgence of connectionism, let's put it that way, and the ostensible success of deep learning. And specifically, I'd like to know, do you think the classic voter and violation critique that was written in the classic paper, Connectionism and Cognitive Architecture and Critical Analysis? Do you think the critique there has been answered? Or do you think the success of deep learning has been illusory? [02:40:14] Speaker 1: Well, I think there's a good answer to this question in an interesting research paper by a very good cognitive scientist at Northeastern University, Iris Barron. She had a study which essentially shows that in brief, empiricism is innate. She found that with studying children, adults, and so on, they're automatically driven radical empiricist conclusions. It's just something that comes naturally to us. Okay, that's connectionism. No matter how much it's refuted, it's all going to come back because it's an instinct. Our instinct is to try to find something like that. It's kind of like what happened in the 17th century. The problem in the 7th century was the nature of Moshe. That was called the hard problem in those days. How can you account for the fact that without contact you can make things move? And there was already a mechanical science developed by Descartes and I believe Galileo, Newton, and everyone, actually Newton and his Principia, showed that it doesn't work. There are no machines. Nothing works like the machine does. It's an invention, but it's not real. It was very hard to deal with. Newton himself regarded that conclusion as a total absurdity and spent the rest of his life trying to refute it. They admits Christiane Huygens, the great scientist of the day. They just discussed this is ridiculous and there's a reason. A mechanical science is intuitive. That's what we think about things. Study infants. Put two bars near each other but not touching. And if they move together, the infant will assume there's a connection. That's the way we're built. No? It took a long time for physics to realize the world doesn't work that way. The way we intuitively think about things is just not the way the world is. Maybe someday cognitive science will reach the level of physics in the 19th century and recognize that our intuitive concept of the world isn't the way it works. So the photoposition critiques to the point was accurate but it barely touches the surface. The whole approach is radically wrong. Everything we know about learning totally refutes it. And what we call learning is mostly the kind of growth. It's the growth of natural instincts in one or another way under the triggering slightly shaping of experience. The entire framework of these things is wrong. And you can see it very clearly in the case of use of language. It's very easy to show. By now it's even been experimentally demonstrated that from infancy as early as you can test less than two years old children are ignoring 100% of evidence that they're presented with and relying totally on mental constructions that they never perceive. I mean I can give you examples but it's we all do it all the time in our use of language. We simply ignore all the data and use mental constructions and infants do it as soon as they test it. It's not learning. This is just instinctive behavior. That's the way our visual system develops. It's the way you come to walk. The way your immune system develops and language develops and other things. Other aspects of knowledge. In fact there are many things we know about language that we don't even know that we know. We can't introspect into them. You have to have to do experimental work to show that people know them. That just I mean that's just like other aspects of the organism. I don't expect to be able to introspect into the functioning of your enteric nervous system. The so-called second brain the huge nervous system very much like this one which is just down here and runs most of your body. An enormous nervous system that has billions of neurons which it can't introspect into. But why should you able to introspect into what's going on in a year? It can't. You have to study it from there on the outside. the way you study everything else. Well, that's hard. Philosophers won't accept it at all. They totally reject it just like they rejected Newton is obviously absurd. Let the cognitive scientists occasionally look at it but don't really think about it. The rest just aren't interested. I mean, we're back in the 16th century in these fields and we have to break out of that. It's not easy. It took a long time for physics to break out of it. In fact, Pat Newton's theories couldn't be taught at his own university at Cambridge for about 50 years after his death because they were so obviously absurd. Well, if it's hard for physics it's going to be harder for cognitive scientists. [02:44:33] Speaker 4: As a quick follow-up, connectionism, symbolic, that debate has went on for a long time. Do you think there's any credibility to what people call hybrid or neurosymbolic and it goes under different labels. Do you think there's anything to that approach at least? Okay, learning everything from data probably is in some people's mind not practical but is there anything that neurosymbolic approaches or hybrid approaches can deliver [02:45:17] Speaker 1: to the whole debate? Yeah, there's a lot of extremely intelligent, exciting work. It's not trivial work. You know, there was a lot of thought and understanding. Mathematical sophistication and so on in this work it just doesn't happen to be contributing to science. It's contributing to other things like deep learning approaches have been very useful in protein folding, for example. They've really advanced understanding there. It's a good engineering technique that is. I mean, I'm not a role critical of engineering. I spent most of my life at the world's leading engineering institute. MIT, it's terrific. You know, I mean, it's useful for things like the Google Translate, live transcription, speech recognition. There are engineering projects that are significantly advanced by these methods and that's all to the good. I think that engineering is not a trivial field. It takes intelligence, invention, creativity, these great achievements. Does it contribute to science? Actually, I think there was an interesting transition at MIT where I was most all my life in the 1950s when I got there. That was the time when I was beginning Marvin Minsky, Herb Simon, other people, Alan Turing, who were, in their view, AI was supposed to be a study of the nature of intelligence. It was a scientific field. By now, that's disappeared. Not anybody's interested. But MIT at that time was an engineering school. There were great people in math and physics, but they were basically teachers and engineers. It changed in about 10 years. By the mid-1960s, 10 years later, MIT was a science university. Engineering was unified. Every student, no matter what engineering discipline, they want to go into, took the same fundamental courses. In science and math, you take basic physics, basic chemistry, biology, math, and then later on you apply it. And aeromodical engineering or mechanical engineering, whatever you're interested in, that was a huge transition, totally changed the nature of the institution. They brought humanities in for the first time because the science students were interested in humanities. And what really happened was that for the first time, the basic sciences had something to teach to engineers. It hadn't happened before. If in the 1950s you wanted to build a bridge or construct a lecture theater or something, you just did it via skills that had been developed in the engineering profession. Well, in the 1960s that was no longer the case. Physics and math really had something to tell you. So you had to know something in order to move ahead. Well, that took a long time. You know, that's physics after it was a developed field. We're a long way away from that in the cognitive sciences. Unfortunately, that kind of work that people like Herb Simon and Marvin Minksy were interested in has pretty much disappeared from AI. It's become basically an engineering field. Well, it has plenty of achievements. As I say, engineering is a very noble profession. It just doesn't contribute to science. [02:48:37] Speaker 6: Professor Chomsky, other than the issues with large language models we discussed, there is also the, let's say, minor matter that we're not silicon. Our biological wetware implements a kind of hybrid analog and digital computation which might, might realize aspects that are effectively impossible to replicate in digital circuits alone. Sir Roger Penrose goes as far as to hypothesize that our brains take advantage of quantum properties to access non-computable oracles, making our brains what Turing would have called oracle machines. We'd like to ask where you stand on these points. Perhaps you are a computationalist who believes human cognition can be digitally replicated in silico, or maybe you are open to the possibility that human brains are hypercomputers of some kind. So what do you think? [02:49:32] Speaker 1: Well, first of all, I'm completely incompetent to have any opinion about Roger Penrose's theories about quantum properties. You know, I have no idea. He's a smart guy. Obviously, you have to pay attention, but I frankly don't think it matters. Now, at this stage of understanding, at this stage of understanding, I don't see any reason to question the fact that we are organic creatures like the rest of nature, that whatever's going on here is some property of organic matter, whatever matter is. And then, if it could be duplicated in a silicon system, it essentially wouldn't tell us anything. It would tell us there's some general properties of this organic system, which also exist in some other system, maybe, that since we don't know what's going on up here, I don't see a lot of point in speculating about it. But the very basic questions to deal with about the nature of what we know about authentic language, our ability to deal with what you and I are doing here, there are fundamental questions about that, that unless we have some grasp of those, I really don't see a lot of point in speculating about quantum theoretic properties or possible silicon duplicates of what we don't understand here. So, yeah, they're possible questions. They just don't seem to be, at least to me, on the research agenda. Now, Penrose, of course, seems to be thinking about a serious problem, a problem about memory. Memory models that are studied are mostly in the neural net models, and in fact, deep learning is based on those. There is a serious question about whether neural net models are even in the right place to look. It's, I think here, Randy Gallistall's work is very significant, arguing that if you look at neural net models, they simply don't have to have the capacity to have the basic elements of a Turing machine, the core of computing is going to be some form of Turing machine. And he's argued, I think, pretty persuasively, you simply can't find those elements in neural net models, no matter how you proliferate them. Penrose has recently picked up the same idea. He's argued that, as Gallistall has, computing is not taking place in neural net models, that there's a lot more reasons to think so. Rather, it's at a much deeper level, you know, maybe even in RNA. If you look deep inside the cell, there's huge computing capacity. It goes way beyond what you can achieve in neural nets. It also isn't troubled. There's a big problem with neural nets. That goes back to Helmholtz. They're damn slow. Neural transmission is slow, of course, not by our standards, but by the standards of anything you need for computing. And if you go back, if you go down to, I mean, they've known there already is work showing that Perkins cells have huge computing capacity internally, just they're big, very big cells internally, without any external connections. And maybe that's the source of where compute is really going on in the brain. I think that's the kind of thing that Penrose is doing. He argues that at that level, you do have quantum effects. Well, maybe so I can't make any judgment about that. But there's a large problem being that that's biting this work. The question of whether the whole framework of neural net models is even appropriate, the way Gallistel puts it. Or we like to dreamt living under the wrong lampost, you know, because that's where the light is. [02:53:18] Speaker 2: You believe that there are limits of human understanding, mysteries of nature, which human intelligence may never grasp and cannot formalize. Professor Kenneth Stanley goes further and claims that the veneer of formalism, in particular the formalism of metrics and objectives, may paradoxically impede scientific progress by blinding us to creativity and serendipity in exploration and learning. He believes that open-ended exploration, something which he calls treasure hunting, is necessary to find valuable stepping stones which might lead to greatness, that is to say, stepping stones which, formal objectives, would have blocked us from discovering. I just wondered, what do you think of this view? [02:54:05] Speaker 1: Well, first, are there questions that can be formulated that are outside of our cognitive range? I think it would be a miracle if it's not true, unless we're angels, that's going to be true. If we are organic creatures, part of the organic world, then there'll be scope and limits to our capacities. In fact, the scope and limits are related. So, I have the capacity to walk, much faster than a chimpanzee, much better than an eagle, but by the same token, I can't fly or jump around trees. The same intrinsic characteristics that provide me with capacities and pose limits. Well, that's almost a bit of logic. Not exactly, but it's pretty close. So, if we are organic creatures, we're going to be like other organic creatures, in that there are bounds to our cognitive capacities. So, for example, a rat can be trained to run pretty complicated nases, but it can't be trained to learn a prime number maze. Turn right at every prime number, it just doesn't have the concept. and no matter how much training you do, you're not going to get anywhere. Well, I suspect there's reasons to suppose we're like rats. We have capacities, we have a nature, we have a structure, they yield all sorts of extensive range of things that we can do, but they probably impose limits. And I think we could even make some guess about what these limits are. Actually, one of them was suggested pretty strongly in the 17th century. I mean, we're not any smarter than Newton. Galileo, Leibniz, nothing relevant has been learned to help alleviate her concerns, and I think we have the same concerns. To me, it's as much of an absurdity as it was to Newton, though I can move the moon by raising my hand. Total absurdity. Okay, they regarded it as an absurdity. Leibniz did, Galileo did. They wanted an explanation in terms of an intelligible universe. That was the goal of early modern science. Let's find an intelligible universe, and an intelligible universe meant mechanical science, something that skilled artisans could construct, like incredible clocks, other objects that skilled artisans were constructing in Europe at the time, which almost acted like humans. So that's the way the world is, directed by a super skilled artisan. If you were a deist, then went home, it was a retired engineer who set it up and then left it run by itself, was the big issue at the time. But the point is the universe ought to be intangible. That ensured that the universe is not intelligible. And what happened after Newton took a long time. Science just reduced its aspirations. It doesn't seek to find an intelligible universe. It just seeks to find intelligible theories about the universe. So Leibniz could understand Newton's theories. They were not unintelligible. It was the world that we were describing that was unintelligible. Well, that's a big shift in the nature of science. It wasn't particularly recognized, but it just became tacit. You don't even look for intelligibility anymore. You want theory that meets the conditions of intelligibility for a theory. For example, we get to what we started with. A theory is no good at all unless it tells you why things are not this way. Otherwise, it's not a theory in the least, like GPT-3, deep learning, and their approaches. They don't enter the domain of theories. You can even look at it. So there's nothing to say about them, like my anything goes theory. That's a condition that theories have to meet, but what the world does meet. We have nothing to say about it. Whatever crazy things physicists come up with. Okay, if that's the way it is, that's the way it is. You know, it's not intelligible, too bad for my cognitive capacities. Well, what are the mysteries in this universe that are beyond our scope? I think we can make some guesses. There are questions that have been asked for thousands of years where we have made zero progress. Not even bad ideas about them. One of them is the 17th century hard problem. Motion. We've given up on that. Motion is what our physicists tell us. If it's gravitons in a quantum system, okay, then that's what it is beyond our capacity to conceive of except we understand the theory. So I think that's one candidate. Another candidate is what you and I are now doing that's been a problem for thousands of years. Have who we be constructing in our minds infinitely many thoughts. Picking out of them. How do we do that? And then communicating it in a way which allows others who have no access to our thoughts to grasp what's internal to our minds. How on earth do we do this? Galileo regarded this as one of the great miracles of the universe and is totally beyond our understanding. Galileo regarded the alphabet as the most spectacular of human inventions inventions because it somehow captured this miracle with a finite number of symbols. You can not only construct an infinite number of thoughts which is miraculous enough but you can also pick one of them out and use it to convey to the others the internal workings of your mind. We have absolutely no idea, not even bad ideas, about how any of this can go on. In fact, we don't even have any idea of how I can decide to lift my little finger. None. It's just a total mystery. Of course, you can make claims about it, but you can't do anything about it. Well, maybe it's just beyond our cognitive capacities and I think there are examples like that where we just hit a blank wall, we can't do anything, whether there's further things to say about it. I mean, even ordinary normal creativity, the kind that goes on, like in speaking, normal speaking is a highly creative act, a scientific invention is a greater creative act, gray art is an even greater act, but all those things are totally beyond our comprehension. From lifting my little finger to writing a Beethoven quartet, we haven't a clue, we're talking, we just have nothing else to say about it. [03:00:35] Speaker 6: Fascinating. So, cognitive horizon. This is regarding the many theories of semantics that have cropped up over the years. For example, truth-conditional semantics, logical semantics, ontological semantics, etc. Which, if any, paradigms of semantics do you think are headed in the right direction as far as getting us closer to an actual science of semantics? Or will we ever have a formal science of semantics as Montahue thought? [03:01:08] Speaker 1: Well, I think there's very rich, exciting work. And what's called semantics, it's been one of the most lively fields of theory and linguistics, philosophy, cognitive science, and yours. You mentioned Barbara Puerti earlier, one of the pioneers in this field. Great work. It's not semantics, it's syntactics. It's all study of symbolic manipulations that go on in the mind. Suppose you do model theoretic semantics. The kind Barbara Puerti does how you do model theoretic semantics. What you do is identify certain individuals and certain predicates, and you ask how the predicates are distributed over the individuals under various conditions. What are the individuals? Mental objects, not things in the world. They are mental objects of something. Do they correspond to anything in the world? Very loosely, if you actually look carefully at the mix of words, there's a very loose connection to anything in the outside world. Take Aristotle's example. He discusses this. His example is house. So what's a house? Well, in his metaphysics, house is a combination of form and matter. The matter of a house is the bricks, the timber, things that a physicist could find. the form of the house is the intention of the designer. The characteristic use are things that are in the mind. In fact, that's what a house is. The thing could look exactly like I had a house for a physicist and not be a house. It could be a library, could be a stable, could be a paperweight for a giant, you know, it could be anything. Because the meaning of every word is largely a matter of our conceptual structures. And that's true of the simplest words that you find. Actually, the first example that was used in physical philosophy was river. Heracles, pre-Socratic, asked, how can you cross the same river twice? It's a pretty deep question if you think about it. The second time you cross it, it's a totally different physical object. it wasn't the same river when you start living at that. The form, it's what we construct in our minds, is what constitutes river. I happen to live in Arizona now. On my way to the university, I crossed something called the Relito River. I have yet to see a drop of water. Old-timers tell me if you could go with them on sand, there's sort of water flowing. Though it's the Relito River, if it got paved over, and started to be used for commuting. It would be the Relito Highway. It's the same object. And that's true for every word in the language. There is simply no semantics in natural language, at least semantics of the denser, frige, parsi, corna, quine, any formal semantics. It just doesn't exist in language. They have mental operations going on that have some loose relation to the outside world, but it's not truth. And it's not reference. Those just don't occur. So, what's the best approach to this? My own view, the most productive approaches are the what are called the event semantics near the Davidsonian, developed by Paul Piotrowski, Barry Scheng, and a number of others, which essentially started with the question like, why can, if we say, John read the book quickly, why can we infer that John read the book? Okay, that was the original question, and the proposed answer is, there's an event reading, there's an agent John, there's an agent book, and there's an adverb, the modifier of the event. Quickly, if you analyze it that way, that's just a concatenation, and you get the inferences that's been developed extensively by people like Piotrowski, and Scheng, among others, that happens to fit very naturally to what is, I think, is coming, we're coming to understand as pure syntax. It seems that that's the way pure syntax provides structures of that nature, which fits very naturally into events and semantics. But notice the event semantics is syntax. When you talk about an event, it's not anything in the world, it's something that we construct in our minds. There was a gentleman named Zeno who taught us something about that. How many events are there when I cross a room? As many as you decide to put there. You know, there's no end up to the power of the continuum. So event semantics, I think, is productive as a form of syntax. Then comes another question. How do all these things going on in our mind relate to the outside world? That's one of those questions that don't think we have any answer to. Now we're back to Galileo's problem. How we do these things, we don't know. We do them. We do a lot of things. But we have no understanding. We'll probably never have an understanding of it. [03:06:31] Speaker 4: I'd like to ask you, professor, what do you think is the relationship between what you have called universal grammar or the I language and Fodor's language of thought, which has been also quite a theory in linguistics and cognitive science. And if we can suppose that both assume an innate system endowed by genetics and or the laws of nature, if that is a similarity or is it, and if not, and if it is, how do they differ? So basically, how are they similar and how are [03:07:19] Speaker 1: they different? That gets to the heart of current advanced inquiry. In my opinion, Stuart, Jerry Fodor was a close personal friend. We talked about these things all the time. But ask yourself, what's the language of thought? As far as I know, it's English. Do you know anything about the language of thought that isn't English? Not in Jerry Fodor's work, just English. Well, of course it's not English, but it should be what is common to human languages. Whatever is common to human languages, that should be the language of thought. Well, what's common to human languages? Universal grammar, it's just its definition. That's the name of what is the core of all languages. Whatever it turns out to be, asking what is universal grammar is asking what are the laws of nature. Well, try to find them, you know. but make the best guesses. You know, find out they're wrong, find better ones, and so on. That's universal. But there is debate in the cognitive science literature about whether universal grammar exists. It's another illustration of the pre-scientific character of cognitive science. The question is meaningless. There's something that distinguishes a human infant from a chimpanzee with regard to language, U.K. If you don't agree with that, you're a flat earther. If you agree with it, the next question is what is it? Answer universal grammar, whatever that turns out to be. The question about its existence doesn't even arise. I mean, it's like arguing with somebody who says everything's done by angels. Useless discussion, you know. So the question is what is it that's distinctive about human language, that it's inborn. There is a long tradition going back to Aristotle, in fact, right for centuries into the 20th century, assume that what a language is, is a system for generating thought. Language was sometimes defined as audible thought. You know, we know audible is too narrow. Now we know it can be signed in other sensory motor systems. It's irrelevant. It's like a computer program that can be hooked up to any printer. It doesn't care, but that's the internal thing, like the computer program or the i-language. That's a system of thought, and the language of thought will be whatever that system happens to i-compute, probably identical among humans. As far as we know, there's no distinction among humans. There's the capacity to acquire a language, any infant. As far as we know, can acquire any language with equal facility, so it's probably uniform, which would not be very surprising. But humans are a very recent species, a couple hundred thousand years. That's a flick of an eye in evolutionary time. And we know from genomic evidence that humans began to separate on the order of 150,000 years ago. That means there's a very narrow, and they all share the language faculty equally. there's no evidence that existed at all before modern homo sapiens. So there's a very narrow window in which it seems to have emerged. Probably hasn't changed since. So we have certain expectations. It should be something very simple, something that you just followed from natural law. If you look at the way evolution works, not stories. Actual evolution basically has three stages. The first stage is you have system function, some random disruption takes place, really random, and mutation, teen transfer, some bacterium by accident swallows another microorganism, gives you variety, eukaryotic cells, complex life, you know, just random events take place and they change the structure of the system. And Mother Nature comes along at this stage and finds the simplest solution to whatever development, what Einstein wants called the Magical Cree. Linus' law of least effort. It always seems to work in every branch of science, whenever you understand that anything. Turns out it was the simplest solution. So that's the way Mother Nature works. Can't really give an explanation, but it's so overwhelmingly supported that nobody even questions it. And if you don't have the simplest solution, you figure you're wrong. You know, that's the ordinary scientific way. And as I say, Einstein just called it the Magical Cree, that's the way it is, that's how nature is. So we'd expect that when Mother Nature, some event took place, random event which provided Homo sapiens with the capacity for recursive enumeration, the enumeration, the fundamental property of a computational system. No other organism has it. It's nowhere. You know, it's uniform in humans. Mother Nature came along and said, okay, let's find the simplest way of handling recursive enumeration with some special conditions. Namely, it has to produce thought. So it has to have some kind of, at least primitive, way of having conceptual entities which enter into the thought, like probably event semantics, like maybe we conceive the world in terms of events, agents, patients, modifications, so on. Put that together with recursive enumeration. Find the simplest possible solution that ought to be a universal group. For the tasks of researchers in linguistics and cognitive science ought to be, see if you can show that the simplest possible solution to this conundrum yielded explanation for the phenomenon of language. That's the task of the field. Almost nobody's interested in it. You can count the number of people on the fingers of one hand, but that's what the field ought to be, and I think they're progressing in that. I think we're maybe entering a new era where, for the first time, first time ever, we can give genuine explanations for fundamental properties of language. In fact, one of them I mentioned, the most striking dramatic feature of language is what's called structure dependence, the fact that from infancy, every human understands unconsciously that all the rules of language, all operations in language, have to ignore the order of words and deal just with structures. So ignore everything you've heard, deal with the abstract structures in your mind. You can demonstrate this directly, overwhelmingly. That's the way it works. We now have an explanation for it. It turns out that that's what follows from the simplest combinatorial operation. The simplest combinatorial operation happens to be binary set formation. What's called merge in contemporary literature? Well, if language is based on binary set formation, you get this property. No linear or just structures. So we had a, for the first time ever, a deep explanation for the most fundamental property of language, which is a very surprising property, which tells you something about learning, cognition and so on. Almost nobody's interested in it. You take a look at the literature and cognitive science. There's an endless number of papers trying to show that by massive statistical analysis of huge amounts of data, you can begin to approximate, but you can explain nothing. I mean, of course they all failed. It's not interesting. Why try it in the first, they have a perfect explanation, the best possible, for some fundamental mental property. What's the point of trying to see if a couple of supercomputers in massive amounts data, they can approximate it. I mean, it's madness, you know, but that's the field that we're in. It's madness. That's it. And you can't, it's very hard to get this across. I mean, it's not of interest to people. The idea of finding an explanation for something, it's just not of interest. I mean, it was for Turing, was for Marv Minsky, who I know pretty well, Herb Simon, the other pioneers of McCarthy, pioneers of AI, you know that was interesting to them. They tried. Well, it wasn't me at that time, and it was given up. And, um, as you know, the field by now that's considered old-fashioned nonsense. We don't care about that stuff anymore. It's, uh, in other words, we don't care of anything of any interest. We just want little things that make some money. Okay. That's okay. That's what you are. That's probably where the field will develop. That's where the money is, you know, and the jobs. But it's a sham. I think some people try to hang on to the old ideas since they want to do something of intellectual interest. [03:16:14] Speaker 2: Which profound misunderstandings of language and linguistics persist even at the highest levels of the scientific community? Do you mind that many of your own scientific ideas are widely misunderstood? [03:16:30] Speaker 1: Totally misunderstood. It's amazing to me. So there's one paper of mine that I presume you know about. It was in 1956, three models for the description of language that did enter the literature. I've been very interested in the fact that for 60 years the paper has been totally misunderstood and no matter how many times I try to explain it, I can't make any progress. If probably, you probably haven't looked at that paper. But you get material drawn from an elementary introduction to cognitive science, what's called the Chomsky hierarchy, which is all stuff I said you shouldn't be looking at. That's what the paper said. The paper had three models. One of them was Markovian sources everybody was using at the time and argued it can't work. The second model was rewriting systems, post rewriting systems, of which if you put some conditions on you get context sensitive, context free grammars. Finite automatic, so the hierarchy, there was a third model. That's why it's called three models. The third model was the only one I thought made any sense because it began to provide some explanations for things. The other ones were just descriptive models, wrong descriptive models. The only thing that's come out of that paper is the wrong models. There's huge literature on context-free grammar, context-sensitive grammar. I've written about it. It raises some interesting questions about automata theory. You know, context-free grammars happen to be broadly the equivalent to push-down storage automata. Kind of a useful result, but basically they tell you nothing about language. These models, models just apply to language. They're the only ones that have been studied. And in all literature in the last 60 years, nobody's noticed there's a third model. The paper's called three models, and the point of the paper is to show that the other models just don't work. Well, that's only 60 years. Maybe somebody will notice. But the more interesting question was the first part. What are the misunderstandings? And I think we can. A useful way to look at is to think of what have been major problems. They've actually been given names. So let's use the name, the Plato problem. How can we know so much with so little evidence? Well, the problem is badly misunderstood. And there's, we started off by talking about one of the misunderstandings, the idea that if you have enough data, you know, trillions and trillions amount of it and you have battery of supercomputers working, you're going to deal with this problem. No, you're not. You're going to get nowhere. You can show in advance that you're going to get nowhere. Okay, that's one misinterpretation which holds not just for computer science, it holds for philosophy, linguistics, total misunderstanding. I think probably for our experiential reasons, we're instinctively radical and persists. And it's hard to get out of that. It took the 17th and 18th century for physics to break out of it. Another problem is what's sometimes called Darwin's problem. How do we get this language system? It's unique. It's common to humans. No variation as far as we know. No trace of it in any other organism. There's a lot of time wasted trying to train poor chimpanzees to duplicate some of these things, which makes about as much sense as trying to train graduate students to do the waggle-waggles of the bees, when you could train them to somehow mimic it maybe, but it would be idiotic. It's equally idiotic to try to train a chimpanzee to do what we're doing. Well, for the first one, the graduate students, you can't get an NSF grant for that because it's so obviously idiotic. The second one, which is equally idiotic, you can get grants for a lot of people working on it and so on. That's part of the overwhelming irrationality of the way the human mind is studied. That's Darwin's problem. Now the last problem is what's now called Descartes' problem. How can we do what we're doing right now? How can we speak in ways which are appropriate to situations but not caused by them? That's a huge problem, which is basically the question of freedom of choice. How can we do it, year's day? This is, we're involved in it all our lives. It's very interesting, if you look at the philosophical literature, the scientific literature, it's sort of interesting. Virtually 100% of people think about saying we're all determined. We're all thermostats. Everything we do is totally determined. You look at the behavior of the same people, 100% of the time they act as if they're not determined. So, and it's this way constantly, and even the people who give arguments saying determined are tacitly assuming we're not determined. Otherwise, why give the argument, you know, if it's determined, if it's a thermostat, why bother? So, it goes all the way up to Einstein who gave arguments trying to show that everything's determined, proving by his effort to do it that he was not determined. Okay, so here we have a kind of a paradox. Everybody says we're automatic. Everybody acts all the time as if we're not automatic. Well, does science say anything about this? Nothing. Science says we can't deal with it. It says we can deal with it with determinism. We can deal with randomness, but we can't deal with things that are not determined or random. So, here's the system situation. Some extraterrestrial intelligence is looking at us. These strange beasts down here, 100% of the time it says it's determined. 100% of the time it acts as if it's not determined. It believes science which tells it nothing. Alright, is it a misunderstanding? Sure seems to me like a deep one. So, if you go to Plato's problem, Darwin's problem, Descartes' problem, you see very profound misunderstandings which dominate the fields, all the fields. Philosophy which is supposed to be sophisticated is probably the worst of all. Maybe I can add one point about confusion. If you look at the philosophical literature today, there's something that everyone's obsessed with. It's called the hard problem. The problem is consciousness. What's it like to see the sun rise? Let's go back to the 17th century, which is an interesting century. It's the birth of modern science. They had a hard problem, as it was called, the motion. The hard problem was motion. How is a hard problem dealt with in the 17th century? Well, properties of motion were formulated. They said, here are some properties of motion. You know, Daleo's experiments, and so on. Actually, thought experiments, he never carried them out. If he tried to carry them out, it never would have worked. They were thought experiments. You know, you drop a ball from the top of the mast of a moving sailboat, and it falls to the base, not to the back. Thought experiment. If he tried, he would have got craziness. But, there were properties of motion that were established, and then came the hard problem. How can we explain the properties of motion? Well, the answer was we can't, so we give it up. We moved to something else, trying to do theories of the properties of motion without understanding the properties. That's the 17th century. Now let's go to the 20th and 21st century. There's something called the hard problem. What's it like to see the sun rise? There's a step missing, the step that was taken in the 17th century. What are the properties of what's being like to see the sun rise? I can't tell you, I could write a story about it, write a poem about it, but I can't say here are the properties of what's like to see the sun rise. So what's being posed as an unanswerable question? A question that can't be answered. So if you pose an unanswerable question, you're not going to get an answer. The question can be answered only if you can formulate it. If you can say here's what I'm trying to explain, but you got this huge literature, total obsession in philosophy of mind. How can we answer an unanswerable question? I mean one philosopher, a young philosopher. Galen Strausson commented, 20th century must be the silliest century in the history of philosophy. He's exaggerating, but not by too much enough. We're in a strange period in all of these fields. We're just consumed with elementary misunderstandings. It's very hard to break out of them. It's a very irrational period. It's reminiscent of the period when people argued about the right interpretation of the Eucharist and so on. You know, I mean it's a strange period, a lot of sophistication but massive misunderstanding. [03:25:30] Speaker 6: I'm curious, on the side of things that we might be able to answer, what do you think some of the greatest remaining mysteries of language, science, or philosophy which we have yet to solve and yet may be able to answer, and which are some of the areas of research that you find most personally exciting to those ends? [03:25:55] Speaker 1: Well, I think there are questions right at the border of research which I think are very exciting. Like the one I mentioned, I mentioned that I think for the first time ever for thousands of years we can now give deep explanations of some fundamental properties of language and thought which are probably the same thing. Let's assume that language and thought are two different ways of looking at the same thing. Thought's what is generated by language and language is what generates thought. And one of the deepest mysteries is this property of structure dependence that I mentioned. In fact, it's the deepest property and a very surprising one and we now have a perfect answer to it. It's accounted for by the simplest, by the assumption that nature acts the way nature always acts by finding the simplest solution. And the simplest solution for a combinatorial system happens to be binary set formation. Well, you look a little further, there's much more to it. But that's the beginning of it. And now I think we can press forward with that. Ask what should be a component of language. Assuming that nature is perfect, you look, you find there are some things which are there only because they fall away from these properties. So there's a subpart language called control theory. You have a curious. Now I have to get into details, but there's a curious distinction which has been empirically known for many years between sentences like a one interpreter each seems to have been assigned to the diplomats, one interpreter each tried to be assigned to the diplomats. First one's okay, second one isn't, uh, why? Well, the answer is a whole system called control theory, and we now have an explanation for it in terms of the assumption that nature picked the simplest possible answer. that kind of question that can now be raised for the first time, and sometimes answered, uh, well, I think that's exciting moment in history of the study of mind that I see it interact with, what to me at least seems the most promising mode of formal semantics, namely event semantics. It links very closely to that, so if you can work out the ton of remaining detail, it's not elementary. You can imagine how you'd work them out. You would have a basic answer to the structure and understanding of our system of thought and expression. That would be a pretty exciting development, I think. There's many others, so for example, you know, this goes way beyond my competence, so I can just talk about the problems, not what they mean, but somebody, a friend. A quantum physicist recently sent me a paper in a quantum theory journal, which was a symposium of half a dozen leading figures in quantum physics. They were discussing what is a particle. They have no idea. They don't know what a particle is. It's the big thing you have to talk about in physics. They have a lot of vague ideas about it, but they don't know the answer. Well, that sounds like an interesting question to me. What is a particle? And here we go back to the question of consciousness and look at the strange way the topic hasn't studied. You go back to Newton again, let's say. Newton recognized that we know so little about matter that we cannot say that all matter has life. maybe a stone. We know so little about matter, so little about life, we can't say whether all matter has life. Well, back last century, Sir Arthur Eddington, great astrophysicist, he said we know little about matter. We can't say whether all matter is conscious, not because we don't know about consciousness. We know massive amount of consciousness more than anything else. But we know nothing about matter, so we can't say whether all matter is conscious. Well, that's question two. I don't think a stone is conscious, but can I show we know enough about matter to explain that a stone isn't conscious? That's a lack of knowledge about matter. So we, not me, but advanced physics will pursue that question. Can we find out enough about matter to answer questions like this or even tell us what a particle is? Well, those are questions at the border of research. And in every area you're in, you have the questions. what's life, you know, versus non-life. Any area of science you're in, you're overwhelmed by questions like these. Some of them are at the border of research, so you can hope to go forward. Others are so beyond, far beyond, that you can't even speculate sensibly about them. Some of them may turn out to be true mysteries for humans, that it's just beyond our cognitive capacities. You have to study them in independent ways, you know, kind of sideways. Well, among the problems of language are what's the neural basis? What's the neural basis for language? I mean, something's going on in our brains when we're doing this. What's going on? It's very hard to find out, partly for ethical reasons. You just can't do the experiments that might give you some answers. Because they were unethical, you can't stick an electrode into a particular neuron in Broca's area to find out what's going on. We do it with cats and monkeys. We don't do it with humans. Oh, well, you could argue about cats and monkeys. But the fact is we do it, that's the way we've learned about the visual system. But you can't with the language system. There's no organism that has anything like it. So you can't do it with other organisms. We don't allow it to do it. We don't allow ourselves to do it with humans. We don't raise humans in controlled environments to see what will happen. I, in principle, if the Nazis had won the war, maybe we'd be doing it. But we don't do it. So it's just very hard to find answers to questions that even when we know how to find the answer, that means you have to be much more ingenious, do much more sophisticated experiments. But these are things that the borderline of inquiry, actually, yesterday I sat in a dissertation defense of somebody who's actually working on this. There are interesting things, white matter connections between Broca's and Mornick's area, two areas that seem to be implicated in language, and there's some evidence that these white matter connections become super myelinated in early development in humans, but not apes, connecting the two areas of the brain that seem to be implicated in language. And interestingly, they also have a connection to the auditory system, also lacking in apes. So maybe there's something in white matter that has something special to do with language. That's the kind of topic you can investigate from the outside in any area of science you know about. You've got many questions like this. So there's no shortage of things to study. It's just a shame that a huge amount of effort, money, scarce energy, are wasted in doing things that make absolutely no sense. [03:32:49] Speaker 2: Well, Professor Chomsky, we're going to wrap the conversation here, but can I just thank you so much for coming on our podcast. It means so much to us. Sincerely. [03:32:58] Speaker 1: Thank you very much. Pleasure to talk to you. [03:33:00] Speaker 2: What was the best thing about talking with Chomsky? [03:33:04] Speaker 6: Best thing about talking with Chomsky? Oh, boy. I think, I'll tell you what, and this is kind of personal to me, I think, because you probably know in front of the show and whatnot, you know, I like plain language. I mean, sometimes I'm forced to use technical terms, but I like plain language and I like to think about concrete things and I like to make jokes about, you know, bulldozers or, you know, very down-to-earth kinds of topics, right? And so when I realized how down-to-earth he is and, and, and somebody that you could just hang out with and joke, joke about and talk about things concretely in the real world, and yet he's, you know, the foremost intellectual of our era, right? I think, um, it was nice. It was nice to see that, like, you can be both at the same time. You know, you can be, you can be an ordinary, human being and an extraordinary intellectual all rolled into one. Yeah. [03:34:12] Speaker 2: He, he was taking the piss out of neural networks. He said he liked bulldozers too, but they weren't a contribution to science. [03:34:19] Speaker 6: Yeah, exactly. Well, exactly. So, you know, he's, he's saying they're a great feat of engineering. They just don't, they don't contribute to science. And I just, I just, I found it really enjoyable to talk to him. It was, it was very, um, it's very fun. He's very funny. And, and, uh, yeah, just all around. It was, it was great, great experience. [03:34:41] Speaker 2: Yeah. And officially this has been in terms of production and the amount of hours of faffing around, this has been by far, by far the longest cycle we've had on any episode. Yeah. And of course, I've already made some content about how we fixed it, but yeah, it's just, I mean, we, we've fallen out over it. we've made up again now, but yeah, it's been an incredible slog just getting it all fixed. Um, thank God Chomsky let us publish [03:35:08] Speaker 6: it. Yeah. And that, you know, as we've explained here or elsewhere in the show, I mean, it was obviously a lot of it was accidental. Okay. We didn't, you know, technological failure that resulted in this challenge, but I have to say for me personally, uh, it, it, it made the journey all the more epic and satisfying that we, um, that we really pulled off this, um, you know, Chomsky called himself a miracle of engineering, you know, to, to restore the show. It doesn't contribute to science, but it was a, uh, was a miracle of engineering. And I mean, you know, it's, it's, it's one of these life stories that I think, um, in the end, I'm going to remember forever and it couldn't have happened. I mean, in a way, the, the, uh, the person that we were interviewing, this was the perfect person for it to happen to. other than, than the loss of quality, quality because of just all the ironies and sort of the, the process of doing it. And, and, you know, we, we really wouldn't have put this tremendous amount of effort into it for, for anyone else. Uh, so [03:36:11] Speaker 2: unfortunately, we ran out of time. The production cycle has been so long, but there were a few moments in the recording with us where Chomsky was chuckling and it meant so much to us to kind of capture that mental state. So we didn't have time to cut them back into the synthesized version, but here are a few clips of, of Chomsky chuckling in response to us. [03:36:33] Speaker 1: Well, that [03:36:34] Speaker 2: sounds like an interesting question to me. [03:36:37] Speaker 1: What is a particle? Maybe somebody will notice, but it seems, seems to me like a deep one, it's never carried them out. The ton of remaining details is not

The Ghost in the Machine — Noam Chomsky

Related Transcripts from Machine Learning Street Talk

Transcribe Any Video or Podcast — Free