A Conversation with Martin Frické on the Epistemology of Machine Learning
Martin Frické is an Emeritus Professor at the University of Arizona. In this conversation, we talked about the epistemology of machine learning.
Martin Frické is an Emeritus Professor at the University of Arizona. He received his PhD in logic and scientific method from the London School of Economics. He has taught networking, human-computer interaction, logic, and web design, and has spent the latter part of his career studying logic and librarianship, specifically the use of computers and symbolic logic to organize information. As a computer programmer and developer, he has written programs to assist with instruction, many of which are in use the world over. His most recent book is Artificial Intelligence and Librarianship: Notes for Teaching. You can learn more about him here. In this conversation, we talked about the epistemology of machine learning, focusing on Noam Chomsky’s arguments against programs like ChatGPT.
Vatsal: I find it fascinating how certain tendencies and positions manifest again and again in the history of thought. A perennial example is the rough distinction between those who place greater emphasis on our innate structure and those who place greater emphasis on experience or data in their conception of knowledge. We may be familiar with Raphael’s famous painting, the School of Athens, which places Plato and Aristotle at the center—Plato pointing toward the heavens and Aristotle toward the ground—symbolizing their contrasting approaches and beliefs. In his dialogue Meno, Plato presents us with a puzzle: how is any inquiry possible if it is impossible to inquire into what we don’t know—since we couldn’t search for it and wouldn’t recognize it even if we found it—or into what we do know, as we already know it? In response to this problem, Plato suggested that what we consider learning is actually prompted recollection: we possess at birth knowledge in a latent form obtained in an existence prior to this life, which is only brought out by the prompting of experience. Aristotle, by contrast, emphasized the essential role of experience in the formation of knowledge. According to him, experience not only allows us to initiate inquiry but also serves as the standard against which its success is measured.
In the 17th century, John Locke compared the initial condition of the human mind to “white paper, void of all characters, without any ideas”. In response to the question, “Whence has it all the materials of reason and knowledge?”, he answered, “in one word, from experience”. For Locke, all knowledge derives from experience. All complex ideas, including those about things that do not exist in nature, can be reduced to simpler ideas, ultimately tracing back to elementary ideas that are not created by us but are obtained from experience. While Locke denied the existence of innate ideas, Gottfried Leibniz affirmed it. Leibniz wrote a book-length response to Locke, in which he placed Locke’s system closer to that of Aristotle and his own closer to that of Plato. Referring to a scholastic maxim—that nothing is in the intellect which was not first in the senses—he added, “except for: the intellect itself”. Against Locke’s white paper, Leibniz argued that “the soul contains from the beginning the source of several notions and doctrines, which external objects awaken only on certain occasions”. For Leibniz, “senses never give us anything but instances”, but “all the instances confirming a general truth, however numerous they may be, are not sufficient to establish the universal necessity of that same truth, for it does not follow that what has happened before will always happen in the same way”.
It is remarkable how these sides have reappeared in our era of artificial intelligence. Consider the arguments of Noam Chomsky against machine learning. Machine learning, the approach responsible for the recent advances in artificial intelligence, relies heavily on data. Rather than relying on explicit programming, an approach that was popular early in the history of artificial intelligence, machine learning involves identifying patterns in large datasets that enable predictions. For Mr. Chomsky, this is not how the human mind functions. The human mind is not “a lumbering statistical engine for pattern matching”. It does not seek to “infer brute correlations among data points”, but to “create explanations”.
Mr. Chomsky is sympathetic to Plato’s conception of knowledge. He mentions how “Leibniz pointed out that Plato’s theory of reminiscence was basically correct, but it had to be purged of the error of reminiscence — in other words, not an earlier life, but rather something intrinsic to our nature”. According to Mr. Chomsky, Plato’s pre-existence can be reconceptualized in a way as the lives of our ancestors, that is, our genetic endowment. It is this genetic endowment that allows a child to acquire language at a specific period during maturation despite limited and often corrupt linguistic data. To use his analogy: although growth would not occur without eating, it is not the food but the child’s inner nature that determines how the growth will occur. Likewise, it is not the linguistic data but the child’s biological endowment that determines how language is acquired. Since machine learning systems operate not based on explicit rules but rather on patterns found in data, they lack the constraints that the human mind possesses. “Humans are limited in the kinds of explanations we can rationally conjecture,” Mr. Chomsky observes, while “machine learning systems can learn both that the earth is flat and that the earth is round.”
You have written about machine learning and its inability to generate explanations. Are your reasons behind this position similar to those of Mr. Chomsky?
Martin Frické: Let me redescribe some of Chomsky in my own words. One central problem that Chomsky addresses is the question of how it is that infants of different ethnicities or cultures can learn their relevant languages extremely rapidly and on the basis of limited data or examples. Chomsky posits innate abilities that are universal. These abilities are not innate ideas or innate knowledge. But, in their most recent version, they might be thought of as principles and parameters (a series of ‘switches’). When a baby hears certain sample words or conversation snippets, switches are ‘flipped’. Thus, a Japanese baby in a Japanese family will speak Japanese and had the same baby alternatively been brought up in an English family, the infant would speak English. An important part of the explanatory challenge here is the paucity of the data that the baby is exposed to. The infant does not hear lots of Japanese, or lots of English, relative to the considerable linguistic skills that the infant acquires. It is the innate principles that facilitate this.
Chomsky’s suggestion here has the status of being an explanatory scientific hypothesis. It is not a piece of armchair philosophy (or armchair linguistics). When detail is added to it, and it is subject to test, there has been considerable evidence in its favor. There has also been considerable counterevidence. (For example, Ibbotson and Tomasello have a paper, Evidence Rebuts Chomsky's Theory of Language Learning). Chomsky’s final principles and parameters account is only one of the modifications he has made to his theories in the light of evidence.
What we are talking of here is science. Going back to innate ideas and innate knowledge in philosophy, they are not science. We might locate the scientific revolution perhaps with Galileo and his contemporaries, say around 1600. Plato was not doing science, nor was Locke (who was later than 1600). If we try to apply some philosophical categories, Chomsky’s theories are contingent, synthetic, and a posteriori. They are science, fallible hypothetical science, right or wrong.
Vatsal: I have provided a detailed response to Mr. Chomsky’s arguments in my essay, A Response to Noam Chomsky on Machine Learning and Knowledge. For Mr. Chomsky, the deepest flaw of machine learning is its inability to generate explanations. “The crux of machine learning is description and prediction,” he states, “it does not posit any causal mechanisms or physical laws.” He refers to predictions without explanatory insights as pseudoscience. In my essay, I argued that, building upon David Hume’s analysis of causality and the explanatory model articulated by Carl Hempel and others, any conception of causal mechanisms or physical laws can ultimately be reduced to finding regularities and making predictions. Our biological endowments only aid us in this process, as do the architectures of animals and machine learning systems.
In response to what I consider to be Mr. Chomsky’s strongest argument—that “machine learning systems can learn both that the earth is flat and that the earth is round”, and therefore cannot apply the rules of inference to distinguish between truth and falsity—it can be pointed out that these rules are not innate to us. There is no determinate pattern in how their application manifests in our behavior; we do not always follow them, and we may act according to them without fully recognizing it. We can compare this to moral rules, where not committing theft is not an innate behavior, although some tendencies may predispose us to it, with or without an explicit understanding of the rule.
In my essay, I further argued that the data accessible to a system presents a universal constraint, so that “corrupt data misleading a machine learning system is not dissimilar from a Cartesian deceiver misleading a human”. Relying on predictions helps us avoid the pitfalls of being misled by seemingly satisfactory explanations, I argued, “as demonstrated by the success of modern natural science which offers predictions that pre-modern natural philosophy, despite its elaborate explanations, failed to offer”.
Martin Frické: You accurately present Chomsky as contrasting machine learning large language models (LLMs) with Chomsky’s own theories of language acquisition (Katz 2012; Chomsky, Roberts, and Watumull 2023). Just as background: LLMs work by statistical inference. They correlate or predict tokens in sequences from the input data of vast amounts of text. Tokens are typically larger than single characters, but smaller than single words. Then you assert:
[Chomsky] contends [that the way LLMs work] is not how the human mind works.
Chomsky is surely right here. Japanese or English babies, or any other human for that matter, are simply not exposed to the vast amounts of text that LLMs require.
You then consider something else—what you call the ‘empirical’ view or ‘empirical’ question—might mere correlations in exposure to linguistic data be the origin of human access to some linguistic capabilities, nothing innate required? LLMs seem to show that considerable linguistic skills can be acquired by computers just from data. Might human ‘half-baked’ LLMs be the origin of some human linguistic skills? Chomsky thinks not. Basically, he is a hypothetico-deductivist and is skeptical of the inductivist view that theory-free studying of data will lead to discovery. Chomsky here is absolutely in the mainstream of philosophy and history of science. My own views on scientific method and discovery, as expressed in, say, Big Data and Its Epistemology (Frické 2015), more-or-less coincide with Chomsky’s on these points.
There is an embarrassment, though. LLMs can have exceptional language skills, skills apparently acquired solely from data. Prior to 2018, most hypothetico-deductivists would have thought that impossible. This suggests that we should look in more detail at what machine learning approaches and LLMs are doing. Machine learning in general is not theory free. There is always what is called inductive bias (bits of theory that come out of thin air). This, for example, might determine the form of a conjectured regression curve. LLMs themselves have a host of theories in them: that the processes are Markov processes, that certain kinds of gradient descent can be used for optimization, etc.
Your learning and knowledge essay then moves on to other topics which are not of direct concern to us here. You move on to causality, correlations, and explanations. These are huge philosophical topics. But distilling it down. Most philosophers, scientists, and ordinary people would like to have a distinction between correlation and causation. Then once you have causation that can be used in explanation (roughly, in many cases to explain X is to give the causes of X). Also, with causation, sense can be made of counterfactuals (what might happen under different circumstances) and decision making (weighing up possible courses of action). An example might help here:
Smoking is correlated with lung cancer. Smoking is also correlated with cirrhosis of the liver. Were a smoker to stop smoking, that would reduce the possibility of him or her getting lung cancer. Were a smoker merely to stop smoking, that would not reduce the possibility of him or her getting cirrhosis of the liver. Why is this? Well, smoking causes lung cancer, smoking does not cause cirrhosis of the liver. Cirrhosis of the liver is caused by drinking alcohol. (When many folk drink they typically also smoke.)
Now Hume, and possibly you echoing Hume, might say: ‘oh, there are only correlations here’. They would be right that we do not observe any causation. But invoking laws and causation is one move we have had to make sense of our world.
A main reason Chomsky rejects LLMs is that they permit impossible languages. Discussion in this area is way above my pay grade. But let me see if I can make some kind of offering. (There is a book by Andrea Moro on this topic [Moro 2016].) Some languages are impossible for humans. For example, a language that requires more memory than humans have. However, some LLMs can generate and work with some of the impossible languages. So, human capabilities and those of LLMs cannot be one and the same.
In Chomsky’s work at large he sometimes uses a competence/performance distinction. Roughly, this contrasts between what is going on ‘inside’, inside the brain or mind, and what appears to the world outside. Consider a person doing simple arithmetic. Their performance, what they can do, getting right and wrong, is available. A hand-held calculator would usually have a far superior performance to that of a human. (The relevant inner competencies of the human and the calculator need not interest us.) In the case of human language, an LLM may well exceed a human in performance—that is why we can be fooled by chatbots and ChatGPT. But the competencies behind those performances are different, and, I think in Chomsky’s view, the language competence of a human cannot be that of an LLM.
In Chomsky’s guest essay, Noam Chomsky: The False Promise of ChatGPT (Chomsky, Roberts, and Watumull 2023), Chomsky basically focusses on philosophy of science. He argues a fallibilist realist hypothetico-deductivist position, pretty much Popperian falsificationism. Then he says: Chomsky-style linguistics meets the demands of this philosophy and LLMs do not.
Steven Piantadosi has a paper, Modern Language Models Refute Chomsky’s Approach to Language (Piantadosi 2023). This is carefully argued research linguistics. I do not have the training or skill to assess it in depth. But there is a point I want to make about it (which applies also to similar papers). Piantadosi has as the last sentence of his abstract:
Most notably, large language models have attained remarkable success at discovering grammar without using any of the methods that some in linguistics insisted were necessary for a science of language to progress [emphasis added].
The point is this. Chomsky’s view is that humans learn language (discover grammar) by means of an innate universal grammar. He does not assert that it is necessary that there is innate universal grammar. He asserts that innate universal grammar is the correct scientific hypothesis to explain human capabilities. This matters in the following way. If LLMs provide a different way of discovering grammar (for powerful computers) that affects Chomsky’s account not at all. What is needed for LLMs to have an impact on the Chomsky vs LLM debate is for it to be asserted that humans, particularly infants, learn language as LLMs do. Then, if anyone asserted this, they would be wrong thanks to humans simply not being exposed to enough data, to enough text.
To sum up, Chomsky has invoked innate abilities to explain human language acquisition and the language skills we have. He was proposing some science. I think that evidence shows that his views are false (see, for example, the Ibbotson and Tomasello paper). I am no expert on this. It is certainly open to discussion by those more expert than I. LLMs are good, indeed very good, at language performance. Chomsky asserts that humans are not LLMs. I think he is right in this, and the arguments he offers are sound. Chomsky favors a fallibilist realist hypothetico-deductivist approach to the philosophy of science. (I do have a soft spot for this. It was popularized in the London School of Economics in the 1960s, and forward, by Karl Popper and others. My training is from that institution at that time.)
Your conversation piece draws attention to similarities between the debate over innate knowledge in the history of philosophy and the debate between Chomsky and proponents of LLMs over innate abilities. I don’t really see the analogy as extending too far as one debate is philosophy and the other science. Your essay on Chomsky gets into causality, explanation, and the philosophy of science. It favors instrumentalism. That is the view that in science it is only predictions that matter (not explanations). Instrumentalism is the received view nowadays. This largely comes from modern science, especially quantum physics. So, you are in the right place—among intellectual friends. Nevertheless, there still are some old Popperian fogies in favor of fallibilist realism.
Thank you for reading Vatsal’s Newsletter. Have feedback? Write me at vatsal@readvatsal.com. For more posts, see the Archive. To receive new posts in your inbox, subscribe with your email address.