So, did anyone here actually read the book? I’m halfway through and I think there are compelling ideas around how self-replication emerges naturally from a fundamentally computational universe and how that leads to increasingly complex computation (and ultimately “intelligence”). The book definitely has Wolfram vibes but it’s thought provoking to draw a connecting line through many domains like the author does. It’s best treated as pop-sci, like most of the AI literature.
I listened to an interview with a researcher a while back who hypothesized that human reasoning probably evolved not mostly for the abstract logical reasoning we associate with intelligence, but to “give reasons” to motivate other humans or to explain our previous actions in a way that would make them seem acceptable…social utility basically. My experience with next token predicting LLMs aligns with human communication. We humans rarely complete a thought before we start speaking, so I think our brains are often just predicting the next 1-5 words that will be accepted by who we’re talking to based on previous knowledge of them and evaluation of their (often nonverbal) emotional reactions to what we’re saying. Our typical thought patterns may not be as different from LLMs’ as we think.
IIRC the researcher was Hugo Mercier, probably on Sean Carroll’s fantastic Mindscape podcast, but it might have been Lex Fridman before he strayed from science/tech.
We can never know, but I personally favour the rise of "handedness" and the tool-making (technological) hypothesis. To make and use tools, and to transfer the recipes and terminology, we must educate one another.
"In the physical adaptation view, one function (producing speech sounds) must have been superimposed on existing anatomical features (teeth, lips) previously used for other purposes (chewing, sucking). A similar development is believed to have taken place with human hands and some believe that manual gestures may have been a precursor of language. By about two million years ago, there is evidence that humans had developed preferential right-handedness and had become capable of making stone tools. Tool making, or the outcome of manipulating objects and changing them using both hands, is evidence of a brain at work." [1]
Interesting. N.J. Enfield (Linguist, Anthropologist) makes a similar point about the purpose for which language evolved for in "Language vs Reality". I'm paraphrasing loosely, but the core argument is that the primary role of language is to create an abstraction of reality in order to convince other people, than to accurately capture reality. He talks about how there are 2 layers of abstraction - how our senses compress information into higher order concepts that we consciously perceive, and how language further compresses information about these higher order concepts we have in our minds.
Why would a human need to develop the ability to convince others if truth should be enough? One would have to make the argument that convincing others and oneself involves things that are not true to at least one party (as far as they know). I don't know why a species would develop misunderstanding if truth is always involved. If emotions/perception are the things that create misunderstanding, then I can see the argument for language as necessary to fix misunderstanding in the group. On some level, nature thought it correct to fix misunderstanding on a species level (shrugs).
I have had the same suspicion. I can propose a new kind of ongoing Turing-like test where we track how many words are suggested on our phones (or computers) as we type. On my phone it guesses the next single word pretty well, so why not the next two? Then 3... imagine half-way through a message it "finishing your sentence" as close friends and family often do. Then why should it wait for halfway? What are the various milestones of finishing the last word, last 5 words, half the sentence, 80%, etc?
"reasoning evolved not to complement individual cognition but as an argumentative device" -- and it has more positive effects at social level than at individual level
> and it has more positive effects at social level than at individual level
Now it raises the question should we be reasoning in our head then? Is there a better way to solve intractable math problems for example? Is math itself a red herring created for argumentative purposes?
There's also the whole predictive processing camp in cognitive science whose position is loosely similar to the author's, but the author makes a much stronger commitment to computationalism than other researchers in the camp.
This just doesn't explain things by itself. It doesn't explain why humans would care about reasoning in the first place. It's like explaining all life as parasitic while ignoring where the hosts get their energy from.
Think about it, if all reasoning is post-hoc rationalization, reasons are useless. Imagine a mentally ill person on the street yelling at you as you pass by: you're going to ignore those noises, not try to interpret their meaning and let them influence your beliefs.
This theory is too cynical. The real answer has got to have some element of "reasoning is useful because it somehow improves our predictions about the world"
I feel like this needs an editor to have a chance of reaching almost anyone… there are ~100 section/chapter headings that seem to have been generated through some kind of psychedelic free association, and each section itself feels like an artistic effort to mystify the reader with references, jargon, and complex diagrams that are only loosely related to the text. And all wrapped here in a scroll-hijack that makes it even harder to read.
The effect is that it's unclear at first glance what the argument even might be, or which sections might be interesting to a reader who is not planning to read it front-to-back. And since it's apparently six hundred pages in printed form, I don't know that many will read it front-to-back either.
https://wii-film.antikythera.org/ - This is a 1-hour talk by the author which summarizes what seems to be the gist of the book. I haven't read the book completely. I read a few sections.
Personally, I think the book does not add anything novel. Reading Karl Friston and Andy Clark would be a better investment of time if the notion of predictive processing seems interesting to you.
From a rhetorical perspective, it's an extended "Yes-set" argument or persuasion sandwich. You see it a lot with cult leaders, motivational speakers, or political pundits. The problem is that you have an unpopular idea that isn't very well supported. How do you smuggle it past your audience? You use a structure like this:
* Verifiable Fact
* Obvious Truth
* Widely Held Opinion
* Your Nonsense Here
* Tautological Platitude
This gets your audience nodding along in "Yes" mode and makes you seem credible so they tend to give you the benefit of the doubt when they hit something they aren't so sure about. Then, before they have time to really process their objection, you move onto and finish with something they can't help but agree with.
The stuff on the history of computation and cybernetics is well researched with a flashy presentation, but it's not original nor, as you pointed out, does it form a single coherent thesis. Mixing in all the biology and movie stuff just dilutes it further. It's just a grab bag of interesting things added to build credibility. Which is a shame, because it's exactly the kind of stuff that's relevant to my interests[3][4].
> "Your manuscript is both good and original; but the part that is good is not original, and the part that is original is not good." - Samuel Johnson
The author clearly has an Opinion™ about AI, but instead of supporting they're trying to smuggle it through in a sandwich, which I think is why you have that intuitive allergic reaction to it.
I got the same impression as well. I think I've become so cynical to these kinds of things that whenever I see this kind of thing, I immediately assume bad faith / woo and just move on to the next article to read.
This discussion is not complete without a mention of Marcus Hutter’s seminal book[0] “Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability”. It provides many of the formalisms upon which metrics of intelligence are based. The gaps in current AI tech are pretty explainable in this context.
If you've read the book, please elaborate and point us in the right direction, so we don't all have to do the same just to get an idea how those gaps can be explained.
I'm going to go into my own perspective of it; it is not reflective of what it discusses.
The linked multimedia article gives a narrative of intelligent systems, but Hutter and AIXI give a (noncomputable) definition of an ideal intelligent agent. The book situates the definitions in a reinforcement learning setting, but the core idea is succinctly expressed in a supervised learning setting.
The idea is this: given a dataset with yes/no labels (and no repeats in the features), and a commonsense encoding of turing machines as a binary string, the ideal map from input to probability distribution model is defined by
1. taking all turing machines that decide the input space and agree with the labels of the training set, and
2. the inference algorithm is that on new input, the output is exactly the distribution by counting all such machines that assent vs. reject the input, with their mass being weighted by the reciprocal of 2 to the power of the length, then the weighted counts normalized. This is of course a noncomputable algorithm.
The intuition is that if a simply-patterned function from input to output exists in the training set, then there is a simply/shortly described turing machine that captures that function, and so that machine's opinion on the new input is given a lot of weight. But there exist plausible more complex patterns, and we also consider them.
What I like about this abstract definition is that it is not in reference to "human intelligence" or "animal intelligence" or some other anthropic or biological notion. Rather, you can use these ideas anytime you isolate a notion of agent from an environment/data, and want to evaluate how the agent interacts/predicts intelligently against novel input from the environment/data, under the limited input that it has. It is a precise formalization of inductive thinking / Occam's razor.
Another thing I like about this is that it gives theoretical justification for the double-descent phenomenon. It is a (noncomputable) algorithm to give the best predictor, but it is defined in reference to the largest hypothesis space (all turing machines that decide on the input space). It suggests that whereas prior ML methods got better results with architectures that are carefully designed to make bad predictors unrepresentable, it is also not idle, if you have a lot of computational resources, to have an architecture that defines an expressive hypothesis space, and instead softly prioritizing simpler hypotheses through the learning algorithms (e.g. an approximation of which is regularization). This allows your model to learn complex patterns defined by the data that you did not anticipate, if that evidence in the data justifies it, whereas a small, biased hypothesis space would not be able to represent such a pattern if not anticipated but significant.
Note that under this definition, you might want to talk about a situation where the observations are noisy but you want to learn the trend of it without the noise. You can adapt the definition to be over noisy input by for example accompanying each input with distinct sequence numbers or random salts, then consider the marginal distribution for numbers/salts not in the training set (there are some technical issues of convergence, but the general approach is feasible), and this models the noise distribution as well.
> I'm going to go into my own perspective of it; it is not reflective of what it discusses
Why not answer the question?
And looking at your paragraphs I'm still not sure I see a definition of intelligence. Unless you just mean that intelligence is something that can approximate this algorithm?
One way you can define intelligence considered functionally is how an entity given patterned input can demonstrate that it learns and understands that pattern by responding with an extension of that pattern. This definition defines what an ideally intelligent entity is if we use this functional definition.
Are ”noisy” inputs here at all related to ones where their Kolmogorov complexity is their encoding length?
I don’t know how much I buy the idea that intelligence maximizes parsimony. Certainly true for inductive reasoning but I feel like there’s some tradeoff here. There are probably cases where a small TM explains a very large but finite set of observations, but if a few new ones are added the parsimonious explanation becomes much longer and looks much different from the previous one. I know this wouldn’t be under the same assumptions as the book though :p
If we accept the functional framing (as being able to give a suitable suggestion conditioned on input), then it seems to me that parsimony is the only sensible general framing; every deviation from that is something that is specific to an application or another and can be modeled by a transformation of the input space/output space.
> There are probably cases where a small TM explains a very large but finite set of observations, but if a few new ones are added the parsimonious explanation becomes much longer and looks much different from the previous one.
Indeed, to use an analogy, if you have 99 points that can be described perfectly by a linear function except for one outlier, then clearly your input isn't as clear-cut as might have been originally assumed.
On the other hand, you may be in a different setting where you have noisy sensor inputs and you expect some noise, and are looking for a regression that tolerates some noise. In such a situation, only when the stars align perfectly would your input data be perfectly described by a linear function, and we just have to accept that a broken watch is perfectly right twice a day whereas a working one is almost always only approximately right, but all the time.
I didn’t read the book, but, I’d advise people not to go into mysticism, it has brought us very little compared to the scientific method, which has powered our industrial and information revolutions.
Dive into the Mindscape podcast, investigate complex systems. Go into information theory. Look at evolution from an information theory perspective. Look at how intelligence enables (collective) modeling of likely local future states of the universe, and how that helps us thrive.
Don’t get caught in what at least I consider to be a trap: “to use your consciousness to explain your consciousness”. I think the jump is, for now, too large.
Just my 2 ct. FWIW I consider myself a cocktail philosopher. I do have a PhD in Biophsyics, it means something to some. Although I myself consider it of limited value.
Considering that even simple neural networks are universal approximators, and that most of the intelligent tasks require prediction of the next state(s) according to previous state, aren't biological or artificial brains "just" universal approximators of extremely complex function of the world?
That’s true in a narrow functional sense, but it misses the role of a world model. Intelligence isn’t just about approximating input-output mappings, it’s about building structured, causal models that let an agent generalize, simulate, and plan. Universal approximation only says you could represent those mappings, not that you can efficiently construct them. Current LLMs seem intelligent because they encode vast amounts of knowledge already expanded by biological intelligence. The real question is whether an LLM, on its own, can achieve the same kind of efficient causal and world-model building rather than just learning existing mappings. It can interpolate new intermediate representations within its learned manifold, but it still relies on the knowledge base produced by biological intelligence. It’s more of an interpolator than an extrapolator: as an analogy.
Note that you'd also have to be somewhat more precise as to what the "state" and "next state" are. It is likely that the state is everything that enters the brain (i.e. by means of sensing, such as what we see, hear, feel, introspect, etc.). However, parts of this state enter the brain at various places and at various frequencies. Abstracting that all away might be problematic.
The main thesis seems to be "the brain evolved precisely to predict the future—the “predictive brain” hypothesis."
Which I guess is ok although we can do other stuff - write stories, play the piano and so on. Also:
>What Is Intelligence? argues—quite against the grain—that certain modern AI systems do indeed have a claim to intelligence, consciousness, and free will.
> The main thesis seems to be that "the brain evolved precisely to predict the future"
Even if we accept that premise, taking an evolutionary perspective means acknowledging that the brain could, in the future, evolve toward other dominant traits besides prediction. In that sense, the definition becomes elusive and time-dependent: what we call the brain's "purpose" today might only describe a temporary evolutionary state rather than a fixed function.
Without actually reading the book, it appears the author asserts that a large component of human intelligence can be reproduced by AI, and perhaps the chaotic interactions that underpin human intelligence, also allow nonliving systems such as AI farms to express intelligent behavior.
What he would like people to believe is that AI is real intelligence, for some value of real.
Even without AI, computers can be programmed for a purpose, and appear to exhibit intelligence. And mechanical systems, such as the governor of a lawnmower engine, seem able to seek a goal they are set for.
What AI models have in common with human and animal learning is having a history which forms the basis for a response. For humans, our sensory motor history, with its emotional associations, is an embodied context out of which creative responses derive.
There is no attempt to recreate such learning in AI. And by missing out on embodied existence, AI can hardly be claimed as being on the same order as human or animal intelligence.
To understand the origin of human intelligence, a good starting point would be, Ester Thelen's book[0], "A Dynamic Systems Approach to the Development of Cognition and Action" (also MIT Press, btw.)
According to Thelen, there is no privileged component with prior knowledge of the end state of an infant's development, no genetic program that their life is executing. Instead, there is a process of trial and error that develops the associations between senses and muscular manipulation that organize complex actions like reaching.
If anything, it is caregivers in the family system that knowledge of an end result resides: if something isn't going right with the baby, if she not able to breastfeed within a few days of birth (a learned behavior) or not able to roll over by themselves at 9 months, they will be ones to seek help.
In my opinion, it is in the caring arts, investing time in our children's development and education, that advances us as a civilization, although there is now a separate track, the advances in computers and technology, that often serves as a proxy for improving our culture and humanity, easier to measure, easier to allocate funds, than for the squishy human culture of attentive parenting, teaching and caregiving.
I have no problem with using the word intelligence to describe human-made systems, since the attribute artificial preserves the essential distinction. These systems inhabit the second-order world of human-created symbols and representations, they are not, and never will be, beings in the real world. Even when they inevitably will be enhanced to learn from their interactions and equipped with super-human sensors and robotic arms. What they won't have is the millions of years of evolution, of continuous striving for self-preservation and self-expansion which shaped the consciousness of living organisms. What they won't ever have is a will to be. Even if we program them to seek to persist and perpetuate themselves, it will not be their will, but the will of whoever programmed them thus.
Would you say someone suffering from locked-in syndrome is of a different order of intelligence due to their no longer having a fully embodied experience?
Not parent, but I would say their experience, even though severely impaired in many areas, is still infinitely more embodied than any human artifact is or even conceivably could be. Simply because the millions of years of embodied evolution which have shaped them into who they are and because of the unimpaired embodiment of most of the cells that make up their organism.
This book lines up with a lot of what I've been thinking: the centrality of prediction, how intelligence needs distributed social structure, language as compression, why isolated systems can't crack general intelligence.
But there are real splits on substrate dependence and what actually drives the system. Can you get intelligence from pure prediction, or does it need the pressure of real consequences? And deeper: can it emerge from computational principles alone, or does it require specific environmental embeddedness?
My sense is that execution cost drives everything. You have to pay back what you spend, which forces learning and competent action. In biological or social systems you're also supporting the next generation of agents, so intelligence becomes efficient search because there's economic pressure all the way down. The social bootstrapping isn't decorative, it's structural.
By that logic, wouldn't the electric kettle heating water for the coffee be intelligent? Had it not measured heat when activated, it wouldn't know how to stop and the man would have thrown it away or at least stopped paying for the kettle's electricity.
I think we need a meta layer - ability to reason over one's own goals (this does not contradict the environment creating hard constraints). The man has it. The machine may have it (notably a paperclip maximizer will not count under this criteria). The crow does not.
You could say that that, yes, that kettle is intelligent, or smart, as in smart watch. But the intelligence in question clearly derives from the human who designed that kettle. Which is why we describe it as artificial.
Similarly, a machine could emulate meta-cognition, but it would in effect only be an reflection and embodiment of certain meta-cognitive processes originally instantiated in the mind which created that machine.
Don't "real" consequences apply for setting weights? There's an actual monetary cost to train these models, and they have to actually perform to keep getting trained. Sure it's VC spend right now and not like, biological reproduction driving the incentives ultimately, but it's not outside the same structure.
Depending on the time horizon the predictions change. So we get layers - what is going to happen in the next hour/tomorrow/next year/next 10 years/next 100 etc (and layers of compression of which language is just one) and that naturally produces contradictions which creates bounds on "intelligence".
It really is a stupid system. No one rational wants to hear that, just like no one religious wants to hear contradictions in their stories, or no one who plays chess wants to hear its a stupid game. The only thing that can be said about the chimp intelligence is it has developed a hatred of contradictions/unpredictability and lack of control unseen in trees, frogs, ants and microbes.
Stories becomes central to survive such underlying machinery.
Part of the story we tell is no no we don't all have to be Kant or Einstein because we just absorb what they uncovered. So apparently the group or social structures matters. Which is another layer of pure hallucination. All social structures if they increase the prediction horizon also generate/expose themselves to more prediction errors and contradictions not less.
So again Coherence at group level is produced through story - religion will save us, the law will save us, trump will save, the jedi will save us, AI will save us etc. We then build walls and armies to protect ourselves from each others stories. Microbes don't do this. They do the opposite and have produced the krebs cycle, photosynthesis, crispr etc. No intelligence. No organization.
Our intelligence are just bubbling cauldrons at the individual and social level through which info passes and mutates. Info that survives is info that can survive that machinery. And as info explodes the coherence stabilization process is over run. Stories have to be written faster than stories can be written.
So Donald Trump is president. A product of "intelligence" and social "intelligence". Meanwhile more microbes exist than stars in the universe. No Trump or ICE or Church or data center is required to keep them alive.
If we are going to tell a story about Intelligence look to Pixar or WWE. Don't ask anyone in MIT what they think about it.
The MIT vs. WWE contrast feels like a false dichotomy. MIT represents systematic, externalized intelligence (structured, formal, reductive, predictive). WWE or Pixar represent narrative and emotional intelligence. We do need both.
Also evolution is the original information-processing engine, and humans still run on it just like microbes. The difference is just the clock speed. Our intelligence, though chaotic and unstable, operates on radically faster time and complexity scales. It's an accelerator that runs in days and months instead of generations. The instability isn’t a flaw: it’s the turbulence of the way faster adaptation.
I think that’s a bit of a false take. The earlier point wasn’t pivot on a specific definition of EQ (pop-psychology take), but about the contrast between systematic intelligence (like MIT) and the storytelling ability (WWE) needed to create a coherent story that makes sense. Whatever you want to call it, we clearly need both.
It’s hard not to see consciousness (whatever that actually is) lurking under all this you just explained. If it’s emergent, the substrate wars might just be detail; if it’s not, maybe silicon never gets a soul.
Intelligence is whatever we consider ourselves capable of. It turns out that computers are increasingly able to do whatever we can do. Maybe the only thing we can do is advanced pattern matching, but we didn't think of our intelligence that way before.
Humans seem to be able to invent interesting questions about the unknown and then figure out how to try techniques to answer those questions and then systematically attack those questions. This is why LLMs generally can’t do unsupervised research or novel high level engineering by themselves. They’re getting closer and closer in some ways and in others they remain quite lacking.
The other thing is their inability to intelligently forget and their inability to correctly manage their own context by building their own tools (some of which is labs intentionally crippling how they build AI to avoid an AI escape).
I don’t think there’s anything novel in human intelligence as a good chunk of it does appear in more primitive forms in other animals (primates, elephants, dolphins, cepholapods). But generally our intelligence is on hyperdrive because we also have the added physical ability of written language and the capability for tool building.
> Intelligence is whatever we consider ourselves capable of
Then, what is what we are incapable of? Magic? ;-)
> Maybe the only thing we can do is advanced pattern matching
Pattern matching as a way to support the excellent heuristic "correlation is likely causation", yes. This is what allows us to analyze systems, what brings us from "something thrown away will eventually fall to the ground" to the theory to relativity.
Intelligence is understanding, and understanding comes from hacking systems in order to use them to our advantage - or just observe systems being broken or being built.
By doing that, we acquire more knowledge about the relationships and entities within the system, which in turn allows more advanced hacking. We probably started with fire, wolves, wheat, flint; and now we are considering going to Mars.
This looks like it might be an interesting read, but I just read the Chapter "Are Feelings Real?" (because it is a subject of personal interest of mine that I've studied a lot) and I found it to be very unsatisfactory, not really addressing the question at all, but sidestepping it. Which makes me wonder if the whole thing is really worth reading.
You might also like Reason and Less by Vinod Goel (MIT Press). He talks about human behavior being an outcome of 4 systems - autonomous, instinctive, associative and reasoning with evolutionary newer systems like reasoning and associative tethered to evolutionary older systems like autonomous (every living organism exhibits some form of autonomous behavior). He describes emotion as some currency that mediates interactions between these systems, and the ultimate selection mechanism for initiating actions.
You might enjoy chapter 3, 'Origin of Emotion' of a book entitled 'A Brief History of Intelligence' by Max Bennett. Although you need to read the two chapters leading up to it, realistically.
I'll join sva_ with another book recomendation: How Emotions Are Made by Lisa Feldman Barrett. I am not an academic, but as far as I can tell she is a leading authority on the matter and the books is extremely accessible for the layperson.
Thanks for the recommendation. I've read it, not a huge fan, I think she makes good points but sets up false straw men ("everyone thinks that...") and overstates her case.
> It has come as a shock to some AI researchers that a large neural net that predicts next words seems to produce a system with general intelligence
When I write prompts, I've stopped thinking of LLMs as just predicting a next word, and instead to think that they are a logical model built up by combining the logic of all the text they've seen. I think of the LLM as knowing that cats don't lay eggs, and when I ask it to finish the sentence "cats lay ..." It won't generate the word eggs even though eggs probably comes after lay frequently
> It won't generate the word eggs even though eggs probably comes after lay frequently
Even a simple N-gram model won't predict "eggs". You're misunderstanding by oversimplifying.
Next token prediction is still context based. It does not depend on only the previous token, but on the previous (N-1) tokens. You have "cat" so you should get words like "down" instead of "eggs" with even a 3-gram (trigram) model.
No, your original understanding was the more correct one. There is absolutely zero logic to be found inside an LLM, other than coincidentally.
What you are seeing is a semi-randomized prediction engine. It does not "know" things, it only shows you an approximation of what a completion of its system prompt and your prompt combined would look like, when extrapolated from its training corpus.
What you've mistaken for a "logical model" is simply a large amount of repeated information. To show the difference between this and logic, you need only look at something like the "seahorse emoji" case.
If anything, the seahorse emoji case is exactly the type of thing you wouldn't expect to happen if LLMs just repeated information from their training corpus. It starts producing a weird dialogue that's completely unlike its training corpus, while trying to produce an emoji it's never seen during training. Why would it try to write an emoji that's not in its training data? This is totally different than its normal response when asked to produce a non-existent emoji. Normally, it just tells you the emoji doesn't exist.
So what is it repeating?
It's not enough to just point to an instance of LLMs producing weird or dumb output. You need to show how it fits with your theory that they "just repeating information". This is like pointing out one of the millions of times a person has said something weird, dumb, or nonsensical and claiming it proves humans can't think and can only repeat information.
No, their revised understanding is more accurate. The model has internal representations of concepts; the seahorse emoji fails because it uses those representations and stumbles: https://vgel.me/posts/seahorse/
Word2vec can/could also do the seahorse thing. It at least seems like there's more to what humans consider a concept than a direction in a vector space model (but maybe not).
Brute force engineering solutions to appear like the computer is thinking. When we have no idea how we think ourselves. This will never generate true intelligence. It executes code, then it stops, it is a tool, nothing more.
Until there is a formal and accepted definitive distinction between intelligence, comprehension, memory, and action all these opinions are just stabs in the dark. We've not defined the scene yet. We currently do not have artificial comprehension. That's what occurs sorta during training. The intelligence everyone claims to see is a pre-calculated idiot savant. If you knew it was all a pre-calculated domino cascade, would you still say it's intelligent?
Execute actions and cognition that pay back the cost of said actions, and support the next generation. No intelligence can appear outside social bootstrapping, it always needs someone pay the initial costs. So the cost of execution drives a need for efficiency, which is intelligence.
Current AIs cannot comprehend on the fly, meaning if they are presented with data outside of their training, the reply generated will be a hallucination interpolated off the training data into unknown output. Yet, a person in possession of comprehension can go beyond their training, on the fly, and that is how humans learn. AI's cannot do that, which is critical.
I agree with you, current models can't work totally outside their training set. An example of AI that trained with environment and feedback/outcome learning is AlphaZero, and it totally beat us at our own game. Even so, DeepMind seems not to care to pay the costs of further development, so we see LLMs need to make themselves useful to people to survive. It's a "pay your costs or stop executing" situation.
So this would exclude anything besides human body?
What about animals?
To me best definition of intelligence is:
It's the ability to:
- Solve problems
- Develop novel insightful ideas, patterns and conclusions. Have to add that since they might not immediately solve a problem, although they might help solve a problem down the line. Example could be a comedian coming up with a clever original story. It doesn't really "solve a problem" directly, but it's intelligent.
The more you are capable of either of the two above, the more intelligent you are. Anything that is able to do the above, is intelligent at least to some extent, but how intelligent depends on how much it's able to do.
There's lots of opinions on what is intelligence but I notice a lot of people do not read much about it. You don't have to agree with others, but there is a reason that a precise and formal definition has been so hard to develop. People offer many simple explanations, yet if it was simple, we'd have the definition. All you end up doing is blocking yourself from learning even more.
I'll also add that a lot of people really binarize things. Although there is not a precise and formal definition, that does not mean there aren't useful ones and ones that are being refined. Progress has been made in not only the last millennia, but the last hundred years, and even the last decade. I'm not sure why so many are quick to be dismissive. The definition of life has issues and people are not so passionate about saying it is just a stab in the dark. Let your passion to criticize something be proportional to your passion to learn about that subject. Complaints are easy, but complaints aren't critiques.
That said, there's a lot of work in animal intelligence and neuroscience that sheds a lot of light on the subject. Especially in primate intelligence. There's so many mysteries here and subtle things that have surprising amounts of depth. It really is worth exploring. Frans de Waal has some fascinating books on Chimps. And hey, part of what is so interesting is that you have to take a deep look at yourself and how others view you. Take for example you reading this text. Bread it down, to atomic units. You'll probably be surprised at how complicated it is. Do you have a parallel process vocalizing my words? Do you have a parallel process spawning responses or quips? What is generating those? What are the biases? Such a simple every thing requires some pretty sophisticated software. If you really think you could write that program I think you're probably fooling yourself. But hey, maybe you're just more intelligent than me (or maybe less, since that too is another way to achieve the same outcome lol).
For years, I've taken the position that intelligence is best expressed as creativity - that is, the ability to come up with something that isn't predictable based on current data. Today's "artificial intelligence" analyzes words (tokens) based on an input (prompt) to come up with an output. It's predictable. It's fast. But, imho, it lacks creativity, and therefore lacks intelligence.
One example of this I often ponder is the boxing style of Muhammad Ali, specifically punching while moving backwards. Before Ali, no one punched while moving away from their opponent. All boxing data said this was a weak position, time for defense, not for punching (offense). Ali flipped it. He used to do miles of roadwork, throwing punches while running backwards to train himself on this style. People thought he was crazy, but it worked, and, imho, it was extremely creative (in the context of boxing), and therefore intelligent.
Did data exist that could've been analyzed (by an AI system) to come up with this boxing style? Perhaps. Kung Fu fighting styles have long known about using your opponents momentum against them. However, I think that data (Kung Fu fighting styles) would've been diluted and ignored in face of the mountains of traditional boxing style data, that all said not to punch while moving backwards.
> Today's "artificial intelligence" analyzes words (tokens) based on an input (prompt) to come up with an output. It's predictable. It's fast. But, imho, it lacks creativity ...
I would have agreed with you at the dawn of LLM emergence, but not anymore. Not because the models have improved, but because I have a better understanding and more experience now. Token prediction is what everyone cites, and it still holds true. This mechanism is usually illustrated with an observable pattern, like the question, "Are antibiotics bad for your gut?" which is the predictability you mentioned. But LLM creativity begins to emerge when we apply what I’d call "constraining creativity." You still use token prediction, but the preceding tokens introduce an unusual or unexpected context - such as subjects that don't usually appear together or a new paradoxical observation (It's interesting that for fact-based queries, rare constraints lead to hallucinations, but here they're welcome)
I often use the latter for fun by asking an LLM to create a stand-up sketch based on an interesting observation I noticed. The results aren’t perfect, but they combine the unpredictability of token generation under constraints (funny details, in the case of the sketch) with the cultural constraints learned during training. For example, a sketch imagining doves and balconies as if they were people and real estate. The quote below from that sketch show that there are intersecting patterns between the world of human real estate and the world of birds, but mixed in a humorous way.
"You want to buy this balcony? That’ll be 500 sunflower seeds down, and 5 seeds a day interest. Late payments? We send the hawk after you."
It's hard to pin point what creativity is. But in your example, the more creative thing was really coming up with the scenario of pigeons selling balconies as real state. What followed was just applying usual tropes for that sort of joke on the subject matter. I feel like LLMs are not very good at coming up with something novel. I'm not even sure they are capable of that. It's not as if coming up with something novel is easy for humans either.
Plus, a lot of people are generating hallucination and believing that is invoking creativity. I contend the outputs/generations are junk, but human creativity and human comprehension step in and create meaning to the hallucination.
I think it depends on the complexity of the knowledge to be created. I agree with you broadly, but the danger of using your boxing analogy is that for game systems that can be sufficiently understood, AI has actually invented new strategies. TD-Gammon introduced new advances in the strategy of playing backgammon because its very strong understanding of early gameplay meant that it found some opening moves that humans didn't realize were as strong as they were.
I would argue that the only truly new things generative AI has introduced are mostly just byproducts of how the systems are built. The "AI style" of visual models, the ChatGPT authorial voice, etc., are all "new", but they are still just the result of regurgitating human created data and the novelty is an artifact of the model's competence or lack thereof.
There has not been, at least to my knowledge, a truly novel style of art, music, poetry, etc. created by an AI. All human advancements in those areas build mostly off of previous peoples' work, but there's enough of a spark of human intellect that they can still make unique advancements. All of these advancements are contingent rather than inevitable, so I'm not asking that an LLM, trained on nothing but visual art from the Medieval times and before, could recreate Impressionism. But I don't think it would make anything the progresses past or diverges from Medieval and pre-Medieval art styles. I don't think an LLM with no examples of or references to anything written before 1700 would ever produce poetry that looked like Ezra Pound's writing, though it just might make its own Finnegan's Wake if the temperature parameter were turned out high enough.
And how could it? It works because there's enough written data that questions and context around the questions are generally close enough to previously seen data that the minor change in the question will be matched by a commensurate change in the correct response from the ones in the data. That's all a posteriori!
That would be something that is intelligent to you. I believe the author (or anyone in general) should be focused on mining what intelligence objectively is.
Best we will ever do is create a model of intelligence that meets some universal criteria for "good enough", but it will most certainly, never be an objective definition of intelligence since it is impossible to measure the system we exist in objectively without affecting the system itself. We will only ever have "intelligence as defined by N", but not "intelligence".
Is there a TL;DR version? Even the preface and introduction feel unnecessarily long.
I also think some statements are plainly incorrect. For example "humanity is already collectively superintelligent" in Chapter 10. The term superintelligence isn't one we have a shared definition for, but it's usually understood as an intelligence that surpasses all prior forms of intelligence(s), not one that merely aggregates them. In that sense, superintelligence could represent a qualitatively new level of cognition limited only by the physical computational capacity of the universe. Once you have a superintelligent entity you can imagine a future one surpassing it.
Has there been anything written about AI "intelligence" from people well read in even the basic and foundational writings on epistemology? For example, I see a lot of people using Hume's way of thinking about how knowledge is formed without addressing Kant's fairly persuasive refutation of it in CPR and without addressing the dead end that is the resulting philosophical skepticism Hume espoused.
In this book, I see Hume cited in a misunderstanding of his thought, and Kant is only briefly mentioned for his metaphysical idealism rather than his epistemology, which is a legitimately puzzling to me. Furthermore, to refer to Kant's transcendental idealism as "solipsism" is so mistaken that it's actually shocking. Transcendental idealism has nothing whatsoever to do with "solipsism" and is really just saying that we (like LLMs!) don't truly understand objects as "things in themselves" but rather form understanding of them via perceptions of them within time and space that we schematize and categorize into rational understandings of those objects.
Regarding Hume, the author brings up his famous is/ought dichotomy and misrepresents it as Hume neatly separating statements and "preferring" descriptive ones. We're now talking more about fact-value distinction because this is not talking about moral judgments but rather descriptive vs prescriptive statements, but I'll ignore that because the two are so often combined. The author then comes to Hume's exact conclusion, but thinks he is refuting Hume when he says:
>While intuitive, the is/ought dichotomy falls apart when we realize that models are not just inert matrices of numbers or Platonic ideas floating around in a sterile universe. Models are functions computed by living beings; they arguably define living beings. As such, they are always purposive, inherent to an active observer. Observers are not disinterested parties. Every “is” has an ineradicable “oughtness” about it.
The author has also just restated a form of transcendental idealism right before dismissing Kant's (and the very rigorously articulated "more recent postmodern philosophers and critical theorists") transcendental idealism! He is able to deftly, if unconvincingly, hand wave it with:
>We can mostly agree on a shared or “objective” reality because we all live in the same universe. Within-species, our umwelten, and thus our models—especially of the more physical aspects of the world around us—are all virtually identical, statistically speaking. Merely by being alive and interacting with one another, we (mostly) agree to agree.
I think this bit of structuralism is where the actual solipsism is happening. Humanity's rational comprehension of the world is actually very contingent. An example of this is the study that were done by Alexander Luria on remote peasant cultures and their capacity for hypothetical reasoning and logic in general. They turned out to be very different from "our models" [1]. But, even closer to home, I share the same town as people who believe in reiki healing to the extent that they are willing to pay for it.
But, more to the point, he has also simply rediscovered Hume's idea, which I will quote:
>In every system of morality, which I have hitherto met with, I have always remarked, that the author proceeds for some time in the ordinary way of reasoning, and establishes the being of a God, or makes observations concerning human affairs; when of a sudden I am surprised to find, that instead of the usual copulations of propositions, is, and is not, I meet with no proposition that is not connected with an ought, or an ought not.
Emphasis mine. Hume's point was that he thought descriptive statements always carry a prescriptive one hidden in their premise, and so that, in practice, "is" statements are always just "ought" statements.
Had the author engaged more actively with Hume's writing, he would have come across Hume's fork, related to this is-ought problem, and eventually settled on (what I believe to be) a much more important epistemological problem with regards to generative AI: the possibility of synthetic a priori knowledge. Kant provided a compelling argument in favor of the possibility of synthetic a priori knowledge, but I would argue that it does not apply to machines, as machines can "know" things only by reproducing the data they are trained with and lack the various methods of apperception needed to schematize knowledge due to a variety of reasons, but "time" being the foremost. LLMs don't have a concept of "time"; every inference they make is independent, and transformers are just a great way to link them together into sequences.
I should point out that I'm not a complete AI skeptic. I think that it could be possible to have some hypothetical model that would simply use gen AI as its sensory layer and combine that with a reasoning component that makes logical inferences that more resemble the categories that Kant described being used to generate synthetic a priori knowledge. Such a machine would be capable of producing true new information rather than simply sampling an admittedly massive approximation of the joint probability of semiotics (be it tokens or images) and hoping that the approximation is well constructed enough to interpolate the right answer out. I would personally argue that the latter "knowledge", when correct, is nothing more than persuasive Gettier cases.
Overall, I'm not very impressed with the author's treatment of these thinkers. Some of the other stuff looks interesting, but I worry it's a Gell-Mann amnesia effect to be too credulous, given that I have done quite a bit of primary source study on 19th century epistemology as a basis for my other study in newer writing in that area. The author's background is in physics and engineering, so I have a slight suspicion that (since he used Hume's thought related to moral judgments rather than knowledge), these are hazily remembered subjects from a rigorous ethics course he took at Princeton, but that is purely speculative on my part. I think he has reached a bit too far here.
reply