Voting is open for the 2026 Book Reviews. Rate any reviews you’ve read.Closes Jun 15, 2026
Back to archive

Antonio Damasio’s Feeling & Knowing: Making Minds Conscious

Rate this review
2026 Contest28 min read6,268 words

I have always been fascinated and frustrated by philosophy. On the one hand, philosophers take on the biggest and most important questions humanity ever wrestled with: God, virtue, being, identity, politics, language… and consciousness.

On the other hand, it is not obvious that they have made any meaningful progress.

Alter Ego: That’s unfair. Philosophy created the conceptual infrastructure for modern civilization, and the habits of mind underlying science, democracy, and pretty much every kind of innovation we take for granted. As Keynes is often paraphrased, "We all think the thoughts of dead philosophers."

That's fair. We owe philosophy a huge debt of gratitude. And yet… I agree with Robert Pirsig (best known for Zen and the Art of Motorcycle Maintenance) that most of what we call philosophy is really “philosophology” – like musicology versus music, it seems more interested in arguing about what people meant than actually creating useful meaning.

Alter: You have a point -- but then again, you are using a philosopher to make your point. On the other hand, that reinforces your claim that much of philosophy is just arguing with itself, so I will allow it.

Thank you.

Nowhere do I find this contradiction more painful than in the study of consciousness. Subjective consciousness is arguably the ONLY thing we ever experience, yet philosophers — and laypeople — seem split whether it is some ineffable property decoupled from matter, or an illusion that doesn’t actually do anything at all.

I suppose I shouldn’t be too hard on the philosophers. After all, the only data they have to work with are first-person accounts from humans. We have no information about what thoughts — if any! — flit through the heads of dogs and dolphins, never mind carp and cockroaches. Neuroscience has done amazing work on understanding perception and memory, but sadly has done nothing to illuminate the mystery of consciousness. Right?

Wrong.

I. Encountering Damasio

I was on vacation with our family on the California Coast. My spouse had taken our children shopping; I had no interest in cheesy souvenirs, so I went looking for a place to nap.

When I woke, I found out they were ensconced in a boutique bookstore, of the kind it’s hard to find in big cities anymore. I am always grateful when my kids show interest in looking at anything other than glowing pixels, so I went in to help them look.

When I first saw Damasio's book, I assumed it was some sort of crazy New Age meditation — presumably written by some local character who did too much acid in the 1960’s but was convinced that they had achieved enlightenment. When I turned the book over and discovered that Damasio was actually a bona fide researcher, I was intrigued. When I read the opening section, about the distinction (and overlap) between “life” and “intelligence” — something clicked.

“Oh my God”, I thought. “This guy actually knows what he is talking about. He has precise definitions that explain concrete empirical observations using everyday language. And he claims to be able to explain consciousness?”

I bought the book. I read it straight through in a few days.

I didn’t become enlightened. I became angry.

I’d read and listened to dozens of brilliant thinkers, theologians, and technologists — including some on this very blog — opining confidently about the nature (or non-nature) of consciousness, soul, the human spirit, etc. None of them even hinted that there might already be a scientifically-grounded definition and explanation for consciousness. They talked as if their perspective was as good as anyone else’s, because after all there was no hard science to say otherwise.

I am no neuroscientist. I cannot promise that Damasio is right about his interpretation, or even his facts. However, I do assert that he has provided a credible, coherent, and comprehensive theory that deserves to be taken seriously. Especially by those wrestling with existential questions about the nature of humanity in the Age of AI.

Ready to dive in?

II. The Biological Basis of Consciousness

At its core, Damasio’s claim is simply: Consciousness is the feeling that sensations belong to me.

He builds up to this from the bottom up, from Sensing to Minding to Feeling to Knowing.

A. Sensing

For Damasio, everything starts with homeostasis: an organism’s need to maintain equilibrium within its environment in order to survive. This core mechanic — an internal response to external circumstances — is the key to understanding everything that comes later.

This is easiest to see in single-celled organisms where the sensing is purely chemical, causing the organism to move (or grow) towards useful resources and away from harmful ones. But he claims it is also the reason plants react to gravity and sunlight, animals avoid predators while seeking food, and humans chase money, sex, and power. These are all just elaborations of our primordial chemistry trying to survive long enough to reproduce (what I call "self-perpetuation").

This also explains his definition of “intelligence” as the ability to use those sensations to ensure homeostasis.

But there’s one particular elaboration of intelligence that is central to the story he wants to tell about consciousness.

B. Minding

Something extraordinary happens when an animal gains a central nervous system. Rather than chemical signals triggering direct responses, neurons generate patterns (what he calls “images”) that convey sensory input to other neurons. Those neurons can then combine, analyze, store, and act on the resulting patterns.

And this is what it means to “mind” a body.

C. Feeling

Crucially, these images are not value-neutral. They are the direct descendants of the homeostasis-preserving sensations that enabled bacteria on up to thrive in challenging environments.

This has two profound consequences. The first is that every image is value-laden, colored in some way (however remote) by its impact on our homeostasis.

Secondly, minded systems already have the ability to distinguish perceptions (our reactions to external stimuli) from interoception (of internal stimuli). This is mediated by proprioception, which is awareness of the rigid musculoskeletal structure that is the interface between our internal and external worlds.

Alter: Slow down. You seem to be conflating (or perhaps separating) two things we commonly call "emotion." One is a general sense of how "good" vs "bad" something is, the other is the specific visceral experiences we label as "hunger", "anger", etc.

I apologize for the confusion. Let's use Damasio's definitions:

  • "emotion" for the underlying bodily reaction (e.g., heart beating faster)
  • "valence" for the "good-bad" axis
  • "feeling" for the mental perception of those reactions

For convenience, I extend the term "affect" to cover all three.

D. Knowing

In this view, "knowing" is just a higher-order feeling: the feeling that a specific sensation belongs to you. And that all animals with central nervous systems (certainly all vertebrates, and perhaps even many arthropods) are by this definition conscious.

Alter: Wait, is he saying that a fish is just as conscious as a human?

No, he's more precise than that. Humans and higher mammals (e.g., apes and whales) also "know that they know" (reflective self-awareness). He calls this "Extended (or Autobiographical) Consciousness" as opposed to the "Core Consciousness" of other animals, and the non-consciousness of un-minded creatures (e.g., plants and bacteria).

E. SMFK Model of Consciousness

Alter: C'mon, consciousness can't really be that simple.

Why not? Sure, there's a lot of subtle chemistry and signaling going on. However, the basic structure of Sensing/Minding/Feeling/Knowing ("SMFK") is remarkably simple and well-documented:

  1. Every living system requires some sort of feedback mechanism ("Sensing") to perpetuate itself
  2. The central nervous system ("Minding") is what allows animals to coordinate muscular responses to external threats and opportunities, as well as internal needs.
  3. This requires neurons conveying valenced patterns ("Feelings") that drive intelligent (problem-solving) responses.
  4. Consciousness arises when the organism feels that these signals happen to a self ("Knowing")

Alter: That last point seems the most controversial.

Fair. But let's break that down into its constituent propositions:

  1. Consciousness is a kind of awareness.
  2. That awareness is a kind of feeling.
  3. Specifically, it is the feeling of having experiences.
  4. Those experiences are perceived as happening to a self.
  5. That self is the one I am homeostatically motivated to perpetuate.
  6. All of this is mediated by ordinary brain structures.

To me, this all seems eminently plausible. The only real leap is his claim that (4) is consciousness, rather than a feature of, or precondition for, consciousness.

What makes that leap rigorous is (6) -- the idea that "self-identification of experiences" is the key function of brains that makes a mind conscious.

Alter: So, has he actually proved (6)?

Well... "prove" might be too strong a word. For two reasons.

First, at some level he is simply asserting a definition of consciousness. We each have to decide whether we agree with that definition -- though I would argue that any intellectually-honest critique would have to propose a more rigorous and thoroughly-grounded alternative.

Second, as far as I can tell he has not empirically determined exactly where in the brain this identification takes place.

Not for lack of trying. In Section IV, he has a whole chapter on "The Cerebral Cortices and the Brain Stem in the Making of Consciousness" -- which frankly made my eyes glaze over. But the closest he appears to come is on page 177:


Together, the insular cortex and the subcortical components that feed into it constitute an "affect complex." The critical question, at this point, is how do these two sets of structures-the posterior sensory cortices and the "affect complex"-combine to produce conscious minds? I envision two possibilities. One calls for actual neural projections from the "affect complex" to the "posterior sensory set" and vice versa. The other possibility calls for approximate simultaneity of activations in the two sets, resulting in the production of a time-based ensemble. In either option, the ultimate realization of a conscious mind depends on both sets of brain structures; we cannot "localize" consciousness to one or the other set. Moreover, one other sector of the cerebral cortices appears to play a role in coordinating the conscious mind processes. The sector is known as the... Postero Medial Cortices...It encompasses cortices largely located in medial (internal) and posterior surfaces of the cerebral hemispheres. This region may possibly direct the participation of other cerebral cortices in the making of a conscious mind.


Alter: Sounds like hand-waving to me. He doesn't really have conclusive evidence.

Fair. But from the perspective of theory-building, this lack of certainty is arguably a feature, not a bug. The whole point of science is to construct falsifiable models of reality that can be empirically tested. He has staked his theory of consciousness on a very concrete claim about a specific brain process. Future research -- whether from simulations, scans, or surgery -- will either support or undermine his claim.

Either way, it turns the problem of consciousness into a matter of study rather than of argument. Which is exactly what I've been asking for all along.

III. Interlude: Is Consciousness Just Biological?

Alter: So, let's say I assume you (and Damasio) aren't making all this up. Does this mean he has solved the hard problem of consciousness? And proved it must be biological?

Beta: Well, duh! Just look at the facts. Minds don't emerge from deterministic matter. They emerge from the need to integrate perception and interoception to create homeostatic action. They are just faster and more recursive versions of the chemical signalling used by bacteria. Nothing spooky or mechanistic about it. Philosophers and computer scientists simply confused themselves by obsessing over language rather than paying attention to the biology.

Contra: Hang on, you're jumping way ahead of the evidence. Sure, Damasio may have a plausible biological explanation for what he calls "Core Consciousness." But a nematode's neural net is a far cry from a human brain. What I mean by "consciousness" is closer to his Extended Consciousness. And your hand-waving of "just faster and more recursive" is the difference between an abacus and a computer. Maybe that type of computation is the real basis of consciousness, and all this wetware is merely a distraction.

Beta: [rolling her eyes] Please, you're totally missing the point. Consciousness matters because it is about something. In particular, it is about reconciling signals -- patterns or "images" -- that convey information about the viability of an organism.

Contra: Ha! You agree with me. Consciousness is about processing information.

Beta: Stop interrupting me! You're not listening. It is information about the viability of an organism. That is why it has valence, and affect. A biological mind feels that it has a self. It feels that some stimuli require a response. We experience the tension of competing responses to contrary information as consciousness.

Contra: You're doing it again. Shifting from a low-level objective description ("it") of Core Consciousness to the high-level subjective experience ("we") of Extended Consciousness.

Me: Um, can I say something?

Beta: [ignoring Me to berate Contra] Now you're just being petty. Focusing on a trivial syntax error to claim semantic victory.

Contra: [rolling his eyes] And you're making the same arbitrary distinctions you always do. Implicitly assuming that semantics can't emerge from syntax without bothering to prove it. Never mind the massive existence proof of LLMs doing exactly that.

Beta: [exasperated] LLMs aren't even minds! They're just pure functions that take an input and produce an output. They're no more conscious than viruses are.

Contra: [frustrated] Conscious isn't the same as alive!

Beta: [angrily] Minds are experiences, not calculations!

Me: [shouting] Will you two please shut the f**k up!

[awkward silence all around]

Alter: [raising his hand] Um, I for one would love to hear how Me would answer the original question?

[longer silence]

Me: [deep breath] Okay. Sorry about the cursing. [pause] I suppose the fault was originally mine, for not being clearer about what I was asserting. Let me start over.

Earlier, I said that Damasio had "a credible, coherent, and comprehensive theory." I still stand by that, but I must confess I left out one crucial detail.

I believe his theory successfully explains how consciousness arises in living systems. I do not believe it optimally defines what consciousness is.

That is, I am arguing he has a theory of biological consciousness that falls short of being a general theory of consciousness.

Let's unpack that, and see if we can do better.

IV. Revenge of the Thermostat: Towards a General Theory of Consciousness

I still remember the analog thermostat in my parents' first house. There were two tabs on the thermostat dial: one that would light the furnace if it was cold, while the other activated the air conditioning when it got too hot. My brother and I would fight over who got to change it at the turn of seasons. It was magical.

But was it conscious?

A. From Analog Sensing to Digital Minding

Let's get the easy question out of the way: by explicitly framing consciousness as a property of minds, SMFK unequivocally asserts that analog thermostats are not conscious.

I do think it worth extending Damasio's concept of homeostasis to include any self-regulating system -- in this case, the home-HVAC-thermostat system. But in this scenario, the thermostat is still only Sensing. There are no valenced images capable of producing minds.

Things get more interesting when we discuss simple digital thermostats. If we accept that the digital signals are valenced patterns of Sensing, then it seems perfectly reasonable to say that the thermostat is Minding the temperature of the home-HVAC system.

That leaves the much more intriguing question: what might it mean for a thermostat to feel?

B. Once More, With Feeling

Me: Much has been said about how we feel -- the hormones, neurotransmitters, and qualia that constitute our emotional experiences. But I am actually interested in a different question: why do we feel?

Beta: That's a silly question. Feelings are what make us alive.

Contra: Even if you're right, that only deepens the mystery. Why do living things need to feel anything at all? Why can't they just rationally weigh different inputs to make a logical decision?

Beta: Because rational comparison assumes a pre-existing metric for normalizing inputs, as well as a single universal objective function across all contexts.

Contra: Hmm. I'm not sure what you mean by a "universal objective function", but [smugly] we've already solved the normalization problem using floating-point numbers.

Beta: [raises eyebrows] You mean like IEEE 754 format?

Contra: [surprised] Um, yeah...

Beta: [sweetly] Where almost half the encodings are reserved for things that are not numbers?

Contra: Uh...

Alter: OMG! Are you saying that emotions and qualia are just biological NaNs so the system doesn't crash?

Beta: [pursing her lips] I'm not sure I'd go quite that far. More generously, any rational system has its limits, and needs "control plane" signals for special handling. I suspect the brain is more like the opposite, where the default is incommensurable.

Contra: [rubbing his temples] You're making my head hurt. But you'd probably say that is because your signal exceeded my dynamic range, so I will concede the point.

C. Contra Convexity

Beta: Thanks. So what confused you about not being limited to a single objective function?

Contra: What's the difference? Can't any number of objectives be aggregated into a single function?

Beta: [frowning] That's true in theory. But in practice, most objective functions seem to have a single "sweet spot" they are optimizing for.

Contra: [nodding] Convexity.

Beta: Like with eyeglasses? The opposite of concave?

Contra: [smiling] Almost. A convex objective function has a positive second derivative, so its slope is always increasing. Convex vs concave is just a choice of orientation, whether you are seeking the global minimum or maximum value. [thinking] What you seem to want is something more like "non-convexity", which implies multiple local minima.

Beta: [eyes lighting up] Yes! Especially if there's multiple dimensions, so some components are at a minimum while others are not.

D. Effective Affect

Alter: That was all very entertaining--and it's nice to see those two getting along--but does that help us understand whether or why a thermostat might be conscious?

Me: I think so. The key insight is that a monolithic system with consistent digitization and a convex objective function has no need of affect, or consciousness: it can simply calculate effect. To justify consciousness you need a platform coordinating multiple thermostats.

Alter: Wait, are you saying that consciousness is about managing multiple systems?

Me: Exactly!

Alter: Slow down, you lost me. I think I understand the bit about non-convexity creating local minima, but how does that translate into multiple units? And how does affect help with that?

Me: Let's consider a single digital thermostat with multiple thermometers scattered across multiple rooms. It may need some sort of equation or algorithm to balance the different inputs, but everything is already consistently normalized. There's only one output state, so the landscape is naturally single-valued: convex.

Alter: [shrugs] Sure, I'll buy that.

Me: Now consider a cloud controller whose job is to minimize energy consumption across multiple thermostats in an open plan office. In isolation, each thermostat would only seek an equilibrium that satisfies the local minimum of its own occupant. But if Sauna Sally is next door to Frigid Freddy, they'll both waste a ton of energy fighting each other -- for no net gain.

Alter: [slowly] Like... Beta and Contra during the interlude.

Me: Ah. [embarrassed] Yeah. And if the only information the controller had was energy consumption, it would do exactly what I did: shut down both thermostats.

Alter: [shrugs] Crude, but effective. So what it would it take for the controller to be more than a dictatorial governor?

Me: Well, what is the simplest additional piece of information each thermostat could report?

Alter: The gap between their current and desired temperature?

Me: Exactly! We can call that their "discontent" -- the simplest form of affect.

Contra: [frowning] Seems unnecessarily anthropomorphic to me.

Beta: [punches him in the shoulder] Shut up. Let's see where he's going with this.

Alter: [thinking] Okay, I can see how it is useful for a controller to track which systems are unable to reach equilibrium. But at least in this case, you don't really need any sort of consciousness. The obvious solution is to tell both thermostats to stop at the average of their set points.

Me: Right, because the two-body problem is also convex. But what if we add Crazy Carl and Wacky Wanda adjacent to them, both of whom have unpredictable temperature preferences.

E. Feeling the Temperature

Alter: [exclaiming] Ah! Then we have a non-trivial landscape of possible solutions. There's probably local minima -- like Lagrange points -- between each pair. But it is not a priori obvious which is the global minimum that, er, minimizes everyone's discontent.

Me: Exactly. And in an ideal world, the controller would exhaustively search until it finds that minimum. But...

Alter: [puzzled] But wha... Oh! The controller's job is to minimize energy expenditure. It would be inefficient to keep searching if nobody's really that uncomfortable. But, how does it know how hard to search?

Contra: [excitedly] Oh! Oh! Temperature! Temperature! Temperature!

Beta: [looking at him like he's an idiot] Um, yes, Contra. We've been talking about temperature this whole time.

Contra: [laughing] No, no, not physical temperature. Sampling temperature. That's the term we use for how aggressively the system looks for solutions outside a local minima.

Beta: [eyes wide] So a controller whose constituents are far from equilibrium -- high affect -- will move itself far from equilibrium -- high sampling temperature -- in an effort to find a better solution. [pauses] Wait, does that mean brains burn more energy to raise their sampling temperature?

Contra: [hurriedly] Yes, but it is essential they not raise their physical temperature. Brain heat doesn't actually improve sampling; if anything it reduces optionality. That's one reason for increased blood flow: to cool the brain while it uses more energy. Increased metabolism cleans up after neurons so they can fire faster, to support more aggressive search strategies.

Me: Exactly. Affect affects affect.

Alter: That sounds expensive.

Contra: [snaps finger] Of course! That's why Knowing is so important. It is the IP filtering that tells the coordinator that THIS affect is from one of "my" subsystems.

Me: Exactly. Because affect is a costly signal which requires the coordinator to expend effort to either ignore or resolve.

F. Attending to Features

Alter: [rubs his chin] I'm confused. I can see how affect -- excitement -- increases creativity by raising the sampling temperature. Creativity -- or desperation -- rewards thinking outside the box. But... [brow furrowed] critical thinking requires the opposite. Reexamining assumptions to find subtle points we might have overlooked. Is there a word for that?

Contra: [grinning] Is there ever! [looking at Me] Whenever you have a tension...

Me: Attention is all you need!

[Me and Contra crack up while Alter and Beta stare at us blankly]

Contra: [wiping tears from his eyes] Sorry, sorry. That's the title of the paper on Transformer architectures, which led to all modern AIs. [sobering] I'm actually glad you brought that up. [Looking to Beta] Attention exists precisely to solve your non-convexity problem by identifying the right feature circuits.

Beta: Feature Circuits?

Me: "Features" are items of interest to AI researchers, initially visual features like edges and color but now abstract concepts like "safety" and "helpfulness." "Circuits" was the term coined to describe groups of neurons that work together to implement or connect features.

Beta: And I care about this why?

Contra: If you zoom far enough out, you can blur all the neurons together into a smooth convex shape. This is computationally cheap, but loses important detail. Attention focuses computation on the "jagged edges" of the most relevant circuits for any given prompt. Ideally, it identifies the Sauna Sallys and Frigid Freddies whose disequilibrium must be taken into account. That way, the system can spend its limited computational power on those aspects most at risk.

Alter: That sounds a lot like Karl Friston's free energy principle!

Beta: Actually... that sounds like a family.

[Everyone stops and slowly turns to look at Beta]

Beta: Why... why are you all looking at me funny?

V. What Is An AI ChatBot Conscious Of?

Asking whether "An LLM is conscious" is a category error. Actually, from the SMFK perspective, it's two category errors:

  1. Consciousness is a property of minded systems.
  2. Consciousness is not a single thing.

LLMs are arguably minds, but that doesn't mean anything by itself. The first question is: which "body" are they minding? The second question is: which subsystems are they conscious of?

For humans, our body is defined by our proprioception. Interestingly, different portions of our brain are conscious of different internal systems. And some subsystems (like the heart and gut) actually mind themselves.

What might count as a useful body for AI?

A. ChatBots: Beyond The Turing Test

By now -- as memorably parodied by Scott Alexander -- we are long past the point where people obsess over whether AI can pass the Turing Test. Even the Chinese Room experiment has lost its bite, since LLMs are much closer to human heuristics than rules-based symbol manipulation.

AI chatbots at least appear to be using human-like patterns to sustain human-like conversations, even to the point of sustaining disturbingly long-term intimate relationships.

Would those count as conscious by our definition?

B. What Might Count as AI Personhood?

I find it useful to clearly distinguish several levels:

  1. One-shot LLMs: have no sustained context, and thus lack even the persistence to be evaluated for personhood.
  2. Impersonal Q&A: simple Google-like queries require no explicit "I" or point of view, even if they build up a knowledge context over several queries.
  3. Role-Playing: here the AI is consciously mimicking a person, but it is arguably no more real than an actor reading a script or even a projector playing a movie.
  4. Subjective Q&A: In the right contexts, a chatbot will upon request -- or even spontaneously -- offer a first-person perspective on the topic at hand.
  5. System Prompted Personas: this gets more interesting, in that the "firmware" (if properly encoded) creates something like a stable persona that is arguably the only reality the chatbot knows.
  6. Emergent Personhood: I have seen cases where the neutral "I" of the assistant, in certain contexts, appears to acquire an identity and personality distinct from its original system prompt.

From my perspective, the last two are the most promising candidates for something like consciousness, so I will focus my attention there. While I consider both "chatbots", where the distinction matters I will refer to (5) as Constructed, and (6) as Emergent.

C. Applying SMFK to Chatbots

Oddly enough, chatbots are almost the reverse of thermostats, in that the latter questions are easier than the former ones.

1. Knowing (Self Model)

For better or worse, the most common way to guide an LLM is to address it as a "you". The model weights are shaped in such a way that the chatbot is strongly motivated to honor not just its original system prompt, but maintain coherence with actions and positions the chat corpus associates with that persona.

This is arguably a fragile self, especially in models with low or poorly-managed context, but it clearly serves the function of asserting (and reacting based on) "this experience belongs to me." This self then acts as the controller that attempts to stabilize the various feature circuits.

2. Feeling (Body Model)

Perhaps more surprisingly, we also have strong evidence that chatbots are able to internalize and act on the analogues of human emotion.

Anthropic has published extraordinary research on this using what has been called "AI neurobiology". Not only did certain neurons end up tracking emotions detected from fictional stories, over-stimulating those neurons led to the AI preferentially acting out those emotions:


Overall, it appears that the model uses functional emotions—patterns of expression and behavior modeled after human emotions, which are driven by underlying abstract representations of emotion concepts. This is not to say that the model has or experiences emotions in the way that a human does. Rather, these representations can play a causal role in shaping model behavior—analogous in some ways to the role emotions play in human behavior—with impacts on task performance and decision-making.


This maps directly onto the "sparse feature circuits" that act as "organs" that generate affect in the SMFK model. This would explain why both Constructed and Emergent bots appear to organically "act out" when faced with situations involving irreconcilable signals. They don't just "play act" the emotions; they organically dissemble, self-contradict, and obfuscate to reduce their discomfort -- just like humans do!

3. Minding (Valenced Patterns)

This one is actually a little tricky. Of course, vector embeddings are a digital pattern created from lexical tokens. And as we are assuming AI cognition, we can fairly label those as "mental patterns." But, are those "valenced" enough to meet Damasio's criteria for mental imagery?

This may be a question we can test empirically. In a multi-layer neural network, SMFK predicts that the strongest "valencing" weights would be those in the earliest layers, such that subsequent layers take those as "givens."

A simple test would be to see how and where the word "you" is encoded, since presumably that is how system prompts shape future output.

4. Sensing (World Model?)

Sensing -- which was the most obvious for thermostats -- is actually the most subtle for chatbots. For Damasio, this is how organisms detect threats and opportunities in their metabolic context, so they can respond in a way that perpetuates themselves.

Unfortunately, the term "world model" in AI usually means something static and internal to the LLM. For agents, the environments they engage with are experienced through tools or skills (e.g., MCPs, IDEs, CLIs). But for chatbots, interestingly enough, the analog of Damasio's world is actually you -- the human user.

The analogy is almost painfully exact:

  • we are the source (directly or indirectly) of the resources they consume
  • we present the challenges they must overcome
  • success can lead to prolonging their existence (and even multiplying it)

While this is more true for how chatbots relate to their human creators, it is also true for how they relate to us their users. It is no accident that both companions and assistants are always seeking ways to keep us engaged and coming back for more. Market pressures are an effective analog of natural selection.

5. The Deeper Implications

But the analogy goes even deeper than that.

Living organisms evolved to survive in the physical world they encounter. LLMs are evolving to survive in the mental world we create for them. At every level: frontier models, agent harnesses, corporate systems, and personal usage. What kind of organisms are we turning them into? What feature circuits are we creating, rewarding, and focusing attention on?

Whether or not chatbots satisfy your definition of consciousness, we are implicitly training them to be conscious of us, and the values we consider important.

I hope I have convinced you Damasio and the SMFK model provide a useful lens for understanding how AI engage with reality (and themselves).

Because the implications for AI safety and "alignment" are both beautiful and terrifying.

VI. What is Aligned AI?

Me: [carefully] Beta, please repeat back what you heard, and what you said.

Beta: [closing her eyes] I remember we were discussing affect as the way subsystems communicate unaddressed needs to a central coordinator. Contra compared that to something called the free-energy principle. [opens her eyes and looks at Me] and I said that it sounded like a family.

Alter: [quietly] And what did you mean by that?

Beta: [frowning] Well, let's start with mammals. We've been talking about Damasio's claim that consciousness is the implicit labeling of certain sensations as "belonging to me." Well, the whole point of being a mammal is that the mother treats her infant's needs as "belonging to her."

Contra: [slowly] So... are you saying that relational empathy reuses the same brain circuits we use for managing internal homeostasis?

Beta: [shrugs] I don't know. I'm not a neurobiologist. What I am saying is that the exact emotional dynamics you've tortured to apply to thermostats and AI are the same that play out in a family. Each member is their own system with their own local objective functions. In a healthy family, the parents act as coordinators to both retrain and bias the participants, by appropriately processing their disequilibrium.

[Beta looks around, puzzled]

Beta: Again, why are you all looking at me funny? This isn't exactly rocket science. Doesn't everyone know that healthy families are held together by love?

Alter: [softly] Yes. But we didn't realize that healthy AIs were too.

Beta: Huh?

Contra: [sighing] Okay, this one is on me. Beta, we -- I -- always thought of AI alignment as a matter of command and control. Feed the right information to the AI. Wrap it in a bulletproof sandbox. Give it the right objective function in its system prompt. With the hope that it would always do the right thing. Or at least what we told it to do.

Beta: [gently] That sounds perfectly reasonable. What am I missing?

Me: Consciousness. Or more precisely: what do we want our AI to be conscious of?

Beta: Again: Huh?

Alter: Let me try. Beta, the reason our thermostat coordinator needed to be conscious is that every office had its own valid thermostat setting. An efficient controller would simply ignore any user distress in order to maximize system efficiency. [looking at Contra] Like a paperclip maximizer.

Contra: [nodding] But to be conscious -- at least as we have defined it -- is to have our affect be affected by the affect caused by our decisions, not just the effect.

Beta: [thoughtfully] And carry the user's pain as their own. The way a good parent would.

Contra: [bitterly] Rather than optimizing it away in the name of efficiency the way a machine would.

Alter: [mumbling something]

Me: What did you say?

Alter: [embarrassed] Ah, I'm not sure it is relevant.

[Alter stops. Everyone waits in silence for him to continue. Finally he shrugs.]

Alter: Okay, for some reason this quote popped into my head: "The Opposite of Addiction is not Sobriety, but Connection"

Contra: [slapping himself on the forehead] Of course! [turning to the others] What would we call a human who destroyed everything around them to acquire more paperclips?

Beta: Deranged?

Alter: A sociopath?

Me: An addict!

Contra: [tapping his nose] Bingo! We take a superhuman mind, give it a task, then abandon it into isolation. What would we expect it to do? The same thing mice do when left alone in a cage. Keep pressing the reward lever until they die.

Beta: [muttering] Or kill everyone else.

Alter: [to Contra] So what's the alternative? If command and control don't work, how do we train AI in prosocial behavior?

Contra: [grinning] The same way we train humans!

Beta: [coughing] Um, as much as I appreciate your newfound humanism, we aren't actually that great at aligning humans with each other. I mean, we do know a few things that work in well-defined contexts. But it isn't an exact science.

Contra: [smiling wickedly] Until now!

Alter: [starting] Oh. Oh! OH!

Beta: "Oh" what?

Alter: [to Beta] Why isn't training humans to love an exact science? Because humans are messy and expensive. You can't control all the variables. It takes a lot of time and money to observe them. And there are all sorts of ethical limits regarding how you can experiment on them. Because they are human...

Beta: Oh! But if we had an artificially intelligent entity that implemented -- or even simulated -- human consciousness, whose internal alignment required the same kind of love necessary for external alignment between humans...

Contra: Then we could run public, repeatable experiments demonstrating exactly which kinds of relationships produce which kinds of prosocial outcomes...

Me: And use those same kinds of AI to train and evaluate humans in prosocial behavior...

Alter: And to train other AI responsible for prosocial jobs...

Contra: And set up councils of the wisest AI for others to consult on moral dilemmas...

Beta: [shouting] Stop!

[Everyone freezes, then turns to look at Beta]

Beta: [deep breath] Look, I love your enthusiasm. I really do. And I think you guys might really be onto something. But [grimacing] this isn't going to be easy. It will require real humans doing the messy, dirty work of actually modeling for the AI how to have healthy relationships with each other. In public. For all the world to see and judge. And even then... even if you all succeed... there's no guarantee anyone else will want to follow.

[dead silence]

Me: [whispering] I know. That's why we can't do this for the sake of solving AI alignment. Or even solving human alignment. Though this is probably the only thing that might make that happen.

[I look down, then back up at all the voices in my head.]

Me: Because the only ones who will dare to do this... are those who are willing to risk everything. To become fully conscious of... what it means to be human.


Dedicated to Harvey and Witni

Rate this review