**Links to**: [[Markov blanket]], [[Agent]], [[Individual]], [[Model]], [[Self]], [[Free energy principle]], [[Vantagepointillism]], [[Recursivity]], [[Entropomorfismo]], [[Perspectivism]], [[Perspectival anxiety]], [[Polycomputation]], [[Constraint]], [[05 Prediction]], [[World model]], [[Xpectator]], [[Subject]], [[Subjectivity]], [[Intersubjectivity]], [[The body problem]].
A generative model is, simply put, the way in which the world is, for an agent. Technically speaking, a generative model is a “joint probability distribution or density over hidden causes and observations” (Ramstead et al. 2020, p. 227). It can be understood as a model or representation of the causal structures of the generative process that is the world (although there exist many discussions on the concept of (structuralist) “representation” and its uses to describe neuronal dynamics, see for example: Williams 2018, van Es & Myin 2020, Facchin 2021). The model is generative because it produces predictions about incoming sensory information; the “model parameters are quantities that do not change over time and encode causal regularities and associations” (Friston & Frith 2015, p. 131). Basic model parameters are fundamental to a creature’s physiological make-up and cannot be changed, learning and plasticity ensue from the modulation and synthetic recombination of parameters.
In order to adapt to the future, to be able to at least minimally *pre*-sense what the patterns in a projected _countercurrent_ (i.e., not currently the case) will feel like, what next (and next, and next) situation we will fall into, a **generative model** is possessed by a projective agent (or [[Xpectator]]). It is the “filter” (Bergson)^[In relation to future lightcones (Levin 2019), we can also relate Bergson’s take on projective memory and its tangent with the real, in the image of his famous memory cone, which is an inverted version of the projection of a future lightcone.] which “processes signals that track _divergences_ between expected and actual sensory data.” (Bouizegarene et al., 2024, p. 3, our emphasis). It is **generative** because the xpectator *expects* and thus *produces* a specific reality. In a dialectical fashion, it is a **model** and therefore defines what an agent is, to itself, while at the same time this very processing of divergence tracks everything that it is **not**. In active inference (AIF), this active co-creation of reality between xpectator and environment is dictated by the minimization of free energy (see [[Free energy principle]]). Because of the complex temporal dynamics that come with being able to future-project, AIF distinguishes between two types of free energy: *variational* and *expected* free energy, we will attempt to briefly explain these below in a few paragraphs.
The generative model of any (organic) agent is challenged with maintaining coherence in the face of dissipation _now_: keeping within a certain temperature or humidity range through movement or nutrition, for example, as well as (in the case of agents with significant projective capacities) keeping itself together in the (distant) future, based on what it learns about surviving _now_. How does an agent’s generative model, which is flexible enough to endure a lot of contingency (from our mortal perspective), navigate the possibly rather vast seas of contingent variations? It estimates. In AIF, this estimation is understood as approximate Bayesian inference because, minimally, at least, we need to say something has *priors*, based on characteristics which incline it: it has ways it *is*, which lead to *posteriors*: estimated results that will tune future estimations. All of this is ensues from the evolutionary dynamics of niche-organism. This adaptive condition characterizes all organisms’ specific determinations, ways in which they perceive and couple to, or co-produce, their context.
#todo
[explain below better and with images]
Then, we need some sort of way to understand—if we want to, for example, model evolutionary dynamics—how exactly something couples through the *expected* (by its priors) to a relentlessly changing environment. How does something *learn and adapt?* This process can be imagined as tuned by the xpectator’s **likelihood** function. The relationship between the states of the niche and the observations is a **joint distribution**, which represents the generative model. The approximate Bayesian inference performed through the generative model is the relation between states and observations, where what is called a *likelihood* function gives the probability of an observation given a state; returning a normalized posterior (a probability distribution). The posterior tells you the probability of a state, given something has just been witnessed, observed. In a niche-xpectator coupling, the inversion and normalization resulting in the probability distribution of the beliefs a xpectator had about the niche is the perception of “the Sun must be rising *if* I see more light and feel more heat on my skin”. “Perfect” estimations would result in exact Bayesian inference. However, nothing is ever perfect (whence that concept?).
%%
#todo
[Explain everything below better, explain representation model vs gen mod, and explain diff distributions, Gaussian etc. with visualizations]
%%
Variational Bayesian inference is what happens because everything is _partially_ observable. Things are intractable, they are too complex, the practical aspects are impossible to compute. So, we want to keep it “doable”/approximate. A best guess: an abduction. Gaussian dist. can be used for continuous cases, or categorical for discrete cases. Variational inference (or variational free energy (F) minimization) occurs as the model deals incoming sensory data, where the alignment between immediately sensed and expected is in the moment, shaping the priors by allowing for the model to learn from what is happening *in that very moment*. Variational free energy minimization is what it feels like for that model to be coherent then and there.
When projecting into the future, at varying scales, we need to think about *expected* free energy (G). This is what the generative model resorts to, the precision-weighting of future observations, in order to assess how we will deal with expected future states: what policies (denoted by π in AIF) we will employ (contrasting policies is essentially what we do in counterfactual thinking, hence our hint at “countercurrent” just above). This is the evaluation of a future “a trajectory of hidden states” (Parr et al., 2022, p. 32). If F deals with **immediate perception** through access to sense data and evaluation against memory (what could be causing light to become more intense? The Sun), G deals with **future perception** which has neither access to data nor any fully reliable memory yet (How will I make sure I stay warm tonight? The Sun, perhaps redirected through nutrition or fossil fuels or the like). The priors for G become calculated as future posteriors, this is what is explored as possibilities in counterfactual reason.
Variational free energy is minimized in the now, based on sensed embodiment and memory (priors ensuing from species characteristic traits), whereas projection and access to complex memory processes allows for expected free energy to *evaluate* how different plans, or policies, will lead to the minimization of free energy (to the maintenance of coherence against dissipation), in the future. This makes a project such as AI quite the existential conundrum if we consider how we currently infer ourselves towards it, yet at the same time consider it an existential threat. Whatever the future holds, we will no longer be “us”,^[See: [[We]], [[Pronoun]].] should a novel, far distant scenario ensue in which a currently unimaginable artificial life supersedes the current forms we know (embodied, carbon-based, etc.). The ways in which expected free energy is navigated in our global, technodistributed nature results in identity paradoxes of complex degrees.
This distinction between these two types of inference is important for generative models in AIF in order to analyze dynamics and create experimental circumstances, but their distinction is not that clear-cut “in reality.”^[Just like disentangling perception from action is near impossible, see notes on this in [[Free energy principle]]. F and G get at this in the sense that F deals with immediate action, assessing what is happening now, whereas G deals with policies about future actions. But both are inextricably linked and bidirectionally tuned. If the estimation of F for a policy is low, then it is pretty reliable: if I think it would be very expectable and therefore really unsurprising to go downstairs and open the tap and be able to drink water. If I find myself at the tap, open it and no water comes out, then G is immediately high, I am surprised by the sense data, and this tunes the generative model that led to my earlier estimation of F. Now I have doubts about the piping in my house, the city water system, etc. The hierarchies of models influence each other and stack in different ways, depending on the analytical intent.] The distinction allows for different spatiotemporal analyses: a brain can minimize F in the moment, and be observed under experimental circumstances in order to prove something about G: expected free energy is what tunes variational free energy in the context of beings which are capable of significant projection. The reduction of uncertainty, or free energy (F or G), depends on multiscalar processes such as the massive web of [[Markov blanket]]s that make up an agent from within and without (as a “unity” and distributed: extended cognition, dialogism, etc.). To act and infer something from that action, is based on the primacy that an action must be possible, and cognition being such a complex landscape of constraints, it is difficult to delineate action from effect, and even to isolate the beginning or end of, e.g., a thought process. Thoughts are thinkers (William James) because all counterfactual thinking is essentially the creation of possible generative models that would result in x, y, or z observations: if I do a, b or c I will be this kind of thing. Minimally: I will survive, more complexly: I will continue to continue to continue to survive in these and these ways (which could mean coupling with the world in many different ways: reproduction, influence through the writing of a book, etc.).
**See also**: Pezzulo, Parr, Cisek, Clark and Friston: “Generating Meaning: Active Inference and the Scope and Limits of Passive AI”, 2023.
%%
Generating Meaning: Active Inference and the Scope and Limits of Passive AI
Giovanni Pezzulo1,*, Thomas Parr2, Paul Cisek3, Andy Clark4,5, Karl Friston6,7
2023 p. 2:
Generative AI shares several commitments with active inference. Both emphasize prediction, and
both rest on generative models, albeit differently, see Figure 1. Generative AIs are based on deep
(neural) networks that construct generative models of their inputs, via self-supervised learning. For
example, the training of most LLMs involves learning to predict the next word in a sentence, usually
using autoregressive models [31] and transformer architectures [32]. Once trained on a large
corpus of exemplars, the models learned by Generative AI afford flexible prediction and the
generation of novel content (e.g., text or images). Furthermore, they excel in various downstream
tasks, such as text summarization and question answering; learning from instructions and examples
without additional training (i.e., in-context learning [33]). Additional fine-tuning using small,
domain-specific datasets permits LLMs to address even more tasks, such as interpreting medical
images [34] and writing fiction [35].
In active inference, however, generative models play a broader role that underwrites agency. During
task performance, they support inference about states of the extrapersonal world and of the internal
milieu, goal-directed decision-making, and planning (as predictive inference). During off-line
periods, such as those associated with introspection or sleep, generative models enable the simulation
of counterfactual pasts and possible futures and a particular form of training “in the imagination”,
which optimize generative models that—crucially—generate the agent’s policies [36–41].