07.10.2024

Summary Report of "The Active Self" Meeting at the Santa Fe Institute

March 11, 2024 – March 15, 2024. Santa Fe, USA

Organised by:

  • Nihat Ay, Hamburg University of Technology, Leipzig University, Santa Fe Institute
  • Carlotta Langer, Hamburg University of Technology
  • Jesse van Oostrum, Hamburg University of Technology


Invited participants:

  • Larissa Albantakis, University of Wisconsin–Madison
  • Maell Cullen, Santa Fe Institute
  • Valentin Forch, Chemnitz University of Technology
  • Amos Golan, American University and Santa Fe Institute
  • David Krakauer, President of the Santa Fe Institute
  • Thomas Martinetz, University of Lübeck
  • Melanie Mitchell, Santa Fe Institute
  • Thomas Parr, University of Oxford
  • Daniel Polani, University of Hertfordshire


Meeting Info:

https://www.santafe.edu/events/active-self

 

Meeting Summary:

The recent developments in machine learning stimulate the discussion around the possibility of
strong artificial intelligence. One question we addressed during the working group is whether we are yet comfortable assigning human qualities such as consciousness or a self to these potential systems. Opposed to these systems, we as biological agents are grounded through our bodies and actions in a physical environment. Hence, we asked how phenomenal experiences and the sense of self depend on this embodiment. To approach these questions, we focused on information-theoretic theories that aim at explaining minimal agents. These theories include the Integrated Information Theory, the Active Inference framework, and concepts such as generative models and empowerment.

Following a general introduction to the aims and structure of the German priority program 'The Active Self' (German Research Foundation, project number 2134), which inspired and partly funded the working group, Valentin Forch gave a conceptual overview of the minimal self. Within cognitive science and philosophy, the minimal self refers either to the subject of phenomenal experience or to the object of self-related beliefs. Forch highlighted that metrics and phenomena related to the self do not point towards a singular underlying construct, questioning whether there can be a unified concept of self.

Nihat Ay subsequently discussed the situatedness of an agent in an information-theoretic setting. He assumed a generative model within the agent's brain that generates observations from latent states. Usually, the objective of these models is to generate a distribution over observations that matches the real-world distribution, which can be approximated by the evidence lower bound (ELBO). However, these objectives completely ignore the outside world. Therefore, the primary objective of the generative model should be formulated in terms of world states. The ELBO and distance between generated and true distribution should be considered as approximations to this primary objective function.

Carlotta Langer opened the second day with an introduction to the Integrated Information Theory (IIT). IIT's axiomatic basis is derived from thought experiments about phenomenal consciousness. These axioms are translated to so-called postulates about the mechanisms underlying consciousness, which in turn are formalized in mathematical expressions. Different versions of this theory differ mainly on the level of the postulates and the mathematical expressions.

Larissa Albantakis continued with a discussion of the self and autonomous agents in light of IIT. By her working definition, agents have stable, self-defined, and self-maintained causal borders. She elucidated ways to identify these borders from an intrinsic, observer-independent perspective. Further, she highlighted the recurrence of information flow as a necessary condition for autonomy and showed examples of evolved artificial agents.

In the afternoon, Langer presented an information-theoretic framework for quantifying information flows among different parts of agents and their environments. This includes a measure for controller complexity (CC), similar to integrated information. In addition, morphological computation (MC) describes the reduction of the computational cost for the controller resulting from the interaction of the agent's body with its environment. Simulations suggest that these measures are inversely related - agents that can exploit the properties of their environment through MC have less complex controllers. However, greater CC may lead to more efficient MC.

Daniel Polani introduced the concepts of relevant information and empowerment, the former being defined as the part of the sensory input that is used for action selection, and the latter being the maximal mutual information between potential actions and resulting sensory states (given a current world state). He used these concepts to discuss how agents can be optimized to fit into the informational niche of their environment.

On Wednesday, Ay introduced a measure of integrated information that can be interpreted from an information-geometric perspective. These measures try to capture the extent to which the whole system is more than the sum of its parts. One of these measures is based on conditional independence statements and satisfies all of the properties postulated as desirable. Unfortunately, it does not have a graphical representation, which makes it more difficult to analyze. Ay introduced an alternative that explicitly includes an external influence on the system as a latent variable. This measure has a graphical interpretation and satisfies all the required conditions for a feasible measure in this context.

Jesse van Oostrum discussed how different aspects of the active self can be understood as the consequence of a simple learning rule of a generative model. Assuming a simple static and passive observer, the formation of (self) concepts can be understood as finding appropriate latent variables. For a passive observer in time, the same objective can explain the need for memory and therefore a memory of self. Lastly, the sense of agency can be understood by considering conditioning the model on which actions the agent might take.

Thomas Parr introduced the concept of a Markov blanket as a system-defining boundary. He gave a broad overview of how the outside world, observations, brain signals, internal states, and actions all play together in the Active Inference framework. Under Active Inference, the self can be understood as a part of the world model inherent to the agent. This includes a model of how the agent influences its environment and affects the observations it receives.

Van Oostrum and Langer then zoomed in on the equations for active inference in discrete time. All facets of the action selection procedure were described and illustrated using an example of a race car avoiding touching the walls of the environment.

On Thursday, Melanie Mitchell and Thomas Martinetz gave presentations concerning the level of “understanding” exhibited by deep neural networks (DNN). Mitchell questioned whether DNN demonstrate genuine understanding or merely excel at pattern matching for inputs encountered during training. She highlighted that DNN are easily deceived or significantly underperform when presented with out-of-distribution inputs. She suggested that achieving greater robustness requires conceptual thought, abstraction, and reasoning, which are currently lacking in DNN.

Martinetz introduced a specific neural network architecture, Multi Relation Networks (MRN), and compared them with mainstream transformer architectures. Central to his presentation was the finding that population coding, a simple locality-preserving recoding technique akin to that employed by the brain, significantly increased the performance of MRNs, even on abstract reasoning tasks.

The working group concluded with a synthesis on Friday divided into two segments. First, questions that arose during the week were addressed. This included the foundations and methodology of IIT and related concepts such as predictions and active inference. Furthermore, the problem of distinguishing between “understanding” and “generalization” was raised. Mitchell differentiated between the concept of generalization in machine learning and psychology. The former refers to the test set being in distribution, whereas for the latter it is out of distribution. In this context, Polani highlighted the importance of counterfactual predictions for true understanding. He stated that understanding is necessarily interventional while generalization is not. These concepts were then discussed in relation to overfitting, large language models, and the importance of communication.

The second part of the final discussion concerned the self and started with a summary of the different approaches to the self that were introduced during the week. These included the self as beliefs, counterfactual predictions, inverse empowerment or analyzed from an information-theoretic perspective. Parr defined the self from an Active Inference perspective as consisting of three components: (a) a system boundary that depends on the generative model of the system itself, (b) the sensory data the system receives which depends on the specific embodiment, and (c) the action policy of the system. Albantakis highlighted that the sense of self within IIT, on the other hand, can be treated as a content of the experience. This theory predicts that there should be a “stable” complex, a system with highly integrated information, while we are awake.

In general, the feedback we received about the meeting was very positive. Participants were happy about the involved discussions and the interactive presentations that this working group format offers. When organizing a future working group we would try to incorporate the plenary talks that take place at SFI during the working group into the program so that the participants might also join these talks.
Since this was an international working group, we had many people arriving late Sunday night from Europe. For this reason, we decided to have no program on Monday morning for people to recover from the trip. We felt this was a good decision.
Unfortunately, we had a few last-minute cancellations. Due to the small number of participants this had a significant influence on the schedule. Next time we would try to stay in touch more closely with the participant in the time leading up to the working group in order to prevent this.