Do highly over-parameterized neural networks generalize since bad solutions are rare?
Thomas Martinetz, Universität zu Lübeck
Abstract
Traditional machine learning wisdom tells us that it needs more training data points than there are parameters in a neural network to be able to learn a given task. Learning theory based on VC-dimension or Rademacher Complexity provides an extended and deeper framework for this "wisdom". Modern deep neural networks have millions of weights, so one should need extremely large training data sets. That's the common narrative. But is it really true? In practice, these large neural networks are often trained with much less data than one would expect to be necessary. We show in experiments that even a few hundred data points can be sufficient for millions of weights. We provide a mathematical framework to understand this surprising phenomenon, challenging the traditional view.
Super-efficiency in complex collective systems
Quanyang Chen, University of Sydney
Abstract
Self-organising collective systems, such as flocks of birds, shoals of fish and the brain, often operate near the "critical regime" between order and disorder. To understand why this occurs, we study four intrinsic utilities: predictive information, empowerment, variational free energy (active inference) and thermodynamic efficiency. These measures evaluate the usefulness of behaviour independent of external rewards. In this talk, I will briefly introduce each measure and compare them using the same example (the Ising Model). We use numerical simulations to identify the optimal parameter values under each intrinsic utility framework, and I will discuss what the results reveal about the utilities of operating at the critical regime.
Language Models: A look into its History, Scopes and Pitfalls
Swarnadeep Bhar, IRIT, Toulouse Institute for Research in Computer Science
Abstract
With the release of ChatGPT we have seen a significant interest in language models.While the central theory behind these models have been around since quite sometime, the large language models show scaling with the large amount of data and aligning them with human responses can unlock significant jumps in performance previously unseen. While they have an impressive performance over a range of bench marks, which were previously thought to be “impossible” to be solved by machine learning techniques, a range of new problems arise, with “hallucinated” responses and the tendency of these models to provide affirmative responses for any instruction, limit the blind deployment of these models in critical scenarios. In this talk, we’ll explore the core principles behind these models, their impressive capabilities, and the challenges they pose—such as hallucinations and overly affirmative responses. We'll also discuss key considerations to keep in mind when deploying these models for personal or critical use cases.
Towards Symmetry-Based Structure Extraction using Generalized Information Bottlenecks
Hippolyte Charvin, University of Hertfordshire
Abstract
Extraction of structure, in particular of group symmetries, is increasingly crucial to understanding and building intelligent models. In parallel, some information-theoretic models of complexity-constrained learning have been argued to induce invariance extraction. Here, we formalise and extend the study of group symmetries through the information lens, by identifying a certain duality between probabilistic symmetries and information parsimony. Namely, we characterise group symmetries through the full information preservation case of Information Bottleneck-like compressions. More precisely, we require the compression to be optimal under the constraint of preserving the divergence from a given exponential family, yielding a novel generalisation of the Information Bottleneck framework. Through appropriate choices of exponential families, we characterise (in the discrete and full support case) channel invariance, channel equivariance and distribution invariance under permutation. Allowing non-zero distortion then leads to principled definitions of ``soft symmetries'' as the exact symmetries of a compressed representation of data. In simple synthetic experiments, we demonstrate that our method successively recovers, at increasingly compressed ``resolutions'', nested but increasingly perturbed equivariances, where new equivariances emerge at bifurcation points of the distortion parameter. Our framework provides the area of probabilistic symmetry discovery with a theoretical clarification of its link with information parsimony, and with a basis on which to potentially build new computational tools.
Operator Approximation by Neural Networks
Janek Gödeke, University of Bremen
Abstract:
Learning an operator between function spaces by neural networks is desirable, for example, in the field of partial differential equations (PDEs). For instance, learning parameter-to-state maps that map the parameter function of a PDE to the corresponding solution. During the last five years, several Deep Learning concepts arised, such as Deep Operator Networks, (Fourier) Neural Operators, or operator approximation based on Principal Component Analysis.
These approaches, particularly the architectures of the involved neural networks, are inspired from universal approximation theorems, which state that many operators can be approximated arbitrarily well by sufficiently large networks of these types. Although universal approximation theorems cannot be used to fully explain the success of neural networks, their investigation has led to powerful Deep Learning approaches. Furthermore, they reveal that certain network architectures may add some beneficial induced bias to operator learning tasks.
In my talk I will give an overview of the state-of-the-art of such operator approximation theorems. I will discuss general concepts and questions that have not been answered yet.
The Role of the Robotic Body in Learning Agile & Accurate Control
Dieter Büchler, Max Planck Institute for Intelligent Systems
Abstract:
Despite decades of robotics research, current robots still struggle to acquire general and flexible dynamic skills on a human level. Tasks, such as table tennis, represent this set of dynamic problems that appear easy to learn for humans but pose a steep challenge for anthropomorphic robots. In this talk, I will argue that the robotic body plays a crucial role in the generation of such skills. In particular, muscular actuation (i) enables robust long-term training, such as is required with reinforcement learning, and (ii) fail-safe execution of explosive motions that allow robots to safely explore dynamic regimes. Stay tuned for table tennis playing, ball smashing, and precisely controlled soft muscular robots.
Pattern formation and critical regimes during social and epidemic dynamics
Mikhail Prokopenko, University of Sydney
Abstract:
We will discuss pattern formation and critical regimes during spatial contagions of four different types: epidemics, opinion polarisation, social myths, and social unrest. The presented model combines Maximum Entropy principle with Lotka-Volterra dynamics, and the results are analysed using methods of percolation theory. The identified critical regimes separate distinct phases, implying that small changes in individual risk perception could lead to abrupt changes in the spatial morphology of the epidemic/social phenomena.
Random walks on networks (toward information geometry)
Giulia Bertagnolli, University of Trento
Location: Blohmstraße 15 (HIP One), 5th Floor, Room 5.002
Time: 16:15
Abstract
Complex physical and social systems find a handy representation in terms of graphs, which, in this context, are called complex networks. Entities in these systems naturally “communicate”, or exchange “information”, e.g., a group of people interacting via email or sharing links, liking posts, and following each other on social platforms, exchange information as part of their social life. Neurons, connected by synapses and fibre bundles, exchange of neuro-physiological signals, enabling cognition. In fish schools, aggregations of fish, who come together in an interactive, social way, the (possibly passive) communication between fish allows them to act as a super-system. All complex systems show some emergent behaviour that cannot be ascribed to the actions and behaviour of their individual components. This emergent behaviour is a function of both the interaction patterns, i.e. the links in the graph, and the communication strategy, which can be modelled as a dynamical process on the network. In this talk, we will see, firstly, how Markovian random walks on networks model diffusion dynamics in the complex system and why this approach is useful in network science. Then, we will see an example of non-Markovian random walk, which mimics the run-and-tumble motion of bacteria. Eventually, it should become clear how this led me here, trying to learn information geometry.
Compositional planning by making memories of the future
Jabob J. W. Bakermans (University of Oxford)
Location: Blohmstraße 15 (HIP One), 5th Floor, Room 5.002
Time: 15:00
Abstract:
Hippocampus is critical for memory, imagination, and constructive reasoning. However, recent models have suggested that its neuronal responses can be well explained by state-spaces that model the transitions between experiences. How do we reconcile these two views? I’ll show that if state-spaces are constructed compositionally from existing primitives, hippocampal responses can be interpreted as compositional memories, binding these primitives together. Critically, this enables agents to behave optimally in novel environments with no new learning, inferring behaviour directly from the composition. This provides natural interpretations of generalisation and latent learning. Hippocampal replay can build and consolidate these compositional memories, but importantly, due to their compositional nature, it can construct states it has never experienced – effectively building memories of the future. This enables new predictions of optimal replays for novel environments, or after structural changes.
An information geometric and optimal transport framework for Gaussian processes
Minh Ha Quang, RIKEN Center for Advanced Intelligence Project (AIP)
Location: Blohmstraße 15 (HIP One), 5th Floor, Room 5.002
Time: 15:00
Abstract:
Information geometry (IG) and Optimal transport (OT) have been attracting much research attention in various fields, in particular machine learning and statistics. In this talk, we present results on the generalization of IG and OT distances for finite-dimensional Gaussian measures to the setting of infinite-dimensional Gaussian measures and Gaussian processes. Our focus is on the Entropic Regularization of the 2-Wasserstein distance and the generalization of the Fisher-Rao distance and related quantities. In both settings, regularization leads to many desirable theoretical properties, including in particular dimension-independent convergence and sample complexity. The mathematical formulation involves the interplay of IG and OT with Gaussian processes and the methodology of reproducing kernel Hilbert spaces (RKHS). All of the presented formulations admit closed form expressions that can be efficiently computed and applied practically. The theoretical formulations will be illustrated with numerical experiments on Gaussian processes.
Events – Learning Latent Codes for Hierarchical Prediction and Generalization
Christian Gumbsch, Max Planck Institute for Intelligent Systems and University of Tübingen
Location: Blohmstraße 15 (HIP One), 5th Floor, Room 5.002
Time: 15:00
Die Klugheit der Dinge
Nihat Ay, Hamburg University of Technology
Location: Blohmstraße 15 (HIP One), 5th Floor, Room 5.002
Time: 15:00
More info: Stud.IP
Uncertainty and Stochasticity of Optimal Policies
Johannes Rauh, MPI for Mathematics in the Sciences, Leipzig and Federal Institute for Quality and Transparency in Healthcare, Berlin
Location: Blohmstraße 15 (HIP One), 5th Floor, Room 5.002
Time: 9:30
Abstract
We are interested in optimal action selection mechanisms, policies, that maximize an expected long term reward. Our main model are POMDPs (Partially Observed Markov Decision Problems). While the optimal policy can be stochastic in the general case, we find conditions under which the optimal policy is deterministic, at least for some observations, or under which the stochasticity can be bounded. This talk presents joint work with Guido Montúfar and Nihat Ay.
Information Geometry for Deep Learning (Seminar within the Machine Learning in Engineering initiative MLE@TUHH)
Nihat Ay, Hamburg University of Technology
Mathematics of Data Seminar : Representation and Learning in Graph Neural Networks
Stefanie Jegelka, Machine Learning Group at MIT, USA
19.03.2021, 16:00 Uhr
The seminar is cancelled.
Mathematics of Data Seminar : Understanding Gradient Descent for Over-parameterized Deep Neural Networks
Marco Mondelli, IST Austria
04.08.2020, 11:00 Uhr
Mathematics of Data Seminar : Kalman-Wasserstein Gradient Flows
Franca Hoffmann, California Institute of Technology
20.07.2020, 17:00 Uhr
Special Seminar : Extending Integrated Information Theories for Cognitive Systems
Xerxes Arsiwalla, Pompeu Fabra University Barcelona, Spain
04.02.2020, 11:00 Uhr
Chalk Talk – Mathematics of Data Seminar : What’s next for machine learning? Some thoughts toward a unified theory of supervised inference.
Mikhail Belkin, The Ohio State University, USA
10.12.2019, 16:45 Uhr
Mathematics of Data Seminar : The geometry of neural networks
Kathlén Kohn, KTH Royal Institute of Technology, Stockholm
14.11.2019, 11:00 Uhr
Mathematics of Data Seminar : Lower Bounds on Complexity of Shallow Networks
Věra Kůrková, Institute of Computer Science, Czech Academy of Sciences, Czech Republic
23.10.2019, 11:00 Uhr
Mathematics of Data Seminar : Supervised learning and sampling error of integral norms in function classes
Vladimir Temlyakov, University of South Carolina
18.09.2019, 11:00 Uhr
Mathematics of Data Seminar : A Mathematical trip into the Data Science realm
Lamiae Azizi, The University of Sydney
16.07.2019, 11:00 Uhr
This seminar is cancelled.
Mathematics of Data Seminar : The use of geometry to learn from data, and the learning of geometry from data.
Nicolas Garcia Trillos, Department of Statistics, University of Wisconsin-Madison, USA
28.05.2019, 11:15 Uhr
Mathematics of Data Seminar : Computational Optimal Transport for Data Sciences
Gabriel Peyré, CNRS and Ecole Normale Supérieure, Paris, France
10.04.2019, 11:00 Uhr
Mathematics of Data Seminar : Compressed Sensing – From Theory To Practice
Stefania Petra, Universität Heidelberg
07.03.2019, 11:00 Uhr
Mathematics of Data Seminar : Blind deconvolution with randomness – convex geometry and algorithmic approaches
Felix Krahmer, Technische Universität München
14.02.2019, 11:00 Uhr
Mathematics of Data Seminar : A geometric structure underlying stock correlations
Nils Bertschinger, Frankfurt Institute for Advanced Studies (FIAS), Germany
28.01.2019, 11:00 Uhr
Mathematics of Data Seminar : Convergence rates for mean field stochastic gradient descent algorithms
Benjamin Fehrmann, University of Oxford
08.11.2018, 11:00 Uhr
Mathematics of Data Seminar : Topics in Deterministic and Stochastic Dynamical Systems on Wasserstein Space
Max von Renesse, Universität Leipzig
27.09.2018, 11:00 Uhr
Mathematics of Data Seminar : Statistical estimation under group actions: The Sample Complexity of Multi-Reference Alignment
Afonso Bandeira, Courant Institute of Mathematical Sciences, New York
14.08.2018, 16:30 Uhr
Mathematics of Data Seminar : Learning laws of stochastic processes
Harald Oberhauser, University of Oxford
11.07.2018, 15:30 Uhr
Mathematics of Data Seminar : Structured Tensors and the Geometry of Data
Anna Seigal, University of California, Berkeley
18.06.2018, 15:30 Uhr
Seminar on Theory of Embodied Intelligence : Quantifying Morphological Computation
Keyan Ghazi-Zahedi, MPI MIS, Leipzig
14.05.2018, 14:00 Uhr
Mathematics of Data Seminar : Max-linear Bayesian networks
Steffen Lauritzen, University of Copenhagen, Denmark
02.05.2018, 11:00 Uhr
Mathematics of Data Seminar : The statistical foundations of learning to control
Benjamin Recht, University of California, Berkeley
24.04.2018, 15:30 Uhr
Information Geometry Seminar : Quantum Information Geometry and Boltzmann Machines
Dimitri Marinelli, Romanian Institute of Science and Technology (RIST), Romania
22.03.2018, 14:00 Uhr
LikBez Seminar : Causal Inference II
Nihat Ay, MPI MIS, Leipzig
08.01.2018, 14:00 Uhr
Special Seminar : Continuum limits of tree-valued Markov chains and algebraic measure trees
Wolfgang Löhr, TU Chemnitz
04.12.2017, 11:00 Uhr
Seminar on Theory of Embodied Intelligence : Modeling of Networked Embodied Cognitive Processes
Fabio Bonsignorio, Scuola Superiore Sant’Anna, Pisa, Italy
27.11.2017, 14:00 Uhr
Information Geometry Seminar : Statistical Manifold and Entropy-Based Inference
Jun Zhang, University of Michigan-Ann Arbor, USA
10.11.2017, 11:45 Uhr
Information Geometry Seminar : From Natural Gradient to Riemannian Hessian: Second-order Optimization over Statistical Manifolds
Luigi Malagò, Romanian Institute of Science and Technology (RIST), Romania
16.10.2017, 14:00 Uhr
Information Geometry Seminar : Hamilton-Jacobi approach to Potential Functions in Information Geometry
Domenico Felice, University of Camerino, Italy
10.05.2017, 14:00 Uhr
Special Seminar : Polyquantoids and quantoids: quantum counteparts of polymatroids and matroids
František Matúš, Czech Academy of Sciences, Prague, Czech Republic
25.11.2016, 15:30 Uhr
Seminar on Theory of Embodied Intelligence : On Information and the Drivers of Cognition
Daniel Polani, University of Hertfortshire, United Kingdom
28.06.2016, 15:30 Uhr
Seminar on Theory of Embodied Intelligence : Minimum-Information Planning in Partially-Observable Decision Problems
Roy Fox, School of Computer Science and Engineering, Hebrew University, Israel
10.05.2016, 11:00 Uhr
Seminar on Theory of Embodied Intelligence : Musculo-Skeletal Models of Human Movement: Tools to Quantify Embodiment
Daniel Häufle, Stuttgart Research Center for Simulation Technology, University of Stuttgart, Germany
30.03.2016, 11:00 Uhr
Seminar on Theory of Embodied Intelligence : Musculo-Skeletal Models of Human Movement: Tools to Quantify Embodiment
Daniel Häufle, Stuttgart Research Center for Simulation Technology, University of Stuttgart, Germany
21.01.2016, 11:00 Uhr
This talk is canceled!
Arbeitsgemeinschaft NEURONALE NETZE UND KOGNITIVE SYSTEME : Toward a Quantum Theory of Cognition: History, Development and Perspectives
Tomas Veloz, University of British Columbia, Canada
09.11.2015, 14:00 Uhr
Special Seminar : On asymptotic optimality of ML-type detectors in quantum hypothesis testing
Sajad Saeedinaeeni, Universität Leipzig
08.04.2015, 14:00 Uhr
Special Seminar : Information-Theoretic Cheeger Inequalities
Peter Gmeiner, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
17.03.2015, 14:00 Uhr
Seminar on Theory of Embodied Intelligence : Intelligent motility control of biological swimmers
Benjamin Friedrich, Max-Planck-Institut für Physik komplexer Systeme, Dresden
10.03.2015, 11:00 Uhr
Special Seminar : Quantum information geometry as a foundation for quantum theory beyond quantum mechanics
Ryszard Kostecki, Perimeter Institute for Theoretical Physics, Waterloo, Canada
18.02.2015, 14:00 Uhr
Seminar on Theory of Embodied Intelligence : Towards an Alchemy of Intelligence
Oliver Brock, Technische Universität Berlin, Robotics and Biology Laboratory
19.01.2015, 11:00 Uhr
Special Seminar : Algebraic Problems Related to Entropy Regions
František Matúš, Academy of Sciences of the Czech Republic, Institute of Information Theory and Automation
15.01.2015, 10:30 Uhr