We propose to supervise students' research projects and Bachelor's and Master's theses. If you are a student and interested in getting supervised by us, feel free to have a look at the proposed topics, or to contact us to suggest yours! We can also create a topic together.
Here is a list of proposed topics for research projects and master's theses. These projects can be extended or refined to match your own wishes.
Inverse reinforcement learning with non-stationary goals
Reinforcement Learning (RL) is an area of machine learning in which an agent aims to perform actions in a given environment in order to maximize a reward. The choice of a reward is a key component of RL systems and an incorrect choice of a reward can lead to improper behaviours or a misalignment of the artificial agent. Inverse reinforcement learning (IRL) is the inverse problem of RL and consists in estimating the reward of a RL system based on the observation of the agent's behaviour. In its standard form, IRL assumes that an agent has a fixed and stationary reward, but this assumption might be unrealistic: in many scenarios, the agent can change objectives in the middle of its process, leading to an evolution of its reward. The goal of this project is to incorporate knowledge about this evolution of the reward. The project will start with the simplest case: an agent switching from a reward 1 to a reward 2 at one (unknown) point in time. Depending on how fast the candidate will progress, other cases can be investigated: more than one change, recurrent rewards etc.
Building mental models of 3D simulation of scenes based on projections
When interacting with our environment, we build a mental model of it, in particular of its spatial configuration. This is also the case when we observe for instance a photo, which is a 2d projection of a 3d scene. The idea of cognitive agents having a mental model of the world is exposed in particular in Tenenbaum's idea of "intuitive physics" (see the reference paper https://harvardlds.org/wp-content/uploads/2017/11/Ullman-Spelke-Battaglia-Tenenbaum_Mind-Games_2017.pdf). In this project, the goal is to reconstruct a 3d scene from a 2d projection of it (which can be considered as a photography). The idea will be to use a 3d engine and to reconstruct a given scene based on one projection. The project will start with the task of controlling the engine through python, and will be followed by the reconstruction task itself.
Impossibility of inferring the level in cognitive hierarchy of a user
When interacting with other agents, we adapt our actions to the others, anticipating what how they are interpreting them and how they anticipated them. This idea was formalized by level-k and cognitive hierarchy theories. In short, these theories assume various levels of modeling of the other. In level 0, it is assumed that the other agent does not pursue a goal; in level 1, that the other agent pursues a goal but does not think we are pursuing a goal; in level 2, that it pursues a goal and perceives us as pursuing a goal but not knowing it pursues a goal etc. The cognitive level then has a consequence on the played actions. In this project, we aim to answer an important question: is it possible to infer at the same time the goal of an agent and its model of the others. Our conjecture is that this is not possible, and the goal of the project is to show this theoretically and/or empirically. This work will be following the idea developed by Mindermann and Armstrong.
Planning on large horizons divided into short sessions
Language learning apps like Duolingo mostly rely on heuristics to choose the word in which to propose the teaching material to their users. An alternative is to develop a "model" of the learner, i.e. a probabilistic model describing what the learner remembered and how likely they are to make mistakes at each step for each word. These methods are particularly efficient, since they are able to predict at each step what the learner knows and forgot, but they come with an inconvenience: they make it difficult for the teaching app to choose an optimal action to play at each step: indeed, this requires to compute the memorization state of the learner for all trajectories of actions over time. This is impossible, since the horizon is very large and the size of the teaching material can be very large. In this project, we propose to exploit the fact that learning is divided into learning sessions of reasonable length. We propose a hierarchical planning in which we first let the algorithm plan over some local learning goals (e.g. "at the end of the session, remembering words 1 to 7", then over teaching actions to reach this goal in which session.
A hierarchical model of a population of learners
Language learning apps like Duolingo mostly rely on heuristics to choose the word in which to propose the teaching material to their users. An alternative is to develop a "model" of the learner, i.e. a probabilistic model describing what the learner remembered and how likely they are to make mistakes at each step for each word. One of the difficulties of this method is that it requires to estimate memorization parameters for each word and each user, on a very limited number of interactions. When considering a population of learners, things can be easier: if all learners have difficulties memorizing a word, this provides a strong indication that this word is hard to remember. The goal of this project is to develop a simple probabilistic memorization model at the scale of a population, inspired by hierarchical models like matrix factorization, and to test various inference methods on it. Depending on the progress made during the project, the model could be tested on real data.