Abstract
We are interested in optimal action selection mechanisms, policies, that maximize an expected long term reward. Our main model are POMDPs (Partially Observed Markov Decision Problems). While the optimal policy can be stochastic in the general case, we find conditions under which the optimal policy is deterministic, at least for some observations, or under which the stochasticity can be bounded. This talk presents joint work with Guido Montúfar and Nihat Ay.
∘ Image created by Copilot Designer, Bing.