Partially observable Markov decision
processes (POMDPs) provide a rich mathematical framework for agent
decision making under uncertainty, however, solving a POMDP exactly is
an intractable problem. In the so-called point-based
techniques, a set of belief points are first sampled from the belief
simplex (by letting the agent interact with the environment) and then
planning is performed on these points only. In the references below we developed Perseus, a
randomized version of point-based value iteration that is very
competitive to other methods in terms of computation time and solution
quality. Source code for Perseus is
available.
The key idea is that, in each value iteration step, we can improve the
value of all points in the belief set by only updating the value and
its gradient of a subset of the points, and we select these belief
points in a randomized greedy manner. The figure below
demonstrates Perseus on an example POMDP with two states by showing
three value update stages. The POMDP's belief space is depicted on the
x-axis and the y-axis represents the value of
each belief. We can quickly improve the value of all belief points by
sampling only a subset of the points. The series of figures show how
we keep track of which points have not been improved yet, and sample
only from these.
Matthijs T. J. Spaan and Nikos Vlassis. A
point-based POMDP algorithm for robot
planning. In Proceedings of the IEEE
International Conference on Robotics and
Automation, pp. 2399–2404, New Orleans,
Louisiana, 2004.
Nikos Vlassis and Matthijs T. J. Spaan. A
fast point-based algorithm for POMDPs. In
Benelearn 2004: Proceedings of the Annual Machine
Learning Conference of Belgium and the Netherlands,
pp. 170–176, Brussels, Belgium, January
2004. (Also presented at the NIPS 16 workshop `Planning
for the Real-World', Whistler, Canada, Dec 2003)