Active Object Recognition by offline solving of POMDPs by cie

robótica

Susana Brandão1, Manuela Veloso2 and João Paulo Costeira3 1 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, USA. sbrandao@ece.cmu.edu 2 Department of Computer Science, Carnegie Mellon University, Pittsburgh USA. veloso@cs.cmu.edu 3 Department of Electrical and Computer Engineering, IST-Universidade Técnica de Lisboa, Lisboa Portugal. jpc@isr.ist.utl.pt

ARTIGO CIENTÍFICO

Active Object Recognition by Ofﬂine Solving of POMDPs ABSTRACT In this paper, we address the problem of recognizing multiple known objects under partial views and occlusion. We consider the situation in which the view of the camera can be controlled in the sense of an active perception planning problem. One common approach consists of formulating such active object recognition in terms of information theory, namely to select actions that maximize the expected value of the observation in terms of the recognition belief. In our work, instead we formulate the active perception planning as a Partially Observable Markov Decision Process (POMDP) with reward solely associated with minimization of the recognition time. The returned policy is the same as the one obtained using the information value. By recognizing observations as a time consuming process and imposing constrains on time, we minimize the number of observations and consequently maximize the value of each one for the recognition task. Separating the reward from the belief in the POMDP enables solving the planning problem ofﬂine and the recognition process itself becomes less computationally intensive. In a focused simulation example we illustrate that the policy is optimal in the sense that it performs the minimum number of actions and observation required to achieve recognition. Index Terms – ignore

I. INTRODUCTION Object recognition is still an open problem. From the choice of features to the actual classiﬁcation problem, we are still far from a global recipe that would allow for a complete discriminative approach to recognition. The large majority of object recognition community is focused on ofﬂine, database driven tasks. State of the art is measured with respect to performance in datasets gathered from web images such as the Pascal challenge datasets. Two problems arise from the

use of such datasets. The first is the large variability of images. The second is the incapacity to look at the scenes from different poses that would provide different, and probably more discriminative, views of objects that would help to segment objects from the background. In the context of a robot moving in a constrained environment, the object variability is no longer present. The chairs in an office building are all very similar to each other and will be the same for long periods of time. For a robot moving in such a building, the model for a chair can be much simpler and efficient than a model built from web datasets. So, in this project, we assume that recognition can be feasible in such an environment. However, in spite of having highly accurate models of each object in a room, the robot may not be able to completely distinguish between two different types of objects. Both self-occlusion and occlusion caused by other objects may cover the distinctive parts of an object, making the robot unable to distinguish between two object classes: the object classes are ambiguous given the occlusion. This ambiguity appears, for example, between a computer screen and a card box. Although they may look the same when the robot is directly in front of them, if the robot looks to the side of the screen it should be able to correctly differentiate the screen from the card box. Since the robot will never have access to all the views of the object at a given time instant, the type of ambiguity described arises even when the robot performs 3D object recognition. The robot only has access to partial information on the object until it decides to move with relation to the object. We assume that most of the ambiguity in object recognition can be removed by having the robot looking to objects through different angles. I.e., we assume that, in spite the ambiguity between ob-

ject A and object B, there is always an angle in A or B from which the objects can be disambiguated. There is a vast literature on active perception and the reader may find a detailed overview of the field with special focus on multi-view object recognition at Chen et. al. [1]. In recent years, the main contributions to the field concern the algorithms used to come up with a policy. In the early 2000’, approaches (e.g. [2], [3]) focused on information theory arguments to make decisions. The next viewpoint in a task was selected in order to minimize an entropy function, i.e., to minimize the uncertainty in the state. The cost of the whole plan in terms of time and energy is neglected. In a recent work of R. Eidenberger and J. Scharinger, [4], an action control cost is added to the value of information reward. In this work, we consider the problem solely as the minimization of time to recognition. Since time is spent in both image processing and movement actions, by minimizing time we guarantee that the viewpoints selected for image processing are the most informative. To minimize the number of movement actions and the time spent in image processing we formalize our problem as a Partially Observable Markov Decision Process, POMDP. The partial observability arises from the incapacity of the robot to see the whole object from the same viewpoint. The formalization of active object recognition as a POMDP is also present in some recent works. In particular we refer to [4]. However, in their work, POMDP rewards are linked to the expected minimization of information entropy from the next observation. In practice, the reward of future actions needs to be computed online, since it depends on the entropy of the current state. The dependency of rewards on the current state means that