Skill memory in biped locomotion: Using perceptual information to predict task outcome

Skill memory in biped locomotion Using perceptual information to predict task outcome

J. Andre1, C. Santos2, L. Costa3 Department of Industrial Electronics, University of Minho, joaocandre@dei.uminho.pt 2 Department of Industrial Electronics, University of Minho, cristina@dei.uminho.pt 3 Department of Production and System, University of Minho, lac@dps.uminho.pt

Abstract Robots must be able to adapt their motor behavior to unexpected situations in order to safely move among humans. A necessary step is to be able to predict failures, which result in behavior abnormalities and may cause irrecoverable damage to the robot and its surroundings, i.e. humans. In this paper we build a predictive model of sensor traces that enables early failure detection by means of a skill memory. Specifically, we propose an architecture based on a biped locomotion solution with improved robustness due to sensory feedback, and extend the concept of Associative Skill Memories (ASM) to periodic movements by introducing several mechanisms into the training workflow, such as linear interpolation and regression into a Dynamical Motion Primitive (DMP) system such that representation becomes time invariant and easily parameterizable. The failure detection mechanism applies statistical tests to determine the optimal operating conditions. Both training and failure testing were conducted on a DARwIn-OP inside a simulation environment to assess and validate the failure detection system proposed. Results show that the system performance in terms of the compromise between sensitivity and specificity is similar with and without the proposed mechanism, while achieving a significant data size reduction due to the periodic approach taken. Keywords Reinforcement learning · Bio-inspired · Skill Memory

robótica

artigo científico

2.ª Parte

condition in equation (16) is satisfied, then the null hypothesis of being a successful trial is not rejected and the sensor data is assumed to be in conformity with the expected values - there are no signs of failure conditions. If, on the other hand, ytrial(n) is out of the confidence bounds established in (16), then there is a high probability of failure occurring, or that movement objective is not achieved at the end of the trial. Whether or not the current trial is flagged as failure depends on the thresholds for failure detection: the minimum number of sensors M and the minimum number of consecutive instants failing N. Simply put, if the system detects at least M sensors failing for N instants consecutively, then it is predicted that, based on the previous experiences stored into the ASM, task execution will fail. 6.1. Detection Accuracy Failure detection was interpreted as a simple two-class classification problem, characterized by the sensitivity, considered to be the probability of detecting a failure on unsuccessful trials and, similarly, by a specificity value, the probability of rejecting failures on successful trials. Conclusions about the performance of the failure detection algorithm were based on a detection score computed from sensitivity and specificity values: (17)

6. FAILURE DETECTION In order to properly take advantage of the information stored into the ASM, we propose a system that monitors continuously the execution of a motor skill (in this case biped locomotion) and looks for deviations that could evolve into movement failures. The failure detection protocol we introduce in this work was inspired by Pastor et al. [20], but utilizes a more reﬁned statistical analysis in order to achieve the best results possible. Once again we anchor the whole process on the phase values at any given time. At each instant n of the simulation, and , reconstructed from the trained DMPs, provide the ASM values for the correspondent phase (n), upon which a statistical z-test is performed. Thus, a tolerance interval is established for each sensor according to:

which can be interpreted as inversely proportional to the distance to the optimal operation point of maximal (100%) sensitivity and speciﬁcity - a higher detection score implies better performance when detecting failure conditions.

robótica 100, 3.o Trimestre de 2015

Figure 4. ROBOTIS DARwIn-OP humanoid robot in Webots.

(16)

where ytrial(n) is the sensor data of the current simulation; (n) is the phase at the current instant n in this trial, z =2.57 for a conﬁdence level of 99%, and and represent the mean and standard deviation values stored into the ASM. If the

6.2. Optimal Parameterization M and N, thresholds for the number of sensors and number of consecutive time steps, respectively, have a signiﬁcant impact on the performance of the failure detection system. With no a priori knowledge or practical know-how about how the sensor readings vary, it is hard to estimate the optimal values for these