Skip to main content

A Review Paper on Real-Time Hand Motion Capture

Page 1

International Research Journal of Engineering and Technology (IRJET) Volume: 09 Issue: 05 | May 2022

www.irjet.net

e-ISSN: 2395-0056 p-ISSN: 2395-0072

A Review Paper on Real-Time Hand Motion Capture Niji K Raj1, Prof. Parvathi V. S.2 1PG

Student, Dept. of Electronics & Communication Engineering, LBSITW, Kerala, India Professor, Dept. of Electronics & Communication Engineering, LBSITW, Kerala, India ---------------------------------------------------------------------***--------------------------------------------------------------------Abstract - Hand Image Understanding (HIU) can be used 2Assistant

for a variety of human-computer interaction applications such as physical or gesture-based controls for virtual reality and augmented reality devices. It is a novel framework that extracts comprehensive information about hand objects from a single RGB image. This paper gives a brief study of the design of a hand motion capture system for real-time extraction of hand shape and poses, of all data modalities, including synthetic and real-image datasets with either 2D or 3D annotations. Key Words: Hand Image Understanding, HumanComputer Interaction, Virtual Reality (VR), Augmented Reality (AR), Motion Capture, RGB Image.

1. INTRODUCTION The hand is our most useful tool for manipulating physical objects and communicating with the outside world. As a result, capturing vision-based 3D hand motion has a wide range of applications, including gaming, biomechanical analysis, robotics, human-computer interaction such as augmented reality and virtual reality (AR/VR), and many others. Despite years of research, it remains an unsolved problem due to the high dimensionality of hand shape, pose and shape variations, self-occultations, and so on. Previous methods concentrated on sparse 3D hand joint location from monocular RGB images. However, discriminative methods based on convolutional neural networks (CNNs) have been used to estimate dense hand poses including 3D shapes from RGB images or depth maps (Fig.1). [1] Convolutional neural networks (CNNs)-based discriminative methods have demonstrated very promising performance in estimating 3D hand poses from RGB images or depth maps. However, the predictions are often based on coarse skeletal representations with no explicit kinematics or geometric mesh constraints. Establishing a personalized hand model, on the other hand, necessitates a generative approach that optimizes the hand model to fit 2D images. Aside from their complexity, optimization-based methods are prone to local minima, and personalized hand model calibration contradicts the ability to generalize for hand shape variations.

© 2022, IRJET

|

Impact Factor value: 7.529

|

Fig -1: Hand pose examples

2. REVIEW ON RELATED WORK Moeslund et al. [1] proposed a novel approach to recovering and tracking a human hand's 3D position, orientation, and full articulation from markerless visual observations obtained by a Kinect sensor. It provides a thorough review covering the general problem of articulated objects for tracking 3D, visual human motion capture, and analysis. The 3D tracking of human hands has several applications, towards developing an effective solution, one has to struggle with the number of interacting factors such as the problem of high dimensionality, and self-occultations that occur while the hand is in action. Therefore, approached an optimization problem, looking for hand model parameters that minimize the difference in the appearance and 3D structure of hypothesized instances of a hand model and actual hand observations. This optimization problem is effectively solved using a Particle Swarm Optimization variant (PSO). The proposed method does not necessitate the use of special markers or a complex image acquisition setup. It provides continuous solutions to the problem of tracking hand articulations because it is model-based. Extensive experiments with the proposed method's prototype GPUbased implementation show that accurate and robust 3D tracking of hand articulations can be achieved in near-realtime (15Hz). Erol et al. [2] proposed a method based on the output, that differentiates between partial pose estimation and full pose estimation, different existing approaches are divided into model and appearance-based. Model-based approaches, provide visual information through a multicamera system but are computationally high in cost. Appearance-based methods, associated with less hardware complexity and computational cost are low.

ISO 9001:2008 Certified Journal

|

Page 3797


Turn static files into dynamic content formats.

Create a flipbook