Sign Language Recognition using Facial Gesture and Skeleton Keypoints

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 07 | July 2022

p-ISSN: 2395-0072

www.irjet.net

Sign Language Recognition using Facial Gesture and Skeleton Keypoints Sheela N1, Kiran Raghavendra2, Shashank C2, Sanjana S2, Dhanush N S2 1Assistant

Professor, Dept of Computer Science Engineering, JSS Science and Technology University, Mysuru of Computer Science Engineering, JSS Science and Technology University, Mysuru ---------------------------------------------------------------------***--------------------------------------------------------------------2Dept

Abstract - Sign Language is used by people who are

depending on the number of times it is repeated. Secondly, different signers perform signs differently (e.g., speed, body shape and posture, left handed or right handed) thus making SLR challenging.

speech impaired or hard of hearing. Sign Language Recognition aims to recognise signs performed by the signer in the input videos. It is an extremely complex task as signs are performed with complicated hand gestures, body posture and mouth actions. In recent times, skeleton based models are preferred for sign language recognition owing to the independence between subject and background. Some sign languages make use of mouthings/facial gestures in addition to hand gestures for signing specific words. These facial expressions can be used to assist the convoluted task of sign language recognition. Skeleton based methods are still under research due to lack of annotations for hand keypoints. Significant efforts have been made to tackle sign language recognition using skeleton based multi-modal ensemble methods, but to our knowledge none of them take facial expressions into consideration. To this end, we propose the usage of face keypoints to assist skeleton based sign language recogntion methods. As a result, skeleton based methods on addition of facial feature information achieves an accuracy of 93.26% on AUTSL dataset.

Inspired by the recent developments on SLR using multimodal methods [4], we propose the usage of pretrained facial keypoint estimators to provide additional facial gesture information to the SLGCN + SSTCN ensemble framework proposed in [4]. We propose the usage of SLGCN [4] and SSTCN [4] to exploit facial gesture information using face keypoints generated using pretrained estimators thus assisting the complex task of sign language recognition.

2. RELATED WORK In this section, we review existing publicly available datasets for sign language, and existing state-of-the-art algorithms for sign language recognition.

2.1 Sign Language Datasets

Key Words: Sign language recognition, Skeleton based methods, Face expression, SLGCN, SSTCN, Wholepose keypoint estimator, AUTSL dataset

A Word Level American Sign Language (WLASL) dataset is proposed in [1], containing over 2000 words performed by over 100 signers.

1. INTRODUCTION

A Turkish Sign Language dataset is proposed in [6]. The dataset consists of 226 signs performed by over 43 different and 38,336 sign video samples in total. Samples contain a variety of videos in different backgrounds (both indoor and outdoor environments).

Sign Language is a means of communication for people who are hard of hearing or speech impaired. It is a visual language involving hand gestures, body posture and mouth actions. Comprehending sign language requires remarkable effort and training which is not feasible for the general public. In addition sign language is affected by the language of communication (e.g., English, Chinese, Italian) and region of usage (e.g., American Sign Language, Indian Sign Language). With advancements in computer vision and machine learning it is essential to explore sign language recognition (SLR) which translates sign language and helps the deaf/speech impaired community to communicate easily with others in their daily life.

Reference [3] introduces a 3D hand pose dataset based on synthetic hand models.

2.2 Sign Language Recognition Approaches An appearance based approach and 2D human pose based approach is proposed in [1] creating baselines that aid method bench marking. In addition, [1] proposes a posebased temporal graph convolution networks (Pose-TGCN) that models spatial and temporal dependencies.

In comparison with action recognition or pose estimation, SLR is a extremely challenging task. Firstly, SLR requires information of global body motions and intricate movement of hands and fingers to express the sign correctly. Similar signs can interpret different meanings

© 2022, IRJET

|

Impact Factor value: 7.529

A Two-Stream Inflated 3D ConvNet (I3D) is proposed in [7]. It is based on 2D ConvNet inflation: filters and pooling kernels of very deep image classification convnets are

|

ISO 9001:2008 Certified Journal

|

Page 746


Turn static files into dynamic content formats.

Create a flipbook