Human Pose Estimation for Yoga Using VGG-19 and COCO Dataset: Development and Implementation of a Mo by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 08 | Aug 2024

p-ISSN: 2395-0072

www.irjet.net

Human Pose Estimation for Yoga Using VGG-19 and COCO Dataset: Development and Implementation of a Mobile Application Dhadkan Shrestha1, Peshal Nepal2, Pratik Gautam3, Pradeep Oli4 1Texas State University, San Marcos, TX, US 78666 2Georgian College, Barrie, ON L4M 3X9, Canada 3Georgian College, Barrie, ON L4M 3X9, Canada

4Thapathali Engineering Campus, Kathmandu, Nepal

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Human Pose Estimation (HPE) is a critical

advancement of artificial intelligence and has applications in human-computer interaction, augmented reality, virtual reality, training robots, and activity recognition [2]. HPE is critical in various fields such as healthcare, sports, and entertainment. In healthcare, it is used for monitoring and analyzing physical therapy exercises to ensure patients perform movements correctly, reducing the risk of injury. In sports, it aids in performance analysis, helping athletes improve their techniques. In entertainment, HPE enables the creation of more interactive and immersive experiences in video games and virtual reality.

technology in computer vision with diverse applications ranging from healthcare to sports analysis. This project presents a method for detecting the 2D stance of multiple persons in an image using a nonparametric representation known as Part Affinity Fields (PAFs). By leveraging the first 10 layers of the VGG-19 convolutional neural network and training on the COCO dataset, our model effectively identifies and associates key points of the human body. The architecture employs a two-branch system that jointly learns part locations and their associations through sequential prediction. This enables the model to maintain real-time performance while achieving high accuracy, regardless of the number of persons in the image. To enhance accessibility, we developed a mobile application using Flutter and TensorFlow Lite, allowing real-time pose estimation via a mobile device’s front camera. The app provides immediate feedback on physical exercises and yoga poses, making it an invaluable tool for fitness enthusiasts and healthcare professionals. Visual outputs such as heatmaps and PAFs confirm the model’s capability to accurately localize and connect key points. Despite potential challenges such as data quality and hyperparameter tuning, the results indicate that our approach is both reliable and practical for real-world deployment. This project not only advances the state-of-the-art in HPE but also opens possibilities for future enhancements, including integrating 3D pose estimation and applying the technology in augmented and virtual reality applications.

There are several approaches to modeling a human body in pose estimation, which can be broadly categorized into three types:

Human Pose Estimation (HPE) is the process of identifying, tracking, predicting, and classifying the movement and orientation of the human body through input data from images or videos. It captures the coordinates of the joints, including the knees, shoulders, and head. The three primary approaches to modeling a human body are Skeleton-based, Contour-based, and Volume-based models [1]. HPE has been evolving with the

Impact Factor value: 8.226



Contour-based Models: These models focus on the outer contour of the body, capturing the silhouette to infer pose and movement.



Volume-based Models: These models create a volumetric representation of the body, capturing the full 3D structure, which is useful for more detailed analysis.

2D Pose Estimation: This technique involves estimating key points in the joints of the human body in the 2D space for the image or video. It serves as a foundation for more advanced computer vision tasks like 3D human pose estimation, motion prediction, and human parsing.

1. INTRODUCTION

Skeleton-based Models: These models represent the human body as a collection of joints connected by bones. The coordinates of the joints are tracked over time to understand the movement and posture.

HPE can be divided into two primary techniques:

Key Words: Human Pose Estimation (HPE), Convolutional Neural Network (CNN), VGG-19, Part Affinity Fields (PAFs), COCO Dataset, Real-Time Pose Detection



3D Pose Estimation: This technique involves estimating the actual spatial positioning of the body in the 3D space, introducing the z-dimension. It provides a more comprehensive understanding of the body’s posture and movement [3].

ISO 9001:2008 Certified Journal

Page 355