Skip to main content

A COMPREHENSIVE STUDY ON TEXT-TO-IMAGE SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORKS(GAN’s)

Page 1

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 05 | May 2024

www.irjet.net

p-ISSN: 2395-0072

A COMPREHENSIVE STUDY ON TEXT-TO-IMAGE SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORKS(GAN’s) 1SIDRAL ROJA, 2N CHANDANA, 3A AKHILA, 4ASMA BEGUM 1,2,3B.E., Department of ADCE, SCETW, OU Hyderabad, Telangana, India 4Assistant Professor, ADCE, SCETW, OU Hyderabad, Telangana, India

------------------------------------------------------------------------***--------------------------------------------------------------------------Abstract: Using automated image generation from textual descriptions, this work introduces a novel way to text-toimage synthesis. Our approach leverages sophisticated neural networks, such as Generative Adversarial Networks (GAN’s), to overcome multi-modal learning issues. It does this by improving visual realism, handling many scenes, guaranteeing semantic consistency, and facilitating style transfer. The architecture combines a sophisticated text encoder, flexible generator network, larger datasets , conditional discriminator, and painstaking detail-oriented design. Its potential significance is highlighted by rigorous evaluation measures and a variety of applications, which pave the way for further improvements and signal a paradigm change in the capabilities of text-to-image synthesis. Keywords: Text-to-image synthesis, Generator network, Enhanced text encoder, Conditional discriminator, Generative models.

INTRODUCTION Text-to-image synthesis is an exciting and rapidly evolving field within the realm of artificial intelligence. Its primary objective is to develop automated models that possess the ability to understand and interpret detailed textual descriptions, subsequently generating corresponding visual representations. This task is inherently intricate due to the necessity of seamlessly merging the realms of natural language processing and computer vision, thereby demanding a sophisticated level of creativity and ingenuity. Despite its immense potential, text-to-image synthesis remains relatively underexplored compared to other wellestablished domains within machine learning, such as object recognition. The complexity of this field stems from its inherent requirement to integrate and reconcile multimodal information, effectively combining textual cues with visual inputs.

discriminator network responsible for assessing authenticity of these generated images.

the

One of the significant advancements in text-to-image synthesis has been the introduction of conditional variations within GANs, known as conditional GANs (cGANs). These models have the capability to incorporate additional inputs, such as class labels or textual descriptions, enabling them to generate images that align more closely with the semantics of the provided textual input. Despite these advancements, several challenges persist within the realm of cGAN-based approaches. These challenges include the imperative of maintaining semantic coherence between textual descriptions and generated images, preserving fine details during the synthesis process, and effectively handling diverse object classes and scenes within a single image. In summary, while significant progress has been made in the field of text-to-image synthesis, there still exist numerous opportunities for further research and development aimed at addressing these challenges and Unleashing the complete capabilities of this revolutionary technology.

2.LITERATURE SURVEY

Generative Adversarial Networks (GANs) have emerged as a powerful and versatile architecture for text-to-image synthesis. GANs consist of two primary components: a generator network tasked with producing images, and a

[1] Scaling Up GANs for Text-to-Image Synthesis (2023) by Minguk Kang and crew presents GigaGAN, an adaptable textto-image synthesis model based on StyleGAN. The results show competitiveness, though it falls short of achieving photorealism compared to some counterparts. Summary: This work focuses on scaling GANs and demonstrates the

© 2024, IRJET

ISO 9001:2008 Certified Journal

|

Impact Factor value: 8.226

|

|

Page 1561


Turn static files into dynamic content formats.

Create a flipbook