International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 12 Issue: 09 | Sep 2025
p-ISSN: 2395-0072
www.irjet.net
Text To Image Generation Using Deep Learning Shamanth D 1, Dr. Rashmi C R2, Dr. Shantala C P3 1PG Student, Dept. Of Computer Science & Engineering, Channabasaveshwara Institute of Technology, Gubbi,
Karnataka, India
2Assistant Professor, Dept. Of Computer Science & Engineering, Channabasaveshwara Institute of Technology,
Gubbi, Karnataka, India 3 Professor & Head, Dept. Of Computer Science & Engineering, Channabasaveshwara Institute of Technology, Gubbi, Karnataka, India
---------------------------------------------------------------------***--------------------------------------------------------------------text, but also brings to light issues like computational complexity, linguistic ambiguity, bias in datasets, and revolutionary field of Artificial Intelligence that risks to the privacy of users. Finally, this paper combines the development of Natural Language demonstrates the use of deep learning to connect human Processing (NLP) and Computer Vision (CV) in recent linguistic expression to visual creativity and offers an years. This paper discussed in this report discusses the extensible and usable platform capable of generating design and implementation of a deep learning model cross-modal content. that can translate natural language description to realistic images by using Stable Diffusion and other Key Words: Text-to-Image Generation, Stable Diffusion, supporting architectures, including GANs and Deep Learning, Generative Adversarial Networks, Attentional GANs (AttnGAN). Natural Language Processing, CLIP Embeddings, Image The original GAN models or conventional image Synthesis. synthesis approaches (such as template-based retrieval) were characterized by low-resolution, semantic 1.INTRODUCTION mismatches, and low-adaptability. To overcome these In the modern digital landscape, visual media plays a constraints, this work combines latent diffusion models vital role in conveying information, storytelling, and (Stable Diffusion v1.5) that are trained on the massive stimulating creative processes across disciplines. The LAION-5B dataset, which are loaded via the Hugging exponential growth in artificial intelligence Face diffusers library. It is written in a Jupyter Notebook, technologies, especially in the realm of deep learning, giving the user an interactive environment to input text has transformed how computers interpret and generate prompts, use positive and negative conditioning, and complex imagery. A key breakthrough in this area is the make real-time parameter changes, including guidance concept of Text-to-Image Generation, whereby scale, denoising steps, and resolution. sophisticated algorithms synthesize original, highly The algorithm uses text preprocessing and embedding realistic images based solely on detailed written through CLIP encoders, latent space denoising, and descriptions. This interdisciplinary technique bridges refinement, to generate semantically correct and linguistic expression and visual representation, opening photorealistic images. This is evaluated against up new possibilities for innovation in sectors such as quantitative scores (Inception Score, Fréchet Inception advertising, digital art, interactive entertainment, Distance, SSIM, Precision / Recall) and subjective human virtual learning, medical simulation, and user interface judgement, so that realism, diversity, and semantic design. fidelity are balanced. Text-to-image generation involves training neural Findings indicate that GAN-based models are sharper networks to grasp the nuances and intricacies of human and more detailed, but diffusion-based models are more language, converting abstract or descriptive sentences semantically consistent, more likely to generalize, and into visually rich and contextually relevant images. more photorealistic. The system has been able to Unlike conventional image processing where the focus produce quality images in various fields such as art and is often on filtering, classification, or retrieving existing design, education, healthcare simulation, gaming and in visual assets this method requires the system to production of digital media. understand complex semantics, discern contextual This paper does not only show how diffusion-based clues, and recreate features in a way that mirrors the architectures could practically synthesize images using
Abstract - Text-to-image generation is a new
© 2025, IRJET 28
|
Impact Factor value: 8.315
|
ISO 9001:2008 Certified Journal
|
Page 128