Issuu

International Research Journal of Engineering and Technology (IRJET) Volume: 11 Issue: 06 | Jun 2024

www.irjet.net

e-ISSN: 2395-0056 p-ISSN: 2395-0072

A Survey on Large Language Models: Overview and Applications Mangi Nikhil1, Yeshwanth Reddy Vallela2, Gajji Chakrapani3 1 Student, Department of Computer Science and Engineering, Sri Indu College of Engineering and

Technology, Telangana, India.

2 Student, Department of Cyber Security, Sri Indu College of Engineering and Technology, Telangana, India. 3 Student, Department of Artificial Intelligence and Machine Learning, Sri Indu College of Engineering and

Technology, Telangana, India.

---------------------------------------------------------------------***--------------------------------------------------------------------context behind it. This improved language understanding; Abstract— Large Language Models (LLMs) are a those models were trained on capturing relations breakthrough in natural language processing that have between the large text corpora. After that Deep learning revolutionized how computers understand and generate techniques revolutionized the NLP language modeling human-like language. This paper provides a using RNNs (recurrent neural networks) and LSTM (long comprehensive introduction to LLMs and generative AI. It short-term memory) here in recurrent neural networks, covers the history and evolution of language models, the they handle sequential data and take information from transformer architecture that enabled powerful LLMs like previous inputs by maintaining a hidden state. At each GPT and BERT, and the key applications of LLMs across stage, an RNN processes current input, updates its hidden diverse domains. The paper discusses the design cycle state, and repeats. We also have LSTM, which addresses for building domain-specific LLM models, leveraging the vanishing gradient problem that occurs in training open-source options like Llama 2. With insightful literature neural networks. LSTMs store long-term sequences, review and technical details, this study serves as a which helps in capturing dependencies b/w inputs. When beginner's guide for understanding LLMs' potential, we apply RNNs and LSTM to language modeling, we use a capabilities, and the process of harnessing their power for model called sequence-to-sequence modeling (encoderspecialized use cases. As generative AI reshapes decoder model). encoder process input encodes into industries, this paper equips readers to embrace the a fixed-length representation (context vector). The transformative impact of LLMs responsibly and effectively. decoder takes the context vector and generates the output. Here, hidden states are initialized by the context vector Keywords—Large Language Models, Recurrent generated by the encoder. Neural Networks, Chat GPT, Generative AI, etc., In 2017, a research study proposed the transformer architecture [1] "Attention is All You Need" by Vaswani et al. This changed everything in language modeling. This is one of the greatest advancements that laid the foundation for many models like ChatGPT, BERT, and many more. These models do not depend on recurrent connections; instead, they use attention mechanisms. It allows parallel processing, efficient usage of hardware, and also helps to understand long-range dependencies. Transformers work this way; they transform one sequence to another. In a sentence based on the situation, sometimes the context behind it may vary. Things like word order and turns of phrases mix things up as transformers work on sequence-to-sequence learning; they take a sequence of tokens and process the input to generate an output sequence. It consists of an encoder and a decoder. An encoder processes the input sequence, whereas a decoder processes the output sequence. The encoder generates the encodings that define which parts of the input sequence are relevant to each other and passes these to the next encoder layer. The decoder uses all these encodings and uses their derived context to generate the output sequence.

1. INTRODUCTION Natural language processing is a field of computer science where we build algorithms and models that make computers understand human language. LLM (Large Language Models) are also known as transformative or next-generative language models. These models brought so many changes in natural language processing that help in understanding complex patterns, structures in the languages and identify the semantic relations between words in a sentence. LLMs are trained on large data sets and use transformer architecture which uses deep learning techniques when compared to other NLP techniques. These LLMs not only understand text, it can also process images, videos, and audio files. They can also be used for sentiment analysis. This has the upper hand in understanding or mimicking human-understandable language. In the early stages of language modeling statistical methods were adopted where we predicted the upcoming words, few such models are N-grams, Hidden Markov Models. Based on the training data they observe the previous words of the sentence and suggest new words. Then after a few proposed machine learning approaches. We need the machine to understand the © 2024, IRJET

Impact Factor value: 8.226

Models like Chatgpt and Bert emerged after the transformer architecture came to light. OpenAI’s GPT (Generative Pre-Trained Transformers) and Google’s BERT |

ISO 9001:2008 Certified Journal

Page 1025