International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 12 Issue: 11 | Nov 2025
p-ISSN: 2395-0072
www.irjet.net
Research Gaps in Developing Fair and Inclusive LLMs for India’s Multilingual Agricultural Landscape Anamika Singh1, Archita Agar2, Veena Kulkarni3 ,Dr. Ranjita Akash Asati4, Dr. Neha Patwari5 1,2,4,5 Assistant Professor, Dept. of IT, Thakur Engineering College, Mumbai, India 3 Assistant Professor, Dept. of COMP, Thakur Engineering College, Mumbai, India
---------------------------------------------------------------------***--------------------------------------------------------------------Abstract - Indian farmers speak more than 22 official languages and that is not even counting all the regionalism. The country is having mix of languages and cultures makes it difficult to build Large Language Models (LLMs) which are actually work for Indian agriculture sector. It is not just about the large number of different languages. There are other issues, like some languages are having less resources, different scripts, multilanguage in the same sentence and shortage of useful information and agriculturespecific data. This paper focus on the main roadblocks: there is hardly any annotated data, rural conversations come with their own social idiosyncrasy and the words people use for crops are always different from one place to another. To solve these issues few strategies—like building agricultural knowledge graphs, borrowing from high-resource languages through transfer learning and putting together multilingual corpora focused on farming. Indian-language LLMs can do, like monitoring forecasting crop yields, giving real-time suggestion, detecting diseases and pests and supporting farmers to access government programs and policies. With the help of LLMs that really understand local languages could make a benefits like it can help more people get online, give better information to farmers for making better decisions and support sustainable agriculture. This research shares some ideas and recommendations for building solid LLMs that actually fit India’s unique agricultural and linguistic landscape. Key Words: Languages Spoken In India, Agriculture, Multilingual Artificial Intelligence, Low-Resource Languages, Digital Inclusion, Crop Advisory, Large Language Models (LLMs), And Natural Language Processing (NLP)
1.INTRODUCTION India is a mix of different languages. There are near by 22 official languages and approximately 20,000 dialects spoken across the country. It is a place where within few miles people speaking in completely different ways [1][2]. That is really matters, especially for the 150 million farmers working across India. Most of them do the communication, get the suggestions and figure out things like weather or disease warnings in their own languages or local dialects. So, multilingualism is not just common in Indian agriculture but it is essential [2][3]. There are maximum AI and natural language technologies are heavily toward famous languages like Hindi and English. That is the big problem for the farmers especially those in rural areas with different languages [4][5]. Big language models like GPT-4, LLaMA, and BERT have done good job in text generation and understanding the language [11][12]. These models work well only when you are using a major language. But the moment required information or need help in a less common or code-mixed language (which happens constantly in Indian farming), they disappointed[14][18]. Languages like Tamil, Marathi, Punjabi, and Assamese each come with their own specialty like the way people speak, write, and creating the form of words can depart. That creates a whole new set of technical problems. General-purpose language models rarely work on the words farmers actually use. They facing the problem related to crop names, soil types, fertilizers, climate patterns. Because of these issues scientists are building models for multilingual language in agriculture field[20][25]. They train these models for farmer helpline calls to extension service handbooks, scientific articles and even local news. Projects like AI4Bharat, IndicNLP, and Bhashini which help to have open-source datasets and models for Indian languages. These tools provides things like voice-based advice, yield prediction, spotting crop diseases and smart irrigation in local languages. The goals are making AI useful for every farmer, no matter what language they speak [28][30]. One of the biggest problems is the lack of well-labeled data. There is also no standard way to organize agricultural knowledge across languages and code-mixed text for creating mess [32][34]. On other side each and every script that is from Devanagari to Tamil, Telugu, Bengali, or Malayalam—works differently. It is difficult to handle basics like splitting words or cleaning up messy text. To solve these issues, refer the new ideas like combining images and text, fine-tuning models for specific farm topics and building smarter ways for models to jump between languages [33][37]. But language isn’t the only challenge. The AI needs to understand local culture, geography, environmental factor and even the pattern of the farming calendar, not only to translate words. Depending on the geographical region or crop which mean totally
© 2025, IRJET
|
Impact Factor value: 8.315
|
ISO 9001:2008 Certified Journal
|
Page 553