Algorithm for calculating relevance of documents in information retrieval systems

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395 -0056

Volume: 04 Issue: 03 | Mar -2017

p-ISSN: 2395-0072

www.irjet.net

Algorithm for calculating relevance of documents in information retrieval systems Roberto Passailaigue Baquerizo1, Paúl Rodríguez Leyva2, Juan Pedro Febles3, Hubert Viltres Sala4, Vivian Estrada Sentí5 1Canciller

Universidad Tecnológica (ECOTEC) Guayaquil, Ecuador 2Departamento de Soluciones Informáticas para Internet, Universidad de las Ciencias Informáticas, La Habana, Cuba 3Departamento Metodológico de Postgrado, Universidad de las Ciencias Informáticas, La Habana, Cuba 4Departamento de Preparación Profesional Universidad de las Ciencias Informáticas, La Habana, Cuba 3Departamento Metodológico de Postgrado, Universidad de las Ciencias Informáticas, La Habana, Cuba ---------------------------------------------------------------------***--------------------------------------------------------------------1. INTRODUCTION

Abstract - This research belongs to the field of information

retrieval and its main objective is the basis of an algorithm to assign the value of relevance to a document concerning a consultation inserted by users on information retrieval systems. The concept of relevance is a fundamental aspect in the design and development of information retrieval systems, because although these tools perform a thorough search of the web, a correct structuring of documents and an efficient storage of the same, if the user it does not obtain the results that actually respond to its search needs, then the quality of the information retrieval system is penalized by the acceptance criteria of the users. The algorithm is based primarily on the classical mathematical expressions for calculating similarity between groups, known as the cosine, jaccard and dice formulas. It has the particularity variation of the similarity based on the relationship established between the search profile of users and categories of documents stored in information retrieval system. In order to get these variables are used text mining and web mining techniques allowing the processing of the information generated by the registration of user queries and metadata stored documents? The main contribution of the research is an algorithm to calculate the relevance of the documents that are provided as part of the responses to queries made by users

This document is template. We ask that authors follow some simple guidelines. In essence, we ask you to make your paper look exactly like this document. The easiest way to do this is simply to download the template, and replace (copy-paste) the content with your own material. Number the reference items consecutively in square brackets (e.g. [1]). However the authors name can be used along with the reference number in the running text. The order of reference in the running text should match with the list of references at the end of the paper. Information Retrieval (IR) is not a new area, but is being developed since the late fifties. However, it now plays a more important role given the value of the information. It can be argued that having or not having the right information in a timely manner can lead to the success or failure of an operation. Therefore, the importance of information retrieval systems (SRI) can handle - with certain limitations - these situations effectively and efficiently [1]. From 1950 to the present many concepts have addressed this particular issue. According to Baeza Yates, one of the most experienced researchers in this field, the term "deals with representation, storage, organization and access to information elements". This concept is defined by Salton as "a field related to the structure, analysis, organization, storage, search and retrieval of information" [11]. Croft estimates that information retrieval is "the set of tasks by which the user locates and accesses information resources that are relevant to problem resolution." Documentary languages, abstract techniques, description of the

Key Words: algorithm, similarity, queries, information retrieval systems, relevance

© 2017, IRJET

|

Impact Factor value: 5.181

|

ISO 9001:2008 Certified Journal

|

Page 243


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.