12 research outputs found

    Applying Genetic Algorithm In Query Improvement Problem

    Get PDF
    This paper presents an adaptive method using genetic algorithm to modify user’s queries, based on relevance judgments. This algorithm was adapted for the three well-known documents collections (CISI, NLP and CACM). The method is shown to be applicable to large text collections, where more relevant documents are presented to users in the genetic modification. The algorithm shows the effects of applying GA to improve the effectiveness of queries in IR systems. Further studies are planned to adjust the system parameters to improve its effectiveness. The goal is to retrieve most relevant documents with less number of non-relevant documents with respect to user's query in information retrieval system using genetic algorithm

    INFORMATION RETRIEVAL USING PAGE RELEVANCY

    Get PDF
    ABSTRACT Retrieval of relevant documents from a large collection of web contents is done through Information Retrieval System. Relevancy of a page can be calculated with the use of similarity measure Thus similarity measures ranks the pages according to the relevancy of each page and helps the information retrieval system to present to the users the more relevant pages earlier in response to a query. In this paper a general frame of Information Retrieval system and search engines are discussed. Various Information retrieval models and types of similarity measures are also presented

    A review on the application of evolutionary computation to information retrieval

    Get PDF
    In this contribution, different proposals found in the specialized literature for the application of evolutionary computation to the field of information retrieval will be reviewed. To do so, different kinds of IR problems that have been solved by evolutionary algorithms are analyzed. Some of the specific existing approaches will be specifically described for some of these problems and the obtained results will be critically evaluated in order to give a clear view of the topic to the reader.CICYT under project TIC2002-03276University of Granada under project ‘‘Mejora de Metaheur ısticas mediante Hibridaci on y sus Aplicaciones

    Measuring retrieval effectiveness based on user preference of documents

    Get PDF

    ON THE USE OF THE DEMPSTER SHAFER MODEL IN INFORMATION INDEXING AND RETRIEVAL APPLICATIONS

    Get PDF
    The Dempster Shafer theory of evidence concerns the elicitation and manipulation of degrees of belief rendered by multiple sources of evidence to a common set of propositions. Information indexing and retrieval applications use a variety of quantitative means - both probabilistic and quasi-probabilistic - to represent and manipulate relevance numbers and index vectors. Recently, several proposals were made to use the Dempster Shafes model as a relevance calculus in such applications. The paper provides a critical review of these proposals, pointing at several theoretical caveats and suggesting ways to resolve them. The methodology is based on expounding a canonical indexing model whose relevance measures and combination mechanisms are shown to be isomorphic to Shafer's belief functions and to Dempster's rule, respectively. Hence, the paper has two objectives: (i) to describe and resolve some caveats in the way the Dempster Shafer theory is applied to information indexing and retrieval, and (ii) to provide an intuitive interpretation of the Dempster Shafer theory, as it unfolds in the simple context of a canonical indexing model.Information Systems Working Papers Serie

    Atıf Klasiklerinin Etkisinin ve İlgililik Sıralamalarının Pennant Diyagramları ile Analizi

    Get PDF
    Citation indexes are important authority resources for measuring the contribution of scientists and scientific publications to literature. Many studies in information retrieval are based on research aiming to develop retrieval algorithms. These studies tend to receive citations from different fields because of the interdisciplinary nature of information retrieval. Therefore, it is important to analyze the so-called “citation classics” retrospectively to find out their impact on other fields. Yet, it is not easy to do this using citation indexes, especially for relatively old papers, as traditional citation analysis tends not to reveal the full impact of a work on other studies at its time and periods that follow. In order to see the big picture it is important to study the contribution of these studies on other disciplines as well. In this study the impact of Maron and Kuhns’ citation classic on “probabilistic retrieval” published in 1960 has been visualized using pennant diagrams that were developed on the basis of relevance theory, information retrieval and bibliometrics. We hypothesized that “The interdisciplinary relations that are unobservable with traditional citation analysis can be revealed using the pennant diagrams method”. In order to test the hypothesis works that cited Maron and Kuhns’ study between the years of 1960 and 2015 have been downloaded with their references (a total of 4,176 unique works) and graphics have been prepared by the macros written in MS Excel. Of 4,176 works, 90 were selected using convenience sampling techniques to create static and interactive pennant diagrams for further analysis. Another important output of this study is the relevance rankings. As an alternative to the relevance rankings based on the similarity of references already used in citation indexes, relevance rankings have been created using the pennant diagrams that took into account not only items that cited the core (seed) paper but also citations to the items that cited the core paper. Relevance rankings based on the similarity of references and that of pennant diagrams have been compared. Findings support the hypothesis in that pennant diagrams provide information as to which papers that the core paper on probabilistic model influenced or got influenced from, directly or indirectly. Relevance ranking based on pennant diagrams revealed the impact of the core paper on information retrieval field as well as on other disciplines. Furthermore, it identified the relations between these somewhat disconnected fields, between authors, works, and journals that cannot be readily identified using traditional citation analysis. Relevance rankings using pennant diagrams seem to have been more successful than the relevance rankings based on references similarity. This study is the first such study in Turkey that uses pennant diagrams for relevance rankings. The data used in graphs and relevance rankings are available through citation indexes (the frequencies of total citations and co-citations). Thus, alternative relevance rankings based on pennant diagrams can be offered to users. Pennant diagrams can help researchers track the relevant literature more easily as well as identify how a core work influences other works in a specific field or in other fields

    Document ranking with quantum probabilities

    Get PDF
    In this thesis we investigate the use of quantum probability theory for ranking documents. Quantum probability theory is used to estimate the probability of relevance of a document given a user's query. We posit that quantum probability theory can lead to a better estimation of the probability of a document being relevant to a user's query than the common approach, i.e. the Probability Ranking Principle (PRP), which is based upon Kolmogorovian probability theory. Following our hypothesis, we formulate an analogy between the document retrieval scenario and a physical scenario, that of the double slit experiment. Through the analogy, we propose a novel ranking approach, the quantum probability ranking principle (qPRP). Key to our proposal is the presence of quantum interference. Mathematically, this is the statistical deviation between empirical observations and expected values predicted by the Kolmogorovian rule of additivity of probabilities of disjoint events in configurations such that of the double slit experiment. We propose an interpretation of quantum interference in the document ranking scenario, and examine how quantum interference can be effectively estimated for document retrieval. To validate our proposal and to gain more insights about approaches for document ranking, we (1) analyse PRP, qPRP and other ranking approaches, exposing the assumptions underlying their ranking criteria and formulating the conditions for the optimality of the two ranking principles, (2) empirically compare three ranking principles (i.e. PRP, interactive PRP, and qPRP) and two state-of-the-art ranking strategies in two retrieval scenarios, those of ad-hoc retrieval and diversity retrieval, (3) analytically contrast the ranking criteria of the examined approaches, exposing similarities and differences, (4) study the ranking behaviours of approaches alternative to PRP in terms of the kinematics they impose on relevant documents, i.e. by considering the extent and direction of the movements of relevant documents across the ranking recorded when comparing PRP against its alternatives. Our findings show that the effectiveness of the examined ranking approaches strongly depends upon the evaluation context. In the traditional evaluation context of ad-hoc retrieval, PRP is empirically shown to be better or comparable to alternative ranking approaches. However, when we turn to examine evaluation contexts that account for interdependent document relevance (i.e. when the relevance of a document is assessed also with respect to other retrieved documents, as it is the case in the diversity retrieval scenario) then the use of quantum probability theory and thus of qPRP is shown to improve retrieval and ranking effectiveness over the traditional PRP and alternative ranking strategies, such as Maximal Marginal Relevance, Portfolio theory, and Interactive PRP. This work represents a significant step forward regarding the use of quantum theory in information retrieval. It demonstrates in fact that the application of quantum theory to problems within information retrieval can lead to improvements both in modelling power and retrieval effectiveness, allowing the constructions of models that capture the complexity of information retrieval situations. Furthermore, the thesis opens up a number of lines for future research. These include (1) investigating estimations and approximations of quantum interference in qPRP, (2) exploiting complex numbers for the representation of documents and queries, and (3) applying the concepts underlying qPRP to tasks other than document ranking

    Análisis de los criterios de relevancia documental mediante consultas de información en el entorno web

    Get PDF
    La búsqueda de información no se entiende sin los motores de búsqueda web. Ante una demanda de información los buscadores web ordenan los resultados de forma que las páginas web más relevantes para la consulta aparezcan en las primeras posiciones. Esto genera un alto grado de competitividad entre las páginas web por obtener mejores asignaciones de relevancia por parte de los buscadores. Por norma general, los usuarios suelen consultar sólo los primeros resultados que devuelve un motor de búsqueda, en consecuencia ocupar estos puestos se traduce en mayor prestigio y visibilidad. Por tanto, la percepción de relevancia documental web por parte de los usuarios está intrínsecamente unida a los motores de búsqueda. En este trabajo se propone y desarrolla una metodología para determinar la relevancia documental web de forma automática, que se puede interpretar como: predicción automática de la posición que otorgaría un motor de búsqueda a un documento web entre los resultados de una consulta. La investigación se completa identificando los factores considerados en el posicionamiento web, a partir del estudio de herramientas empleadas en la optimización y promoción de páginas web. También se analiza el peso de cada uno de estos factores en los algoritmos de ordenación de los buscadores. Finalmente, en relación a las capacidades adquiridas para emular el comportamiento de los motores de búsqueda se propone un método de optimización web que estima previamente la rentabilidad del proceso. De esta forma no se invertirá en una campaña de promoción si los pronósticos de mejora del posicionamiento no se juzgan adecuados

    Un modèle de recherche d'information basé sur les graphes et les similarités structurelles pour l'amélioration du processus de recherche d'information

    Get PDF
    The main objective of IR systems is to select relevant documents, related to a user's information need, from a collection of documents. Traditional approaches for document/query comparison use surface similarity, i.e. the comparison engine uses surface attributes (indexing terms). We propose a new method which uses a special kind of similarity, namely structural similarities (similarities that use both surface attributes and relation between attributes). These similarities were inspired from cognitive studies and a general similarity measure based on node comparison in a bipartite graph. We propose an adaptation of this general method to the special context of information retrieval. Adaptation consists in taking into account the domain specificities: data type, weighted edges, normalization choice. The core problem is how documents are compared against queries. The idea we develop is that similar documents will share similar terms and similar terms will appear in similar documents. We have developed an algorithm which traduces this idea. Then we have study problem related to convergence and complexity, then we have produce some test on classical collection and compare our measure with two others that are references in our domain. The Report is structured in five chapters: First chapter deals with comparison problem, and related concept like similarities, we explain different point of view and propose an analogy between cognitive similarity model and IR model. In the second chapter we present the IR task, test collection and measures used to evaluate a relevant document list. The third chapter introduces graph definition: our model is based on graph bipartite representation, so we define graphs and criterions used to evaluate them. The fourth chapter describe how we have adopted, and adapted the general comparison method. The Fifth chapter describes how we evaluate the ordering performance of our method, and also how we have compared our method with two others.Cette thèse d'informatique s'inscrit dans le domaine de la recherche d'information (RI). Elle a pour objet la création d'un modèle de recherche utilisant les graphes pour en exploiter la structure pour la détection de similarités entre les documents textuels d'une collection donnée et une requête utilisateur en vue d'améliorer le processus de recherche d'information. Ces similarités sont dites « structurelles » et nous montrons qu'elles apportent un gain d'information bénéfique par rapport aux seules similarités directes. Le rapport de thèse est structuré en cinq chapitres. Le premier chapitre présente un état de l'art sur la comparaison et les notions connexes que sont la distance et la similarité. Le deuxième chapitre présente les concepts clés de la RI, notamment l'indexation des documents, leur comparaison, et l'évaluation des classements retournés. Le troisième chapitre est consacré à la théorie des graphes et introduit les notations et notions liées à la représentation par graphe. Le quatrième chapitre présente pas à pas la construction de notre modèle pour la RI, puis, le cinquième chapitre décrit son application dans différents cas de figure, ainsi que son évaluation sur différentes collections et sa comparaison à d'autres approches

    Nueva propuesta evolutiva para el agrupamiento de documentos en sistemas de recuperación de información

    Get PDF
    Texto en español y resumen en español e inglésFernández del Castillo Díez, José Raúl, codir.El conocimiento explicito de las organizaciones se encuentra recogido en colecciones documentales controladas, a disposición de sus usuarios. Cuándo el número de documentos es elevado se necesitan herramientas para organizar y mostrar los contenidos de la colección, que permitan y faciliten a los usuarios explorar la colección para conocer mejor su naturaleza y descubrir relaciones, patrones, tendencias, y otras características para poder así ?comprender? la información. La necesidad de usar conocimientos en los Sistemas de Recuperación de Información empujó a los investigadores a analizar los sistemas inteligentes que procuran incorporar y usar dichos conocimientos con la finalidad de optimizar el sistema. En la presente tesis, se muestra un Sistema Evolutivo (SEV), y los resultados obtenidos en la construcción de un sistema de esta naturaleza. En este trabajo hacemos una aportación en el área de Recuperación de Información (RI), proponiendo el desarrollo de un nuevo sistema que, utilizando técnicas evolutivas, implemente un sistema de aprendizaje del tipo no supervisado, para agrupar los documentos de un Sistema de Recuperación de Información (SRI); en donde los grupos y el número de ellos son desconocidos a priori por el sistema. El criterio para realizar el agrupamiento de los documentos estará basado por la similitud y distancia de los documentos, formando así de esta manera grupos ó clustering de documentos afines, permitiendo así agrupar los documentos de un SRI de una manera aceptable, presentándose como una alternativa válida a los métodos de agrupamiento tradicionales, pudiéndose contrastar sus resultados experimentalmente con algunos de los métodos clásicos. Los lexemas más relevantes de cada documento, obtenidos mediante la aplicación de técnicas de RI, permiten enriquecer la información asociada a los documentos de la colección y utilizarlos como valores de metadatos para el algoritmo evolutivo. De esta forma, el sistema funciona mediante una metodología de procesamiento de documentos que selecciona los lexemas de los documentos mediante criterios de recuperación de información. Los resultados obtenidos demuestran la viabilidad de la construcción de una aplicación a gran escala de estas características, para integrarla en un sistema de gestión de conocimiento que tenga que manejar grandes colecciones documentales controladas
    corecore