508 research outputs found

    Evaluation of MIRACLE approach results for CLEF 2003

    Get PDF
    This paper describes MIRACLE (Multilingual Information RetrievAl for the CLEf campaign) approach and results for the mono, bi and multilingual Cross Language Evaluation Forum tasks. The approach is based on the combination of linguistic and statistic techniques to perform indexing and retrieval tasks

    Mixed Graph of Terms: Beyond the bags of words representation of a text

    Get PDF
    The main purpose of text mining techniques is to identify common patterns through the observation of vectors of features and then to use such patterns to make predictions. Vectors of features are usually made up of weighted words, as well as those used in the text retrieval field, which are obtained thanks to the assumption that considers a document as a "bag of words". However, in this paper we demonstrate that, to obtain more accuracy in the analysis and revelation of common patterns, we could employ (observe) more complex features than simple weighted words. The proposed vector of features considers a hierarchical structure, named a mixed Graph of Terms, composed of a directed and an undirected sub-graph of words, that can be automatically constructed from a small set of documents through the probabilistic Topic Model. The graph has demonstrated its efficiency in a classic "ad-hoc" text retrieval problem. Here we consider expanding the initial query with this new structured vector of features

    Unsupervised Distillation of Syntactic Information from Contextualized Word Representations

    Full text link
    Contextualized word representations, such as ELMo and BERT, were shown to perform well on various semantic and syntactic tasks. In this work, we tackle the task of unsupervised disentanglement between semantics and structure in neural language representations: we aim to learn a transformation of the contextualized vectors, that discards the lexical semantics, but keeps the structural information. To this end, we automatically generate groups of sentences which are structurally similar but semantically different, and use metric-learning approach to learn a transformation that emphasizes the structural component that is encoded in the vectors. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics. Finally, we demonstrate the utility of our distilled representations by showing that they outperform the original contextualized representations in a few-shot parsing setting.Comment: Accepted in BlackboxNLP@EMNLP202

    Sense resolution properties of logical imaging

    Get PDF
    The evaluation of an implication by Imaging is a logical technique developed in the framework of modal logic. Its interpretation in the context of a “possible worlds” semantics is very appealing for IR. In 1994, Crestani and Van Rijsbergen proposed an interpretation of Imaging in the context of IR based on the assumption that “a term is a possibleworld”. This approach enables the exploitation of term– term relationshipswhich are estimated using an information theoretic measure. Recent analysis of the probability kinematics of Logical Imaging in IR have suggested that this technique has some interesting sense resolution properties. In this paper we will present this new line of research and we will relate it to more classical research into word senses

    Word Embedding Driven Concept Detection in Philosophical Corpora

    Get PDF
    During the course of research, scholars often explore large textual databases for segments of text relevant to their conceptual analyses. This study proposes, develops and evaluates two algorithms for automated concept detection in theoretical corpora: ACS and WMD retrieval. Both novel algorithms are compared to key word retrieval, using a test set from the Digital Ricoeur corpus tagged by scholarly experts. WMD retrieval outperforms key word search on the concept detection task. Thus, WMD retrieval is a promising tool for concept detection and information retrieval systems focused on theoretical corpora

    MIRACLE Approaches to Multilingual Information Retrieval: A Baseline for Future Research

    Get PDF
    This paper describes the first set of experiments defined by the MIRACLE (Multilingual Information RetrievAl for the CLEf campaign) research group for some of the cross language tasks defined by CLEF. These experiments combine different basic techniques, linguistic-oriented and statistic-oriented, to be applied to the indexing and retrieval processes

    Retrieval effectiveness for OCR text using thesauri

    Full text link
    This thesis reports on the effects of an automatic query expansion with a subject specific thesaurus on retrieval effectiveness for document collection consisting of OCR text; The investigation encompasses several experiments with a modern retrieval engine based on the probabilistic model. Each experiment is performed on two document collections. The first version of the collection consists of raw OCR output. The second collection consists of the ground truth (retyped from hard copy) version of the same collection; It is shown that the usage of the thesaurus as a source for query expansion can significantly improve recall for Boolean queries, for both OCR and manually corrected document collections. In the case of weighted queries, the expansion has no effect on the average precision and recall. Nevertheless, some individual queries benefit from query expansion

    Semantic multimedia modelling & interpretation for search & retrieval

    Get PDF
    With the axiomatic revolutionary in the multimedia equip devices, culminated in the proverbial proliferation of the image and video data. Owing to this omnipresence and progression, these data become the part of our daily life. This devastating data production rate accompanies with a predicament of surpassing our potentials for acquiring this data. Perhaps one of the utmost prevailing problems of this digital era is an information plethora. Until now, progressions in image and video retrieval research reached restrained success owed to its interpretation of an image and video in terms of primitive features. Humans generally access multimedia assets in terms of semantic concepts. The retrieval of digital images and videos is impeded by the semantic gap. The semantic gap is the discrepancy between a user’s high-level interpretation of an image and the information that can be extracted from an image’s physical properties. Content- based image and video retrieval systems are explicitly assailable to the semantic gap due to their dependence on low-level visual features for describing image and content. The semantic gap can be narrowed by including high-level features. High-level descriptions of images and videos are more proficient of apprehending the semantic meaning of image and video content. It is generally understood that the problem of image and video retrieval is still far from being solved. This thesis proposes an approach for intelligent multimedia semantic extraction for search and retrieval. This thesis intends to bridge the gap between the visual features and semantics. This thesis proposes a Semantic query Interpreter for the images and the videos. The proposed Semantic Query Interpreter will select the pertinent terms from the user query and analyse it lexically and semantically. The proposed SQI reduces the semantic as well as the vocabulary gap between the users and the machine. This thesis also explored a novel ranking strategy for image search and retrieval. SemRank is the novel system that will incorporate the Semantic Intensity (SI) in exploring the semantic relevancy between the user query and the available data. The novel Semantic Intensity captures the concept dominancy factor of an image. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other. The SemRank will rank the retrieved images on the basis of Semantic Intensity. The investigations are made on the LabelMe image and LabelMe video dataset. Experiments show that the proposed approach is successful in bridging the semantic gap. The experiments reveal that our proposed system outperforms the traditional image retrieval systems
    • …
    corecore