15,555 research outputs found

    The relationship between IR and multimedia databases

    Get PDF
    Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud \ud Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud \ud Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud \ud First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud \ud Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud \ud Third, we add the functionality to process the users' relevance feedback.\ud \ud We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud \ud We conclude with an outline for implementation of miRRor on top of the Monet extensible database system

    Automatic case acquisition from texts for process-oriented case-based reasoning

    Get PDF
    This paper introduces a method for the automatic acquisition of a rich case representation from free text for process-oriented case-based reasoning. Case engineering is among the most complicated and costly tasks in implementing a case-based reasoning system. This is especially so for process-oriented case-based reasoning, where more expressive case representations are generally used and, in our opinion, actually required for satisfactory case adaptation. In this context, the ability to acquire cases automatically from procedural texts is a major step forward in order to reason on processes. We therefore detail a methodology that makes case acquisition from processes described as free text possible, with special attention given to assembly instruction texts. This methodology extends the techniques we used to extract actions from cooking recipes. We argue that techniques taken from natural language processing are required for this task, and that they give satisfactory results. An evaluation based on our implemented prototype extracting workflows from recipe texts is provided.Comment: Sous presse, publication pr\'evue en 201

    K-Space at TRECVid 2007

    Get PDF
    In this paper we describe K-Space participation in TRECVid 2007. K-Space participated in two tasks, high-level feature extraction and interactive search. We present our approaches for each of these activities and provide a brief analysis of our results. Our high-level feature submission utilized multi-modal low-level features which included visual, audio and temporal elements. Specific concept detectors (such as Face detectors) developed by K-Space partners were also used. We experimented with different machine learning approaches including logistic regression and support vector machines (SVM). Finally we also experimented with both early and late fusion for feature combination. This year we also participated in interactive search, submitting 6 runs. We developed two interfaces which both utilized the same retrieval functionality. Our objective was to measure the effect of context, which was supported to different degrees in each interface, on user performance. The first of the two systems was a ‘shot’ based interface, where the results from a query were presented as a ranked list of shots. The second interface was ‘broadcast’ based, where results were presented as a ranked list of broadcasts. Both systems made use of the outputs of our high-level feature submission as well as low-level visual features

    Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context

    Full text link
    Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.Comment: 10 pages, 4 figure

    Crosslingual Document Embedding as Reduced-Rank Ridge Regression

    Get PDF
    There has recently been much interest in extending vector-based word representations to multiple languages, such that words can be compared across languages. In this paper, we shift the focus from words to documents and introduce a method for embedding documents written in any language into a single, language-independent vector space. For training, our approach leverages a multilingual corpus where the same concept is covered in multiple languages (but not necessarily via exact translations), such as Wikipedia. Our method, Cr5 (Crosslingual reduced-rank ridge regression), starts by training a ridge-regression-based classifier that uses language-specific bag-of-word features in order to predict the concept that a given document is about. We show that, when constraining the learned weight matrix to be of low rank, it can be factored to obtain the desired mappings from language-specific bags-of-words to language-independent embeddings. As opposed to most prior methods, which use pretrained monolingual word vectors, postprocess them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since our algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that our method achieves state-of-the-art performance on a crosslingual document retrieval task. Finally, although not trained for embedding sentences and words, it also achieves competitive performance on crosslingual sentence and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19

    Adaptive content mapping for internet navigation

    Get PDF
    The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database

    OntoMathPROOntoMath^{PRO} Ontology: A Linked Data Hub for Mathematics

    Full text link
    In this paper, we present an ontology of mathematical knowledge concepts that covers a wide range of the fields of mathematics and introduces a balanced representation between comprehensive and sensible models. We demonstrate the applications of this representation in information extraction, semantic search, and education. We argue that the ontology can be a core of future integration of math-aware data sets in the Web of Data and, therefore, provide mappings onto relevant datasets, such as DBpedia and ScienceWISE.Comment: 15 pages, 6 images, 1 table, Knowledge Engineering and the Semantic Web - 5th International Conferenc
    corecore