10,427 research outputs found

    Report on the first Twente Data Management Workshop on XML Databases and Information Retrieval

    Get PDF
    The Database Group of the University of Twente initiated a new series of workshops called Twente Data Management workshops (TDM), starting with one on XML Databases and Information Retrieval which took place on 21 June 2004 at the University of Twente. We have set ourselves two goals for the workshop series: i) To provide a forum to share original ideas as well as research results on data management problems; ii) To bring together researchers from the database community and researchers from related research fields

    PFTijah: text search in an XML database system

    Get PDF
    This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery

    DCU and ISI@INEX 2010: Ad-hoc and data-centric tracks

    Get PDF
    We describe the participation of Dublin City University (DCU)and the Indian Statistical Institute (ISI) in INEX 2010. The main contributions of this paper are: i) a simplified version of Hierarchical Language Model (HLM) which involves scoring XML elements with a combined probability of generating the given query from itself and the top level article node, is shown to outperform the baselines of Language Model (LM) and Vector Space Model (VSM) scoring of XML elements; ii) the Expectation Maximization (EM) feedback in LM is shown to be the most effective on the domain specic collection of IMDB; iii) automated removal of sentences indicating aspects of irrelevance from the narratives of INEX ad-hoc topics is shown to improve retrieval eectiveness

    Use of Wikipedia Categories in Entity Ranking

    Get PDF
    Wikipedia is a useful source of knowledge that has many applications in language processing and knowledge representation. The Wikipedia category graph can be compared with the class hierarchy in an ontology; it has some characteristics in common as well as some differences. In this paper, we present our approach for answering entity ranking queries from the Wikipedia. In particular, we explore how to make use of Wikipedia categories to improve entity ranking effectiveness. Our experiments show that using categories of example entities works significantly better than using loosely defined target categories

    Capturing k-ary Existential Second Order Logic with k-ary Inclusion-Exclusion Logic

    Get PDF
    In this paper we analyze k-ary inclusion-exclusion logic, INEX[k], which is obtained by extending first order logic with k-ary inclusion and exclusion atoms. We show that every formula of INEX[k] can be expressed with a formula of k-ary existential second order logic, ESO[k]. Conversely, every formula of ESO[k] with at most k-ary free relation variables can be expressed with a formula of INEX[k]. From this it follows that, on the level of sentences, INEX[k] captures the expressive power of ESO[k]. We also introduce several useful operators that can be expressed in INEX[k]. In particular, we define inclusion and exclusion quantifiers and so-called term value preserving disjunction which is essential for the proofs of the main results in this paper. Furthermore, we present a novel method of relativization for team semantics and analyze the duality of inclusion and exclusion atoms.Comment: Extended version of a paper published in Annals of Pure and Applied Logic 169 (3), 177-21

    Evaluation campaigns and TRECVid

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus, automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table
    corecore