26,816 research outputs found

    Non-Compositional Term Dependence for Information Retrieval

    Full text link
    Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they co-occur, the more semantically dependent they are. This assumption is not always correct: the frequency of co-occurring terms can be separate from the strength of their semantic dependence. E.g. "red tape" might be overall less frequent than "tape measure" in some corpus, but this does not mean that "red"+"tape" are less dependent than "tape"+"measure". This is especially the case for non-compositional phrases, i.e. phrases whose meaning cannot be composed from the individual meanings of their terms (such as the phrase "red tape" meaning bureaucracy). Motivated by this lack of distinction between the frequency and strength of term dependence in IR, we present a principled approach for handling term dependence in queries, using both lexical frequency and semantic evidence. We focus on non-compositional phrases, extending a recent unsupervised model for their detection [21] to IR. Our approach, integrated into ranking using Markov Random Fields [31], yields effectiveness gains over competitive TREC baselines, showing that there is still room for improvement in the very well-studied area of term dependence in IR

    Interactive retrieval of video using pre-computed shot-shot similarities

    Get PDF
    A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision

    Gower as Data: Exploring the Application of Machine Learning to Gower’s Middle English Corpus

    Get PDF
    Distant reading, a digital humanities method in wide use, involves processing and analyzing a large amount of text through computer programs. In treating texts as data, these methods can highlight trends in diction, themes, and linguistic patterns that individual readers may miss or critical traditions may obscure. Though several scholars have undertaken projects using topic models and text mining on Middle English texts, the nonstandard orthography of Middle English makes this process more challenging than for our counterparts in later literature. This collaborative project uses Gower’s Confessio Amantis as a small, fixed corpus for analysis. We employ natural language processing to reexamine the Confessio’s themes, adding data analysis to the more traditional close reading strategies of Gower scholarship. We use Gower’s work as a case study both to help reduce the potential variants across textual versions and to more deeply investigate the corpus than distant reading normally allows. Here, we share our initial findings as well as our methodologies. We hope to share resources that will allow other scholars to engage in similar types of projects

    Semantic web technology to support learning about the semantic web

    Get PDF
    This paper describes ASPL, an Advanced Semantic Platform for Learning, designed using the Magpie framework with an aim to support students learning about the Semantic Web research area. We describe the evolution of ASPL and illustrate how we used the results from a formal evaluation of the initial system to re-design the user functionalities. The second version of ASPL semantically interprets the results provided by a non-semantic web mining tool and uses them to support various forms of semantics-assisted exploration, based on pedagogical strategies such as performing later reasoning steps and problem space filtering

    Geographical information retrieval with ontologies of place

    Get PDF
    Geographical context is required of many information retrieval tasks in which the target of the search may be documents, images or records which are referenced to geographical space only by means of place names. Often there may be an imprecise match between the query name and the names associated with candidate sources of information. There is a need therefore for geographical information retrieval facilities that can rank the relevance of candidate information with respect to geographical closeness of place as well as semantic closeness with respect to the information of interest. Here we present an ontology of place that combines limited coordinate data with semantic and qualitative spatial relationships between places. This parsimonious model of geographical place supports maintenance of knowledge of place names that relate to extensive regions of the Earth at multiple levels of granularity. The ontology has been implemented with a semantic modelling system linking non-spatial conceptual hierarchies with the place ontology. An hierarchical spatial distance measure is combined with Euclidean distance between place centroids to create a hybrid spatial distance measure. This is integrated with thematic distance, based on classification semantics, to create an integrated semantic closeness measure that can be used for a relevance ranking of retrieved objects

    Derived classes as a basis for views in UML/OCL data models

    Get PDF
    UML is the de facto standard language for analysis and design in object-oriented frameworks. Information systems, and in particular information systems based on databases and their applications, rely heavily on sound principles of analysis and design. Many present-day database applications employ object-oriented principles in the phases of analysis and design due to the advantages of expressiveness and clarity of such languages as UML. Database specifications often involve specifications of constraints, and the Object Constraint Language (OCL) - as part of UML - can aid in the unambiguous modelling of database constraints. One of the central notions in database modelling and in constraint specifications is the notion of a database view. A database view closely corresponds to the notion of derived class in UML. This paper will show how the notion of a derived class in UML can be given a precise semantics in terms of OCL. We will then demonstrate that the notion of a relational database view can be correctly expressed as a derived class in UML/OCL. A central part of our investigation concerns the generality of our manner of representing relational views in OCL. An important problem that we address in this respect is the representation of product spaces and relational joins. Joins are often essential in view definitions, and we shall demonstrate how we can express Cartesian products and joins within the current framework of UML/OCL language by employing the notions of derived class. As a consequence, OCL will be shown to be equipped with the full expressive power of the relational algebra, offering support for the claim that OCL can be useful as a general query language within the framework of the UML/OCL data model.
    • 

    corecore