42,899 research outputs found

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Structural fingerprints of transcription factor binding site regions

    Get PDF
    Fourier transforms are a powerful tool in the prediction of DNA sequence properties, such as the presence/absence of codons. We have previously compiled a database of the structural properties of all 32,896 unique DNA octamers. In this work we apply Fourier techniques to the analysis of the structural properties of human chromosomes 21 and 22 and also to three sets of transcription factor binding sites within these chromosomes. We find that, for a given structural property, the structural property power spectra of chromosomes 21 and 22 are strikingly similar. We find common peaks in their power spectra for both Sp1 and p53 transcription factor binding sites. We use the power spectra as a structural fingerprint and perform similarity searching in order to find transcription factor binding site regions. This approach provides a new strategy for searching the genome data for information. Although it is difficult to understand the relationship between specific functional properties and the set of structural parameters in our database, our structural fingerprints nevertheless provide a useful tool for searching for function information in sequence data. The power spectrum fingerprints provide a simple, fast method for comparing a set of functional sequences, in this case transcription factor binding site regions, with the sequences of whole chromosomes. On its own, the power spectrum fingerprint does not find all transcription factor binding sites in a chromosome, but the results presented here show that in combination with other approaches, this technique will improve the chances of identifying functional sequences hidden in genomic data

    Optimal search strategies for identifying sound clinical prediction studies in EMBASE

    Get PDF
    BACKGROUND: Clinical prediction guides assist clinicians by pointing to specific elements of the patient's clinical presentation that should be considered when forming a diagnosis, prognosis or judgment regarding treatment outcome. The numbers of validated clinical prediction guides are growing in the medical literature, but their retrieval from large biomedical databases remains problematic and this presents a barrier to their uptake in medical practice. We undertook the systematic development of search strategies ("hedges") for retrieval of empirically tested clinical prediction guides from EMBASE. METHODS: An analytic survey was conducted, testing the retrieval performance of search strategies run in EMBASE against the gold standard of hand searching, using a sample of all 27,769 articles identified in 55 journals for the 2000 publishing year. All articles were categorized as original studies, review articles, general papers, or case reports. The original and review articles were then tagged as 'pass' or 'fail' for methodologic rigor in the areas of clinical prediction guides and other clinical topics. Search terms that depicted clinical prediction guides were selected from a pool of index terms and text words gathered in house and through request to clinicians, librarians and professional searchers. A total of 36,232 search strategies composed of single and multiple term phrases were trialed for retrieval of clinical prediction studies. The sensitivity, specificity, precision, and accuracy of search strategies were calculated to identify which were the best. RESULTS: 163 clinical prediction studies were identified, of which 69 (42.3%) passed criteria for scientific merit. A 3-term strategy optimized sensitivity at 91.3% and specificity at 90.2%. Higher sensitivity (97.1%) was reached with a different 3-term strategy, but with a 16% drop in specificity. The best measure of specificity (98.8%) was found in a 2-term strategy, but with a considerable fall in sensitivity to 60.9%. All single term strategies performed less well than 2- and 3-term strategies. CONCLUSION: The retrieval of sound clinical prediction studies from EMBASE is supported by several search strategies

    An Integrated Content and Metadata based Retrieval System for Art

    No full text
    In this paper we describe aspects of the Artiste project to develop a distributed content and metadata based analysis, retrieval and navigation system for a number of major European Museums. In particular, after a brief overview of the complete system, we describe the design and evaluation of some of the image analysis algorithms developed to meet the specific requirements of the users from the museums. These include a method for retrievals based on sub images, retrievals based on very low quality images and retrieval using craquelure type

    Data Mining in Electronic Commerce

    Full text link
    Modern business is rushing toward e-commerce. If the transition is done properly, it enables better management, new services, lower transaction costs and better customer relations. Success depends on skilled information technologists, among whom are statisticians. This paper focuses on some of the contributions that statisticians are making to help change the business world, especially through the development and application of data mining methods. This is a very large area, and the topics we cover are chosen to avoid overlap with other papers in this special issue, as well as to respect the limitations of our expertise. Inevitably, electronic commerce has raised and is raising fresh research problems in a very wide range of statistical areas, and we try to emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Sensitive and Scalable Online Evaluation with Theoretical Guarantees

    Full text link
    Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. However, existing multileaved comparison methods that provide reliable outcomes do so by degrading the user experience during evaluation. Conversely, current multileaved comparison methods that maintain the user experience cannot guarantee correctness. Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM), that performs comparisons based on document-pair preferences, and prove that it is considerate and has fidelity. We show empirically that, compared to previous multileaved comparison methods, PPM is more sensitive to user preferences and scalable with the number of rankers being compared.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information and Knowledge Managemen
    • …
    corecore