41,579 research outputs found

    Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University

    Get PDF
    1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]

    Partition clustering for GIS map data protection

    Get PDF

    Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

    Full text link
    We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 201

    Optimizing momentum resolution with a new fitting method for silicon-strip detectors

    Full text link
    A new fitting method is explored for momentum reconstruction of tracks in a constant magnetic field for a silicon-strip tracker. Substantial increases of momentum resolution respect to standard fit is obtained. The key point is the use of a realistic probability distribution for each hit (heteroscedasticity). Two different methods are used for the fits, the first method introduces an effective variance for each hit, the second method implements the maximum likelihood search. The tracker model is similar to the PAMELA tracker. Each side, of the two sided of the PAMELA detectors, is simulated as momentum reconstruction device. One of the two is similar to silicon micro-strip detectors of large use in running experiments. Two different position reconstructions are used for the standard fits, the η\eta-algorithm (the best one) and the two-strip center of gravity. The gain obtained in momentum resolution is measured as the virtual magnetic field and the virtual signal-to-noise ratio required by the two standard fits to reach an overlap with the best of two new methods. For the best side, the virtual magnetic field must be increased 1.5 times respect to the real field to reach the overlap and 1.8 for the other. For the high noise side, the increases must be 1.8 and 2.0. The signal-to-noise ratio has similar increases but only for the η\eta-algorithm. The signal-to-noise ratio has no effect on the fits with the center of gravity. Very important results are obtained if the number N of detecting layers is increased, our methods provide a momentum resolution growing linearly with N, much higher than standard fits that grow as the N\sqrt{N}.Comment: This article supersedes arXiv:1606.03051, 22 pages and 10 figure
    • …
    corecore