Search CORE

41,579 research outputs found

Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University

Author: Zhou Zheng
Publication venue: 'Massey University'
Publication date: 01/01/2006
Field of study

1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]

Massey Research Online

Partition clustering for GIS map data protection

Author: Abubahia Ahmed
Cocea Mihaela
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Portsmouth University Research Portal (Pure)

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

Author: Pavllo Dario
Piccardi Tiziano
West Robert
Publication venue
Publication date: 07/04/2018
Field of study

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Optimizing momentum resolution with a new fitting method for silicon-strip detectors

Author: Landi Giovanni E.
Landi Gregorio
Publication venue: 'MDPI AG'
Publication date: 13/06/2018
Field of study

A new fitting method is explored for momentum reconstruction of tracks in a constant magnetic field for a silicon-strip tracker. Substantial increases of momentum resolution respect to standard fit is obtained. The key point is the use of a realistic probability distribution for each hit (heteroscedasticity). Two different methods are used for the fits, the first method introduces an effective variance for each hit, the second method implements the maximum likelihood search. The tracker model is similar to the PAMELA tracker. Each side, of the two sided of the PAMELA detectors, is simulated as momentum reconstruction device. One of the two is similar to silicon micro-strip detectors of large use in running experiments. Two different position reconstructions are used for the standard fits, the

\eta

-algorithm (the best one) and the two-strip center of gravity. The gain obtained in momentum resolution is measured as the virtual magnetic field and the virtual signal-to-noise ratio required by the two standard fits to reach an overlap with the best of two new methods. For the best side, the virtual magnetic field must be increased 1.5 times respect to the real field to reach the overlap and 1.8 for the other. For the high noise side, the increases must be 1.8 and 2.0. The signal-to-noise ratio has similar increases but only for the

\eta

-algorithm. The signal-to-noise ratio has no effect on the fits with the center of gravity. Very important results are obtained if the number N of detecting layers is increased, our methods provide a momentum resolution growing linearly with N, much higher than standard fits that grow as the

\sqrt{N}

.Comment: This article supersedes arXiv:1606.03051, 22 pages and 10 figure

arXiv.org e-Print Archive

Directory of Open Access Journals