41,579 research outputs found
Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University
1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]
Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping
We propose Quootstrap, a method for extracting quotations, as well as the
names of the speakers who uttered them, from large news corpora. Whereas prior
work has addressed this problem primarily with supervised machine learning, our
approach follows a fully unsupervised bootstrapping paradigm. It leverages the
redundancy present in large news corpora, more precisely, the fact that the
same quotation often appears across multiple news articles in slightly
different contexts. Starting from a few seed patterns, such as ["Q", said S.],
our method extracts a set of quotation-speaker pairs (Q, S), which are in turn
used for discovering new patterns expressing the same quotations; the process
is then repeated with the larger pattern set. Our algorithm is highly scalable,
which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus.
Validating our results against a crowdsourced ground truth, we obtain 90%
precision at 40% recall using a single seed pattern, with significantly higher
recall values for more frequently reported (and thus likely more interesting)
quotations. Finally, we showcase the usefulness of our algorithm's output for
computational social science by analyzing the sentiment expressed in our
extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media
(ICWSM), 201
Optimizing momentum resolution with a new fitting method for silicon-strip detectors
A new fitting method is explored for momentum reconstruction of tracks in a
constant magnetic field for a silicon-strip tracker. Substantial increases of
momentum resolution respect to standard fit is obtained. The key point is the
use of a realistic probability distribution for each hit (heteroscedasticity).
Two different methods are used for the fits, the first method introduces an
effective variance for each hit, the second method implements the maximum
likelihood search. The tracker model is similar to the PAMELA tracker. Each
side, of the two sided of the PAMELA detectors, is simulated as momentum
reconstruction device. One of the two is similar to silicon micro-strip
detectors of large use in running experiments. Two different position
reconstructions are used for the standard fits, the -algorithm (the best
one) and the two-strip center of gravity. The gain obtained in momentum
resolution is measured as the virtual magnetic field and the virtual
signal-to-noise ratio required by the two standard fits to reach an overlap
with the best of two new methods. For the best side, the virtual magnetic field
must be increased 1.5 times respect to the real field to reach the overlap and
1.8 for the other. For the high noise side, the increases must be 1.8 and 2.0.
The signal-to-noise ratio has similar increases but only for the
-algorithm. The signal-to-noise ratio has no effect on the fits with the
center of gravity. Very important results are obtained if the number N of
detecting layers is increased, our methods provide a momentum resolution
growing linearly with N, much higher than standard fits that grow as the
.Comment: This article supersedes arXiv:1606.03051, 22 pages and 10 figure
- …