49,905 research outputs found
Efficient Discovery of Ontology Functional Dependencies
Poor data quality has become a pervasive issue due to the increasing
complexity and size of modern datasets. Constraint based data cleaning
techniques rely on integrity constraints as a benchmark to identify and correct
errors. Data values that do not satisfy the given set of constraints are
flagged as dirty, and data updates are made to re-align the data and the
constraints. However, many errors often require user input to resolve due to
domain expertise defining specific terminology and relationships. For example,
in pharmaceuticals, 'Advil' \emph{is-a} brand name for 'ibuprofen' that can be
captured in a pharmaceutical ontology. While functional dependencies (FDs) have
traditionally been used in existing data cleaning solutions to model syntactic
equivalence, they are not able to model broader relationships (e.g., is-a)
defined by an ontology. In this paper, we take a first step towards extending
the set of data quality constraints used in data cleaning by defining and
discovering \emph{Ontology Functional Dependencies} (OFDs). We lay out
theoretical and practical foundations for OFDs, including a set of sound and
complete axioms, and a linear inference procedure. We then develop effective
algorithms for discovering OFDs, and a set of optimizations that efficiently
prune the search space. Our experimental evaluation using real data show the
scalability and accuracy of our algorithms.Comment: 12 page
Mining local staircase patterns in noisy data
Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing biclustering algorithms. Our formalization relies on a scoring function based on the Minimum Description Length principle. Furthermore, we propose a first algorithm for identifying staircase biclusters, based on a combination of local search and constraint programming. Experiments show that the approach is promising
A Review of the Mass Measurement Techniques proposed for the Large Hadron Collider
We review the methods which have been proposed for measuring masses of new
particles at the Large Hadron Collider paying particular attention to the
kinematical techniques suitable for extracting mass information when invisible
particles are expected.Comment: 72 pages - in form to be published in JPhys
FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage
In recent years, fuzz testing has proven itself to be one of the most
effective techniques for finding correctness bugs and security vulnerabilities
in practice. One particular fuzz testing tool, American Fuzzy Lop or AFL, has
become popular thanks to its ease-of-use and bug-finding power. However, AFL
remains limited in the depth of program coverage it achieves, in particular
because it does not consider which parts of program inputs should not be
mutated in order to maintain deep program coverage. We propose an approach,
FairFuzz, that helps alleviate this limitation in two key steps. First,
FairFuzz automatically prioritizes inputs exercising rare parts of the program
under test. Second, it automatically adjusts the mutation of inputs so that the
mutated inputs are more likely to exercise these same rare parts of the
program. We conduct evaluation on real-world programs against state-of-the-art
versions of AFL, thoroughly repeating experiments to get good measures of
variability. We find that on certain benchmarks FairFuzz shows significant
coverage increases after 24 hours compared to state-of-the-art versions of AFL,
while on others it achieves high program coverage at a significantly faster
rate
Discovering the Higgs with Low Mass Muon Pairs
Many models of electroweak symmetry breaking have an additional light
pseudoscalar. If the Higgs boson can decay to a new pseudoscalar, LEP searches
for the Higgs can be significantly altered and the Higgs can be as light as 86
GeV. Discovering the Higgs boson in these models is challenging when the
pseudoscalar is lighter than 10 GeV because it decays dominantly into tau
leptons. In this paper, we discuss discovering the Higgs in a subdominant decay
mode where one of the pseudoscalars decays to a pair of muons. This search
allows for potential discovery of a cascade-decaying Higgs boson with the
complete Tevatron data set or early data at the LHC.Comment: 10 pages, 7 figure
Psychological Climate and Work Attitudes: The Importance of Telling the Right Story
In this field study, the authors explore how choosing one context over another influences both research results and implications. Using both quantitative and qualitative data, the authors examine context from both an organizational and a business-unit perspective by studying relationships between five psychological climate variables and outcomes of job satisfaction, affective commitment, and intent to leave. Results show different contextual influences between the organization and two business units, suggesting that different bundles of psychological climate variables yield similar outcomes depending on the context studied. These results bolster the contention that researchers need to identify the right context in field research
- …