481 research outputs found
Qualitative Effects of Knowledge Rules in Probabilistic Data Integration
One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used
Quality Measures in Uncertain Data Management
Many applications deal with data that is uncertain. Some examples are applications dealing with sensor information, data integration applications and healthcare applications. Instead of these applications having to deal with the uncertainty, it should be the responsibility of the DBMS to manage all data including uncertain data. Several projects do research on this topic. In this paper, we introduce four measures to be used to assess and compare important characteristics of data and systems
User Feedback in Probabilistic XML
Data integration is a challenging problem in many application areas. Approaches mostly attempt to resolve semantic uncertainty and conflicts between information sources as part of the data integration process. In some application areas, this is impractical or even prohibitive, for example, in an ambient environment where devices on an ad hoc basis have to exchange information autonomously. We have proposed a probabilistic XML approach that allows data integration without user involvement by storing semantic uncertainty and conflicts in the integrated XML data. As a\ud
consequence, the integrated information source represents\ud
all possible appearances of objects in the real world, the\ud
so-called possible worlds.\ud
\ud
In this paper, we show how user feedback on query results\ud
can resolve semantic uncertainty and conflicts in the\ud
integrated data. Hence, user involvement is effectively postponed to query time, when a user is already interacting actively with the system. The technique relates positive and\ud
negative statements on query answers to the possible worlds\ud
of the information source thereby either reinforcing, penalizing, or eliminating possible worlds. We show that after repeated user feedback, an integrated information source better resembles the real world and may converge towards a non-probabilistic information source
How Noisy Data Affects Geometric Semantic Genetic Programming
Noise is a consequence of acquiring and pre-processing data from the
environment, and shows fluctuations from different sources---e.g., from
sensors, signal processing technology or even human error. As a machine
learning technique, Genetic Programming (GP) is not immune to this problem,
which the field has frequently addressed. Recently, Geometric Semantic Genetic
Programming (GSGP), a semantic-aware branch of GP, has shown robustness and
high generalization capability. Researchers believe these characteristics may
be associated with a lower sensibility to noisy data. However, there is no
systematic study on this matter. This paper performs a deep analysis of the
GSGP performance over the presence of noise. Using 15 synthetic datasets where
noise can be controlled, we added different ratios of noise to the data and
compared the results obtained with those of a canonical GP. The results show
that, as we increase the percentage of noisy instances, the generalization
performance degradation is more pronounced in GSGP than GP. However, in
general, GSGP is more robust to noise than GP in the presence of up to 10% of
noise, and presents no statistical difference for values higher than that in
the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation
Conference (GECCO 2017), Berlin, German
Effects of flywheel training on strength-related variables in female populations. A systematic review
This study aimed to evaluate the effect of flywheel training on female populations, report practical recommendations for practitioners based on the currently available evidence, underline the limitations of current literature, and establish future research directions. Studies were searched through the electronic databases (PubMed, SPORTDiscus, and Web of Science) following the preferred reporting items for systematic reviews and meta-analysis statement guidelines. The methodological quality of the seven studies included in this review ranged from 10 to 19 points (good to excellent), with an average score of 14-points (good). These studies were carried out between 2004 and 2019 and comprised a total of 100 female participants. The training duration ranged from 5 weeks to 24 weeks, with volume ranging from 1 to 4 sets and 7 to 12 repetitions, and frequency ranged from 1 to 3 times a week. The contemporary literature suggests that flywheel training is a safe and time-effective strategy to enhance physical outcomes with young and elderly females. With this information, practitioners may be inclined to prescribe flywheel training as an effective countermeasure for injuries or falls and as potent stimulus for physical enhancement
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS
Trio is a new kind of database system that supports data, uncertainty, and lineage in a fully integrated manner. The first Trio prototype, dubbed Trio-One, is built on top of a conventional DBMS using data and query translation techniques together with a small number of stored procedures. This paper describes Trio-One's translation scheme and system architecture, showing how it efficiently and easily supports the Trio data model and query language
Efficient Equilibria in Polymatrix Coordination Games
We consider polymatrix coordination games with individual preferences where
every player corresponds to a node in a graph who plays with each neighbor a
separate bimatrix game with non-negative symmetric payoffs. In this paper, we
study -approximate -equilibria of these games, i.e., outcomes where
no group of at most players can deviate such that each member increases his
payoff by at least a factor . We prove that for these
games have the finite coalitional improvement property (and thus
-approximate -equilibria exist), while for this
property does not hold. Further, we derive an almost tight bound of
on the price of anarchy, where is the number of
players; in particular, it scales from unbounded for pure Nash equilibria ( to for strong equilibria (). We also settle the complexity
of several problems related to the verification and existence of these
equilibria. Finally, we investigate natural means to reduce the inefficiency of
Nash equilibria. Most promisingly, we show that by fixing the strategies of
players the price of anarchy can be reduced to (and this bound is tight)
Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces
The \emph{Chow parameters} of a Boolean function
are its degree-0 and degree-1 Fourier coefficients. It has been known
since 1961 (Chow, Tannenbaum) that the (exact values of the) Chow parameters of
any linear threshold function uniquely specify within the space of all
Boolean functions, but until recently (O'Donnell and Servedio) nothing was
known about efficient algorithms for \emph{reconstructing} (exactly or
approximately) from exact or approximate values of its Chow parameters. We
refer to this reconstruction problem as the \emph{Chow Parameters Problem.}
Our main result is a new algorithm for the Chow Parameters Problem which,
given (sufficiently accurate approximations to) the Chow parameters of any
linear threshold function , runs in time \tilde{O}(n^2)\cdot
(1/\eps)^{O(\log^2(1/\eps))} and with high probability outputs a
representation of an LTF that is \eps-close to . The only previous
algorithm (O'Donnell and Servedio) had running time \poly(n) \cdot
2^{2^{\tilde{O}(1/\eps^2)}}.
As a byproduct of our approach, we show that for any linear threshold
function over , there is a linear threshold function which
is \eps-close to and has all weights that are integers at most \sqrt{n}
\cdot (1/\eps)^{O(\log^2(1/\eps))}. This significantly improves the best
previous result of Diakonikolas and Servedio which gave a \poly(n) \cdot
2^{\tilde{O}(1/\eps^{2/3})} weight bound, and is close to the known lower
bound of (1/\eps)^{\Omega(\log \log (1/\eps))}\} (Goldberg,
Servedio). Our techniques also yield improved algorithms for related problems
in learning theory
- …