481 research outputs found

    Qualitative Effects of Knowledge Rules in Probabilistic Data Integration

    Get PDF
    One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used

    Quality Measures in Uncertain Data Management

    Get PDF
    Many applications deal with data that is uncertain. Some examples are applications dealing with sensor information, data integration applications and healthcare applications. Instead of these applications having to deal with the uncertainty, it should be the responsibility of the DBMS to manage all data including uncertain data. Several projects do research on this topic. In this paper, we introduce four measures to be used to assess and compare important characteristics of data and systems

    User Feedback in Probabilistic XML

    Get PDF
    Data integration is a challenging problem in many application areas. Approaches mostly attempt to resolve semantic uncertainty and conflicts between information sources as part of the data integration process. In some application areas, this is impractical or even prohibitive, for example, in an ambient environment where devices on an ad hoc basis have to exchange information autonomously. We have proposed a probabilistic XML approach that allows data integration without user involvement by storing semantic uncertainty and conflicts in the integrated XML data. As a\ud consequence, the integrated information source represents\ud all possible appearances of objects in the real world, the\ud so-called possible worlds.\ud \ud In this paper, we show how user feedback on query results\ud can resolve semantic uncertainty and conflicts in the\ud integrated data. Hence, user involvement is effectively postponed to query time, when a user is already interacting actively with the system. The technique relates positive and\ud negative statements on query answers to the possible worlds\ud of the information source thereby either reinforcing, penalizing, or eliminating possible worlds. We show that after repeated user feedback, an integrated information source better resembles the real world and may converge towards a non-probabilistic information source

    How Noisy Data Affects Geometric Semantic Genetic Programming

    Full text link
    Noise is a consequence of acquiring and pre-processing data from the environment, and shows fluctuations from different sources---e.g., from sensors, signal processing technology or even human error. As a machine learning technique, Genetic Programming (GP) is not immune to this problem, which the field has frequently addressed. Recently, Geometric Semantic Genetic Programming (GSGP), a semantic-aware branch of GP, has shown robustness and high generalization capability. Researchers believe these characteristics may be associated with a lower sensibility to noisy data. However, there is no systematic study on this matter. This paper performs a deep analysis of the GSGP performance over the presence of noise. Using 15 synthetic datasets where noise can be controlled, we added different ratios of noise to the data and compared the results obtained with those of a canonical GP. The results show that, as we increase the percentage of noisy instances, the generalization performance degradation is more pronounced in GSGP than GP. However, in general, GSGP is more robust to noise than GP in the presence of up to 10% of noise, and presents no statistical difference for values higher than that in the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation Conference (GECCO 2017), Berlin, German

    Effects of flywheel training on strength-related variables in female populations. A systematic review

    Get PDF
    This study aimed to evaluate the effect of flywheel training on female populations, report practical recommendations for practitioners based on the currently available evidence, underline the limitations of current literature, and establish future research directions. Studies were searched through the electronic databases (PubMed, SPORTDiscus, and Web of Science) following the preferred reporting items for systematic reviews and meta-analysis statement guidelines. The methodological quality of the seven studies included in this review ranged from 10 to 19 points (good to excellent), with an average score of 14-points (good). These studies were carried out between 2004 and 2019 and comprised a total of 100 female participants. The training duration ranged from 5 weeks to 24 weeks, with volume ranging from 1 to 4 sets and 7 to 12 repetitions, and frequency ranged from 1 to 3 times a week. The contemporary literature suggests that flywheel training is a safe and time-effective strategy to enhance physical outcomes with young and elderly females. With this information, practitioners may be inclined to prescribe flywheel training as an effective countermeasure for injuries or falls and as potent stimulus for physical enhancement

    Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS

    Get PDF
    Trio is a new kind of database system that supports data, uncertainty, and lineage in a fully integrated manner. The first Trio prototype, dubbed Trio-One, is built on top of a conventional DBMS using data and query translation techniques together with a small number of stored procedures. This paper describes Trio-One's translation scheme and system architecture, showing how it efficiently and easily supports the Trio data model and query language

    Efficient Equilibria in Polymatrix Coordination Games

    Get PDF
    We consider polymatrix coordination games with individual preferences where every player corresponds to a node in a graph who plays with each neighbor a separate bimatrix game with non-negative symmetric payoffs. In this paper, we study α\alpha-approximate kk-equilibria of these games, i.e., outcomes where no group of at most kk players can deviate such that each member increases his payoff by at least a factor α\alpha. We prove that for α2\alpha \ge 2 these games have the finite coalitional improvement property (and thus α\alpha-approximate kk-equilibria exist), while for α<2\alpha < 2 this property does not hold. Further, we derive an almost tight bound of 2α(n1)/(k1)2\alpha(n-1)/(k-1) on the price of anarchy, where nn is the number of players; in particular, it scales from unbounded for pure Nash equilibria (k=1)k = 1) to 2α2\alpha for strong equilibria (k=nk = n). We also settle the complexity of several problems related to the verification and existence of these equilibria. Finally, we investigate natural means to reduce the inefficiency of Nash equilibria. Most promisingly, we show that by fixing the strategies of kk players the price of anarchy can be reduced to n/kn/k (and this bound is tight)

    Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces

    Get PDF
    The \emph{Chow parameters} of a Boolean function f:{1,1}n{1,1}f: \{-1,1\}^n \to \{-1,1\} are its n+1n+1 degree-0 and degree-1 Fourier coefficients. It has been known since 1961 (Chow, Tannenbaum) that the (exact values of the) Chow parameters of any linear threshold function ff uniquely specify ff within the space of all Boolean functions, but until recently (O'Donnell and Servedio) nothing was known about efficient algorithms for \emph{reconstructing} ff (exactly or approximately) from exact or approximate values of its Chow parameters. We refer to this reconstruction problem as the \emph{Chow Parameters Problem.} Our main result is a new algorithm for the Chow Parameters Problem which, given (sufficiently accurate approximations to) the Chow parameters of any linear threshold function ff, runs in time \tilde{O}(n^2)\cdot (1/\eps)^{O(\log^2(1/\eps))} and with high probability outputs a representation of an LTF ff' that is \eps-close to ff. The only previous algorithm (O'Donnell and Servedio) had running time \poly(n) \cdot 2^{2^{\tilde{O}(1/\eps^2)}}. As a byproduct of our approach, we show that for any linear threshold function ff over {1,1}n\{-1,1\}^n, there is a linear threshold function ff' which is \eps-close to ff and has all weights that are integers at most \sqrt{n} \cdot (1/\eps)^{O(\log^2(1/\eps))}. This significantly improves the best previous result of Diakonikolas and Servedio which gave a \poly(n) \cdot 2^{\tilde{O}(1/\eps^{2/3})} weight bound, and is close to the known lower bound of max{n,\max\{\sqrt{n}, (1/\eps)^{\Omega(\log \log (1/\eps))}\} (Goldberg, Servedio). Our techniques also yield improved algorithms for related problems in learning theory
    corecore