Search CORE

1,056 research outputs found

Integrity Constraints Revisited: From Exact to Approximate Implication

Author: Kenig Batya
Suciu Dan
Publication venue
Publication date: 03/04/2019
Field of study

Integrity constraints such as functional dependencies (FD), and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes "in the limit". Finally, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Our results recover, and sometimes extend, several previously known results about the implication problem: implication of MVDs can be checked by considering only 2-tuple relations, and the implication of differential constraints for frequent item sets can be checked by considering only databases containing a single transaction

arXiv.org e-Print Archive

Episciences.org

Acta Cybernetica : Volume 12. Number 2.

Author
Publication venue
Publication date: 01/01/1995
Field of study

University of Szeged

Incorporating record subtyping into a relational data model

Author: Dadam Peter
Kalus Christian
Publication venue: Universität Ulm
Publication date: 01/01/1994
Field of study

Most of the current proposals for new data models support the construction of heterogeneous sets. One of the major challenges for such data models is to provide strong typing in the presence of heterogenity. Therefore the inclusion of as much as possible information concerning legal structural variants is needed. We argue that the shape of some part of a heterogeneous scheme is often determined by the contents of some other part of the scheme. This relationship can be formalized by a certain type of integrity constraint we have called attribute dependency. Attribute dependencies combine the expressive power of general sums with a notation that fits into relational models. We show that attribute dependencies can be used, besides their application in type and integrity checking, to incorporate record subtyping into a relational model. Moreover, the notion of attribute dependency yields a stronger assertion than the traditional record subtyping rule as it considers some refinements to be caused by others. To examine the differences between attribute dependencies and traditional record subtyping and to be able to predict how attribute dependencies behave under transformations like query language operations we develop an axiom system for their derivation and prove it to be sound and complete. We further investigate the interaction between functional and attribute dependencies and examine an extended axiom system capturing both forms of dependencies

CiteSeerX

DBIS EPub

Record Subtyping in Flexible Relations by means of Attribute Dependencies

Author: Dadam Peter
Kalus Christian
Publication venue
Publication date: 01/03/1995
Field of study

The model of flexible relations supports heterogeneous sets of tuples in a strongly typed way. The elegance of the standard relational model is preserved by using a single, generic scheme constructor.In each model supporting structural variants the shape of some part of a heterogeneous scheme may be determined by the contents of some other part of the scheme. We formalize this relationship by a certain kind of integrity constraint we have called "attribute dependency" (AD). We motivate how ADs can be used, besides their application in type and integrity checking, to incorporate record subtyping into our extended relational model Moreover, we show that ADs yield a stronger assertion than the traditional record subtyping rule as they consider interdependencies among refinements. We discuss how ADs are related to query processing and how they may help to identify redundant operations

DBIS EPub

Nonextensive entropy approach to space plasma fluctuations and turbulence

Author: Baumjohann W.
Leubner M. P.
Voros Z.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2006
Field of study

Spatial intermittency in fully developed turbulence is an established feature of astrophysical plasma fluctuations and in particular apparent in the interplanetary medium by in situ observations. In this situation the classical Boltzmann-Gibbs extensive thermo-statistics, applicable when microscopic interactions and memory are short ranged, fails. Upon generalization of the entropy function to nonextensivity, accounting for long-range interactions and thus for correlations in the system, it is demonstrated that the corresponding probability distributions (PDFs) are members of a family of specific power-law distributions. In particular, the resulting theoretical bi-kappa functional reproduces accurately the observed global leptokurtic, non-Gaussian shape of the increment PDFs of characteristic solar wind variables on all scales. Gradual decoupling is obtained by enhancing the spatial separation scale corresponding to increasing kappa-values in case of slow solar wind conditions where a Gaussian is approached in the limit of large scales. Contrary, the scaling properties in the high speed solar wind are predominantly governed by the mean energy or variance of the distribution. The PDFs of solar wind scalar field differences are computed from WIND and ACE data for different time-lags and bulk speeds and analyzed within the nonextensive theory. Consequently, nonlocality in fluctuations, related to both, turbulence and its large scale driving, should be related to long-range interactions in the context of nonextensive entropy generalization, providing fundamentally the physical background of the observed scale dependence of fluctuations in intermittent space plasmas.Comment: 21 pages, 8 figures, accepted for publication, to appear in Advances in Geosciences 2, chapter 04, 2006 (with minor corrections

arXiv.org e-Print Archive

Crossref

CERN Document Server

Genetic interactions: the missing links for a better understanding of cancer susceptibility, progression and treatment

Author: Gómez Laia
Hernández Pilar
Maxwell Christopher A
Moreno Víctor
Pujana Miguel Angel
Solé Xavier
Urruticoechea Ander
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

It is increasingly clear that complex networks of relationships between genes and/or proteins govern neoplastic processes. Our understanding of these networks is expanded by the use of functional genomic and proteomic approaches in addition to computational modeling. Concurrently, whole-genome association scans and mutational screens of cancer genomes identify novel cancer genes. Together, these analyses have vastly increased our knowledge of cancer, in terms of both "part lists" and their functional associations. However, genetic interactions have hitherto only been studied in depth in model organisms and remain largely unknown for human systems. Here, we discuss the importance and potential benefits of identifying genetic interactions at the human genome level for creating a better understanding of cancer susceptibility and progression and developing novel effective anticancer therapies. We examine gene expression profiles in the presence and absence of co-amplification of the 8q24 and 20q13 chromosomal regions in breast tumors to illustrate the molecular consequences and complexity of genetic interactions and their role in tumorigenesis. Finally, we highlight current strategies for targeting tumor dependencies and outline potential matrix screening designs for uncovering molecular vulnerabilities in cancer cells

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Diposit Digital de la Universitat de Barcelona

A method of classification for multisource data in remote sensing based on interval-valued probabilities

Author: Kim Hakil
Swain Philip H.
Publication venue
Publication date
Field of study

An axiomatic approach to intervalued (IV) probabilities is presented, where the IV probability is defined by a pair of set-theoretic functions which satisfy some pre-specified axioms. On the basis of this approach representation of statistical evidence and combination of multiple bodies of evidence are emphasized. Although IV probabilities provide an innovative means for the representation and combination of evidential information, they make the decision process rather complicated. It entails more intelligent strategies for making decisions. The development of decision rules over IV probabilities is discussed from the viewpoint of statistical pattern recognition. The proposed method, so called evidential reasoning method, is applied to the ground-cover classification of a multisource data set consisting of Multispectral Scanner (MSS) data, Synthetic Aperture Radar (SAR) data, and digital terrain data such as elevation, slope, and aspect. By treating the data sources separately, the method is able to capture both parametric and nonparametric information and to combine them. Then the method is applied to two separate cases of classifying multiband data obtained by a single sensor. In each case a set of multiple sources is obtained by dividing the dimensionally huge data into smaller and more manageable pieces based on the global statistical correlation information. By a divide-and-combine process, the method is able to utilize more features than the conventional maximum likelihood method

NASA Technical Reports Server