1,056 research outputs found
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD), and multi-valued
dependencies (MVD) are fundamental in database schema design. Likewise,
probabilistic conditional independences (CI) are crucial for reasoning about
multivariate probability distributions. The implication problem studies whether
a set of constraints (antecedents) implies another constraint (consequent), and
has been investigated in both the database and the AI literature, under the
assumption that all constraints hold exactly. However, many applications today
consider constraints that hold only approximately. In this paper we define an
approximate implication as a linear inequality between the degree of
satisfaction of the antecedents and consequent, and we study the relaxation
problem: when does an exact implication relax to an approximate implication? We
use information theory to define the degree of satisfaction, and prove several
results. First, we show that any implication from a set of data dependencies
(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most
quadratic in the number of variables; when the consequent is an FD, the factor
can be reduced to 1. Second, we prove that there exists an implication between
CIs that does not admit any relaxation; however, we prove that every
implication between CIs relaxes "in the limit". Finally, we show that the
implication problem for differential constraints in market basket analysis also
admits a relaxation with a factor equal to 1. Our results recover, and
sometimes extend, several previously known results about the implication
problem: implication of MVDs can be checked by considering only 2-tuple
relations, and the implication of differential constraints for frequent item
sets can be checked by considering only databases containing a single
transaction
Incorporating record subtyping into a relational data model
Most of the current proposals for new data models support the construction of heterogeneous sets. One of the major challenges for such data models is to provide strong typing in the presence of heterogenity. Therefore the inclusion of as much as possible information concerning legal structural variants is needed. We argue that the shape of some part of a heterogeneous scheme is often determined by the contents of some other part of the scheme. This relationship can be formalized by a certain type of integrity constraint we have called attribute dependency. Attribute dependencies combine the expressive power of general sums with a notation that fits into relational models. We show that attribute dependencies can be used, besides their application in type and integrity checking, to incorporate record subtyping into a relational model. Moreover, the notion of attribute dependency yields a stronger assertion than the traditional record subtyping rule as it considers some refinements to be caused by others.
To examine the differences between attribute dependencies and traditional record subtyping and to be able to predict how attribute dependencies behave under transformations like query language operations we develop an axiom system for their derivation and prove it to be sound and complete. We further investigate the interaction between functional and attribute dependencies and examine an extended axiom system capturing both forms of dependencies
Record Subtyping in Flexible Relations by means of Attribute Dependencies
The model of flexible relations supports heterogeneous
sets of tuples in a strongly typed way. The elegance of the standard relational model is preserved by using a single, generic scheme constructor.In each model supporting structural variants the shape of some part of a heterogeneous scheme may be determined by the contents of some other part of the scheme. We formalize this relationship by a certain kind of integrity constraint we have called "attribute dependency" (AD). We motivate how ADs can be used, besides their application in type and integrity checking, to incorporate record subtyping into our extended relational model Moreover, we show that ADs yield a stronger assertion than the traditional record subtyping rule as they
consider interdependencies among refinements. We discuss how ADs are related to query processing and how they may help to identify redundant operations
Nonextensive entropy approach to space plasma fluctuations and turbulence
Spatial intermittency in fully developed turbulence is an established feature
of astrophysical plasma fluctuations and in particular apparent in the
interplanetary medium by in situ observations. In this situation the classical
Boltzmann-Gibbs extensive thermo-statistics, applicable when microscopic
interactions and memory are short ranged, fails. Upon generalization of the
entropy function to nonextensivity, accounting for long-range interactions and
thus for correlations in the system, it is demonstrated that the corresponding
probability distributions (PDFs) are members of a family of specific power-law
distributions. In particular, the resulting theoretical bi-kappa functional
reproduces accurately the observed global leptokurtic, non-Gaussian shape of
the increment PDFs of characteristic solar wind variables on all scales.
Gradual decoupling is obtained by enhancing the spatial separation scale
corresponding to increasing kappa-values in case of slow solar wind conditions
where a Gaussian is approached in the limit of large scales. Contrary, the
scaling properties in the high speed solar wind are predominantly governed by
the mean energy or variance of the distribution. The PDFs of solar wind scalar
field differences are computed from WIND and ACE data for different time-lags
and bulk speeds and analyzed within the nonextensive theory. Consequently,
nonlocality in fluctuations, related to both, turbulence and its large scale
driving, should be related to long-range interactions in the context of
nonextensive entropy generalization, providing fundamentally the physical
background of the observed scale dependence of fluctuations in intermittent
space plasmas.Comment: 21 pages, 8 figures, accepted for publication, to appear in Advances
in Geosciences 2, chapter 04, 2006 (with minor corrections
Genetic interactions: the missing links for a better understanding of cancer susceptibility, progression and treatment
It is increasingly clear that complex networks of relationships between genes and/or proteins govern neoplastic processes. Our understanding of these networks is expanded by the use of functional genomic and proteomic approaches in addition to computational modeling. Concurrently, whole-genome association scans and mutational screens of cancer genomes identify novel cancer genes. Together, these analyses have vastly increased our knowledge of cancer, in terms of both "part lists" and their functional associations. However, genetic interactions have hitherto only been studied in depth in model organisms and remain largely unknown for human systems. Here, we discuss the importance and potential benefits of identifying genetic interactions at the human genome level for creating a better understanding of cancer susceptibility and progression and developing novel effective anticancer therapies. We examine gene expression profiles in the presence and absence of co-amplification of the 8q24 and 20q13 chromosomal regions in breast tumors to illustrate the molecular consequences and complexity of genetic interactions and their role in tumorigenesis. Finally, we highlight current strategies for targeting tumor dependencies and outline potential matrix screening designs for uncovering molecular vulnerabilities in cancer cells
A method of classification for multisource data in remote sensing based on interval-valued probabilities
An axiomatic approach to intervalued (IV) probabilities is presented, where the IV probability is defined by a pair of set-theoretic functions which satisfy some pre-specified axioms. On the basis of this approach representation of statistical evidence and combination of multiple bodies of evidence are emphasized. Although IV probabilities provide an innovative means for the representation and combination of evidential information, they make the decision process rather complicated. It entails more intelligent strategies for making decisions. The development of decision rules over IV probabilities is discussed from the viewpoint of statistical pattern recognition. The proposed method, so called evidential reasoning method, is applied to the ground-cover classification of a multisource data set consisting of Multispectral Scanner (MSS) data, Synthetic Aperture Radar (SAR) data, and digital terrain data such as elevation, slope, and aspect. By treating the data sources separately, the method is able to capture both parametric and nonparametric information and to combine them. Then the method is applied to two separate cases of classifying multiband data obtained by a single sensor. In each case a set of multiple sources is obtained by dividing the dimensionally huge data into smaller and more manageable pieces based on the global statistical correlation information. By a divide-and-combine process, the method is able to utilize more features than the conventional maximum likelihood method
- …