19 research outputs found

    Acta Cybernetica : Volume 9. Number 4.

    Get PDF

    Deriving Probabilistic Databases with Inference Ensembles

    Get PDF
    Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach

    Methodological perspectives and research implications

    Get PDF
    "#1994"--handwritten on coverOverall statement of responsibility for the multi-volume set reads: Hayward R. Alker, Jr., Lincoln P. Bloomfield and Nazli Choucri"November 1974."Includes bibliographical referencesSupported by the Dept. of State 1722-32008

    Dependence modeling with applications in financial econometrics

    Get PDF
    The amount of data available in banking, finance and economics steadily increases due to the ongoing technological progress and the continuing digitalization. A key element of many econometric models for analyzing this data are methods for assessing dependencies, cross-sectionally as well as intertemporally. For this reason, the thesis is centered around statistical and econometric methods for dependence modeling with applications in financial econometrics. The first part of this cumulative dissertation consists of three contributions. The first contribution provides a thorough explanation of the partial copula. It is a natural generalization of the partial correlation coefficient and several of its properties are investigated. In the second contribution, a different multivariate generalization of the partial correlation, the partial vine copula (PVC), is introduced. The PVC is a specific simplified vine copula (SVC) consisting of bivariate higher-order partial copulas, which are copula-based generalizations of sequentially computed partial correlations. Several properties of the PVC are presented and it is shown that, if SVCs are considered as approximations of multivariate distributions, the PVC has a special role as it is the limit of stepwise estimators. The third contribution introduces statistical tests for the simplifying assumption with a special focus on high-dimensional vine copulas. We propose a computationally feasible test for the simplifying assumption in high-dimensions, which is successfully applied to data sets with up to 49 dimensions. The novel test procedure is based on a decision tree which is used to identify the possibly strongest violation of the simplifying assumption. The asymptotic distribution of the test statistic is derived under consideration of estimation uncertainty in the copula parameters. The finite sample performance is analyzed in an extensive simulation study and the results show that the power of the test only slightly decreases in the dimensionality of the test problem. In the second part of the dissertation, the assessment of risk measures is studied with a special focus on the financial return data used for estimation. It is shown that the choice of the sampling scheme can greatly affect the results of risk assessment procedures if the assessment frequency and forecasting horizon are fixed. Specifically, we study sequences of variance estimates and show that they exhibit spurious seasonality, if the assessment frequency is higher than the sampling frequency of non-overlapping return data. The root cause of spurious seasonality is identified by deriving the theoretical autocorrelation function of sequences of variance estimates under general assumptions. To overcome spurious seasonality, alternative variance estimators based on overlapping return data are suggested. The third part of the dissertation is about state space methods for systems with lagged states in the measurement equation. Recently, a low-dimensional modified Kalman filter and smoother for such systems was proposed in the literature. Special attention is paid to the modified Kalman smoother, for which it is shown that the suggested smoother in general does not minimize the mean squared error (MSE). The correct MSE-minimizing modified Kalman smoother is derived and computationally more efficient smoothing algorithms are discussed. Finally, a comparison of the competing smoothers with regards to the MSE is performed

    Computational Complexity of Strong Admissibility for Abstract Dialectical Frameworks

    Get PDF
    Abstract dialectical frameworks (ADFs) have been introduced as a formalism for modeling and evaluating argumentation allowing general logical satisfaction conditions. Different criteria used to settle the acceptance of arguments arecalled semantics. Semantics of ADFs have so far mainly been defined based on the concept of admissibility. Recently, the notion of strong admissibility has been introduced for ADFs. In the current work we study the computational complexityof the following reasoning tasks under strong admissibility semantics. We address 1. the credulous/skeptical decision problem; 2. the verification problem; 3. the strong justification problem; and 4. the problem of finding a smallest witness of strong justification of a queried argument

    On the Discovery of Semantically Meaningful SQL Constraints from Armstrong Samples: Foundations, Implementation, and Evaluation

    No full text
    A database is said to be C-Armstrong for a finite set Σ of data dependencies in a class C if the database satisfies all data dependencies in Σ and violates all data dependencies in C that are not implied by Σ. Therefore, Armstrong databases are concise, user-friendly representations of abstract data dependencies that can be used to judge, justify, convey, and test the understanding of database design choices. Indeed, an Armstrong database satisfies exactly those data dependencies that are considered meaningful by the current design choice Σ. Structural and computational properties of Armstrong databases have been deeply investigated in Codd’s Turing Award winning relational model of data. Armstrong databases have been incorporated in approaches towards relational database design. They have also been found useful for the elicitation of requirements, the semantic sampling of existing databases, and the specification of schema mappings. This research establishes a toolbox of Armstrong databases for SQL data. This is challenging as SQL data can contain null marker occurrences in columns declared NULL, and may contain duplicate rows. Thus, the existing theory of Armstrong databases only applies to idealized instances of SQL data, that is, instances without null marker occurrences and without duplicate rows. For the thesis, two popular interpretations of null markers are considered: the no information interpretation used in SQL, and the exists but unknown interpretation by Codd. Furthermore, the study is limited to the popular class C of functional dependencies. However, the presence of duplicate rows means that the class of uniqueness constraints is no longer subsumed by the class of functional dependencies, in contrast to the relational model of data. As a first contribution a provably-correct algorithm is developed that computes Armstrong databases for an arbitrarily given finite set of uniqueness constraints and functional dependencies. This contribution is based on axiomatic, algorithmic and logical characterizations of the associated implication problem that are also established in this thesis. While the problem to decide whether a given database is Armstrong for a given set of such constraints is precisely exponential, our algorithm computes an Armstrong database with a number of rows that is at most quadratic in the number of rows of a minimum-sized Armstrong database. As a second contribution the algorithms are implemented in the form of a design tool. Users of the tool can therefore inspect Armstrong databases to analyze their current design choice Σ. Intuitively, Armstrong databases are useful for the acquisition of semantically meaningful constraints, if the users can recognize the actual meaningfulness of constraints that they incorrectly perceived as meaningless before the inspection of an Armstrong database. As a final contribution, measures are introduced that formalize the term “useful” and it is shown by some detailed experiments that Armstrong tables, as computed by the tool, are indeed useful. In summary, this research establishes a toolbox of Armstrong databases that can be applied by database designers to concisely visualize constraints on SQL data. Such support can lead to database designs that guarantee efficient data management in practice
    corecore