162 research outputs found
On Independence Atoms and Keys
Uniqueness and independence are two fundamental properties of data. Their
enforcement in database systems can lead to higher quality data, faster data
service response time, better data-driven decision making and knowledge
discovery from data. The applications can be effectively unlocked by providing
efficient solutions to the underlying implication problems of keys and
independence atoms. Indeed, for the sole class of keys and the sole class of
independence atoms the associated finite and general implication problems
coincide and enjoy simple axiomatizations. However, the situation changes
drastically when keys and independence atoms are combined. We show that the
finite and the general implication problems are already different for keys and
unary independence atoms. Furthermore, we establish a finite axiomatization for
the general implication problem, and show that the finite implication problem
does not enjoy a k-ary axiomatization for any k
On Shapley Value in Data Assemblage Under Independent Utility
In many applications, an organization may want to acquire data from many data
owners. Data marketplaces allow data owners to produce data assemblage needed
by data buyers through coalition. To encourage coalitions to produce data, it
is critical to allocate revenue to data owners in a fair manner according to
their contributions. Although in literature Shapley fairness and alternatives
have been well explored to facilitate revenue allocation in data assemblage,
computing exact Shapley value for many data owners and large assembled data
sets through coalition remains challenging due to the combinatoric nature of
Shapley value. In this paper, we explore the decomposability of utility in data
assemblage by formulating the independent utility assumption. We argue that
independent utility enjoys many applications. Moreover, we identify interesting
properties of independent utility and develop fast computation techniques for
exact Shapley value under independent utility. Our experimental results on a
series of benchmark data sets show that our new approach not only guarantees
the exactness of Shapley value, but also achieves faster computation by orders
of magnitudes.Comment: Accepted by VLDB 202
Time dimension in the relational model
Call number: LD2668 .T4 CMSC 1987 C52Master of ScienceComputing and Information Science
Object-oriented data modeling
The object-oriented paradigm models local behavior, and to a lesser extent, the structure of a problem. Semantic data models describe structure and semantics. This thesis unifies the behavioral focus of the object-oriented paradigm with the structural and semantic focus of semantic data models. The approach contains expressive abstractions to model static and derived data, semantics, and behavior. The abstractions keep the data model closer to the problem domain, and can be translated into a relational (or other) implementation. The paper makes six contributions. First, a comprehensive set of data structuring abstractions are described. Second, the abstractions are compared to the entity-relationship and relational models. Third, semantic information inherent in the functional representation of the abstractions is identified. Fourth, a set of behavioral abstractions are described. Fifth, an algorithm that describes the dynamics between mathematically derived attributes of cooperating objects is presented. Sixth, weaknesses of object-oriented programming languages are identified
Impact de l\u27organisation des documents électroniques sur l\u27interprétation de l\u27information organique et consignée dans un contexte décentralisée
Intervention au colloque "Le numérique : impact sur le cycle de vie du document", organisé à l\u27université de Montréal par l\u27EBSI et l\u27ENSSIB du 13 au 15 octobre 2004. Dans un contexte de gestion documentaire décentralisée, l\u27organisation des documents électroniques est sous le contrôle direct des employés. Il est reconnu que ces derniers organisent ces documents électroniques selon des critères très personnels qui sont le plus souvent incompréhensibles pour les autres employés, rendant difficile le repérage et l\u27interprétation des documents. Cette communication se donne pour objet d\u27étudier plus avant le lien entre le mode d\u27organisation des documents électroniques et leur interprétation avant de conclure sur les besoins au niveau de la recherche et l\u27implication des résultats en matière de gestion du cycle de vie des documents
Histogram techniques for cost estimation in query optimization.
Yu Xiaohui.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 98-115).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 2 --- Related Work --- p.6Chapter 2.1 --- Query Optimization --- p.6Chapter 2.2 --- Query Rewriting --- p.8Chapter 2.2.1 --- Optimizing Multi-Block Queries --- p.8Chapter 2.2.2 --- Semantic Query Optimization --- p.13Chapter 2.2.3 --- Query Rewriting in Starburst --- p.15Chapter 2.3 --- Plan Generation --- p.16Chapter 2.3.1 --- Dynamic Programming Approach --- p.16Chapter 2.3.2 --- Join Query Processing --- p.17Chapter 2.3.3 --- Queries with Aggregates --- p.23Chapter 2.4 --- Statistics and Cost Estimation --- p.24Chapter 2.5 --- Histogram Techniques --- p.27Chapter 2.5.1 --- Definitions --- p.28Chapter 2.5.2 --- Trivial Histograms --- p.29Chapter 2.5.3 --- Heuristic-based Histograms --- p.29Chapter 2.5.4 --- V-Optimal Histograms --- p.32Chapter 2.5.5 --- Wavelet-based Histograms --- p.35Chapter 2.5.6 --- Multidimensional Histograms --- p.35Chapter 2.5.7 --- Global Histograms --- p.37Chapter 3 --- New Histogram Techniques --- p.39Chapter 3.1 --- Piecewise Linear Histograms --- p.39Chapter 3.1.1 --- Construction --- p.41Chapter 3.1.2 --- Usage --- p.43Chapter 3.1.3 --- Error Measures --- p.43Chapter 3.1.4 --- Experiments --- p.45Chapter 3.1.5 --- Conclusion --- p.51Chapter 3.2 --- A-Optimal Histograms --- p.54Chapter 3.2.1 --- A-Optimal(mean) Histograms --- p.56Chapter 3.2.2 --- A-Optimal(median) Histograms --- p.58Chapter 3.2.3 --- A-Optimal(median-cf) Histograms --- p.59Chapter 3.2.4 --- Experiments --- p.60Chapter 4 --- Global Histograms --- p.64Chapter 4.1 --- Wavelet-based Global Histograms --- p.65Chapter 4.1.1 --- Wavelet-based Global Histograms I --- p.66Chapter 4.1.2 --- Wavelet-based Global Histograms II --- p.68Chapter 4.2 --- Piecewise Linear Global Histograms --- p.70Chapter 4.3 --- A-Optimal Global Histograms --- p.72Chapter 4.3.1 --- Experiments --- p.74Chapter 5 --- Dynamic Maintenance --- p.81Chapter 5.1 --- Problem Definition --- p.83Chapter 5.2 --- Refining Bucket Coefficients --- p.84Chapter 5.3 --- Restructuring --- p.86Chapter 5.4 --- Experiments --- p.91Chapter 6 --- Conclusions --- p.95Bibliography --- p.9
Databases and Artificial Intelligence
International audienceThis chapter presents some noteworthy works which show the links between Databases and Artificial Intelligence. More precisely, after an introduction, Sect. 2 presents the seminal work on "logic and databases" which opened a wide research field at the intersection of databases and artificial intelligence. The main results concern the use of logic for database modeling. Then, in Sect. 3, we present different problems raised by integrity constraints and the way logic contributed to formalizing and solving them. In Sect. 4, we sum up some works related to queries with preferences. Section 5 finally focuses on the problematic of database integration
Using crowdsourced geospatial data to aid in nuclear proliferation monitoring
In 2014, a Defense Science Board Task Force was convened in order to assess and explore new technologies that would aid in nuclear proliferation monitoring. One of their recommendations was for the director of National Intelligence to explore ways that crowdsourced geospatial imagery technologies could aid existing governmental efforts. Our research builds directly on this recommendation and provides feedback on some of the most successful examples of crowdsourced geospatial data (CGD). As of 2016, Special Operations Command (SOCOM) has assumed the new role of becoming the primary U.S. agency responsible for counter-proliferation. Historically, this institution has always been reliant upon other organizations for the execution of its myriad of mission sets. SOCOM's unique ability to build relationships makes it particularly suited to the task of harnessing CGD technologies and employing them in the capacity that our research recommends. Furthermore, CGD is a low cost, high impact tool that is already being employed by commercial companies and non-profit groups around the world. By employing CGD, a wider whole-of-government effort can be created that provides a long term, cohesive engagement plan for facilitating a multi-faceted nuclear proliferation monitoring process.http://archive.org/details/usingcrowdsource1094551570Major, United States ArmyMajor, United States ArmyApproved for public release; distribution is unlimited
Disjunctively incomplete information in relational databases: modeling and related issues
In this dissertation, the issues related to the information incompleteness in relational databases are explored. In general, this dissertation can be divided into two parts. The first part extends the relational natural join operator and the update operations of insertion and deletion to I-tables, an extended relational model representing inclusively indefinite and maybe information, in a semantically correct manner. Rudimentary or naive algorithms for computing natural joins on I-tables require an exponential number of pair-up operations and block accesses proportional to the size of I-tables due to the combinatorial nature of natural joins on I-tables. Thus, the problem becomes intractable for large I-tables. An algorithm for computing natural joins under the extended model which reduces the number of pair-up operations to a linear order of complexity in general and in the worst case to a polynomial order of complexity with respect to the size of I-tables is proposed in this dissertation. In addition, this algorithm also reduces the number of block accesses to a linear order of complexity with respect to the size of I-tables;The second part is related to the modeling aspect of incomplete databases. An extended relational model, called E-table, is proposed. E-table is capable of representing exclusively disjunctive information. That is, disjunctions of the form P[subscript]1\mid P[subscript]2\mid·s\mid P[subscript]n, where ǁ denotes a generalized logical exclusive or indicating that exactly one of the P[subscript]i\u27s can be true. The information content of an E-table is precisely defined and relational operators of selection, projection, difference, union, intersection, and cartisian product are extended to E-tables in a semantically correct manner. Conditions under which redundancies could arise due to the presence of exclusively disjunctive information are characterized and the procedure for resolving redundancies is presented;Finally, this dissertation is concluded with discussions on the directions for further research in the area of incomplete information modeling. In particular, a sketch of a relational model, IE-table (Inclusive and Exclusive table), for representing both inclusively and exclusively disjunctive information is provided
- …