Search CORE

308,778 research outputs found

Streaming Weighted Sampling over Join Queries

Author: 26th International Conference on Extending Database Technology (EDBT 2023)
Cormode G
Ma Q
Shanghooshabad AM
Shekelyan M
Triantafillou P
Publication venue
Publication date: 02/11/2022
Field of study

Join queries are a fundamental database tool, capturing a range of tasks that involve linking heterogeneous data sources. However, with massive table sizes, it is often impractical to keep these in memory, and we can only take one or few streaming passes over them. Moreover, building out the full join result (e.g., linking heterogeneous data sources along quasi-identifiers) can lead to a combinatorial explosion of results due to many-to-many links. Random sampling is a natural tool to boil this oversized result down to a representative subset with well-understood statistical properties, but turns out to be a challenging task due to the combinatorial nature of the sampling domain. Existing techniques in the literature focus solely on the setting with tabular data residing in main memory, and do not address aspects such as stream operation, weighted sampling and more general join operators that are urgently needed in a modern data processing context. The main contribution of this work is to meet these needs with more lightweight practical approaches. First, a bijection between the sampling problem and a graph problem is introduced to support weighted sampling and common join operators. Second, the sampling techniques are refined to minimise the number of streaming passes. Third, techniques are presented to deal with very large tables under limited memory. Finally, the proposed techniques are compared to existing approaches that rely on database indices and the results indicate substantial memory savings, reduced runtimes for ad-hoc queries and competitive amortised runtimes

Queen Mary Research Online

Towards G2G: Systems of Technology Database Systems

Author: Bell David
Maluf David A.
Publication venue
Publication date
Field of study

We present an approach and methodology for developing Government-to-Government (G2G) Systems of Technology Database Systems. G2G will deliver technologies for distributed and remote integration of technology data for internal use in analysis and planning as well as for external communications. G2G enables NASA managers, engineers, operational teams and information systems to "compose" technology roadmaps and plans by selecting, combining, extending, specializing and modifying components of technology database systems. G2G will interoperate information and knowledge that is distributed across organizational entities involved that is ideal for NASA future Exploration Enterprise. Key contributions of the G2G system will include the creation of an integrated approach to sustain effective management of technology investments that supports the ability of various technology database systems to be independently managed. The integration technology will comply with emerging open standards. Applications can thus be customized for local needs while enabling an integrated management of technology approach that serves the global needs of NASA. The G2G capabilities will use NASA s breakthrough in database "composition" and integration technology, will use and advance emerging open standards, and will use commercial information technologies to enable effective System of Technology Database systems

NASA Technical Reports Server

A Unified Approach for Indexed and Non-Indexed Spatial Joins

Author: Arge Lars
Procopiuc Octavian
Ramaswamy Sridhar
Suel Torsten
Vahrenhold Jan
Vitter Jeffrey Scott
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/03/2011
Field of study

The original publication is available at www.springerlink.comL. Arge, O. Procopiuc, S. Ramaswamy, T. Suel, J. Vahrenhold, and J. S. Vitter. “A Unified Approach for Indexed and Non-Indexed Spatial Joins,” Proceedings of the 7th International Conference on Extending Database Technology (EDBT ’00), Konstanz, Germany, March 2000, published in Lecture Notes in Computer Science, Springer, 1777, Berlin, Germany, 413–429

KU ScholarWorks

DEEP CONVOLUTIONAL NEURAL NETWORK USING A NEW DATASET FOR BERBER LANGUAGE

Author: Kemiche Mokrane
Sadou Malika
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 10/03/2023
Field of study

Currently, Handwritten Character Recognition (HCR) technology has become an interesting and immensely useful technology. It has been explored with highperformance in many languages. However, a few HCR systems are proposed for the Amazigh (Berber) language. Furthermore, the validation of any Amazighhandwritten recognition system remains a major challenge due to no availability of a robust Amazigh database. To address this problem, we first created two new datasets for Tifinagh and Amazigh Latin characters, by extending the well-known EMNIST database with the Amazigh alphabet. And then, we have proposed a handwritten character recognition system, which is based on a deep convolutional neural network to validate the created datasets. The proposed CNN has been trained and tested on our created datasets, and the experimental tests show that it achieves satisfactory results in terms of accuracy and recognition efficiency

AGH (Akademia Górniczo-Hutnicza) University of Science and Technology: Journals

Computer Science Journal (AGH University of Science and Technology, Krakow)

Distribution of the Object Oriented Databases. A Viewpoint of the MVDB Model's Methodology and Architecture

Author: Daniel I. HUNYADI
Marian Pompiliu CRISTESCU
Marius POPA
Mircea A. MUSAN
Publication venue
Publication date
Field of study

In databases, much work has been done towards extending models with advanced tools such as view technology, schema evolution support, multiple classification, role modeling and viewpoints. Over the past years, most of the research dealing with the object multiple representation and evolution has proposed to enrich the monolithic vision of the classical object approach in which an object belongs to one hierarchy class. In particular, the integration of the viewpoint mechanism to the conventional object-oriented data model gives it flexibility and allows one to improve the modeling power of objects. The viewpoint paradigm refers to the multiple descriptions, the distribution, and the evolution of object. Also, it can be an undeniable contribution for a distributed design of complex databases. The motivation of this paper is to define an object data model integrating viewpoints in databases and to present a federated database architecture integrating multiple viewpoint sources following a local-as-extended-view data integration approach.object-oriented data model, OQL language, LAEV data integration approach, MVDB model, federated databases, Local-As-View Strategy.

Research Papers in Economics

A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs

Author: Agarwal Khushbu
Chin George
Choudhury Sutanay
Feo John
Holder Lawrence
Publication venue
Publication date: 03/03/2015
Field of study

Cyber security is one of the most significant technical challenges in current times. Detecting adversarial activities, prevention of theft of intellectual properties and customer data is a high priority for corporations and government agencies around the world. Cyber defenders need to analyze massive-scale, high-resolution network flows to identify, categorize, and mitigate attacks involving networks spanning institutional and national boundaries. Many of the cyber attacks can be described as subgraph patterns, with prominent examples being insider infiltrations (path queries), denial of service (parallel paths) and malicious spreads (tree queries). This motivates us to explore subgraph matching on streaming graphs in a continuous setting. The novelty of our work lies in using the subgraph distributional statistics collected from the streaming graph to determine the query processing strategy. We introduce a "Lazy Search" algorithm where the search strategy is decided on a vertex-to-vertex basis depending on the likelihood of a match in the vertex neighborhood. We also propose a metric named "Relative Selectivity" that is used to select between different query processing strategies. Our experiments performed on real online news, network traffic stream and a synthetic social network benchmark demonstrate 10-100x speedups over selectivity agnostic approaches.Comment: in 18th International Conference on Extending Database Technology (EDBT) (2015

arXiv.org e-Print Archive

CiteSeerX

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen

Extending, trimming and fusing WordNet for technical documents

Author: Vossen P.
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2001
Field of study

This paper describes a tool for the automatic extension and trimming of a multilingual WordNet database for cross-lingual retrieval and multilingual ontology building in intranets and domain-specific document collections. Hierarchies, built from automatically extracted terms and combined with the WordNet relations, are trimmed with a disambiguation method based on the document salience of the words in the glosses. The disambiguation is tested in a cross-lingual retrieval task, showing considerable improvement (7%-11%). The condensed hierarchies can be used as browse-interfaces to the documents complementary to retrieval

CiteSeerX

VU Research Portal