Search CORE

15,645 research outputs found

Circuit Complexity Meets Ontology-Based Data Access

Author: A Artale
A Calì
A Calì
A Poggi
A Razborov
A Razborov
D Calvanese
G Gottlob
H Vollmer
J Dolby
N Alon
R Kontchakov
RA Kowalski
S Jukna
S Kikot
Publication venue
Publication date: 03/06/2015
Field of study

Ontology-based data access is an approach to organizing access to a database augmented with a logical theory. In this approach query answering proceeds through a reformulation of a given query into a new one which can be answered without any use of theory. Thus the problem reduces to the standard database setting. However, the size of the query may increase substantially during the reformulation. In this survey we review a recently developed framework on proving lower and upper bounds on the size of this reformulation by employing methods and results from Boolean circuit complexity.Comment: To appear in proceedings of CSR 2015, LNCS 9139, Springe

arXiv.org e-Print Archive

Crossref

Unsupervised String Transformation Learning for Entity Consolidation

Author: Abedjan Ziawasch
Deng Dong
Elmagarmid Ahmed
Ilyas Ihab F.
Li Guoliang
Madden Samuel
Ouzzani Mourad
Stonebraker Michael
Tang Nan
Tao Wenbo
Publication venue
Publication date: 30/07/2018
Field of study

Data integration has been a long-standing challenge in data management with many applications. A key step in data integration is entity consolidation. It takes a collection of clusters of duplicate records as input and produces a single "golden record" for each cluster, which contains the canonical value for each attribute. Truth discovery and data fusion methods, as well as Master Data Management (MDM) systems, can be used for entity consolidation. However, to achieve better results, the variant values (i.e., values that are logically the same with different formats) in the clusters need to be consolidated before applying these methods. For this purpose, we propose a data-driven method to standardize the variant values based on two observations: (1) the variant values usually can be transformed to the same representation (e.g., "Mary Lee" and "Lee, Mary") and (2) the same transformation often appears repeatedly across different clusters (e.g., transpose the first and last name). Our approach first uses an unsupervised method to generate groups of value pairs that can be transformed in the same way (i.e., they share a transformation). Then the groups are presented to a human for verification and the approved ones are used to standardize the data. In a real-world dataset with 17,497 records, our method achieved 75% recall and 99.5% precision in standardizing variant values by asking a human 100 yes/no questions, which completely outperformed a state of the art data wrangling tool

arXiv.org e-Print Archive

Crossref

Computation of Buffer Capacities for Throughput Constrained and Data Dependent Inter-Task Communication

Author: Bekooij Marco J.G.
Smit Gerard J.M.
Wiggers Maarten H.
Publication venue: EDA Consortium
Publication date: 01/01/2008
Field of study

Streaming applications are often implemented as task graphs. Currently, techniques exist to derive buffer capacities that guarantee satisfaction of a throughput constraint for task graphs in which the inter-task communication is data-independent, i.e. the amount of data produced and consumed is independent of the data values in the processed stream. This paper presents a technique to compute buffer capacities that satisfy a throughput constraint for task graphs with data dependent inter-task communication, given that the task graph is a chain. We demonstrate the applicability of the approach by computing buffer capacities for an MP3 playback application, of which the MP3 decoder has a variable consumption rate. We are not aware of alternative approaches to compute buffer capacities that guarantee satisfaction of the throughput constraint for this application

CiteSeerX

Crossref

University of Twente Research Information

Omphale: Streamlining the Communication for Jobs in a Multi Processor System on Chip

Author: Bekooij M.J.G.
Bijlsma T.
Jansen P.G.
Smit G.J.M.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Our Multi Processor System on Chip (MPSoC) template provides processing tiles that are connected via a network on chip. A processing tile contains a processing unit and a Scratch Pad Memory (SPM). This paper presents the Omphale tool that performs the first step in mapping a job, represented by a task graph, to such an MPSoC, given the SPM sizes as constraints. Furthermore a memory tile is introduced. The result of Omphale is a Cyclo Static DataFlow (CSDF) model and a task graph where tasks communicate via sliding windows that are located in circular buffers. The CSDF model is used to determine the size of the buffers and the communication pattern of the data. A buffer must fit in the SPM of the processing unit that is reading from it, such that low latency access is realized with a minimized number of stall cycles. If a task and its buffer exceed the size of the SPM, the task is examined for additional parallelism or the circular buffer is partly located in a memory tile. This results in an extended task graph that satisfies the SPM size constraints

CiteSeerX

University of Twente Research Information

A Novel SAT-Based Approach to the Task Graph Cost-Optimal Scheduling Problem

Author: Nocco Sergio
Quer Stefano
Publication venue
Publication date: 01/01/2010
Field of study

The Task Graph Cost-Optimal Scheduling Problem consists in scheduling a certain number of interdependent tasks onto a set of heterogeneous processors (characterized by idle and running rates per time unit), minimizing the cost of the entire process. This paper provides a novel formulation for this scheduling puzzle, in which an optimal solution is computed through a sequence of Binate Covering Problems, hinged within a Bounded Model Checking paradigm. In this approach, each covering instance, providing a min-cost trace for a given schedule depth, can be solved with several strategies, resorting to Minimum-Cost Satisfiability solvers or Pseudo-Boolean Optimization tools. Unfortunately, all direct resolution methods show very low efficiency and scalability. As a consequence, we introduce a specialized method to solve the same sequence of problems, based on a traditional all-solution SAT solver. This approach follows the "circuit cofactoring" strategy, as it exploits a powerful technique to capture a large set of solutions for any new SAT counter-example. The overall method is completed with a branch-and-bound heuristic which evaluates lower and upper bounds of the schedule length, to reduce the state space that has to be visited. Our results show that the proposed strategy significantly improves the blind binate covering schema, and it outperforms general purpose state-of-the-art tool

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A Two-Stage Approach for Routing Multiple Unmanned Aerial Vehicles with Stochastic Fuel Consumption

Author: Rathinam Sivakumar
Sundar Kaarthik
Venkatachalam Saravanan
Publication venue
Publication date: 24/10/2018
Field of study

The past decade has seen a substantial increase in the use of small unmanned aerial vehicles (UAVs) in both civil and military applications. This article addresses an important aspect of refueling in the context of routing multiple small UAVs to complete a surveillance or data collection mission. Specifically, this article formulates a multiple-UAV routing problem with the refueling constraint of minimizing the overall fuel consumption for all of the vehicles as a two-stage stochastic optimization problem with uncertainty associated with the fuel consumption of each vehicle. The two-stage model allows for the application of sample average approximation (SAA). Although the SAA solution asymptotically converges to the optimal solution for the two-stage model, the SAA run time can be prohibitive for medium- and large-scale test instances. Hence, we develop a tabu-search-based heuristic that exploits the model structure while considering the uncertainty in fuel consumption. Extensive computational experiments corroborate the benefits of the two-stage model compared to a deterministic model and the effectiveness of the heuristic for obtaining high-quality solutions.Comment: 18 page

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals