Search CORE

2,097 research outputs found

Solving Large Scale Instances of the Distribution Design Problem Using Data Mining

Author: Cruz L.
Fraire H.
Frausto J.
Pazos R.
Perez J.
Romero D.
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 26/01/2012
Field of study

In this paper we approach the solution of large instances of the distribution design problem. The traditional approaches do not consider that the instance size can significantly reduce the efficiency of the solution process. We propose a new approach that includes compression methods to transform the original instance into a new one using data mining techniques. The goal of the transformation is to condense the operation access pattern of the original instance to reduce the amount of resources needed to solve the original instance, without significantly reducing the quality of its solution. In order to validate the approach, we tested it proposing two instance compression methods on a new model of the replicated version of the distribution design problem that incorporates generalized database objects. The experimental results show that our approach permits to reduce the computational resources needed for solving large instances by at least 65%, without significantly reducing the quality of its solution. Given the encouraging results, at the moment we are working on the design and implementation of efficient instance compression methods using other data mining techniques

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Development of new data partitioning and allocation algorithms for query optimization of distributed data warehouse systems

Author: Abdalla Hassan Ismail
Publication venue
Publication date: 01/06/2008
Field of study

Distributed databases and in particular distributed data warehousing are becoming an increasingly important technology for information integration and data analysis. Data Warehouse (DW) systems are used by decision makers for performance measurement and decision support. However, although data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, the OLAP query response time is strongly affected by the volume of data need to be accessed from storage disks. Data partitioning is one of the physical design techniques that may be used to optimize query processing cost in DWs. It is a non redundant optimization technique because it does not replicate data, contrary to redundant techniques like materialized views and indexes. The warehouse partitioning problem is concerned with determining the set of dimension tables to be partitioned and using them to generate the fact table fragments. In this work an enhanced grouping algorithm that avoids the limitations of some existing vertical partitioning algorithms is proposed. Furthermore, a static partitioning algorithm that allows fragmentation at early stages of schema design is presented. The thesis also, investigates the performance of the data warehouse after implementing a combination of Genetic Algorithm (GA) and Simulated Annealing (SA) techniques to horizontally partition the data warehouse star schema. It, then presents the experimentation and implementation results of the proposed algorithm. This research presented different approaches to optimize data fragments allocation cost using a greedy mathematical model and a combination of simulated annealing and genetic algorithm to determine the site by site allocation leading to optimal solutions for fragments distribution. Throughout this thesis, the term fragmentation and partitioning will be used interchangeably

London Met Repository

Обзор подходов к организации физического уровня в СУБД

Author: Чернышев Георгий Алексеевич
Publication venue: СПб ФИЦ РАН
Publication date: 01/02/2013
Field of study

In this paper we survey various DBMS physical design options. We will consider both vertical and horizontal partitioning, and briefly cover replication. This survey is not limited only to local systems, but also includes distributed ones. The latter adds a new interesting question — how to actually distribute data among several processing nodes. Aside from theoretical approaches we consider the practical ones, implemented in any contemporary DBMS. We cover these aspects not only from user, but also architect and programmer perspectives.В данной работе мы рассмотрели различные методы организации физического уровня СУБД: вертикальное и горизонтальное фрагментирование, а также вкратце нами затронут вопрос репликации. Указанные методы были рассмотрены не только для локальных, но и для распределенных СУБД. Последним было уделено повышенное внимание: были рассмотрены методы размещения данных на узлах распределенной системы. Кроме теоретических работ, приведены работы практического характера, в которых освещены вопросы применения вышеуказанных методов в современных коммерческих СУБД. Они были рассмотрены как с позиции пользователя, так и с позиций архитектора и программиста СУБ

Информатика и автоматизация

Distributed transaction processing in the Escada protocol

Author: Correia Júnior Alfrânio Tavares
Publication venue
Publication date: 01/01/2004
Field of study

Replicação é uma técnica essencial para a implementação de bases de dados tolerantes a faltas, sendo também frequentemente utilizada para melhorar o seu desempenho. Infelizmente, quando critérios de consistência forte e a capacidade de actualização a partir de qualquer réplica são consideradas, os protocolos de replicação actualmente disponíveis nos gestores de bases de dados comerciais não apresentam um bom desempenho. O problema está relacionado ao custo produzido pelas interacções entre as réplicas no intuito de garantir a consistência, e pelos protocolos de terminação que procuram assegurar que todas as réplicas concordam com o resultado da transacção. De uma maneira geral, o número de “aborts”, “deadlocks” e mensagens trocadas cresce de maneira drástica, ao aumentar o número de réplicas. Em outros trabalhos, foi provado que a replicação de base de dados num cenário desses é impraticável. No intuito de resolver esses problemas, diversos estudos têm sido desenvolvidos. Inicialmente, a maioria deles deixou de lado os requisitos de consistência forte ou a capacidade de actualização a partir de qualquer réplica para conseguir soluções viáveis. Recentemente, protocolos de replicação baseados em comunicação em grupo foram propostos, nos quais os requisitos de consistência forte e actualização a partir de qualquer réplica são preservados e os problemas contornados. Neste contexto encontra-se o projecto Escada. Sucintamente, ele tem como objectivo estudar, projectar e implementar mecanismos de replicação transaccionais adequados para sistemas distribuídos de larga escala. Em particular, o projecto explora as técnicas de replicação parcial para fornecer critérios de consistência forte sem introduzir pesos significantes de sincronização e sem prejudicar o desempenho. Nesta dissertação, extendemos o projecto Escada com um modelo e um mecanismo de processamento de consultas distribuído, o que é um requisito inevitável num ambiente de replicação parcial. Além disso, explorando características dos protocolos, propomos um cache semântico para reduzir o peso gerado ao aceder a réplicas remotas. Também melhoramos o processo de certificação, ao procurar reduzir os “aborts”, utilizando informação semântica presente nas transacções. Finalmente, para avaliar os protocolos desenvolvidos pelo projecto Escada, o cache semântico e o processo de certificação utilizamos um modelo de simulação que combina código simulado e real, o que nos permite avaliar nossas propostas em diferentes cenários e configurações. Mais do que isso, ao invés de usar cargas fictícias, submetemos nossas propostas a cargas baseadas nos “benchmarks” TPC-W e TPC-C.Database replication is an invaluable technique to implement fault-tolerant databases, being also frequently used to improve database performance. Unfortunately, when strong consistency among the replicas and the ability to update the database at any of the replicas are considered, the replication protocols do not scale up. The problem is related to the number of interactions among the replicas in order to guarantee consistency and to the protocols used to ensure that all the replicas agree on transactions’ result. Roughly, the number of aborts, deadlocks and messages exchanged among the replicas grows drastically, when the number of replicas increases. In related works, it has been proved that database replication in such a scenario is impractical. In order to overcome these problems, several studies have been developed. Initially, most of them released the strong consistency and the update-anywhere requirements to achieve feasible solutions. Recently, replication protocols based on group communication were proposed, in which the strong consistency and update-anywhere requirements are preserved and the problems circumvented. This is the context of the Escada project. Briefly, it aims to study, design and implement transaction replication mechanisms suited to large scale distributed systems. In particular, the project exploits partial replication techniques to provide strong consistency criteria without introducing significant synchronization and performance overheads. In this thesis, we augment the Escada with a distributed query processing model and mechanism, which is an inevitable requirement in a partially replicated environment. Moreover, exploiting characteristics of its protocols, we propose a semantic cache to reduce the overhead generated while accessing remote replicas. We also improve the certification process, while attempting to reduce aborts using the semantic information available in the transactions. Finally, to evaluate the Escada protocols, the semantic caching and the certification process, we use a simulation model that combines simulated and real code, which allows to evaluate our proposals under distinct scenarios and configurations. Furthermore, instead of using unrealistic workloads, we test our proposals using workloads based on the TPC-W and TPC-C benchmarks.Fundação para a Ciência e a Tecnologia - POSI/CHS/41285/2001

Universidade do Minho: RepositoriUM

Enabling Technology in Optical Fiber Communications: From Device, System to Networking

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

This book explores the enabling technology in optical fiber communications. It focuses on the state-of-the-art advances from fundamental theories, devices, and subsystems to networking applications as well as future perspectives of optical fiber communications. The topics cover include integrated photonics, fiber optics, fiber and free-space optical communications, and optical networking

Directory of Open Access Books (DOAB)

Advances in Computational Intelligence Applications in the Mining Industry

Author
Publication venue: 'MDPI AG'
Publication date: 21/03/2022
Field of study

This book captures advancements in the applications of computational intelligence (artificial intelligence, machine learning, etc.) to problems in the mineral and mining industries. The papers present the state of the art in four broad categories: mine operations, mine planning, mine safety, and advances in the sciences, primarily in image processing applications. Authors in the book include both researchers and industry practitioners

Directory of Open Access Books (DOAB)

Sixth Goddard Conference on Mass Storage Systems and Technologies Held in Cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems

Author: Hariharan P. C.
Kobler Benjamin
Publication venue
Publication date
Field of study

This document contains copies of those technical papers received in time for publication prior to the Sixth Goddard Conference on Mass Storage Systems and Technologies which is being held in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems at the University of Maryland-University College Inn and Conference Center March 23-26, 1998. As one of an ongoing series, this Conference continues to provide a forum for discussion of issues relevant to the management of large volumes of data. The Conference encourages all interested organizations to discuss long term mass storage requirements and experiences in fielding solutions. Emphasis is on current and future practical solutions addressing issues in data management, storage systems and media, data acquisition, long term retention of data, and data distribution. This year's discussion topics include architecture, tape optimization, new technology, performance, standards, site reports, vendor solutions. Tutorials will be available on shared file systems, file system backups, data mining, and the dynamics of obsolescence

NASA Technical Reports Server

Using Workload Prediction and Federation to Increase Cloud Utilization

Author: Pucher Alexander Ernst
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

The wide-spread adoption of cloud computing has changed how large-scale computing infrastructure is built and managed. Infrastructure-as-a-Service (IaaS) clouds consolidate different separate workloads onto a shared platform and provide a consistent quality of service by overprovisioning capacity. This additional capacity, however, remains idle for extended periods of time and represents a drag on system efficiency.The smaller scale of private IaaS clouds compared to public clouds exacerbates overprovisioning inefficiencies as opportunities for workload consolidation in private clouds are limited. Federation and cycle harvesting capabilities from computational grids help to improve efficiency, but to date have seen only limited adoption in the cloud due to a fundamental mismatch between the usage models of grids and clouds. Computational grids provide high throughput of queued batch jobs on a best-effort basis and enforce user priorities through dynamic job preemption, while IaaS clouds provide immediate feedback to user requests and make ahead-of-time guarantees about resource availability.We present a novel method to enable workload federation across IaaS clouds that overcomes this mismatch between grid and cloud usage models and improves system efficiency while also offering availability guarantees. We develop a new method for faster-than-realtime simulation of IaaS clouds to make predictions about system utilization and leverage this method to estimate the future availability of preemptible resources in the cloud. We then use these estimates to perform careful admission control and provide ahead-of-time bounds on the preemption probability of federated jobs executing on preemptible resources. Finally, we build an end-to-end prototype that addresses practical issues of workload federation and evaluate the prototype's efficacy using real-world traces from big data and compute-intensive production workloads

Ezid

eScholarship - University of California

On three use cases of multi-connectivity paradigm in emerging wireless networks

Author: Kassem Mohamed Mostafa Mohamed
Publication venue: The University of Edinburgh
Publication date: 25/06/2020
Field of study

As envisioned by global network operators, the increasing trend of data traffic demand is expected to continue with exponential growth in the coming years. To cope with this rapid increase, significant efforts from the research community, industry and even regulators have been focused towards improving two main aspects of the wireless spectrum: (i) spectrum capacity and (ii) spectral efficiency. Concerning the spectrum capacity enhancement, the multi-connectivity paradigm has been seen to be fundamentally important to solve the capacity problem in the next generation networks. Multi-connectivity is a feature that allows wireless devices to establish and maintain multiple simultaneous connections across homogeneous or heterogeneous technologies. In this thesis, we focus on identifying the core issues in applying the multi-connectivity paradigm for different use cases and propose novel solutions to address them. Specifically, this thesis studies three use cases of the multi-connectivity paradigm. First, we study the uplink/downlink decoupling problem in 4G networks. More specifically, we focus on the user association problem in the decoupling context, which is considered challenging due to the conflicting objectives of different entities (e.g., mobile users and base stations) in the system. We use a combination of matching theory and stochastic geometry to reconcile competing objectives between users in the uplink/downlink directions and also from the perspective of base stations. Second, we tackle the spectrum aggregation problem for wireless backhauling links in unlicensed opportunistic shared spectrum bands, specifically, TV White Space (TVWS) spectrum. In relation to this, we present a DIY mobile network deployment model to accelerate the roll-out of high-end mobile services in rural and developing regions. As part of this model, we highlight the importance of low-cost and high-capacity backhaul infrastructure for which TVWS spectrum can be exploited. Building on that, we conduct a thorough analytical study to identify the characteristics of TVWS in rural areas. Our study sheds light on the nature of TVWS spectrum fragmentation for the backhauling use case, which in turn poses requirements for the design of spectrum aggregation systems for TVWS backhaul. Motivated by these findings, we design and implement WhiteHaul, a flexible platform for spectrum aggregation in TVWS. Three challenges have been tackled in this work. First, TVWS spectrum is fragmented in that the spectrum is available in non-contiguous manner. To fully utilize the available spectrum, multiple radios should be enabled to work simultaneously. However, all the radios have to share only a single antenna. The key challenge is to design a system architecture that is capable of achieving different aggregation configurations while avoiding the interference. Second, the heterogeneous nature of the available spectrum (i.e., in terms of bandwidth and link characteristics) requires a design of efficient traffic distribution algorithm that takes into account these factors. Third, TVWS is unlicensed opportunistic shared spectrum. Thus, the coordination mechanism between the two nodes of backhauling link is essential to enable seamless channel switching. Third, we study the integration of multiple radio access technologies (RATs) in the context of 4G/5G networks. More specifically, we study the potential gain of enabling the Multi-RAT integration at the Packet Data Convergence Protocol (PDCP) layer compared with doing it at the transport layer. In this work, we consider ultra-reliable low-latency communication (URLLC) as one of the motivating services. This work tackles the different challenges that arise from enabling the Multi-RAT integration at the PDCP layer, including, packet reordering and traffic scheduling

Edinburgh Research Archive

Optimal use of computing equipment in an automated industrial inspection context

Author: Jubb Matthew James
Publication venue
Publication date: 01/01/1995
Field of study

This thesis deals with automatic defect detection. The objective was to develop the techniques required by a small manufacturing business to make cost-efficient use of inspection technology. In our work on inspection techniques we discuss image acquisition and the choice between custom and general-purpose processing hardware. We examine the classes of general-purpose computer available and study popular operating systems in detail. We highlight the advantages of a hybrid system interconnected via a local area network and develop a sophisticated suite of image-processing software based on it. We quantitatively study the performance of elements of the TCP/IP networking protocol suite and comment on appropriate protocol selection for parallel distributed applications. We implement our own distributed application based on these findings. In our work on inspection algorithms we investigate the potential uses of iterated function series and Fourier transform operators when preprocessing images of defects in aluminium plate acquired using a linescan camera. We employ a multi-layer perceptron neural network trained by backpropagation as a classifier. We examine the effect on the training process of the number of nodes in the hidden layer and the ability of the network to identify faults in images of aluminium plate. We investigate techniques for introducing positional independence into the network's behaviour. We analyse the pattern of weights induced in the network after training in order to gain insight into the logic of its internal representation. We conclude that the backpropagation training process is sufficiently computationally intensive so as to present a real barrier to further development in practical neural network techniques and seek ways to achieve a speed-up. Weconsider the training process as a search problem and arrive at a process involving multiple, parallel search "vectors" and aspects of genetic algorithms. We implement the system as the mentioned distributed application and comment on its performance

Durham e-Theses