205 research outputs found

    A Memory Contention Responsive Hash Join Algorithm Design and Implementation on Apache AsterixDB

    Get PDF
    Efficient data management is crucial in complex computer systems, and Database Management Systems (DBMS) are indispensable for handling and processing large datasets. In DBMSs that concurrently execute multiple queries, adapting to varying workloads is desirable. Yet, predicting the fluctuating quantity and size of queries in such environments proves challenging. Over-allocating resources to a single query can impede the execution of future queries while under-allocating resources to a query expecting increased workload can lead to significant processing delays. Moreover, join operations place substantial demands on memory. This resource’s availability fluctuates as queries enter and exit the DBMS. The development of join operators capable of dynamically adapting to memory fluctuations is a complex undertaking, with few recent authors proposing memory-adaptive algorithms. This scarcity of proposals suggests the inherent difficulty in designing, implementing, and analyzing such algorithms. This thesis proposes a new memory adaptive Hash-Based join algorithm extended from designs presented by prior authors. This algorithm is implemented and experimented with in a real DBMS environment to evaluate its memory fluctuation responsiveness. A mathematical model for the increase in I/O caused by it is proposed and compared with actual results. The impacts of memory variation and frequence of memory updates reveal the importance of this thesis for further development of memory adaptive algorithms

    Data Storage and Dissemination in Pervasive Edge Computing Environments

    Get PDF
    Nowadays, smart mobile devices generate huge amounts of data in all sorts of gatherings. Much of that data has localized and ephemeral interest, but can be of great use if shared among co-located devices. However, mobile devices often experience poor connectivity, leading to availability issues if application storage and logic are fully delegated to a remote cloud infrastructure. In turn, the edge computing paradigm pushes computations and storage beyond the data center, closer to end-user devices where data is generated and consumed. Hence, enabling the execution of certain components of edge-enabled systems directly and cooperatively on edge devices. This thesis focuses on the design and evaluation of resilient and efficient data storage and dissemination solutions for pervasive edge computing environments, operating with or without access to the network infrastructure. In line with this dichotomy, our goal can be divided into two specific scenarios. The first one is related to the absence of network infrastructure and the provision of a transient data storage and dissemination system for networks of co-located mobile devices. The second one relates with the existence of network infrastructure access and the corresponding edge computing capabilities. First, the thesis presents time-aware reactive storage (TARS), a reactive data storage and dissemination model with intrinsic time-awareness, that exploits synergies between the storage substrate and the publish/subscribe paradigm, and allows queries within a specific time scope. Next, it describes in more detail: i) Thyme, a data storage and dis- semination system for wireless edge environments, implementing TARS; ii) Parsley, a flexible and resilient group-based distributed hash table with preemptive peer relocation and a dynamic data sharding mechanism; and iii) Thyme GardenBed, a framework for data storage and dissemination across multi-region edge networks, that makes use of both device-to-device and edge interactions. The developed solutions present low overheads, while providing adequate response times for interactive usage and low energy consumption, proving to be practical in a variety of situations. They also display good load balancing and fault tolerance properties.Resumo Hoje em dia, os dispositivos móveis inteligentes geram grandes quantidades de dados em todos os tipos de aglomerações de pessoas. Muitos desses dados têm interesse loca- lizado e efêmero, mas podem ser de grande utilidade se partilhados entre dispositivos co-localizados. No entanto, os dispositivos móveis muitas vezes experienciam fraca co- nectividade, levando a problemas de disponibilidade se o armazenamento e a lógica das aplicações forem totalmente delegados numa infraestrutura remota na nuvem. Por sua vez, o paradigma de computação na periferia da rede leva as computações e o armazena- mento para além dos centros de dados, para mais perto dos dispositivos dos utilizadores finais onde os dados são gerados e consumidos. Assim, permitindo a execução de certos componentes de sistemas direta e cooperativamente em dispositivos na periferia da rede. Esta tese foca-se no desenho e avaliação de soluções resilientes e eficientes para arma- zenamento e disseminação de dados em ambientes pervasivos de computação na periferia da rede, operando com ou sem acesso à infraestrutura de rede. Em linha com esta dico- tomia, o nosso objetivo pode ser dividido em dois cenários específicos. O primeiro está relacionado com a ausência de infraestrutura de rede e o fornecimento de um sistema efêmero de armazenamento e disseminação de dados para redes de dispositivos móveis co-localizados. O segundo diz respeito à existência de acesso à infraestrutura de rede e aos recursos de computação na periferia da rede correspondentes. Primeiramente, a tese apresenta armazenamento reativo ciente do tempo (ARCT), um modelo reativo de armazenamento e disseminação de dados com percepção intrínseca do tempo, que explora sinergias entre o substrato de armazenamento e o paradigma pu- blicação/subscrição, e permite consultas num escopo de tempo específico. De seguida, descreve em mais detalhe: i) Thyme, um sistema de armazenamento e disseminação de dados para ambientes sem fios na periferia da rede, que implementa ARCT; ii) Pars- ley, uma tabela de dispersão distribuída flexível e resiliente baseada em grupos, com realocação preventiva de nós e um mecanismo de particionamento dinâmico de dados; e iii) Thyme GardenBed, um sistema para armazenamento e disseminação de dados em redes multi-regionais na periferia da rede, que faz uso de interações entre dispositivos e com a periferia da rede. As soluções desenvolvidas apresentam baixos custos, proporcionando tempos de res- posta adequados para uso interativo e baixo consumo de energia, demonstrando serem práticas nas mais diversas situações. Estas soluções também exibem boas propriedades de balanceamento de carga e tolerância a faltas

    Staring into the abyss: An evaluation of concurrency control with one thousand cores

    Get PDF
    Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip. This unprecedented level of on-chip parallelism introduces a new dimension to scalability that current database management systems (DBMSs) were not designed for. In particular, as the number of cores increases, the problem of concurrency control becomes extremely challenging. With hundreds of threads running in parallel, the complexity of coordinating competing accesses to data will likely diminish the gains from increased core counts. To better understand just how unprepared current DBMSs are for future CPU architectures, we performed an evaluation of concurrency control for on-line transaction processing (OLTP) workloads on many-core chips. We implemented seven concurrency control algorithms on a main-memory DBMS and using computer simulations scaled our system to 1024 cores. Our analysis shows that all algorithms fail to scale to this magnitude but for different reasons. In each case, we identify fundamental bottlenecks that are independent of the particular database implementation and argue that even state-of-the-art DBMSs suffer from these limitations. We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.Intel Corporation (Science and Technology Center for Big Data

    Envisioning the FTC as a Facilitator of Blockchain Technology Adoption in the Direct-to-Consumer Genetic Testing Industry

    Get PDF
    Seemingly overnight, the kingpins of the direct-to-consumer genetic testing (DTC-GT) industry shifted their focus from exploring their customers’ DNA to commodifying it. Companies like Ancestry or 23andMe that were once exclusively known as mere sources of “infotainment” now regularly sell consenting customers’ genetic data to pharmaceutical researchers or use it to develop drugs of their own. To gain these customers’ consent, both firms employ a series of long, complex clickwrap contracts that largely fail to apprise their readers of the potential risks of sharing their genetic data. Nor do these agreements provide any form of compensation to those consumers whose data ultimately facilitates the development of a new, profitable drug. Understandably, the relative autonomy major DTC-GT firms wield over their customers’ genetic information—and the manner in which that autonomy is gained—raises serious privacy and bioethical concerns. More directly, it reflects a stark lack of federal oversight of the data management and storage practices of the DTC-GT industry as a whole. The emerging patchwork of state consumer privacy laws—while certainly more robust than any existing federal legislation—likewise falls short in fully protecting the privacy and dignitary interests of the DTC-GT consumers whose genetic data is shared and mined for profit. This is not to say that DTC-GT consumers should be uniformly prohibited from contributing their genetic data to medicinal research. Such behavior should be encouraged to the extent this information can be transferred and stored securely. Nevertheless, the current exploitation of consumer data by major DTC-GT firms may, over the long term, inhibit medicinal progress by undermining demand for genetic testing and, thus, the pool of genetic data available for research. Accordingly, consumers and researchers alike would benefit from a more secure and equitable method of exchanging genetic information. This Note argues that the recent advent of “blockchain genomics”—a form of exchange that allows consumers to securely loan out their genetic information for research purposes in return for compensation—fits that bill. With mainstream DTC-GT firms unlikely to adopt such a system and no legislative solution on the horizon, this Note further suggests a role for the FTC, the country’s de facto privacy regulator, to nudge major DTC-GT firms in that direction by exercising various tools of its soft regulatory authority

    SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

    Get PDF
    While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that smart-KG outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs

    Low Latency Geo-distributed Data Analytics

    Full text link
    Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines

    Escalonar sistemas de tempo-real de alta críticalidade

    Get PDF
    Cyclic executives are used to schedule safety-critical real-time systems because of their determinism, simplicity, and efficiency. One major challenge of the cyclic executive model is to produce the cyclic scheduling timetable. This problem is related to the bin-packing problem [34] and is NP-Hard in the strong sense. Unnecessary context switches within the scheduling table can introduce significant overhead; in IMA (Integrated Modular Avionics), cache-related overheads can increase task execution times up to 33% [18]. Developed in the context of the Software Engineering Master’s Degree at ISEP, the Polytechnic Institute of Engineering in Porto Portugal, this thesis contains two contributions to the scheduling literature. The first is a precise and exact approach to computing the slack of a job set that is schedule policy independent. The method introduces several operations to update and maintain the slack at runtime, ensuring the slack of all jobs is valid and coherent. The second contribution is the definition of a state-of-the-art preemptive scheduling algorithm focused on minimizing the number of system preemptions for real-time safety-critical applications within a reasonable amount of time. Both contributions have been implemented and extensively tested in scala. Experimental results suggest our scheduling algorithm has similar non-preemptive schedulability ratio than Chain Window RM [69], yet lower ratio in high utilizations than Chain Window EDF [69] and BB-Moore [68]. For ask sets that failed to be scheduled non-preemptively, 98-99% of all jobs are scheduled without preemptions. Considering the fact that our scheduler is preemptive, being able to compete with non-preemptive schedulers is an excellent result indeed. In terms of execution time, our proposal is multiple orders of magnitude faster than the aforementioned algorithms. Both contributions of this work are planned to be presented at future conferences such as RTSS@Work and RTAS
    corecore