2,908 research outputs found

    Survey of Vector Database Management Systems

    Full text link
    There are now over 20 commercial vector database management systems (VDBMSs), all produced within the past five years. But embedding-based retrieval has been studied for over ten years, and similarity search a staggering half century and more. Driving this shift from algorithms to systems are new data intensive applications, notably large language models, that demand vast stores of unstructured data coupled with reliable, secure, fast, and scalable query processing capability. A variety of new data management techniques now exist for addressing these needs, however there is no comprehensive survey to thoroughly review these techniques and systems. We start by identifying five main obstacles to vector data management, namely vagueness of semantic similarity, large size of vectors, high cost of similarity comparison, lack of natural partitioning that can be used for indexing, and difficulty of efficiently answering hybrid queries that require both attributes and vectors. Overcoming these obstacles has led to new approaches to query processing, storage and indexing, and query optimization and execution. For query processing, a variety of similarity scores and query types are now well understood; for storage and indexing, techniques include vector compression, namely quantization, and partitioning based on randomization, learning partitioning, and navigable partitioning; for query optimization and execution, we describe new operators for hybrid queries, as well as techniques for plan enumeration, plan selection, and hardware accelerated execution. These techniques lead to a variety of VDBMSs across a spectrum of design and runtime characteristics, including native systems specialized for vectors and extended systems that incorporate vector capabilities into existing systems. We then discuss benchmarks, and finally we outline research challenges and point the direction for future work.Comment: 25 page

    Efficient and Reliable Task Scheduling, Network Reprogramming, and Data Storage for Wireless Sensor Networks

    Get PDF
    Wireless sensor networks (WSNs) typically consist of a large number of resource-constrained nodes. The limited computational resources afforded by these nodes present unique development challenges. In this dissertation, we consider three such challenges. The first challenge focuses on minimizing energy usage in WSNs through intelligent duty cycling. Limited energy resources dictate the design of many embedded applications, causing such systems to be composed of small, modular tasks, scheduled periodically. In this model, each embedded device wakes, executes a task-set, and returns to sleep. These systems spend most of their time in a state of deep sleep to minimize power consumption. We refer to these systems as almost-always-sleeping (AAS) systems. We describe a series of task schedulers for AAS systems designed to maximize sleep time. We consider four scheduler designs, model their performance, and present detailed performance analysis results under varying load conditions. The second challenge focuses on a fast and reliable network reprogramming solution for WSNs based on incremental code updates. We first present VSPIN, a framework for developing incremental code update mechanisms to support efficient reprogramming of WSNs. VSPIN provides a modular testing platform on the host system to plug-in and evaluate various incremental code update algorithms. The framework supports Avrdude, among the most popular Linux-based programming tools for AVR microcontrollers. Using VSPIN, we next present an incremental code update strategy to efficiently reprogram wireless sensor nodes. We adapt a linear space and quadratic time algorithm (Hirschberg\u27s Algorithm) for computing maximal common subsequences to build an edit map specifying an edit sequence required to transform the code running in a sensor network to a new code image. We then present a heuristic-based optimization strategy for efficient edit script encoding to reduce the edit map size. Finally, we present experimental results exploring the reduction in data size that it enables. The approach achieves reductions of 99.987% for simple changes, and between 86.95% and 94.58% for more complex changes, compared to full image transmissions - leading to significantly lower energy costs for wireless sensor network reprogramming. The third challenge focuses on enabling fast and reliable data storage in wireless sensor systems. A file storage system that is fast, lightweight, and reliable across device failures is important to safeguard the data that these devices record. A fast and efficient file system enables sensed data to be sampled and stored quickly and batched for later transmission. A reliable file system allows seamless operation without disruptions due to hardware, software, or other unforeseen failures. While flash technology provides persistent storage by itself, it has limitations that prevent it from being used in mission-critical deployment scenarios. Hybrid memory models which utilize newer non-volatile memory technologies, such as ferroelectric RAM (FRAM), can mitigate the physical disadvantages of flash. In this vein, we present the design and implementation of LoggerFS, a fast, lightweight, and reliable file system for wireless sensor networks, which uses a hybrid memory design consisting of RAM, FRAM, and flash. LoggerFS is engineered to provide fast data storage, have a small memory footprint, and provide data reliability across system failures. LoggerFS adapts a log-structured file system approach, augmented with data persistence and reliability guarantees. A caching mechanism allows for flash wear-leveling and fast data buffering. We present a performance evaluation of LoggerFS using a prototypical in-situ sensing platform and demonstrate between 50% and 800% improvements for various workloads using the FRAM write-back cache over the implementation without the cache

    Ancient and historical systems

    Get PDF

    Technical, Economic and Societal Effects of Manufacturing 4.0

    Get PDF
    This open access book is among the first cross-disciplinary works about Manufacturing 4.0. It includes chapters about the technical, the economic, and the social aspects of this important phenomenon. Together the material presented allows the reader to develop a holistic picture of where the manufacturing industry and the parts of the society that depend on it may be going in the future. Manufacturing 4.0 is not only a technical change, nor is it a purely technically driven change, but it is a societal change that has the potential to disrupt the way societies are constructed both in the positive and in the negative. This book will be of interest to scholars researching manufacturing, technological innovation, innovation management and industry 4.0

    Technical, Economic and Societal Effects of Manufacturing 4.0

    Get PDF
    This open access book is among the first cross-disciplinary works about Manufacturing 4.0. It includes chapters about the technical, the economic, and the social aspects of this important phenomenon. Together the material presented allows the reader to develop a holistic picture of where the manufacturing industry and the parts of the society that depend on it may be going in the future. Manufacturing 4.0 is not only a technical change, nor is it a purely technically driven change, but it is a societal change that has the potential to disrupt the way societies are constructed both in the positive and in the negative. This book will be of interest to scholars researching manufacturing, technological innovation, innovation management and industry 4.0

    Parallel Algorithms for Time and Frequency Domain Circuit Simulation

    Get PDF
    As a most critical form of pre-silicon verification, transistor-level circuit simulation is an indispensable step before committing to an expensive manufacturing process. However, considering the nature of circuit simulation, it can be computationally expensive, especially for ever-larger transistor circuits with more complex device models. Therefore, it is becoming increasingly desirable to accelerate circuit simulation. On the other hand, the emergence of multi-core machines offers a promising solution to circuit simulation besides the known application of distributed-memory clustered computing platforms, which provides abundant hardware computing resources. This research addresses the limitations of traditional serial circuit simulations and proposes new techniques for both time-domain and frequency-domain parallel circuit simulations. For time-domain simulation, this dissertation presents a parallel transient simulation methodology. This new approach, called WavePipe, exploits coarse-grained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step, the latter performs predictive computing along the forward direction. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardizing convergence and accuracy. As a coarse-grained parallel approach, it requires low parallel programming effort, furthermore it creates new avenues to have a full utilization of increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions. This dissertation also exploits the recently developed explicit telescopic projective integration method for efficient parallel transient circuit simulation by addressing the stability limitation of explicit numerical integration. The new method allows the effective time step controlled by accuracy requirement instead of stability limitation. Therefore, it not only leads to noticeable efficiency improvement, but also lends itself to straightforward parallelization due to its explicit nature. For frequency-domain simulation, this dissertation presents a parallel harmonic balance approach, applicable to the steady-state and envelope-following analyses of both driven and autonomous circuits. The new approach is centered on a naturally-parallelizable preconditioning technique that speeds up the core computation in harmonic balance based analysis. The proposed method facilitates parallel computing via the use of domain knowledge and simplifies parallel programming compared with fine-grained strategies. As a result, favorable runtime speedups are achieved

    Exploiting heterogeneity in Chip-Multiprocessor Design

    Get PDF
    In the past decade, semiconductor manufacturers are persistent in building faster and smaller transistors in order to boost the processor performance as projected by Moore’s Law. Recently, as we enter the deep submicron regime, continuing the same processor development pace becomes an increasingly difficult issue due to constraints on power, temperature, and the scalability of transistors. To overcome these challenges, researchers propose several innovations at both architecture and device levels that are able to partially solve the problems. These diversities in processor architecture and manufacturing materials provide solutions to continuing Moore’s Law by effectively exploiting the heterogeneity, however, they also introduce a set of unprecedented challenges that have been rarely addressed in prior works. In this dissertation, we present a series of in-depth studies to comprehensively investigate the design and optimization of future multi-core and many-core platforms through exploiting heteroge-neities. First, we explore a large design space of heterogeneous chip multiprocessors by exploiting the architectural- and device-level heterogeneities, aiming to identify the optimal design patterns leading to attractive energy- and cost-efficiencies in the pre-silicon stage. After this high-level study, we pay specific attention to the architectural asymmetry, aiming at developing a heterogeneity-aware task scheduler to optimize the energy-efficiency on a given single-ISA heterogeneous multi-processor. An advanced statistical tool is employed to facilitate the algorithm development. In the third study, we shift our concentration to the device-level heterogeneity and propose to effectively leverage the advantages provided by different materials to solve the increasingly important reliability issue for future processors

    Amplikoni põhine metsamuldade bakterikoosluse analüüs

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneMuldade rikkalike mikroobikoosluste uurimist on siiani palju takistanud tõsiasi, et enamik mulla mikroobe on kultiveerimatud. Seda kitsaskohta aitab leevendada lähenemine nimega metagenoomika, mis tähistab uurimistööd otse keskkonnaproovidest eraldatud geneetilise materjaliga. Selliste andmete kasutamiseks on levinud meetodid, mille abil grupeeritakse (klasterdatakse) kogutud DNA järjestused ad-hoc taksonoomilistesse üksustesse nn. OTU-desse (Operational Taxonomic Unit). Nii võib OTU-desse klasterdatud järjestusi kasutades hinnata bakterikoosluste mitmekesisust ja liigilist koostist. Saadud OTU-de arvukuse numbreid annab kasutada mitmesugustes erinevates analüüsides kui asendajaid tavapärasematele taksonoomilistele üksustele. Niisama kiire, kui on olnud uute sekveneerimistehnoloogiate areng, on ka olnud uute tööriistade arvu kasv – viimase kümnendi jooksul on loodud hulk programme, mis on mõeldud eelpoolmainitud OTU-de moodustamiseks DNA järjestuste andmetest. Antud doktoritöö töö keskendub sellele, kuidas mõjutavad erinevad OTU loomise meetodid edasisi analüüse ning järeldusi. Selleks kasutati järjestusandmeid artiklist “Bacterial community structure and its relationship to soil physico-chemical characteristics in alder stands with different management histories” ning erinevaid OTU klasterdamise meetodeid. OTU-d loodi erinevate programmide abil (Mothur,CROP,UCLUST,Swarm) – seejärel viidi läbi koosluste mitmesugused statistilised analüüsid. OTU andmete analüüs andis üldjoontes samasuguseid tulemusi. Seda visualiseerivad hästi töös olevad joonised. OTU arvude ja mitmekesisusindeksi statistilised testid ei leidnud statistiliselt olulist erinevust eri klasterdusmeetodite vahel. Kasutatud klasterdamismeetoditest jäid parimaina silma paistma CROP ja UCLUST meetodid.Lisaks näitasid analüüsid ka osade statistiliste meetodite eeliseid teiste ees sedasorti OTU andmete käsitlemiselThe soil as a central agent in many ecological processes has received a lot of research attention from many different angles. The investigation of the rich microbiome of the soil has been slowed by the fact that most of the microbes are unculturable. This gap can be filled by the metagenomics which is a field that deals with genetic material directly acquired form environmental samples. The analysis of 16S rDNA data usually begins with the construction of operational taxonomicunits (OTUs): clusters of reads that differ by less than a fixed sequence dissimilarity threshold. Consequently, the obtained sample-by-OTU abundance table serves as the basis for further statistical and exploratory analysis. During the last decade, a plethora of tools based on different principles and having different computational requirements to perform aforementioned OTU clustering has been created. This work we take an interest in the differences of the final outcome of series of analyses when different OTU clustering methods are used and also have a comparision of these methods. We used the dataset published in “Bacterial community structure and its relationship to soil physico-chemical characteristics in alder stands with different management histories” and analysed it using different software packages for processing bioinformatics data: Mothur UCLUST, CROP, Swarm. The results of analyses were on the whole quite similar and comparable.The differences between OTU numbers and diversity indeces were statistically not significant. The CROP and UCLUST methods stood out by their quality and useability. The work also showed the practicality of robust statistical methods when working with OTU data
    corecore