1,294 research outputs found
SST: Integrated Fluorocarbon Microsensor System Using Catalytic Modification
Selective, sensitive, and reliable sensors are urgently needed to detect air-borne halogenated volatile organic compounds (VOCs). This broad class of compounds includes chlorine, fluorine, bromine, and iodine containing hydrocarbons used as solvents, refrigerants, herbicides, and more recently as chemical warfare agents (CWAs). It is important to be able to detect very low concentrations of halocarbon solvents and insecticides because of their acute health effects even in very low concentrations. For instance, the nerve agent sarin (isopropyl methylphosphonofluoridate), first developed as an insecticide by German chemists in 1938, is so toxic that a ten minute exposure at an airborne concentration of only 65 parts per billion (ppb) can be fatal. Sarin became a household term when religious cult members on Tokyo subway trains poisoned over 5,500 people, killing 12. Sarin and other CWAs remain a significant threat to the health and safety of the general public. The goal of this project is to design a sensor system to detect and identify the composition and concentration of fluorinated VOCs. The system should be small, robust, compatible with metal oxide semiconductor (MOS) technology, cheap, if produced in large scale, and has the potential to be versatile in terms of low power consumption, detection of other gases, and integration in a portable system. The proposed VOC sensor system has three major elements that will be integrated into a microreactor flow cell: a temperature-programmable microhotplate array/reactor system which serves as the basic sensor platform; an innovative acoustic wave sensor, which detects material removal (instead of deposition) to verify and quantify the presence of fluorine; and an intelligent method, support vector machines, that will analyze the complex and high dimensional data furnished by the sensor system. The superior and complementary aspects of the three elements will be carefully integrated to create a system which is more sensitive and selective than other CWA detection systems that are commercially available or described in the research literature. While our sensor system will be developed to detect fluorinated VOCs, it can be adapted for other applications in which a target analyte can be catalytically converted for selective detection. Therefore, this investigation will examine the relationships between individual sensor element performance and joint sensor platform performance, integrated with state-of-the-art data analysis techniques. During development of the sensor system, the investigators will consider traditional reactor design concepts such as mass transfer and residence time effects, and will apply them to the emerging field of microsystems. The proposed research will provide the fundamental basis and understanding for examining multifunctional sensor platforms designed to provide extreme selectivity to targeted molecules. The project will involve interdisciplinary researchers and students and will connect to K-12 and RET programs for underrepresented students from rural areas
LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves
The recently proposed learned indexes have attracted much attention as they
can adapt to the actual data and query distributions to attain better search
efficiency. Based on this technique, several existing works build up indexes
for multi-dimensional data and achieve improved query performance. A common
paradigm of these works is to (i) map multi-dimensional data points to a
one-dimensional space using a fixed space-filling curve (SFC) or its variant
and (ii) then apply the learned indexing techniques. We notice that the first
step typically uses a fixed SFC method, such as row-major order and z-order. It
definitely limits the potential of learned multi-dimensional indexes to adapt
variable data distributions via different query workloads. In this paper, we
propose a novel idea of learning a space-filling curve that is carefully
designed and actively optimized for efficient query processing. We also
identify innovative offline and online optimization opportunities common to
SFC-based learned indexes and offer optimal and/or heuristic solutions.
Experimental results demonstrate that our proposed method, LMSFC, outperforms
state-of-the-art non-learned or learned methods across three commonly used
real-world datasets and diverse experimental settings.Comment: Extended Version. Accepted by VLDB 202
Efficient bulk-loading methods for temporal and multidimensional index structures
Nahezu alle naturwissenschaftlichen Bereiche profitieren von neuesten Analyse- und Verarbeitungsmethoden fĂŒr groĂe Datenmengen. Diese Verfahren setzten eine effiziente Verarbeitung von geo- und zeitbezogenen Daten voraus, da die Zeit und die Position wichtige Attribute vieler Daten
sind. Die effiziente Anfrageverarbeitung wird insbesondere durch den Einsatz von Indexstrukturen
ermöglicht. Im Fokus dieser Arbeit liegen zwei Indexstrukturen: Multiversion B-Baum
(MVBT) und R-Baum. Die erste Struktur wird fĂŒr die Verwaltung von zeitbehafteten Daten,
die zweite fĂŒr die Indexierung von mehrdimensionalen Rechteckdaten eingesetzt.
StĂ€ndig- und schnellwachsendes Datenvolumen stellt eine groĂe Herausforderung an die Informatik
dar. Der Aufbau und das Aktualisieren von Indexen mit herkömmlichen Methoden (Datensatz
fĂŒr Datensatz) ist nicht mehr effizient. Um zeitnahe und kosteneffiziente Datenverarbeitung
zu ermöglichen, werden Verfahren zum schnellen Laden von Indexstrukturen dringend benötigt.
Im ersten Teil der Arbeit widmen wir uns der Frage, ob es ein Verfahren fĂŒr das Laden von MVBT
existiert, das die gleiche I/O-KomplexitÀt wie das externe Sortieren besitz. Bis jetzt blieb diese
Frage unbeantwortet. In dieser Arbeit haben wir eine neue Kostruktionsmethode entwickelt und
haben gezeigt, dass diese gleiche ZeitkomplexitÀt wie das externe Sortieren besitzt. Dabei haben
wir zwei algorithmische Techniken eingesetzt: Gewichts-Balancierung und Puffer-BĂ€ume. Unsere
Experimenten zeigen, dass das Resultat nicht nur theoretischer Bedeutung ist.
Im zweiten Teil der Arbeit beschÀftigen wir uns mit der Frage, ob und wie statistische Informationen
ĂŒber Geo-Anfragen ausgenutzt werden können, um die Anfrageperformanz von R-BĂ€umen zu
verbessern. Unsere neue Methode verwendet Informationen wie SeitenverhÀltnis und SeitenlÀngen
eines reprĂ€sentativen Anfragerechtecks, um einen guten R-Baum bezĂŒglich eines hĂ€ufig eingesetzten
Kostenmodells aufzubauen. Falls diese Informationen nicht verfĂŒgbar sind, optimieren
wir R-BĂ€ume bezĂŒglich der Summe der Volumina von minimal umgebenden Rechtecken der Blattknoten.
Da das Problem des Aufbaus von optimalen R-BĂ€umen bezĂŒglich dieses KostenmaĂes
NP-hart ist, fĂŒhren wir zunĂ€chst das Problem auf ein eindimensionales Partitionierungsproblem
zurĂŒck, indem wir die Daten bezĂŒglich optimierte raumfĂŒllende Kurven sortieren. Dann lösen
wir dieses Problem durch Einsatz vom dynamischen Programmieren. Die I/O-KomplexitÀt des
Verfahrens ist gleich der von externem Sortieren, da die I/O-Laufzeit der Methode durch die
Laufzeit des Sortierens dominiert wird.
Im letzten Teil der Arbeit haben wir die entwickelten Partitionierungsvefahren fĂŒr den Aufbau
von Geo-Histogrammen eingesetzt, da diese Àhnlich zu R-BÀumen eine disjunkte Partitionierung
des Raums erzeugen. Ergebnisse von intensiven Experimenten zeigen, dass sich unter Verwendung
von neuen Partitionierungstechniken sowohl R-BĂ€ume mit besserer Anfrageperformanz als
auch Geo-Histogrammen mit besserer SchÀtzqualitÀt im Vergleich zu Konkurrenzverfahren generieren
lassen
Scalable data retrieval in a mobile environment
Retrieving multidimensional data out of distributed systems becomes increasingly important. But applications of these systems are often not only interested in data vectors that match certain queries. Instead, many applications demand for retrieval of data with high quality. In this thesis, we design a distributed system that can be used by applications to retrieve data of high quality for arbitrary multidimensional queries. Major challenges for the quality-based data retrieval are to 1.) find an appropriate formalization of data quality, 2.) design routing algorithms for queries, that are robust in the presence of high dynamics with respect to the participants of the system and the data on the participants and 3.) handle heterogeneous and high-dimensional data in the system. In order to retrieve data quality, we propose 1.) the measure of confidence for a query that is based on clusters of data. When a participant of the system finds, that its confidence for a query is high, it will assume to possess data of high quality for that query. 2.) Further, we design and implement routing strategies in order to route queries to nodes that can answer them with high confidence. Maintaining exact routing tables for each possible query would be infeasible, so nodes have to model the data that can be reached via neighbours in routing models. Such modelling of data is based on structural properties of the data such as how good the data can be clustered. 3.) In the high-dimensional space, we have to overcome the curse of dimensionality: the structure of data can become invisible in higher dimensions. We address this problem with a method for dimensionality reduction that reduces the dimensions with highest data variance. The evaluation of our approaches shows a high accuracy of query routing, even if our approaches do not make use of scalability bottlenecks like flooding of the query or flooding of routing information. Further, we show that the use of dimensionality reduction in routing has positive influence on the routing accuracy. We think that the methods in our approach can be useful instruments, whenever the task of retrieving data of high quality has to be outsourced to a distributed system
6 Access Methods and Query Processing Techniques
The performance of a database management system (DBMS) is fundamentally dependent on the access methods and query processing techniques available to the system. Traditionally, relational DBMSs have relied on well-known access methods, such as the ubiquitous B +-tree, hashing with chaining, and, in som
Design and Implementation of a Middleware for Uniform, Federated and Dynamic Event Processing
In recent years, real-time processing of massive event streams has become an important topic in the area of data analytics. It will become even more important in the future due to cheap sensors, a growing amount of devices and their ubiquitous inter-connection also known as the Internet of Things (IoT). Academia, industry and the open source community have developed several event processing (EP) systems that allow users to define, manage and execute continuous queries over event streams. They achieve a significantly better performance than the traditional store-then-process'' approach in which events are first stored and indexed in a database. Because EP systems have different roots and because of the lack of standardization, the system landscape became highly heterogenous. Today's EP systems differ in APIs, execution behaviors and query languages. This thesis presents the design and implementation of a novel middleware that abstracts from different EP systems and provides a uniform API, execution behavior and query language to users and developers. As a consequence, the presented middleware overcomes the problem of vendor lock-in and different EP systems are enabled to cooperate with each other. In practice, event streams differ dramatically in volume and velocity. We show therefore how the middleware can connect to not only different EP systems, but also database systems and a native implementation. Emerging applications such as the IoT raise novel challenges and require EP to be more dynamic. We present extensions to the middleware that enable self-adaptivity which is needed in context-sensitive applications and those that deal with constantly varying sets of event producers and consumers. Lastly, we extend the middleware to fully support the processing of events containing spatial data and to be able to run distributed in the form of a federation of heterogenous EP systems
- âŠ