1,250 research outputs found
Efficient Aggregated Deliveries with Strong Guarantees in an Event-based Distributed System
A popular approach to designing large scale distributed systems is to follow an event-based approach. In an event-based approach, a set of software components interact by producing and consuming events. The event-based model allows for the decoupling of software components, allowing distributed systems to scale to a large number of components. Event correlation allows for higher order reasoning of events by constructing complex events from single, consumable events. In many cases, event correlation applications rely on centralized setups or broker overlay networks. In the case of centralized setups, the guarantees for complex event delivery are stronger, however, centralized setups create performance bottlenecks and single points of failure. With broker overlays, the performance and fault tolerance are improved but at the cost of weaker guarantees
Anatomy of a Native XML Base Management System
Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML repositories. In this paper we give a tour of Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we illustrate how almost every component in a database system is affected in terms of adequacy and performance. We show how to design and optimize areas such as storage, transaction management comprising recovery and multi-user synchronisation as well as query processing for XML
Research, development and evaluation of a practical model for sentiment analysis
Sentiment Analysis is the task of extracting subjective information from input sources
coming from a speaker or writer. Usually it refers to identifying whether a text holds a
positive or negative polarity. The main approaches to carry out Sentiment Analysis are
lexicon or dictionary-based methods and machine learning schemes. Lexicon-based models
make use of a prede ned set of words, where each of the words composing the set has an
associated polarity. Document polarity will depend on the feature selection method, and how
their scores are combined. Machine-learning approaches usually rely on supervised classifiers.
Although classifiers offer adaptability for specific contexts, they need to be trained with huge
amounts of labelled data which may not be available, specially for upcoming topics.
This project, contrary to most scientific researches over this field, aims to go further in
emotion detection and puts its efforts on identifying the actual sentiment of documents,
instead of focusing on whether it may have a positive or negative connotation. The set of
sentiments used for this approach have been extracted from Plutchik's wheel of emotions,
which defines eight basic bipolar sentiments and another eight advanced emotions composed
of two basic ones. Moreover, in this project we have created a new scheme for SA combining
a lexicon-based model for getting term emotions and a statistical approach to identify the
most relevant topics in the document which are the targets of the sentiments. By taking this
approach we have tried to overcome the disadvantages of simple Bag-of-words models that
do not make any distinctions between parts of speech (POS) and weight all words commonly
using the tf-idf scheme which leads to overweight most frequently used words. Furthermore,
in order to improve knowledge, this projects presents a heuristic learning method that
allows improving initial knowledge by converging to human-like sensitivity.
In order to test proposed scheme's performance, an Android application for mobile devices
has been developed. This app allows users taking photos and introducing descriptions which
are processed and classi ed with emotions. Classi cation that may be corrected by the user
so that system performance statistics can be extracted.El Análisis de Sentimientos consiste en extraer información subjetiva de lenguaje escrito
u oral. Habitualmente se basa en identificar si un texto es positivo o negativo, es decir,
extraer su polaridad. Las principales formas de llevar a cabo el Análisis de Sentimientos son
los métodos basados en dictionarios y en aprendizaje automático. Los modelos basados en
léxicos hacen uso de un conjunto predefinido de palabras que tienen asociada una polaridad.
La polaridad del texto dependerá los elementos analizados y la forma en la que se combinan
sus valores. Las aproximaciones basadas en aprendizaje automático, por el contrario, normalmente
se apoyan en clasificadores supervisados. A pesar de que los claificadores ofrecen
adaptabilidad para contextos muy especĂficos, necesitan gran cantidad de datos para ser
entrenados no siempre disponibles, como por ejemplo en temas muy novedosos.
Este proyecto, al contrario que la mayorĂa de investigaciones en este campo, intenta ir
m as allá en la detección de emociones y pretende identificar los sentimientos del texto en
vez de centrarse en su polaridad. El conjunto de sentimientos usados para este proyecto
esrá basado en la Rueda de las Emociones de Plutchik, que define ocho sentimientos
básicos y ocho complejos formados por dos básicos. Además, en este proyecto se ha creado
un nuevo modelo de AS combinando léxicos para extraer las emociones de las palabras con
otro estadĂstico que trata de identificar los temas más importantes del texto. De esta forma,
se ha intentado superar las desventajas de los modelos Bag-of-words que no diferencian
entre clases de palabras y ponderan todas las palabras usando el esquema tf-idf, que
conlleva sobreponderar las palabras más usadas. Asimismo, para mejorar el conocimiento
del proyecto, se ha implementado un mĂ©todo de aprendizaje heurĂstico que permite mejorar
el conocimiento inicial para converger con la sensibilidad real de los humanos.
Para evaluar el rendimiento del modelo propuesto, una aplicaciĂłn Android para mĂłviles
ha sido desarrollada. Esta app permite a los usuarios tomar fotos e introducir descripciones
que son procesadas y clasificadas por emociones. ClasificaciĂłn que puede ser corregida por
el usuario permitiendo asĂ extraer estadĂsticas del rendimiento del sistema.IngenierĂa Informátic
On the Limits and Practice of Automatically Designing Self-Stabilization
A protocol is said to be self-stabilizing when the distributed system executing it is guaranteed to recover from any fault that does not cause permanent damage. Designing such protocols is hard since they must recover from all possible states, therefore we investigate how feasible it is to synthesize them automatically. We show that synthesizing stabilization on a fixed topology is NP-complete in the number of system states. When a solution is found, we further show that verifying its correctness on a general topology (with any number of processes) is undecidable, even for very simple unidirectional rings. Despite these negative results, we develop an algorithm to synthesize a self-stabilizing protocol given its desired topology, legitimate states, and behavior. By analogy to shadow puppetry, where a puppeteer may design a complex puppet to cast a desired shadow, a protocol may need to be designed in a complex way that does not even resemble its specification. Our shadow/puppet synthesis algorithm addresses this concern and, using a complete backtracking search, has automatically designed 4 new self-stabilizing protocols with minimal process space requirements: 2-state maximal matching on bidirectional rings, 5-state token passing on unidirectional rings, 3-state token passing on bidirectional chains, and 4-state orientation on daisy chains
Rule-Based Dynamic Modification of Workflows in a Medical Domain
A major limitation of current workflow systems is their lack of supporting dynamic workflow modifications. However, this functionality is a major requirement for next-generation systems in order to provide sufficient flexibility to cope with unexpected situations and failures. For example, our experience with data intensive medical domains such as cancer therapy shows that the large number of medical exceptions is hard to manage for domain experts. We therefore have developed a rule- based approach for partially automated management of semantic exceptions during workflow instance execution. When an exception occurs, we automatically determine which running workflow instances w.r.t. which workflow regions are affected, and adjust the control flow. Rules are being used to detect semantic exceptions and to decide which activities have to be dropped or added. For dynamic modification of an affected workflow instance, we provide two algorithms (drcd-and p-algorithm) which locate appropriate deletion or insertion points and carry out the dynamic change of control flow
Garbage Collection for General Graphs
Garbage collection is moving from being a utility to a requirement of every modern programming language. With multi-core and distributed systems, most programs written recently are heavily multi-threaded and distributed. Distributed and multi-threaded programs are called concurrent programs. Manual memory management is cumbersome and difficult in concurrent programs. Concurrent programming is characterized by multiple independent processes/threads, communication between processes/threads, and uncertainty in the order of concurrent operations. The uncertainty in the order of operations makes manual memory management of concurrent programs difficult. A popular alternative to garbage collection in concurrent programs is to use smart pointers. Smart pointers can collect all garbage only if developer identifies cycles being created in the reference graph. Smart pointer usage does not guarantee protection from memory leaks unless cycle can be detected as process/thread create them. General garbage collectors, on the other hand, can avoid memory leaks, dangling pointers, and double deletion problems in any programming environment without help from the programmer. Concurrent programming is used in shared memory and distributed memory systems. State of the art shared memory systems use a single concurrent garbage collector thread that processes the reference graph. Distributed memory systems have very few complete garbage collection algorithms and those that exist use global barriers, are centralized and do not scale well. This thesis focuses on designing garbage collection algorithms for shared memory and distributed memory systems that satisfy the following properties: concurrent, parallel, scalable, localized (decentralized), low pause time, high promptness, no global synchronization, safe, complete, and operates in linear time
From software failure to explanation
“Why does my program crash?”—This ever recurring question drives the developer both when trying to reconstruct a failure that happened in the field and during the analysis and debugging of the test case that captures the failure.
This is the question this thesis attempts to answer. For that I will present two approaches which, when combined, start off with only a dump of the memory at the moment of the crash (a core dump) and eventually give a full explanation of the failure in terms of the important runtime features of the program such as critical branches, state predicates or any other execution aspect that is deemed helpful for understanding the underlying problem.
The first approach (called RECORE) takes a core dump of a crash and by means of search-based test case generation comes up with a small, self-contained and easy to understand unit test that is similar to the test as it is attached to a bug report and reproduces the failure. This test case can server as a starting point for analysis and manual debugging. Our evaluation shows that in five out of seven real cases, the resulting test captures the essence of the failure.
But this failing test case can also serve as the starting point for the second approach (called BUGEX). BUGEX is a universal debugging framework that applies the scientific method and can be implemented for arbitrary runtime features (called facts). First it observes those facts during the execution of the failing test case. Using state-of-the-art statistical debugging, these facts are then correlated to the failure, forming a hypothesis. Then it performs experiments: it generates additional executions to challenge these facts and from these additional observations refines the hypothesis. The result is a correlation of critical execution aspects to the failure with unprecedented accuracy and instantaneously point the developer to the problem. This general debugging framework can be implemented for any runtime aspects; for evaluation purposes I implemented it for branches and state predicates. The evaluation shows that in six out of seven real cases, the resulting facts pinpoint the failure.
Both approaches are independent form one another and each automates a tedious and error prone task. When being combined, they automate a large part of the debugging process, where the remaining manual task—fixing the defect—can never be fully automated.“Warum stürzt mein Programm ab?” – Diese ewig wiederkehrende Frage beschäftigt den Entwickler, sowohl beim Versuch den Fehler so zu rekonstruieren wie er beim Benutzer auftrat, als auch bei der Analyse und beim Debuggen des automatisierten Testfalles der den Fehler auslöst.
Und dies ist auch die Frage, die diese Thesis zu beantworten versucht. Dazu präsentiere ich zwei Ansätze, die, wenn man sie kombiniert, als Eingabe lediglich einen Speicherabzug (“core dump”) im Augenblick des Absturzes haben, und als Endergebnis eine Erklärung des Absturzes in Form von wichtigen Ausführungseigenschaften des Programmes liefert (wie z.B. Zweige, Zustandsprädikate oder jedes andere Merkmal der Programmausführung das für das Fehlerverständnis hilfreich sein könnte).
Der erste Ansatz (namens RECORE) nimmt einen Speicherabzug, der beim Absturz erstellt wurde, und generiert mittels suchbasierter Testfallerzeugung einen kleinen, leicht verständlichen und in sich abgeschlossenen Testfall, der denen die den Fehlerberichten (“bug reports”) beigefügt sind ähnelt und den Fehler reproduziert. Dieser Testfall kann als Ausgangspunkt der Analyse und zum manuellem Debuggen dienen. Unsere Evaluation zeigt, dass in fünf von sieben Fällen der erzeugte Testfall den Absturz erfolgreich nachstellt.
Dieser fehlschlagende Testfall kann aber auch als Ausgangspunkt für den zweiten Ansatz (namens BUGEX) dienen. BUGEX ist ein universelles Rahmenwerk, das die wissenschaftliche Methode verwendet und für beliebige Ausführungsmerkmale des Programmes implementiert werden kann. Zuerst wird der fehlschlagende Testfall bezüglich dieser Merkmale beobachtet, d.h. die Merkmale werden aufgezeichnet. Dann werden aktuelle Methoden des Statistischen Debugging verwendet, um die Merkmale mit dem Testfall zu korrelieren, also um eine Hypothese zu bilden. Anschließend werden Experimente ausgeführt: BUGEX generiert zusätzliche Programmausführungen um diese Korrelation zu prüfen und die Hypothese zu verfeinern. Das Ergebnis ist eine Korrelation zwischen kritischen Ausführungseigenschaften und dem Fehlschlagen des Programmes mit beispielloser Genauigkeit. Die entsprechenden Merkmale zeigen dem Entwickler unmittelbar das Problem auf. Dieses allgemeine Rahmenwerk kann für beliebige Ausführungsmerkmale implementiert werden. Zu Evaluationszwecken habe ich es für Programmzweige und Zustandsprädikate implementiert. Die Evaluation zeigt, dass in sechs von sieben realen Fällen die resultierenden Merkmale den Fehler genau bestimmen.
Beide Ansätze sind unabhängig von einander und jeder automatisiert eine mühsame und fehleranfällige Aufgabe. Wenn man sie kombiniert automatisieren sie einen großteil des Debugging Prozesses. Die verbleibende manuelle Aufgabe – den zu Fehler beheben – kann nie vollständig automatisiert werden
Complete Model-Based Testing Applied to the Railway Domain
Testing is the most important verification technique to assert the correctness of an embedded system. Model-based testing (MBT) is a popular approach that generates test cases from models automatically. For the verification of safety-critical systems, complete MBT strategies are most promising. Complete testing strategies can guarantee that all errors of a certain kind are revealed by the generated test suite, given that the system-under-test fulfils several hypotheses. This work presents a complete testing strategy which is based on equivalence class abstraction. Using this approach, reactive systems, with a potentially infinite input domain but finitely many internal states, can be abstracted to finite-state machines. This allows for the generation of finite test suites providing completeness. However, for a system-under-test, it is hard to prove the validity of the hypotheses which justify the completeness of the applied testing strategy. Therefore, we experimentally evaluate the fault-detection capabilities of our equivalence class testing strategy in this work. We use a novel mutation-analysis strategy which introduces artificial errors to a SystemC model to mimic typical HW/SW integration errors. We provide experimental results that show the adequacy of our approach considering case studies from the railway domain (i.e., a speed-monitoring function and an interlocking-system controller) and from the automotive domain (i.e., an airbag controller). Furthermore, we present extensions to the equivalence class testing strategy. We show that a combination with randomisation and boundary-value selection is able to significantly increase the probability to detect HW/SW integration errors
- …