1,894 research outputs found
Query Stability in Monotonic Data-Aware Business Processes [Extended Version]
Organizations continuously accumulate data, often according to some business
processes. If one poses a query over such data for decision support, it is
important to know whether the query is stable, that is, whether the answers
will stay the same or may change in the future because business processes may
add further data. We investigate query stability for conjunctive queries. To
this end, we define a formalism that combines an explicit representation of the
control flow of a process with a specification of how data is read and inserted
into the database. We consider different restrictions of the process model and
the state of the system, such as negation in conditions, cyclic executions,
read access to written data, presence of pending process instances, and the
possibility to start fresh process instances. We identify for which facet
combinations stability of conjunctive queries is decidable and provide
encodings into variants of Datalog that are optimal with respect to the
worst-case complexity of the problem.Comment: This report is the extended version of a paper accepted at the 19th
International Conference on Database Theory (ICDT 2016), March 15-18, 2016 -
Bordeaux, Franc
Semantic Service Description Framework for Efficient Service Discovery and Composition
Web services have been widely adopted as a new distributed system technology by industries in the areas of, enterprise application integration, business process management, and virtual organisation. However, lack of semantics in current Web services standards has been a major barrier in the further improvement of service discovery and composition. For the last decade, Semantic Web Services have become an important research topic to enrich the semantics of Web services. The key objective of Semantic Web Services is to achieve automatic/semi-automatic Web service discovery, invocation, and composition. There are several existing semantic Web service description frameworks, such as, OWL-S, WSDL-S, and WSMF. However, existing frameworks have several issues, such as insufficient service usage context information, precisely specified requirements needed to locate services, lacking information about inter-service relationships, and insufficient/incomplete information handling, make the process of service discovery and composition not as efficient as it should be.
To address these problems, a context-based semantic service description framework is proposed in this thesis. This framework focuses on not only capabilities of Web services, but also the usage context information of Web services, which we consider as an important factor in efficient service discovery and composition. Based on this framework, an enhanced service discovery mechanism is proposed. It gives service users more flexibility to search for services in more natural ways rather than only by technical specifications of required services. The service discovery mechanism also demonstrates how the features provided by the framework can facilitate the service discovery and composition processes. Together with the framework, a transformation method is provided to transform exiting service descriptions into the new framework based descriptions.
The framework is evaluated through a scenario based analysis in comparison with OWL-S and a prototype based performance evaluation in terms of query response time, the precision and recall ratio, and system scalability
State-of-the-art on evolution and reactivity
This report starts by, in Chapter 1, outlining aspects of querying and updating resources on
the Web and on the Semantic Web, including the development of query and update languages
to be carried out within the Rewerse project.
From this outline, it becomes clear that several existing research areas and topics are of
interest for this work in Rewerse. In the remainder of this report we further present state of
the art surveys in a selection of such areas and topics. More precisely: in Chapter 2 we give
an overview of logics for reasoning about state change and updates; Chapter 3 is devoted to briefly describing existing update languages for the Web, and also for updating logic programs;
in Chapter 4 event-condition-action rules, both in the context of active database systems and
in the context of semistructured data, are surveyed; in Chapter 5 we give an overview of some relevant rule-based agents frameworks
Early aspects: aspect-oriented requirements engineering and architecture design
This paper reports on the third Early Aspects: Aspect-Oriented Requirements Engineering and Architecture Design Workshop, which has been held in Lancaster, UK, on March 21, 2004. The workshop included a presentation session and working sessions in which the particular topics on early aspects were discussed. The primary goal of the workshop was to focus on challenges to defining methodical software development processes for aspects from early on in the software life cycle and explore the potential of proposed methods and techniques to scale up to industrial applications
Recommended from our members
Learning from Sequential User Data: Models and Sample-efficient Algorithms
Recent advances in deep learning have made learning representation from ever-growing datasets possible in the domain of vision, natural language processing (NLP), and robotics, among others. However, deep networks are notoriously data-hungry; for example, training language models with attention mechanisms sometimes requires trillions of parameters and tokens. In contrast, we can often access a limited number of samples in many tasks. It is crucial to learn models from these `limited\u27 datasets. Learning with limited datasets can take several forms. In this thesis, we study how to select data samples sequentially such that downstream task performance is maximized. Moreover, we study how to introduce prior knowledge in the deep networks to maximize prediction performance. We focus on four sequential tasks: computerized adaptive testing in psychometrics, sketching in recommender systems, knowledge tracing in computer-assisted education, and career path modeling in the labor market.
In the first two tasks, we devise novel sample-efficient algorithms to query a minimal number of sequential samples to improve future predictions. We propose a Bilevel Optimization-Based framework for computerized adaptive testing to learn a data-driven question selection algorithm that improves existing data selection policies. We also tackle the sketching problem in the recommender system, with the task of recommending the next item using a stored subset of prior data samples. In this setting, we develop a data-driven sequential selection algorithm that tackles evolving downstream task distribution. In the last two tasks, we devise novel neural models to introduce prior knowledge exploiting limited data samples. For knowledge tracing, we propose a novel neural architecture, inspired by cognitive and psychometric models, to improve the prediction of students\u27 future performance and utilize the labeled data samples efficiently. For career path modeling, we propose a novel and interpretable monotonic nonlinear state-space model to analyze online user professional profiles and provide actionable feedback and recommendations to users on how they can reach their career goals.
The data-driven differentiable data selection algorithms for the first two tasks open up future directions to query (a non-differentiable operation) a minimal number of samples optimally to maximize prediction performance. The structures, introduced in the neural architecture for the models in the last two tasks using prior knowledge, open up future directions to learn deep models augmented with prior knowledge using limited data samples
Cost-Based Optimization of Integration Flows
Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation
Flu Gone Viral: Syndromic Surveillance of Flu on Twitter Using Temporal Topic Models
Abstract—Surveillance of epidemic outbreaks and spread from social media is an important tool for governments and public health authorities. Machine learning techniques for nowcasting the flu have made significant inroads into correlating social media trends to case counts and prevalence of epidemics in a population. There is a disconnect between data-driven methods for forecasting flu incidence and epidemiological models that adopt a state based understanding of transitions, that can lead to sub-optimal predictions. Furthermore, models for epidemiological activity and social activity like on Twitter predict different shapes and have important differences. We propose a temporal topic model to capture hidden states of a user from his tweets and aggregate states in a geographical region for better estimation of trends. We show that our approach helps fill the gap between phenomenolog-ical methods for disease surveillance and epidemiological models. We validate this approach by modeling the flu using Twitter in multiple countries of South America. We demonstrate that our model can consistently outperform plain vocabulary assessment in flu case-count predictions, and at the same time get better flu-peak predictions than competitors. We also show that our fine-grained modeling can reconcile some contrasting behaviors between epidemiological and social models. I
State Management for Efficient Event Pattern Detection
Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst.
Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt.
In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events.
State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified.
In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache.
I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency
- …