528 research outputs found

    Scalable and Declarative Information Extraction in a Parallel Data Analytics System

    Get PDF
    Informationsextraktions (IE) auf sehr großen Datenmengen erfordert hochkomplexe, skalierbare und anpassungsfĂ€hige Systeme. Obwohl zahlreiche IE-Algorithmen existieren, ist die nahtlose und erweiterbare Kombination dieser Werkzeuge in einem skalierbaren System immer noch eine große Herausforderung. In dieser Arbeit wird ein anfragebasiertes IE-System fĂŒr eine parallelen Datenanalyseplattform vorgestellt, das fĂŒr konkrete AnwendungsdomĂ€nen konfigurierbar ist und fĂŒr Textsammlungen im Terabyte-Bereich skaliert. ZunĂ€chst werden konfigurierbare Operatoren fĂŒr grundlegende IE- und Web-Analytics-Aufgaben definiert, mit denen komplexe IE-Aufgaben in Form von deklarativen Anfragen ausgedrĂŒckt werden können. Alle Operatoren werden hinsichtlich ihrer Eigenschaften charakterisiert um das Potenzial und die Bedeutung der Optimierung nicht-relationaler, benutzerdefinierter Operatoren (UDFs) fĂŒr Data Flows hervorzuheben. Anschließend wird der Stand der Technik in der Optimierung nicht-relationaler Data Flows untersucht und herausgearbeitet, dass eine umfassende Optimierung von UDFs immer noch eine Herausforderung ist. Darauf aufbauend wird ein erweiterbarer, logischer Optimierer (SOFA) vorgestellt, der die Semantik von UDFs mit in die Optimierung mit einbezieht. SOFA analysiert eine kompakte Menge von Operator-Eigenschaften und kombiniert eine automatisierte Analyse mit manuellen UDF-Annotationen, um die umfassende Optimierung von Data Flows zu ermöglichen. SOFA ist in der Lage, beliebige Data Flows aus unterschiedlichen Anwendungsbereichen logisch zu optimieren, was zu erheblichen Laufzeitverbesserungen im Vergleich mit anderen Techniken fĂŒhrt. Als Viertes wird die Anwendbarkeit des vorgestellten Systems auf Korpora im Terabyte-Bereich untersucht und systematisch die Skalierbarkeit und Robustheit der eingesetzten Methoden und Werkzeuge beurteilt um schließlich die kritischsten Herausforderungen beim Aufbau eines IE-Systems fĂŒr sehr große Datenmenge zu charakterisieren.Information extraction (IE) on very large data sets requires highly complex, scalable, and adaptive systems. Although numerous IE algorithms exist, their seamless and extensible combination in a scalable system still is a major challenge. This work presents a query-based IE system for a parallel data analysis platform, which is configurable for specific application domains and scales for terabyte-sized text collections. First, configurable operators are defined for basic IE and Web Analytics tasks, which can be used to express complex IE tasks in the form of declarative queries. All operators are characterized in terms of their properties to highlight the potential and importance of optimizing non-relational, user-defined operators (UDFs) for dataflows. Subsequently, we survey the state of the art in optimizing non-relational dataflows and highlight that a comprehensive optimization of UDFs is still a challenge. Based on this observation, an extensible, logical optimizer (SOFA) is introduced, which incorporates the semantics of UDFs into the optimization process. SOFA analyzes a compact set of operator properties and combines automated analysis with manual UDF annotations to enable a comprehensive optimization of data flows. SOFA is able to logically optimize arbitrary data flows from different application areas, resulting in significant runtime improvements compared to other techniques. Finally, the applicability of the presented system to terabyte-sized corpora is investigated. Hereby, we systematically evaluate scalability and robustness of the employed methods and tools in order to pinpoint the most critical challenges in building an IE system for very large data sets

    Processamento de eventos complexos como serviço em ambientes multi-nuvem

    Get PDF
    Orientadores: Luiz Fernando Bittencourt, Miriam Akemi Manabe CapretzTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O surgimento das tecnologias de dispositivos mĂłveis e da Internet das Coisas, combinada com avanços das tecnologias Web, criou um novo mundo de Big Data em que o volume e a velocidade da geração de dados atingiu uma escala sem precedentes. Por ser uma tecnologia criada para processar fluxos contĂ­nuos de dados, o Processamento de Eventos Complexos (CEP, do inglĂȘs Complex Event Processing) tem sido frequentemente associado a Big Data e aplicado como uma ferramenta para obter informaçÔes em tempo real. Todavia, apesar desta onda de interesse, o mercado de CEP ainda Ă© dominado por soluçÔes proprietĂĄrias que requerem grandes investimentos para sua aquisição e nĂŁo proveem a flexibilidade que os usuĂĄrios necessitam. Como alternativa, algumas empresas adotam soluçÔes de baixo nĂ­vel que demandam intenso treinamento tĂ©cnico e possuem alto custo operacional. A fim de solucionar esses problemas, esta pesquisa propĂ”e a criação de um sistema de CEP que pode ser oferecido como serviço e usado atravĂ©s da Internet. Um sistema de CEP como Serviço (CEPaaS, do inglĂȘs CEP as a Service) oferece aos usuĂĄrios as funcionalidades de CEP aliadas Ă s vantagens do modelo de serviços, tais como redução do investimento inicial e baixo custo de manutenção. No entanto, a criação de tal serviço envolve inĂșmeros desafios que nĂŁo sĂŁo abordados no atual estado da arte de CEP. Em especial, esta pesquisa propĂ”e soluçÔes para trĂȘs problemas em aberto que existem neste contexto. Em primeiro lugar, para o problema de entender e reusar a enorme variedade de procedimentos para gerĂȘncia de sistemas CEP, esta pesquisa propĂ”e o formalismo Reescrita de Grafos com Atributos para GerĂȘncia de Processamento de Eventos Complexos (AGeCEP, do inglĂȘs Attributed Graph Rewriting for Complex Event Processing Management). Este formalismo inclui modelos para consultas CEP e transformaçÔes de consultas que sĂŁo independentes de tecnologia e linguagem. Em segundo lugar, para o problema de avaliar estratĂ©gias de gerĂȘncia e processamento de consultas CEP, esta pesquisa apresenta CEPSim, um simulador de sistemas CEP baseado em nuvem. Por fim, esta pesquisa tambĂ©m descreve um sistema CEPaaS fundamentado em ambientes multi-nuvem, sistemas de gerĂȘncia de contĂȘineres e um design multiusuĂĄrio baseado em AGeCEP. Para demonstrar sua viabilidade, o formalismo AGeCEP foi usado para projetar um gerente autĂŽnomo e um conjunto de polĂ­ticas de auto-gerenciamento para sistemas CEP. AlĂ©m disso, o simulador CEPSim foi minuciosamente avaliado atravĂ©s de experimentos que demonstram sua capacidade de simular sistemas CEP com acurĂĄcia e baixo custo adicional de processamento. Por fim, experimentos adicionais validaram o sistema CEPaaS e demonstraram que o objetivo de oferecer funcionalidades CEP como um serviço escalĂĄvel e tolerante a falhas foi atingido. Em conjunto, esses resultados confirmam que esta pesquisa avança significantemente o estado da arte e tambĂ©m oferece novas ferramentas e metodologias que podem ser aplicadas Ă  pesquisa em CEPAbstract: The rise of mobile technologies and the Internet of Things, combined with advances in Web technologies, have created a new Big Data world in which the volume and velocity of data generation have achieved an unprecedented scale. As a technology created to process continuous streams of data, Complex Event Processing (CEP) has been often related to Big Data and used as a tool to obtain real-time insights. However, despite this recent surge of interest, the CEP market is still dominated by solutions that are costly and inflexible or too low-level and hard to operate. To address these problems, this research proposes the creation of a CEP system that can be offered as a service and used over the Internet. Such a CEP as a Service (CEPaaS) system would give its users CEP functionalities associated with the advantages of the services model, such as no up-front investment and low maintenance cost. Nevertheless, creating such a service involves challenges that are not addressed by current CEP systems. This research proposes solutions for three open problems that exist in this context. First, to address the problem of understanding and reusing existing CEP management procedures, this research introduces the Attributed Graph Rewriting for Complex Event Processing Management (AGeCEP) formalism as a technology- and language-agnostic representation of queries and their reconfigurations. Second, to address the problem of evaluating CEP query management and processing strategies, this research introduces CEPSim, a simulator of cloud-based CEP systems. Finally, this research also introduces a CEPaaS system based on a multi-cloud architecture, container management systems, and an AGeCEP-based multi-tenant design. To demonstrate its feasibility, AGeCEP was used to design an autonomic manager and a selected set of self-management policies. Moreover, CEPSim was thoroughly evaluated by experiments that showed it can simulate existing systems with accuracy and low execution overhead. Finally, additional experiments validated the CEPaaS system and demonstrated it achieves the goal of offering CEP functionalities as a scalable and fault-tolerant service. In tandem, these results confirm this research significantly advances the CEP state of the art and provides novel tools and methodologies that can be applied to CEP researchDoutoradoCiĂȘncia da ComputaçãoDoutor em CiĂȘncia da Computação140920/2012-9CNP

    Complex Event Processing as a Service in Multi-Cloud Environments

    Get PDF
    The rise of mobile technologies and the Internet of Things, combined with advances in Web technologies, have created a new Big Data world in which the volume and velocity of data generation have achieved an unprecedented scale. As a technology created to process continuous streams of data, Complex Event Processing (CEP) has been often related to Big Data and used as a tool to obtain real-time insights. However, despite this recent surge of interest, the CEP market is still dominated by solutions that are costly and inflexible or too low-level and hard to operate. To address these problems, this research proposes the creation of a CEP system that can be offered as a service and used over the Internet. Such a CEP as a Service (CEPaaS) system would give its users CEP functionalities associated with the advantages of the services model, such as no up-front investment and low maintenance cost. Nevertheless, creating such a service involves challenges that are not addressed by current CEP systems. This research proposes solutions for three open problems that exist in this context. First, to address the problem of understanding and reusing existing CEP management procedures, this research introduces the Attributed Graph Rewriting for Complex Event Processing Management (AGeCEP) formalism as a technology- and language-agnostic representation of queries and their reconfigurations. Second, to address the problem of evaluating CEP query management and processing strategies, this research introduces CEPSim, a simulator of cloud-based CEP systems. Finally, this research also introduces a CEPaaS system based on a multi-cloud architecture, container management systems, and an AGeCEP-based multi-tenant design. To demonstrate its feasibility, AGeCEP was used to design an autonomic manager and a selected set of self-management policies. Moreover, CEPSim was thoroughly evaluated by experiments that showed it can simulate existing systems with accuracy and low execution overhead. Finally, additional experiments validated the CEPaaS system and demonstrated it achieves the goal of offering CEP functionalities as a scalable and fault-tolerant service. In tandem, these results confirm this research significantly advances the CEP state of the art and provides novel tools and methodologies that can be applied to CEP research

    A Model Driven Approach to Model Transformations

    Get PDF
    The OMG's Model Driven Architecture (MDA) initiative has been the focus of much attention in both academia and industry, due to its promise of more rapid and consistent software development through the increased use of models. In order for MDA to reach its full potential, the ability to manipulate and transform models { most obviously from the Platform Independent Model (PIM) to the Platform Specific Models (PSM) { is vital. Recognizing this need, the OMG issued a Request For Proposals (RFP) largely concerned with finding a suitable mechanism for trans- forming models. This paper outlines the relevant background material, summarizes the approach taken by the QVT-Partners (to whom the authors belong), presents a non-trivial example using the QVT-Partners approach, and finally sketches out what the future holds for model transformations

    Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

    Advancing Operating Systems via Aspect-Oriented Programming

    Get PDF
    Operating system kernels are among the most complex pieces of software in existence to- day. Maintaining the kernel code and developing new functionality is increasingly compli- cated, since the amount of required features has risen significantly, leading to side ef fects that can be introduced inadvertedly by changing a piece of code that belongs to a completely dif ferent context. Software developers try to modularize their code base into separate functional units. Some of the functionality or “concerns” required in a kernel, however, does not fit into the given modularization structure; this code may then be spread over the code base and its implementation tangled with code implementing dif ferent concerns. These so-called “crosscutting concerns” are especially dif ficult to handle since a change in a crosscutting concern implies that all relevant locations spread throughout the code base have to be modified. Aspect-Oriented Software Development (AOSD) is an approach to handle crosscutting concerns by factoring them out into separate modules. The “advice” code contained in these modules is woven into the original code base according to a pointcut description, a set of interaction points (joinpoints) with the code base. To be used in operating systems, AOSD requires tool support for the prevalent procedu- ral programming style as well as support for weaving aspects. Many interactions in kernel code are dynamic, so in order to implement non-static behavior and improve performance, a dynamic weaver that deploys and undeploys aspects at system runtime is required. This thesis presents an extension of the “C” programming language to support AOSD. Based on this, two dynamic weaving toolkits – TOSKANA and TOSKANA-VM – are presented to permit dynamic aspect weaving in the monolithic NetBSD kernel as well as in a virtual- machine and microkernel-based Linux kernel running on top of L4. Based on TOSKANA, applications for this dynamic aspect technology are discussed and evaluated. The thesis closes with a view on an aspect-oriented kernel structure that maintains coherency and handles crosscutting concerns using dynamic aspects while enhancing de- velopment methods through the use of domain-specific programming languages

    A Pure Embedding of Roles: Exploring 4-dimensional Dispatch for Roles in Structured Contexts

    Get PDF
    Present-day software systems have to fulfill an increasing number of requirements, which makes them more and more complex. Many systems need to anticipate changing contexts or need to adapt to changing business rules or requirements. The challenge of 21th-century software development will be to cope with these aspects. We believe that the role concept offers a simple way to adapt an object-oriented program to its changing context. In a role-based application, an object plays multiple roles during its lifetime. If the contexts are represented as first-class entities, they provide dynamic views to the object-oriented program, and if a context changes, the dynamic views can be switched easily, and the software system adapts automatically. However, the concepts of roles and dynamic contexts have been discussed for a long time in many areas of computer science. So far, their employment in an existing object-oriented language requires a specific runtime environment. Also, classical object-oriented languages and their runtime systems are not able to cope with essential role-specific features, such as true delegation or dynamic binding of roles. In addition to that, contexts and views seem to be important in software development. The traditional code-oriented approach to software engineering becomes less and less satisfactory. The support for multiple views of a software system scales much better to the needs of todays systems. However, it relies on programming languages to provide roles for the construction of views. As a solution, this thesis presents an implementation pattern for role-playing objects that does not require a specific runtime system, the SCala ROles Language (SCROLL). Via this library approach, roles are embedded in a statically typed base language as dynamically evolving objects. The approach is pure in the sense that there is no need for an additional compiler or tooling. The implementation pattern is demonstrated on the basis of the Scala language. As technical support from Scala, the pattern requires dynamic mixins, compiler-translated function calls, and implicit conversions. The details how roles are implemented are hidden in a Scala library and therefore transparent to SCROLL programmers. The SCROLL library supports roles embedded in structured contexts. Additionally, a four-dimensional, context-aware dispatch at runtime is presented. It overcomes the subtle ambiguities introduced with the rich semantics of role-playing objects. SCROLL is written in Scala, which blends a modern object-oriented with a functional programming language. The size of the library is below 1400 lines of code so that it can be considered to have minimalistic design and to be easy to maintain. Our approach solves several practical problems arising in the area of dynamical extensibility and adaptation

    Planning and Optimization During the Life-Cycle of Service Level Agreements for Cloud Computing

    Get PDF
    Ein Service Level Agreement (SLA) ist ein elektronischer Vertrag zwischen dem Kunden und dem Anbieter eines Services. Die beteiligten Partner kl aren ihre Erwartungen und Verp ichtungen in Bezug auf den Dienst und dessen Qualit at. SLAs werden bereits f ur die Beschreibung von Cloud-Computing-Diensten eingesetzt. Der Diensteanbieter stellt sicher, dass die Dienstqualit at erf ullt wird und mit den Anforderungen des Kunden bis zum Ende der vereinbarten Laufzeit ubereinstimmt. Die Durchf uhrung der SLAs erfordert einen erheblichen Aufwand, um Autonomie, Wirtschaftlichkeit und E zienz zu erreichen. Der gegenw artige Stand der Technik im SLA-Management begegnet Herausforderungen wie SLA-Darstellung f ur Cloud- Dienste, gesch aftsbezogene SLA-Optimierungen, Dienste-Outsourcing und Ressourcenmanagement. Diese Gebiete scha en zentrale und aktuelle Forschungsthemen. Das Management von SLAs in unterschiedlichen Phasen w ahrend ihrer Laufzeit erfordert eine daf ur entwickelte Methodik. Dadurch wird die Realisierung von Cloud SLAManagement vereinfacht. Ich pr asentiere ein breit gef achertes Modell im SLA-Laufzeitmanagement, das die genannten Herausforderungen adressiert. Diese Herangehensweise erm oglicht eine automatische Dienstemodellierung, sowie Aushandlung, Bereitstellung und Monitoring von SLAs. W ahrend der Erstellungsphase skizziere ich, wie die Modellierungsstrukturen verbessert und vereinfacht werden k onnen. Ein weiteres Ziel von meinem Ansatz ist die Minimierung von Implementierungs- und Outsourcingkosten zugunsten von Wettbewerbsf ahigkeit. In der SLA-Monitoringphase entwickle ich Strategien f ur die Auswahl und Zuweisung von virtuellen Cloud Ressourcen in Migrationsphasen. Anschlie end pr ufe ich mittels Monitoring eine gr o ere Zusammenstellung von SLAs, ob die vereinbarten Fehlertoleranzen eingehalten werden. Die vorliegende Arbeit leistet einen Beitrag zu einem Entwurf der GWDG und deren wissenschaftlichen Communities. Die Forschung, die zu dieser Doktorarbeit gef uhrt hat, wurde als Teil von dem SLA@SOI EU/FP7 integriertem Projekt durchgef uhrt (contract No. 216556)

    Towards Dynamic Composition of Question Answering Pipelines

    Get PDF
    Question answering (QA) over knowledge graphs has gained significant momentum over the past five years due to the increasing availability of large knowledge graphs and the rising importance of question answering for user interaction. DBpedia has been the most prominently used knowledge graph in this setting. QA systems implement a pipeline connecting a sequence of QA components for translating an input question into its corresponding formal query (e.g. SPARQL); this query will be executed over a knowledge graph in order to produce the answer of the question. Recent empirical studies have revealed that albeit overall effective, the performance of QA systems and QA components depends heavily on the features of input questions, and not even the combination of the best performing QA systems or individual QA components retrieves complete and correct answers. Furthermore, these QA systems cannot be easily reused, extended, and results cannot be easily reproduced since the systems are mostly implemented in a monolithic fashion, lack standardised interfaces and are often not open source or available as Web services. All these drawbacks of the state of the art that prevents many of these approaches to be employed in real-world applications. In this thesis, we tackle the problem of QA over knowledge graph and propose a generic approach to promote reusability and build question answering systems in a collaborative effort. Firstly, we define qa vocabulary and Qanary methodology to develop an abstraction level on existing QA systems and components. Qanary relies on qa vocabulary to establish guidelines for semantically describing the knowledge exchange between the components of a QA system. We implement a component-based modular framework called "Qanary Ecosystem" utilising the Qanary methodology to integrate several heterogeneous QA components in a single platform. We further present Qaestro framework that provides an approach to semantically describing question answering components and effectively enumerates QA pipelines based on a QA developer requirements. Qaestro provides all valid combinations of available QA components respecting the input-output requirement of each component to build QA pipelines. Finally, we address the scalability of QA components within a framework and propose a novel approach that chooses the best component per task to automatically build QA pipeline for each input question. We implement this model within FRANKENSTEIN, a framework able to select QA components and compose pipelines. FRANKENSTEIN extends Qanary ecosystem and utilises qa vocabulary for data exchange. It has 29 independent QA components implementing five QA tasks resulting 360 unique QA pipelines. Each approach proposed in this thesis (Qanary methodology, Qaestro, and FRANKENSTEIN) is supported by extensive evaluation to demonstrate their effectiveness. Our contributions target a broader research agenda of offering the QA community an efficient way of applying their research to a research field which is driven by many different fields, consequently requiring a collaborative approach to achieve significant progress in the domain of question answering
    • 

    corecore