10 research outputs found

    A Survey on the Evolution of Stream Processing Systems

    Full text link
    Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between early ('00-'10) and modern ('11-'18) streaming systems, and discuss recent trends and open problems.Comment: 34 pages, 15 figures, 5 table

    State Management for Efficient Event Pattern Detection

    Get PDF
    Event Stream Processing (ESP) Systeme überwachen kontinuierliche Datenströme, um benutzerdefinierte Queries auszuwerten. Die Herausforderung besteht darin, dass die Queryverarbeitung zustandsbehaftet ist und die Anzahl von Teilübereinstimmungen mit der Größe der verarbeiteten Events exponentiell anwächst. Die Dynamik von Streams und die Notwendigkeit, entfernte Daten zu integrieren, erschweren die Zustandsverwaltung. Erstens liefern heterogene Eventquellen Streams mit unvorhersehbaren Eingaberaten und Queryselektivitäten. Während Spitzenzeiten ist eine erschöpfende Verarbeitung unmöglich, und die Systeme müssen auf eine Best-Effort-Verarbeitung zurückgreifen. Zweitens erfordern Queries möglicherweise externe Daten, um ein bestimmtes Event für eine Query auszuwählen. Solche Abhängigkeiten sind problematisch: Das Abrufen der Daten unterbricht die Stream-Verarbeitung. Ohne eine Eventauswahl auf Grundlage externer Daten wird das Wachstum von Teilübereinstimmungen verstärkt. In dieser Dissertation stelle ich Strategien für optimiertes Zustandsmanagement von ESP Systemen vor. Zuerst ermögliche ich eine Best-Effort-Verarbeitung mittels Load Shedding. Dabei werden sowohl Eingabeeevents als auch Teilübereinstimmungen systematisch verworfen, um eine Latenzschwelle mit minimalem Qualitätsverlust zu garantieren. Zweitens integriere ich externe Daten, indem ich das Abrufen dieser von der Verwendung in der Queryverarbeitung entkoppele. Mit einem effizienten Caching-Mechanismus vermeide ich Unterbrechungen durch Übertragungslatenzen. Dazu werden externe Daten basierend auf ihrer erwarteten Verwendung vorab abgerufen und mittels Lazy Evaluation bei der Eventauswahl berücksichtigt. Dabei wird ein Kostenmodell verwendet, um zu bestimmen, wann welche externen Daten abgerufen und wie lange sie im Cache aufbewahrt werden sollen. Ich habe die Effektivität und Effizienz der vorgeschlagenen Strategien anhand von synthetischen und realen Daten ausgewertet und unter Beweis gestellt.Event stream processing systems continuously evaluate queries over event streams to detect user-specified patterns with low latency. However, the challenge is that query processing is stateful and it maintains partial matches that grow exponentially in the size of processed events. State management is complicated by the dynamicity of streams and the need to integrate remote data. First, heterogeneous event sources yield dynamic streams with unpredictable input rates, data distributions, and query selectivities. During peak times, exhaustive processing is unreasonable, and systems shall resort to best-effort processing. Second, queries may require remote data to select a specific event for a pattern. Such dependencies are problematic: Fetching the remote data interrupts the stream processing. Yet, without event selection based on remote data, the growth of partial matches is amplified. In this dissertation, I present strategies for optimised state management in event pattern detection. First, I enable best-effort processing with load shedding that discards both input events and partial matches. I carefully select the shedding elements to satisfy a latency bound while striving for a minimal loss in result quality. Second, to efficiently integrate remote data, I decouple the fetching of remote data from its use in query evaluation by a caching mechanism. To this end, I hide the transmission latency by prefetching remote data based on anticipated use and by lazy evaluation that postpones the event selection based on remote data to avoid interruptions. A cost model is used to determine when to fetch which remote data items and how long to keep them in the cache. I evaluated the above techniques with queries over synthetic and real-world data. I show that the load shedding technique significantly improves the recall of pattern detection over baseline approaches, while the technique for remote data integration significantly reduces the pattern detection latency

    QoS-aware Resource-utilisation Self-adaptive (QRS) Framework for Distributed Data Stream Management Systems

    Get PDF
    The last decade witnessed a vast number of Big Data applications in the science and industry fields alike. Such applications generate large amounts of streaming data and real-time event-based information. Such data needs to be analysed under the specific quality of service constraints, which must be done within extremely low latencies. Many distributed data stream processing approaches are based on the best-effort QoS principle that lack the capability of dynamic adaptation to the fluctuations in data input rates. Most of the proposed solutions tend to either drop some of the input data (load shedding) or degrade the level of QoS provided by the system. Another approach is to limit the data ingestion input rate using techniques like backpressure heartbeats, which can affect the worker nodes that causes an output delay. Such approaches are not suitable to handle certain types of mission-critical applications such as critical infrastructure surveillance, monitoring and signalling, vital health care monitoring, and military command and control streaming applications. This research presents a novel QoS-aware, Resource-utilisation Self-adaptive (QRS) Framework for managing data stream processing systems. The framework proposes a comprehensive usage model that encompasses proactive operations followed by simultaneous prompt actions. The simultaneous prompt actions instantly collect and analyse the performance and QoS metrics along with running data streams, ensuring that data does not lose its current values, whereas the proactive operations construct the prediction model that anticipate QoS violations and performance degradation in the system. The model triggers essential decision process for dynamic tuning of resources or adapting a new scheduling strategy. A proof of concept model was built that accurately represents the working conditions of the distributed data stream management ecosystem. The proposed framework is validated and verified. The framework’s several components were fully implemented over the emerging and prevalent distributed data streaming processing system, Apache Storm. The framework performs accurate prediction up to 81% about the system’s capacity to handle data load and input rate. The accuracy reaches up to 100% by incorporating abnormal detection techniques. Moreover, the framework performs well compared with the default round-robin and resource-aware schedulers within Storm. It provides a better ability to handle high data rates by re-balancing the topology and re-scheduling resources based on the prediction models well ahead of any congestion or QoS degradation

    Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World

    Get PDF
    This report documents the program and the outcomes of GI-Dagstuhl Seminar 16394 "Software Performance Engineering in the DevOps World". The seminar addressed the problem of performance-aware DevOps. Both, DevOps and performance engineering have been growing trends over the past one to two years, in no small part due to the rise in importance of identifying performance anomalies in the operations (Ops) of cloud and big data systems and feeding these back to the development (Dev). However, so far, the research community has treated software engineering, performance engineering, and cloud computing mostly as individual research areas. We aimed to identify cross-community collaboration, and to set the path for long-lasting collaborations towards performance-aware DevOps. The main goal of the seminar was to bring together young researchers (PhD students in a later stage of their PhD, as well as PostDocs or Junior Professors) in the areas of (i) software engineering, (ii) performance engineering, and (iii) cloud computing and big data to present their current research projects, to exchange experience and expertise, to discuss research challenges, and to develop ideas for future collaborations

    Task Scheduling in Data Stream Processing Systems

    Get PDF
    In the era of big data, with streaming applications such as social media, surveillance monitoring and real-time search generating large volumes of data, efficient Data Stream Processing Systems (DSPSs) have become essential. When designing an efficient DSPS, a number of challenges need to be considered including task allocation, scalability, fault tolerance, QoS, parallelism degree, and state management, among others. In our research, we focus on task allocation as it has a significant impact on performance metrics such as data processing latency and system throughput. An application processed by DSPSs is represented as a Directed Acyclic Graph (DAG), where each vertex represents a task and the edges show the dataflow between the tasks. Task allocation can be defined as the assignment of the vertices in the DAG to the physical compute nodes such that the data movement between the nodes is minimised. Finding an optimal task placement for stream processing systems is NP-hard. Thus, approximate scheduling approaches are required to improve the performance of DSPSs. In this thesis, we present our three proposed schedulers, each having a different heuristic partitioning approach to minimise inter-node communication for either homogeneous or heterogeneous clusters. We demonstrate how each scheduler can efficiently assign groups of highly communicating tasks to compute nodes. Our schedulers are able to outperform two state-of-the-art schedulers for three micro-benchmarks and two real-world applications, increasing throughput and reducing data processing latency as a result of a better task placement

    Optimizing Resource Management in Cloud Analytics Services

    Get PDF
    The fundamental challenge in the cloud today is how to build and optimize machine learning and data analytical services. Machine learning and data analytical platforms are changing computing infrastructure from expensive private data centers to easily accessible online services. These services pack user requests as jobs and run them on thousands of machines in parallel in geo-distributed clusters. The scale and the complexity of emerging jobs lead to increasing challenges for the clusters at all levels, from power infrastructure to system architecture and corresponding software framework design. These challenges come in many forms. Today's clusters are built on commodity hardware and hardware failures are unavoidable. Resource competition, network congestion, and mixed generations of hardware make the hardware environment complex and hard to model and predict. Such heterogeneity becomes a crucial roadblock for efficient parallelization on both the task level and job level. Another challenge comes from the increasing complexity of the applications. For example, machine learning services run jobs made up of multiple tasks with complex dependency structures. This complexity leads to difficulties in framework designs. The scale, especially when services span geo-distributed clusters, leads to another important hurdle for cluster design. Challenges also come from the power infrastructure. Power infrastructure is very expensive and accounts for more than 20% of the total costs to build a cluster. Power sharing optimization to maximize the facility utilization and smooth peak hour usages is another roadblock for cluster design. In this thesis, we focus on solutions for these challenges at the task level, on the job level, with respect to the geo-distributed data cloud design and for power management in colocation data centers. At the task level, a crucial hurdle to achieving predictable performance is stragglers, i.e., tasks that take significantly longer than expected to run. At this point, speculative execution has been widely adopted to mitigate the impact of stragglers in simple workloads. We apply straggler mitigation for approximation jobs for the first time. We present GRASS, which carefully uses speculation to mitigate the impact of stragglers in approximation jobs. GRASS's design is based on the analysis of a model we develop to capture the optimal speculation levels for approximation jobs. Evaluations with production workloads from Facebook and Microsoft Bing in an EC2 cluster of 200 nodes show that GRASS increases accuracy of deadline-bound jobs by 47% and speeds up error-bound jobs by 38%. Moving from task level to job level, task level speculation mechanisms are designed and operated independently of job scheduling when, in fact, scheduling a speculative copy of a task has a direct impact on the resources available for other jobs. Thus, we present Hopper, a job-level speculation-aware scheduler that integrates the tradeoffs associated with speculation into job scheduling decisions based on a model generalized from the task-level speculation model. We implement both centralized and decentralized prototypes of the Hopper scheduler and show that 50% (66%) improvements over state-of-the-art centralized (decentralized) schedulers and speculation strategies can be achieved through the coordination of scheduling and speculation. As computing resources move from local clusters to geo-distributed cloud services, we are expecting the same transformation for data storage. We study two crucial pieces of a geo-distributed data cloud system: data acquisition and data placement. Starting from developing the optimal algorithm for the case of a data cloud made up of a single data center, we propose a near-optimal, polynomial-time algorithm for a geo-distributed data cloud in general. We show, via a case study, that the resulting design, Datum, is near-optimal (within 1.6%) in practical settings. Efficient power management is a fundamental challenge for data centers when providing reliable services. Power oversubscription in data centers is very common and may occasionally trigger an emergency when the aggregate power demand exceeds the capacity. We study power capping solutions for handling such emergencies in a colocation data center, where the operator supplies power to multiple tenants. We propose a novel market mechanism based on supply function bidding, called COOP, to financially incentivize and coordinate tenants' power reduction for minimizing total performance loss while satisfying multiple power capping constraints. We demonstrate that COOP is "win-win", increasing the operator's profit (through oversubscription) and reducing tenants' costs (through financial compensation for their power reduction during emergencies).</p

    Adaptive Asynchronous Control and Consistency in Distributed Data Exploration Systems

    Get PDF
    Advances in machine learning and streaming systems provide a backbone to transform vast arrays of raw data into valuable information. Leveraging distributed execution, analysis engines can process this information effectively within an iterative data exploration workflow to solve problems at unprecedented rates. However, with increased input dimensionality, a desire to simultaneously share and isolate information, as well as overlapping and dependent tasks, this process is becoming increasingly difficult to maintain. User interaction derails exploratory progress due to manual oversight on lower level tasks such as tuning parameters, adjusting filters, and monitoring queries. We identify human-in-the-loop management of data generation and distributed analysis as an inhibiting problem precluding efficient online, iterative data exploration which causes delays in knowledge discovery and decision making. The flexible and scalable systems implementing the exploration workflow require semi-autonomous methods integrated as architectural support to reduce human involvement. We, thus, argue that an abstraction layer providing adaptive asynchronous control and consistency management over a series of individual tasks coordinated to achieve a global objective can significantly improve data exploration effectiveness and efficiency. This thesis introduces methodologies which autonomously coordinate distributed execution at a lower level in order to synchronize multiple efforts as part of a common goal. We demonstrate the impact on data exploration through serverless simulation ensemble management and multi-model machine learning by showing improved performance and reduced resource utilization enabling a more productive semi-autonomous exploration workflow. We focus on the specific genres of molecular dynamics and personalized healthcare, however, the contributions are applicable to a wide variety of domains

    Safety and Reliability - Safe Societies in a Changing World

    Get PDF
    The contributions cover a wide range of methodologies and application areas for safety and reliability that contribute to safe societies in a changing world. These methodologies and applications include: - foundations of risk and reliability assessment and management - mathematical methods in reliability and safety - risk assessment - risk management - system reliability - uncertainty analysis - digitalization and big data - prognostics and system health management - occupational safety - accident and incident modeling - maintenance modeling and applications - simulation for safety and reliability analysis - dynamic risk and barrier management - organizational factors and safety culture - human factors and human reliability - resilience engineering - structural reliability - natural hazards - security - economic analysis in risk managemen

    The Music Sound

    Get PDF
    A guide for music: compositions, events, forms, genres, groups, history, industry, instruments, language, live music, musicians, songs, musicology, techniques, terminology , theory, music video. Music is a human activity which involves structured and audible sounds, which is used for artistic or aesthetic, entertainment, or ceremonial purposes. The traditional or classical European aspects of music often listed are those elements given primacy in European-influenced classical music: melody, harmony, rhythm, tone color/timbre, and form. A more comprehensive list is given by stating the aspects of sound: pitch, timbre, loudness, and duration. Common terms used to discuss particular pieces include melody, which is a succession of notes heard as some sort of unit; chord, which is a simultaneity of notes heard as some sort of unit; chord progression, which is a succession of chords (simultaneity succession); harmony, which is the relationship between two or more pitches; counterpoint, which is the simultaneity and organization of different melodies; and rhythm, which is the organization of the durational aspects of music

    Nuyoricans

    Get PDF
    Die Dissertation ist ein Beitrag zur Debatte um die Revision des amerikanistischen Lektürekanons. Ihre drei Schwerpunkte sind die Geschichte und Mythologie der Karibikinsel Puerto Rico, die soziale Lage und das Image der Puertoricaner in den USA sowie die auf Englisch erschienene Erzählliteratur von Autoren puertoricanischer Herkunft. (1) Die spanische Kolonie Puerto Rico kam 1898 in den Besitz der USA und erlebte in der zweiten Hälfte des 20. Jahrhunderts einen rasanten ökonomischen Aufstieg, der mit wachsender Abhängigkeit vom amerikanischen Wohlfahrtsstaat erkauft ist. In freien Referenden bejahte jeweils eine knappe Mehrheit den Zwitterstatus ihrer Insel, die bis heute weder ein Staat der USA noch ein souveränes Land ist. Da jedoch der Kongress in Washington über die Zukunft des Commonwealth of Puerto Rico zu bestimmen hat, bleibt die Insel eine Kolonie der USA. Puerto Ricos Mythologie ist von Stereotypen geprägt, die sich zu einem negativen Klischee vom Nationalcharakter des Landes verdichtet haben. Die amerikanische Dominanz in Politik, Wirtschaft und Kultur verstärkt die von vielen beklagte Schizophrenie Puerto Ricos. Die Chance einer Lösung des puertoricanischen Syndroms verspricht nur die nationale Unabhängigkeit. (2) Stereotype bestimmen auch das Bild von den übergesiedelten Puertoricanern, den Nuyoricans, in der Öffentlichkeit der USA. Das Negativimage der kaum assimilierten Gruppe wird von den Massenmedien verfestigt, obwohl seriöse Studien zeigen, dass frühere Immigranten ähnliche Probleme mit sich brachten. Die Mehrheit der US-Experten propagiert nach wie vor die allmähliche Assimilation der Übersiedler im Rahmen eines kulturellen Pluralismus. Bei den Puertoricanern geht der Trend seit dem Aufkommen des Multikulturalismus in Richtung einer hybriden, bikulturellen Identität, in ein Wort gefasst mit dem Begriff Nuyorican. Die Dissertation bietet eine Auswertung der auf Englisch erschienenen Literatur über die Puertoricaner in den USA unter 15 Aspekten: Kultur, Religion, Bildung, Sprache, Politik, Arbeit, welfare, Wohnverhältnisse, ethnicity, race, class, gender, Familie, Law and order und Migration. Am Beispiel von belletristischen Texten, Spielfilmen und Musicals wird gezeigt, dass die Puertoricaner in diesen Genres größtenteils wohlwollend dargestellt erscheinen. Das gilt auch für das Musical West Side Story, dem viele Kritiker zu Unrecht vorwerfen, die puertoricanischen Jugendlichen als Gangster zu stigmatisieren. (3) Die puertoricanische Prosa in englischer Sprache hat im Kanon der amerikanischen Literatur und in der Kritik bisher kaum eine Rolle gespielt. Vor allem die zahlreichen Neuerscheinungen der achtziger und neunziger Jahre werden hier erstmals auf historisch-soziologischer Basis analysiert. Allgemeine Trends der neueren Nuyorican-Literatur sind ihre Diversifizierung und Feminisierung. Größere Vielfalt gibt es heute bei den Schauplätzen, den Textsorten und den Themen. Das Thema gender steht nicht nur bei den Frauen, sondern auch bei männlichen Autoren oft im Mittelpunkt. Von den Autoren, die New York zum Schauplatz gewählt haben, ist Abraham Rodriguez, Jr. der bedeutendste. Rodriguez erzählt von Teenagern in der South Bronx, deren puertoricanische Ethnizität kein bestimmender Faktor mehr ist. So ist er der am weitesten amerikanisierte Autor der Nuyoricans. Die überzeugendste Interpretation des Migrationsprozesses bietet Esmeralda Santiago. Bei ihr steht die Kritik am traditionellen puertoricanischen Sexismus im Zentrum. Eine feministische Grundtendenz haben auch die in Puerto Rico angesiedelten, zum Teil magisch-realistischen Werke von Rosario Ferré. Die besten Werke von Rodriguez, Santiago, Ferré und weiteren Puertoricanern verdienen Anerkennung als wertvoller und zukunftweisender Beitrag zur amerikanischen Literatur.The thesis contributes to the debate about the revision of the American literary canon. Its first focus is on the history and mythology of Puerto Rico, the second on the social situation and image of the Puerto Ricans in the U.S., and the third on the prose literature by authors of Puerto Rican descent published in English. (1) The Spanish colony of Puerto Rico became a possession of the U.S. in 1898 and experienced a rapid economic rise in the second half of the 20th century, at the expence of growing dependence on the American welfare state. In free referendums the people of Puerto Rico have so far condoned the intermediate status of their island, which still is neither a state of the union nor an independent nation. But in fact the Commonwealth of Puerto Rico remains a colony of the U.S., as the real power to decide about its status lies with Congress. Puerto Rico's mythology has from the start been dominated by stereotypes, which have resulted in a negative cliché of its national character. The United States' political, economic and cultural hegemony has reinforced the alleged schizophrenic state of Puerto Rico. The only chance of healing this Puerto Rican syndrome is the island's national independence. (2) Stereotypes have also determined the image in the American public of the Puerto Ricans who have migrated to the mainland. The bad reputation of this hardly assimilated group is constantly being confirmed by the mass media, although serious studies prove that earlier immigrants had similar problems. Most U.S. experts still advocate the migrants' gradual assimilitation according to the ideal of cultural pluralism. Since the rise of multiculturalism, within the U.S. Puerto Rican community the trend has been going towards a hybrid, bicultural, Nuyorican identity. The dissertation assesses the literature about Puerto Ricans in the U.S. published in English from 15 key aspects: culture, religion, education, language, politics, work, welfare, housing, ethnicity, race, class, gender, family, law and order und migration. A critical look at books, movies and musicals by non-Puerto Ricans shows that in these genres migrants from Puerto Rico have by and large been portrayed benevolenty. This is true even for the musical West Side Story, which has often been wrongly blamed for stigmatizing Puerto Rican youngsters as gangsters. (3) Puerto Rican prose literature in English has so far played little role in the American canon and in criticism. This thesis offers the first analysis of the many new volumes from the eighties and nineties on a socio-historical basis. The overall trends of recent Nuyorican literature are its diversification und feminization. There is a greater diversity of settings, text types and themes. Gender is a central issue not only with the women, but also with a number of male authors. Of those writers who take New York as the setting, Abraham Rodriguez, Jr. is the most important. Rodriguez's books are about teenagers in the South Bronx whose Puerto Rican ethnicity is no crucial factor any more. Thus he is the most Americanised author among the Nuyoricans. The most convincing interpretation of the migratory process is Esmeralda Santiago's. Her main thrust is against the Puerto Rican tradition of sexism. The works of Rosario Ferré, some told in magic realism and all set in Puerto Rico, also have a feminist tendency. The best books by Rodriguez, Santiago, Ferré and a few more Puerto Ricans deserve to be recognised as a valuable and visionary contribution to American literature
    corecore