3,276 research outputs found

    Towards Message Brokers for Generative AI: Survey, Challenges, and Opportunities

    Full text link
    In today's digital world, Generative Artificial Intelligence (GenAI) such as Large Language Models (LLMs) is becoming increasingly prevalent, extending its reach across diverse applications. This surge in adoption has sparked a significant increase in demand for data-centric GenAI models, highlighting the necessity for robust data communication infrastructures. Central to this need are message brokers, which serve as essential channels for data transfer within various system components. This survey aims to delve into a comprehensive analysis of traditional and modern message brokers, offering a comparative study of prevalent platforms. Our study considers numerous criteria including, but not limited to, open-source availability, integrated monitoring tools, message prioritization mechanisms, capabilities for parallel processing, reliability, distribution and clustering functionalities, authentication processes, data persistence strategies, fault tolerance, and scalability. Furthermore, we explore the intrinsic constraints that the design and operation of each message broker might impose, recognizing that these limitations are crucial in understanding their real-world applicability. Finally, this study examines the enhancement of message broker mechanisms specifically for GenAI contexts, emphasizing the criticality of developing a versatile message broker framework. Such a framework would be poised for quick adaptation, catering to the dynamic and growing demands of GenAI in the foreseeable future. Through this dual-pronged approach, we intend to contribute a foundational compendium that can guide future innovations and infrastructural advancements in the realm of GenAI data communication.Comment: 20 pages, 181 references, 7 figures, 5 table

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    When Things Matter: A Data-Centric View of the Internet of Things

    Full text link
    With the recent advances in radio-frequency identification (RFID), low-cost wireless sensor devices, and Web technologies, the Internet of Things (IoT) approach has gained momentum in connecting everyday objects to the Internet and facilitating machine-to-human and machine-to-machine communication with the physical world. While IoT offers the capability to connect and integrate both digital and physical entities, enabling a whole new class of applications and services, several significant challenges need to be addressed before these applications and services can be fully realized. A fundamental challenge centers around managing IoT data, typically produced in dynamic and volatile environments, which is not only extremely large in scale and volume, but also noisy, and continuous. This article surveys the main techniques and state-of-the-art research efforts in IoT from data-centric perspectives, including data stream processing, data storage models, complex event processing, and searching in IoT. Open research issues for IoT data management are also discussed

    Plant-Wide Diagnosis: Cause-and-Effect Analysis Using Process Connectivity and Directionality Information

    Get PDF
    Production plants used in modern process industry must produce products that meet stringent environmental, quality and profitability constraints. In such integrated plants, non-linearity and strong process dynamic interactions among process units complicate root-cause diagnosis of plant-wide disturbances because disturbances may propagate to units at some distance away from the primary source of the upset. Similarly, implemented advanced process control strategies, backup and recovery systems, use of recycle streams and heat integration may hamper detection and diagnostic efforts. It is important to track down the root-cause of a plant-wide disturbance because once corrective action is taken at the source, secondary propagated effects can be quickly eliminated with minimum effort and reduced down time with the resultant positive impact on process efficiency, productivity and profitability. In order to diagnose the root-cause of disturbances that manifest plant-wide, it is crucial to incorporate and utilize knowledge about the overall process topology or interrelated physical structure of the plant, such as is contained in Piping and Instrumentation Diagrams (P&IDs). Traditionally, process control engineers have intuitively referred to the physical structure of the plant by visual inspection and manual tracing of fault propagation paths within the process structures, such as the process drawings on printed P&IDs, in order to make logical conclusions based on the results from data-driven analysis. This manual approach, however, is prone to various sources of errors and can quickly become complicated in real processes. The aim of this thesis, therefore, is to establish innovative techniques for the electronic capture and manipulation of process schematic information from large plants such as refineries in order to provide an automated means of diagnosing plant-wide performance problems. This report also describes the design and implementation of a computer application program that integrates: (i) process connectivity and directionality information from intelligent P&IDs (ii) results from data-driven cause-and-effect analysis of process measurements and (iii) process know-how to aid process control engineers and plant operators gain process insight. This work explored process intelligent P&IDs, created with AVEVA® P&ID, a Computer Aided Design (CAD) tool, and exported as an ISO 15926 compliant platform and vendor independent text-based XML description of the plant. The XML output was processed by a software tool developed in Microsoft® .NET environment in this research project to computationally generate connectivity matrix that shows plant items and their connections. The connectivity matrix produced can be exported to Excel® spreadsheet application as a basis for other application and has served as precursor to other research work. The final version of the developed software tool links statistical results of cause-and-effect analysis of process data with the connectivity matrix to simplify and gain insights into the cause and effect analysis using the connectivity information. Process knowhow and understanding is incorporated to generate logical conclusions. The thesis presents a case study in an atmospheric crude heating unit as an illustrative example to drive home key concepts and also describes an industrial case study involving refinery operations. In the industrial case study, in addition to confirming the root-cause candidate, the developed software tool was set the task to determine the physical sequence of fault propagation path within the plant. This was then compared with the hypothesis about disturbance propagation sequence generated by pure data-driven method. The results show a high degree of overlap which helps to validate statistical data-driven technique and easily identify any spurious results from the data-driven multivariable analysis. This significantly increase control engineers confidence in data-driven method being used for root-cause diagnosis. The thesis concludes with a discussion of the approach and presents ideas for further development of the methods
    corecore