3,276 research outputs found
Towards Message Brokers for Generative AI: Survey, Challenges, and Opportunities
In today's digital world, Generative Artificial Intelligence (GenAI) such as
Large Language Models (LLMs) is becoming increasingly prevalent, extending its
reach across diverse applications. This surge in adoption has sparked a
significant increase in demand for data-centric GenAI models, highlighting the
necessity for robust data communication infrastructures. Central to this need
are message brokers, which serve as essential channels for data transfer within
various system components. This survey aims to delve into a comprehensive
analysis of traditional and modern message brokers, offering a comparative
study of prevalent platforms. Our study considers numerous criteria including,
but not limited to, open-source availability, integrated monitoring tools,
message prioritization mechanisms, capabilities for parallel processing,
reliability, distribution and clustering functionalities, authentication
processes, data persistence strategies, fault tolerance, and scalability.
Furthermore, we explore the intrinsic constraints that the design and operation
of each message broker might impose, recognizing that these limitations are
crucial in understanding their real-world applicability. Finally, this study
examines the enhancement of message broker mechanisms specifically for GenAI
contexts, emphasizing the criticality of developing a versatile message broker
framework. Such a framework would be poised for quick adaptation, catering to
the dynamic and growing demands of GenAI in the foreseeable future. Through
this dual-pronged approach, we intend to contribute a foundational compendium
that can guide future innovations and infrastructural advancements in the realm
of GenAI data communication.Comment: 20 pages, 181 references, 7 figures, 5 table
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Plant-Wide Diagnosis: Cause-and-Effect Analysis Using Process Connectivity and Directionality Information
Production plants used in modern process industry must produce products that meet stringent
environmental, quality and profitability constraints. In such integrated plants, non-linearity and
strong process dynamic interactions among process units complicate root-cause diagnosis of
plant-wide disturbances because disturbances may propagate to units at some distance away
from the primary source of the upset. Similarly, implemented advanced process control
strategies, backup and recovery systems, use of recycle streams and heat integration may
hamper detection and diagnostic efforts.
It is important to track down the root-cause of a plant-wide disturbance because once
corrective action is taken at the source, secondary propagated effects can be quickly eliminated
with minimum effort and reduced down time with the resultant positive impact on process
efficiency, productivity and profitability.
In order to diagnose the root-cause of disturbances that manifest plant-wide, it is crucial to
incorporate and utilize knowledge about the overall process topology or interrelated physical
structure of the plant, such as is contained in Piping and Instrumentation Diagrams (P&IDs).
Traditionally, process control engineers have intuitively referred to the physical structure of
the plant by visual inspection and manual tracing of fault propagation paths within the process
structures, such as the process drawings on printed P&IDs, in order to make logical
conclusions based on the results from data-driven analysis. This manual approach, however, is
prone to various sources of errors and can quickly become complicated in real processes.
The aim of this thesis, therefore, is to establish innovative techniques for the electronic
capture and manipulation of process schematic information from large plants such as
refineries in order to provide an automated means of diagnosing plant-wide performance
problems. This report also describes the design and implementation of a computer application
program that integrates: (i) process connectivity and directionality information from intelligent
P&IDs (ii) results from data-driven cause-and-effect analysis of process measurements and (iii)
process know-how to aid process control engineers and plant operators gain process insight.
This work explored process intelligent P&IDs, created with AVEVA® P&ID, a Computer
Aided Design (CAD) tool, and exported as an ISO 15926 compliant platform and vendor
independent text-based XML description of the plant. The XML output was processed by a
software tool developed in Microsoft® .NET environment in this research project to
computationally generate connectivity matrix that shows plant items and their connections.
The connectivity matrix produced can be exported to Excel® spreadsheet application as a basis
for other application and has served as precursor to other research work. The final version of
the developed software tool links statistical results of cause-and-effect analysis of process data
with the connectivity matrix to simplify and gain insights into the cause and effect analysis
using the connectivity information. Process knowhow and understanding is incorporated to
generate logical conclusions.
The thesis presents a case study in an atmospheric crude heating unit as an illustrative example
to drive home key concepts and also describes an industrial case study involving refinery
operations. In the industrial case study, in addition to confirming the root-cause candidate, the
developed software tool was set the task to determine the physical sequence of fault
propagation path within the plant.
This was then compared with the hypothesis about disturbance propagation sequence
generated by pure data-driven method. The results show a high degree of overlap which helps
to validate statistical data-driven technique and easily identify any spurious results from the
data-driven multivariable analysis. This significantly increase control engineers confidence in
data-driven method being used for root-cause diagnosis.
The thesis concludes with a discussion of the approach and presents ideas for further
development of the methods
- …