3,840 research outputs found
Recommended from our members
Towards a methodology for the development of integrated IT infrastructures
In this paper, the authors propose and validate
a methodology for the development of integrated
Information Technology (IT) infrastructures. The
motivation for putting forward a new methodology is
grounded on the limitations of the various software
engineering methodologies (traditional) that exist
today. Despite that the traditional methodologies result
in the development of Information Systems (IS) from
scratch, Enterprise Application Integration (EAI)
builds integrated IT infrastructures using existing
applications. This significant difference is associated
with many issues needed to be realised and addressed
like: (a) the changes that such an infrastructure brings
to organisations, (b) the resistance to change and (c)
the extension of IS lifecycle’s. The proposed
methodology consist of eight stages and aims at
supporting software engineers, organisations and
researchers to build integrated IT infrastructures. As a
result the methodology seeks to contribute to the body
of knowledge
Towards a Reference Architecture with Modular Design for Large-scale Genotyping and Phenotyping Data Analysis: A Case Study with Image Data
With the rapid advancement of computing technologies, various scientific research communities have been extensively using cloud-based software tools or applications. Cloud-based applications allow users to access
software applications from web browsers while relieving them from the installation of any software applications in
their desktop environment. For example, Galaxy, GenAP, and iPlant Colaborative are popular cloud-based
systems for scientific workflow analysis in the domain of plant Genotyping and Phenotyping. These systems are being used for conducting research, devising new techniques, and sharing the computer assisted analysis results among collaborators. Researchers need to integrate their new workflows/pipelines, tools or techniques with the base system over time. Moreover, large scale data need to be processed within the time-line for more effective analysis. Recently, Big Data technologies are emerging for facilitating large scale data processing with commodity hardware. Among the above-mentioned systems, GenAp is utilizing the Big Data technologies for specific cases only. The structure of such a cloud-based system is highly variable and complex in nature. Software architects and developers need to consider totally different properties and challenges during the development and maintenance phases compared to the traditional business/service oriented systems. Recent studies report that software engineers and data engineers confront challenges to develop analytic tools for supporting large scale and heterogeneous data analysis. Unfortunately, less focus has been given by the software researchers to devise a well-defined methodology and frameworks for flexible design of a cloud system for the Genotyping and Phenotyping domain. To that end, more effective design methodologies and frameworks are an urgent need for cloud based Genotyping and Phenotyping analysis system development that also supports large scale data processing.
In our thesis, we conduct a few studies in order to devise a stable reference architecture and modularity model for the software developers and data engineers in the domain of Genotyping and Phenotyping. In the first study, we analyze the architectural changes of existing candidate systems to find out the stability issues. Then, we extract architectural patterns of the candidate systems and propose a conceptual reference architectural model. Finally, we present a case study on the modularity of computation-intensive tasks as an extension of the data-centric development. We show that the data-centric modularity model is at the core of the flexible development of a Genotyping and Phenotyping analysis system. Our proposed model and case study with thousands of images provide a useful knowledge-base for software researchers, developers, and data engineers for cloud based Genotyping and Phenotyping analysis system development
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Dealing with Data for RE: Mitigating Challenges while using NLP and Generative AI
Across the dynamic business landscape today, enterprises face an
ever-increasing range of challenges. These include the constantly evolving
regulatory environment, the growing demand for personalization within software
applications, and the heightened emphasis on governance. In response to these
multifaceted demands, large enterprises have been adopting automation that
spans from the optimization of core business processes to the enhancement of
customer experiences. Indeed, Artificial Intelligence (AI) has emerged as a
pivotal element of modern software systems. In this context, data plays an
indispensable role. AI-centric software systems based on supervised learning
and operating at an industrial scale require large volumes of training data to
perform effectively. Moreover, the incorporation of generative AI has led to a
growing demand for adequate evaluation benchmarks. Our experience in this field
has revealed that the requirement for large datasets for training and
evaluation introduces a host of intricate challenges. This book chapter
explores the evolving landscape of Software Engineering (SE) in general, and
Requirements Engineering (RE) in particular, in this era marked by AI
integration. We discuss challenges that arise while integrating Natural
Language Processing (NLP) and generative AI into enterprise-critical software
systems. The chapter provides practical insights, solutions, and examples to
equip readers with the knowledge and tools necessary for effectively building
solutions with NLP at their cores. We also reflect on how these text
data-centric tasks sit together with the traditional RE process. We also
highlight new RE tasks that may be necessary for handling the increasingly
important text data-centricity involved in developing software systems.Comment: 24 pages, 2 figures, to be published in NLP for Requirements
Engineering Boo
DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems
While there have been a number of remarkable breakthroughs in machine
learning (ML), much of the focus has been placed on model development. However,
to truly realize the potential of machine learning in real-world settings,
additional aspects must be considered across the ML pipeline. Data-centric AI
is emerging as a unifying paradigm that could enable such reliable end-to-end
pipelines. However, this remains a nascent area with no standardized framework
to guide practitioners to the necessary data-centric considerations or to
communicate the design of data-centric driven ML systems. To address this gap,
we propose DC-Check, an actionable checklist-style framework to elicit
data-centric considerations at different stages of the ML pipeline: Data,
Training, Testing, and Deployment. This data-centric lens on development aims
to promote thoughtfulness and transparency prior to system development.
Additionally, we highlight specific data-centric AI challenges and research
opportunities. DC-Check is aimed at both practitioners and researchers to guide
day-to-day development. As such, to easily engage with and use DC-Check and
associated resources, we provide a DC-Check companion website
(https://www.vanderschaar-lab.com/dc-check/). The website will also serve as an
updated resource as methods and tooling evolve over time.Comment: Main paper: 11 pages, supplementary & case studies follo
- …