Search CORE

1,937 research outputs found

High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

Author: Gebbie Tim
Hendricks Dieter
Wilcox Diane
Publication venue: 'Academy of Science of South Africa'
Publication date: 02/08/2015
Field of study

We implement a master-slave parallel genetic algorithm (PGA) with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs) to implement a PGA and visualise the results using disjoint minimal spanning trees (MSTs). We demonstrate that our GPU PGA, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable due to compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.Comment: 10 pages, 5 figures, 4 tables, More thorough discussion of implementatio

arXiv.org e-Print Archive

Crossref

Academy of Science of South Africa (ASSAf): Open Journal Systems

Directory of Open Access Journals

Semantically-aware data discovery and placement in collaborative computing environments

Author: Wang Xinqi
Publication venue: LSU Digital Commons
Publication date: 01/01/2012
Field of study

As the size of scientific datasets and the demand for interdisciplinary collaboration grow in modern science, it becomes imperative that better ways of discovering and placing datasets generated across multiple disciplines be developed to facilitate interdisciplinary scientific research. For discovering relevant data out of large-scale interdisciplinary datasets. The development and integration of cross-domain metadata is critical as metadata serves as the key guideline for organizing data. To develop and integrate cross-domain metadata management systems in interdisciplinary collaborative computing environment, three key issues need to be addressed: the development of a cross-domain metadata schema; the implementation of a metadata management system based on this schema; the integration of the metadata system into existing distributed computing infrastructure. Current research in metadata management in distributed computing environment largely focuses on relatively simple schema that lacks the underlying descriptive power to adequately address semantic heterogeneity often found in interdisciplinary science. And current work does not take adequate consideration the issue of scalability in large-scale data management. Another key issue in data management is data placement, due to the increasing size of scientific datasets, the overhead incurred as a result of transferring data among different nodes also grow into a significant inhibiting factor affecting overall performance. Currently, few data placement strategies take into consideration semantic information concerning data content. In this dissertation, we propose a cross-domain metadata system in a collaborative distributed computing environment and identify and evaluate key factors and processes involved in a successful cross-domain metadata system with the goal of facilitating data discovery in collaborative environments. This will allow researchers/users to conduct interdisciplinary science in the context of large-scale datasets that will make it easier to access interdisciplinary datasets, reduce barrier to collaboration, reduce cost of future development of similar systems. We also investigate data placement strategies that involve semantic information about the hardware and network environment as well as domain information in the form of semantic metadata so that semantic locality could be utilized in data placement, that could potentially reduce overhead for accessing large-scale interdisciplinary datasets

Louisiana State University

Generalize Synchronization Mechanism: Specification, Properties, Limits

Author: Chen Chi-Yeh
Chien Chih-Wei
Publication venue
Publication date: 20/02/2024
Field of study

Shared resources synchronization is a well studied problem, in both shared memory environment or distributed memory environment. Many synchronization mechanisms are proposed, with their own way to reach certain consistency level. This thesis further found that there is no perfect synchronization mechanism. Each of them has its properties at different level. For example, to enforce strong consistency, writers may loose writing freedom or it would take more time to coordinate. This thesis proposes a framework to generalize all synchronization mechanism in a formal way for better reasoning on properties, from the perspective of multi-writer to single-writer convergence. Therefore, limitations prevent a synchronization mechanism from achieving every property at its optimal level. CAP and ROLL were proposed in previous works to explain such. CAP theorem states that it can only achieve two of Consistency, Availability and Partition tolerance properties. ROLL Theorem uses a framework to model leaderless SMR protocol and states quorum size and fault tolerance are trading off. The thesis covers five properties in a more understandable way to analyze trade-offs and explore new mechanisms.Comment: 17 pages, 1 figure. To be submitted to conferences in 202

arXiv.org e-Print Archive

Data Trading and Monetization: Challenges and Open Research Directions

Author: AlShaikh Muath
Boukhers Zeyd
Jürjens Jan
Lange Christoph
Ramadan Qusai
Publication venue
Publication date: 17/01/2024
Field of study

Traditional data monetization approaches face challenges related to data protection and logistics. In response, digital data marketplaces have emerged as intermediaries simplifying data transactions. Despite the growing establishment and acceptance of digital data marketplaces, significant challenges hinder efficient data trading. As a result, few companies can derive tangible value from their data, leading to missed opportunities in understanding customers, pricing decisions, and fraud prevention. In this paper, we explore both technical and organizational challenges affecting data monetization. Moreover, we identify areas in need of further research, aiming to expand the boundaries of current knowledge by emphasizing where research is currently limited or lacking.Comment: Paper accepted by the International Conference on Future Networks and Distributed Systems (ICFNDS 2023

arXiv.org e-Print Archive

Challenges of Big Data Analysis

Author: Fan Jianqing
Han Fang
Liu Han
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/02/2014
Field of study

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

PubMed Central

Distributed workflows with Jupyter

Author: Aldinucci M.
Cantalupo B.
Cavazzoni C.
Colonnelli I.
Di Carlo R.
Magini N.
Morelli R.
Padovani L.
Rabellino S.
Spampinato C.
Publication venue
Publication date: 01/01/2022
Field of study

The designers of a new coordination interface enacting complex workflows have to tackle a dichotomy: choosing a language-independent or language-dependent approach. Language-independent approaches decouple workflow models from the host code's business logic and advocate portability. Language-dependent approaches foster flexibility and performance by adopting the same host language for business and coordination code. Jupyter Notebooks, with their capability to describe both imperative and declarative code in a unique format, allow taking the best of the two approaches, maintaining a clear separation between application and coordination layers but still providing a unified interface to both aspects. We advocate the Jupyter Notebooks’ potential to express complex distributed workflows, identifying the general requirements for a Jupyter-based Workflow Management System (WMS) and introducing a proof-of-concept portable implementation working on hybrid Cloud-HPC infrastructures. As a byproduct, we extended the vanilla IPython kernel with workflow-based parallel and distributed execution capabilities. The proposed Jupyter-workflow (Jw) system is evaluated on common scenarios for High Performance Computing (HPC) and Cloud, showing its potential in lowering the barriers between prototypical Notebooks and production-ready implementations

Archivio istituzionale della ricerca - Università di Camerino

Towards a big data reference architecture

Author: Maier M.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

A survey and classification of software-defined storage systems

Author: Alysson Bessani
Angel Sebastian
Anwar Ali
Anwar Ali
Belaramani Nalini M.
Belay Adam
Carl
Cully Brendan
Frank
Ghodsi Ali
Gracia-Tinedo Raúl
Gulati Ajay
Gulati Ajay
Hat Red
Hsu Chin-Jung
Hunt Patrick
José Pereira
João Paulo
Kim Hyeong-Jun
Klimovic Ana
Koponen Teemu
Li Ning
Lumb Christopher R.
Mace Jonathan
Mesnier Michael
Murugan Muthukumar
Ongaro Diego
Peter Simon
Qian Yingjin
Raghavan Ajaykrishna
Ricardo Macedo
Riedel Erik
Schroeder Bianca
Schwan Philip
Seshadri Sudharsan
Sevilla Michael A.
Shan Yizhou
Shue David
Shue David
Soheil
Song Huaiming
Stefanovici Ioan
Weil Sage A.
Wires Jake
Yang Bin
Yang Suli
Zhang Xuechen
Zhu Timothy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The exponential growth of digital information is imposing increasing scale and efficiency demands on modern storage infrastructures. As infrastructure complexity increases, so does the difficulty in ensuring quality of service, maintainability, and resource fairness, raising unprecedented performance, scalability, and programmability challenges. Software-Defined Storage (SDS) addresses these challenges by cleanly disentangling control and data flows, easing management, and improving control functionality of conventional storage systems. Despite its momentum in the research community, many aspects of the paradigm are still unclear, undefined, and unexplored, leading to misunderstandings that hamper the research and development of novel SDS technologies. In this article, we present an in-depth study of SDS systems, providing a thorough description and categorization of each plane of functionality. Further, we propose a taxonomy and classification of existing SDS solutions according to different criteria. Finally, we provide key insights about the paradigm and discuss potential future research directions for the field.This work was financed by the Portuguese funding agency FCT-Fundacao para a Ciencia e a Tecnologia through national funds, the PhD grant SFRH/BD/146059/2019, the project ThreatAdapt (FCT-FNR/0002/2018), the LASIGE Research Unit (UIDB/00408/2020), and cofunded by the FEDER, where applicable

Universidade do Minho: RepositoriUM

Crossref

Sensor Search Techniques for Sensing as a Service Architecture for The Internet of Things

Author: Christen Peter
Compton Michael
Georgakopoulos Dimitrios
Liu Chi Harold
Perera Charith
Zaslavsky Arkady
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/09/2013
Field of study

The Internet of Things (IoT) is part of the Internet of the future and will comprise billions of intelligent communicating "things" or Internet Connected Objects (ICO) which will have sensing, actuating, and data processing capabilities. Each ICO will have one or more embedded sensors that will capture potentially enormous amounts of data. The sensors and related data streams can be clustered physically or virtually, which raises the challenge of searching and selecting the right sensors for a query in an efficient and effective way. This paper proposes a context-aware sensor search, selection and ranking model, called CASSARAM, to address the challenge of efficiently selecting a subset of relevant sensors out of a large set of sensors with similar functionality and capabilities. CASSARAM takes into account user preferences and considers a broad range of sensor characteristics, such as reliability, accuracy, location, battery life, and many more. The paper highlights the importance of sensor search, selection and ranking for the IoT, identifies important characteristics of both sensors and data capture processes, and discusses how semantic and quantitative reasoning can be combined together. This work also addresses challenges such as efficient distributed sensor search and relational-expression based filtering. CASSARAM testing and performance evaluation results are presented and discussed.Comment: IEEE sensors Journal, 2013. arXiv admin note: text overlap with arXiv:1303.244

arXiv.org e-Print Archive

Deakin Research Online

Crossref

Online Research @ Cardiff

RMIT Research Repository

The Australian National University