Search CORE

702 research outputs found

Semantic Modeling of Analytic-based Relationships with Direct Qualification

Author: Ahmed Norman
Bryant Jason
Hasseler Gregory
Paulini Matthew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/02/2015
Field of study

Successfully modeling state and analytics-based semantic relationships of documents enhances representation, importance, relevancy, provenience, and priority of the document. These attributes are the core elements that form the machine-based knowledge representation for documents. However, modeling document relationships that can change over time can be inelegant, limited, complex or overly burdensome for semantic technologies. In this paper, we present Direct Qualification (DQ), an approach for modeling any semantically referenced document, concept, or named graph with results from associated applied analytics. The proposed approach supplements the traditional subject-object relationships by providing a third leg to the relationship; the qualification of how and why the relationship exists. To illustrate, we show a prototype of an event-based system with a realistic use case for applying DQ to relevancy analytics of PageRank and Hyperlink-Induced Topic Search (HITS).Comment: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015

arXiv.org e-Print Archive

Crossref

High availability in clouds: systematic review and research challenges

Author: Calin Curescu
Djamel H. Sadok
Glauco E. Gonçalves
Judith Kelner
Moisés Rodrigues
Patricia T. Endo
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector

Blazes: Coordination Analysis for Distributed Programs

Author: Alvaro Peter
Conway Neil
Hellerstein Joseph M.
Maier David
Publication venue
Publication date: 28/11/2013
Field of study

Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label

arXiv.org e-Print Archive

CiteSeerX

Crossref

Parallel and Distributed Stream Processing: Systems Classification and Specific Issues

Author: Caniou Yves
Kotto-Kombi Roland
Lamarre Philippe
Lumineau Nicolas
Publication venue: HAL CCSD
Publication date: 05/10/2015
Field of study

Deploying an infrastructure to execute queries on distributed data streams sources requires to identify a scalable and robust solution able to provide results which can be qualified. Last decade, different Data Stream Management Systems have been designed by exploiting new paradigm and technologies to improve performances of solutions facing specific features of data streams and their growing number. However, some tradeoffs are often achieved between performance of the processing, resources consumption and quality of results. This survey 5 suggests an overview of existing solutions among distributed and parallel systems classified according to criteria able to allow readers to efficiently identify relevant existing Distributed Stream Management Systems according to their needs ans resources

HAL-ENS-LYON

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

Benchmarking Distributed Stream Data Processing Systems

Author: Heiskanen Henri
Karimov Jeyhun
Katsifodimos Asterios
Markl Volker
Rabl Tilmann
Samarev Roman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/06/2019
Field of study

The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.Comment: Published at ICDE 201

arXiv.org e-Print Archive

Crossref

Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing

Author: Najdataei Hannaneh
Publication venue
Publication date: 01/01/2022
Field of study

Today, ubiquitously sensing technologies enable inter-connection of physical\ua0objects, as part of Internet of Things (IoT), and provide massive amounts of\ua0data streams. In such scenarios, the demand for timely analysis has resulted in\ua0a shift of data processing paradigms towards continuous, parallel, and multitier\ua0computing. However, these paradigms are followed by several challenges\ua0especially regarding analysis speed, precision, costs, and deterministic execution.\ua0This thesis studies a number of such challenges to enable efficient continuous\ua0processing of streams of data in a decentralized and timely manner.In the first part of the thesis, we investigate techniques aiming at speeding\ua0up the processing without a loss in precision. The focus is on continuous\ua0machine learning/data mining types of problems, appearing commonly in IoT\ua0applications, and in particular continuous clustering and monitoring, for which\ua0we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data\ua0points collected by LiDAR (a distance sensor that creates a 3D mapping of the\ua0environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and\ua0data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version\ua0to reuse the information and prevent redundant computations, (iv) g-Lisco, a\ua0generalized version of Lisco to cluster any data with spatio-temporal locality\ua0by leveraging the implicit ordering of the data, and (v) Amble, a continuous\ua0monitoring solution in an industrial process.In the second part, we investigate techniques to reduce the analysis costs\ua0in addition to speeding up the processing while also supporting deterministic\ua0execution. The focus is on problems associated with availability and utilization\ua0of computing resources, namely reducing the volumes of data, involving\ua0concurrent computing elements, and adjusting the level of concurrency. For\ua0that, we propose three frameworks; (i) DRIVEN, a framework to continuously\ua0compress the data and enable efficient transmission of the compact data in the\ua0processing pipeline, (ii) STRATUM, a framework to continuously pre-process\ua0the data before transferring the later to upper tiers for further processing, and\ua0(iii) STRETCH, a framework to enable instantaneous elastic reconfigurations\ua0to adjust intra-node resources at runtime while ensuring determinism.The algorithms and frameworks presented in this thesis contribute to an\ua0efficient processing of data streams in an online manner while utilizing available\ua0resources. Using extensive evaluations, we show the efficiency and achievements\ua0of the proposed techniques for IoT representative applications that involve a\ua0wide spectrum of platforms, and illustrate that the performance of our work\ua0exceeds that of state-of-the-art techniques

Chalmers Research

The Knowledge Life Cycle for e-learning

Author: Davis Hugh
Doody Karl
Millard David
Tao Feng
Woukeu Arouna
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, we examine the semantic aspects of e-learning from both pedagogical and technological points of view. We suggest that if semantics are to fulfil their potential in the learning domain then a paradigm shift in perspective is necessary, from information-based content delivery to knowledge-based collaborative learning services. We propose a semantics driven Knowledge Life Cycle that characterises the key phases in managing semantics and knowledge, show how this can be applied to the learning domain and demonstrate the value of semantics via an example of knowledge reuse in learning assessment management

CiteSeerX

Southampton (e-Prints Soton)

HoPP: Robust and Resilient Publish-Subscribe for an Information-Centric Internet of Things

Author: Gündoğan Cenk
Kietzmann Peter
Schmidt Thomas C.
Wählisch Matthias
Publication venue
Publication date: 11/01/2018
Field of study

This paper revisits NDN deployment in the IoT with a special focus on the interaction of sensors and actuators. Such scenarios require high responsiveness and limited control state at the constrained nodes. We argue that the NDN request-response pattern which prevents data push is vital for IoT networks. We contribute HoP-and-Pull (HoPP), a robust publish-subscribe scheme for typical IoT scenarios that targets IoT networks consisting of hundreds of resource constrained devices at intermittent connectivity. Our approach limits the FIB tables to a minimum and naturally supports mobility, temporary network partitioning, data aggregation and near real-time reactivity. We experimentally evaluate the protocol in a real-world deployment using the IoT-Lab testbed with varying numbers of constrained devices, each wirelessly interconnected via IEEE 802.15.4 LowPANs. Implementations are built on CCN-lite with RIOT and support experiments using various single- and multi-hop scenarios

arXiv.org e-Print Archive

Crossref

REPOSIT

Improving the robustness and privacy of HTTP cookie-based tracking systems within an affiliate marketing context : a thesis presented in fulfilment of the requirements for the degree of Doctor of Philosophy at Massey University, Albany, New Zealand

Author: Amarasekara Bede Ravindra
Publication venue: 'Massey University'
Publication date: 01/01/2021
Field of study

E-commerce activities provide a global reach for enterprises large and small. Third parties generate visitor traffic for a fee; through affiliate marketing, search engine marketing, keyword bidding and through organic search, amongst others. Therefore, improving the robustness of the underlying tracking and state management techniques is a vital requirement for the growth and stability of e-commerce. In an inherently stateless ecosystem such as the Internet, HTTP cookies have been the de-facto tracking vector for decades. In a previous study, the thesis author exposed circumstances under which cookie-based tracking system can fail, some due to technical glitches, others due to manipulations made for monetary gain by some fraudulent actors. Following a design science research paradigm, this research explores alternative tracking vectors discussed in previous research studies within a cross-domain tracking environment. It evaluates their efficacy within current context and demonstrates how to use them to improve the robustness of existing tracking techniques. Research outputs include methods, instantiations and a privacy model artefact based on information seeking behaviour of different categories of tracking software, and their resulting privacy intrusion levels. This privacy model provides clarity and is useful for practitioners and regulators to create regulatory frameworks that do not hinder technological advancement, rather they curtail privacy-intrusive tracking practices on the Internet. The method artefacts are instantiated as functional prototypes, available publicly on Internet, to demonstrate the efficacy and utility of the methods through live tests. The research contributes to the theoretical knowledge base through generalisation of empirical findings and to the industry by problem solving design artefacts

Massey Research Online