110,243 research outputs found
Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm
Twitter is a popular social network platform where users can interact and
post texts of up to 280 characters called tweets. Hashtags, hyperlinked words
in tweets, have increasingly become crucial for tweet retrieval and search.
Using hashtags for tweet topic classification is a challenging problem because
of context dependent among words, slangs, abbreviation and emoticons in a short
tweet along with evolving use of hashtags. Since Twitter generates millions of
tweets daily, tweet analytics is a fundamental problem of Big data stream that
often requires a real-time Distributed processing. This paper proposes a
distributed online approach to tweet topic classification with hashtags. Being
implemented on Apache Storm, a distributed real time framework, our approach
incrementally identifies and updates a set of strong predictors in the Na\"ive
Bayes model for classifying each incoming tweet instance. Preliminary
experiments show promising results with up to 97% accuracy and 37% increase in
throughput on eight processors.Comment: IEEE International Conference on Big Data 201
Boosting computational power through spatial multiplexing in quantum reservoir computing
Quantum reservoir computing provides a framework for exploiting the natural
dynamics of quantum systems as a computational resource. It can implement
real-time signal processing and solve temporal machine learning problems in
general, which requires memory and nonlinear mapping of the recent input stream
using the quantum dynamics in computational supremacy region, where the
classical simulation of the system is intractable. A nuclear magnetic resonance
spin-ensemble system is one of the realistic candidates for such physical
implementations, which is currently available in laboratories. In this paper,
considering these realistic experimental constraints for implementing the
framework, we introduce a scheme, which we call a spatial multiplexing
technique, to effectively boost the computational power of the platform. This
technique exploits disjoint dynamics, which originate from multiple different
quantum systems driven by common input streams in parallel. Accordingly, unlike
designing a single large quantum system to increase the number of qubits for
computational nodes, it is possible to prepare a huge number of qubits from
multiple but small quantum systems, which are operationally easy to handle in
laboratory experiments. We numerically demonstrate the effectiveness of the
technique using several benchmark tasks and quantitatively investigate its
specifications, range of validity, and limitations in detail.Comment: 15 page
An occam Style Communications System for UNIX Networks
This document describes the design of a communications system which provides occam style communications primitives under a Unix environment, using TCP/IP protocols, and any number of other protocols deemed suitable as underlying transport layers. The system will integrate with a low overhead scheduler/kernel without incurring significant costs to the execution of processes within the run time environment. A survey of relevant occam and occam3 features and related research is followed by a look at the Unix and TCP/IP facilities which determine our working constraints, and a description of the T9000 transputer's Virtual Channel Processor, which was instrumental in our formulation. Drawing from the information presented here, a design for the communications system is subsequently proposed. Finally, a preliminary investigation of methods for lightweight access control to shared resources in an environment which does not provide support for critical sections, semaphores, or busy waiting, is made. This is presented with relevance to mutual exclusion problems which arise within the proposed design. Future directions for the evolution of this project are discussed in conclusion
A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections
We present a programming methodology and runtime performance case study
comparing the declarative data flow coordination language S-Net with Intel's
Concurrent Collections (CnC). As a coordination language S-Net achieves a
near-complete separation of concerns between sequential software components
implemented in a separate algorithmic language and their parallel orchestration
in an asynchronous data flow streaming network. We investigate the merits of
S-Net and CnC with the help of a relevant and non-trivial linear algebra
problem: tiled Cholesky decomposition. We describe two alternative S-Net
implementations of tiled Cholesky factorization and compare them with two CnC
implementations, one with explicit performance tuning and one without, that
have previously been used to illustrate Intel CnC. Our experiments on a 48-core
machine demonstrate that S-Net manages to outperform CnC on this problem.Comment: 9 pages, 8 figures, 1 table, accepted for PLC 2014 worksho
Data Workflow - A Workflow Model for Continuous Data Processing
Online data or streaming data are getting more and more important for enterprise information systems, e.g. by integrating sensor data and workflows. The continuous flow of data provided e.g. by sensors requires new workflow models addressing the data perspective of these applications, since continuous data is potentially infinite while business process instances are always finite.\ud
In this paper a formal workflow model is proposed with data driven coordination and explicating properties of the continuous data processing. These properties can be used to optimize data workflows, i.e., reducing the computational power for processing the workflows in an engine by reusing intermediate processing results in several workflows
- …