1,183,169 research outputs found
Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce
The kernel -means is an effective method for data clustering which extends
the commonly-used -means algorithm to work on a similarity matrix over
complex data structures. The kernel -means algorithm is however
computationally very complex as it requires the complete data matrix to be
calculated and stored. Further, the kernelized nature of the kernel -means
algorithm hinders the parallelization of its computations on modern
infrastructures for distributed computing. In this paper, we are defining a
family of kernel-based low-dimensional embeddings that allows for scaling
kernel -means on MapReduce via an efficient and unified parallelization
strategy. Afterwards, we propose two methods for low-dimensional embedding that
adhere to our definition of the embedding family. Exploiting the proposed
parallelization strategy, we present two scalable MapReduce algorithms for
kernel -means. We demonstrate the effectiveness and efficiency of the
proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data
Mining (SDM), 201
Big Data and the Stock Market: Distilling Data to Improve Stock Market Returns
In our modern competitive market, businesses are seeking efficient and innovative platforms to remain profitable and prepared, especially in the uncertain world of the financial stock market. One possible avenue for improving stock market returns that companies can turn to is harnessing a substantial volume of information, known as big data. However, because of the nature of big data, distilling and analyzing the vast amount of information can require complex analytical methods. Using a keyword selection process based on word frequency, we were able to filter out the data amongst the noise and derive a sector-specific keyword list. This list, used in combination with a previously created trading method along with the implementation of a thresholding technique, allowed us to develop a more specific trading strategy focused on different market sectors. Our results show that the use of thresholding techniques in addition to the Google Trends strategy may improve returns in the stock market
The Alt-Right and Global Information Warfare
The Alt-Right is a neo-fascist white supremacist movement that is involved in
violent extremism and shows signs of engagement in extensive disinformation
campaigns. Using social media data mining, this study develops a deeper
understanding of such targeted disinformation campaigns and the ways they
spread. It also adds to the available literature on the endogenous and
exogenous influences within the US far right, as well as motivating factors
that drive disinformation campaigns, such as geopolitical strategy. This study
is to be taken as a preliminary analysis to indicate future methods and
follow-on research that will help develop an integrated approach to
understanding the strategies and associations of the modern fascist movement.Comment: Presented and published through IEEE 2019 Big Data Conferenc
A Game Theoretical Analysis of Localization Security in Wireless Sensor Networks with Adversaries
Wireless Sensor Networks (WSN) support data collection and distributed data
processing by means of very small sensing devices that are easy to tamper and
cloning: therefore classical security solutions based on access control and
strong authentication are difficult to deploy. In this paper we look at the
problem of assessing security of node localization. In particular, we analyze
the scenario in which Verifiable Multilateration (VM) is used to localize nodes
and a malicious node (i.e., the adversary) try to masquerade as non-malicious.
We resort to non-cooperative game theory and we model this scenario as a
two-player game. We analyze the optimal players' strategy and we show that the
VM is indeed a proper mechanism to reduce fake positions.Comment: International Congress on Ultra Modern Telecommunications and Control
Systems 2010. (ICUMT'10
Timely-Throughput Optimal Coded Computing over Cloud Networks
In modern distributed computing systems, unpredictable and unreliable
infrastructures result in high variability of computing resources. Meanwhile,
there is significantly increasing demand for timely and event-driven services
with deadline constraints. Motivated by measurements over Amazon EC2 clusters,
we consider a two-state Markov model for variability of computing speed in
cloud networks. In this model, each worker can be either in a good state or a
bad state in terms of the computation speed, and the transition between these
states is modeled as a Markov chain which is unknown to the scheduler. We then
consider a Coded Computing framework, in which the data is possibly encoded and
stored at the worker nodes in order to provide robustness against nodes that
may be in a bad state. With timely computation requests submitted to the system
with computation deadlines, our goal is to design the optimal computation-load
allocation scheme and the optimal data encoding scheme that maximize the timely
computation throughput (i.e, the average number of computation tasks that are
accomplished before their deadline). Our main result is the development of a
dynamic computation strategy called Lagrange Estimate-and Allocate (LEA)
strategy, which achieves the optimal timely computation throughput. It is shown
that compared to the static allocation strategy, LEA increases the timely
computation throughput by 1.4X - 17.5X in various scenarios via simulations and
by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201
- …