Search CORE

1,183,169 research outputs found

Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

Author: Elgohary Ahmed
Farahat Ahmed K.
Kamel Mohamed S.
Karray Fakhri
Publication venue
Publication date: 29/01/2014
Field of study

The kernel

k

-means is an effective method for data clustering which extends the commonly-used

k

-means algorithm to work on a similarity matrix over complex data structures. The kernel

k

-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel

k

-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. In this paper, we are defining a family of kernel-based low-dimensional embeddings that allows for scaling kernel

k

-means on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two methods for low-dimensional embedding that adhere to our definition of the embedding family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel

k

-means. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data Mining (SDM), 201

arXiv.org e-Print Archive

CiteSeerX

Big Data and the Stock Market: Distilling Data to Improve Stock Market Returns

Author: Luo Xiaoyue
Moranchel Jennifer
Shannon William
Publication venue: DigitalCommons@Linfield
Publication date: 18/05/2018
Field of study

In our modern competitive market, businesses are seeking efficient and innovative platforms to remain profitable and prepared, especially in the uncertain world of the financial stock market. One possible avenue for improving stock market returns that companies can turn to is harnessing a substantial volume of information, known as big data. However, because of the nature of big data, distilling and analyzing the vast amount of information can require complex analytical methods. Using a keyword selection process based on word frequency, we were able to filter out the data amongst the noise and derive a sector-specific keyword list. This list, used in combination with a previously created trading method along with the implementation of a thresholding technique, allowed us to develop a more specific trading strategy focused on different market sectors. Our results show that the use of thresholding techniques in addition to the Google Trends strategy may improve returns in the stock market

DigitalCommons@Linfield

The Alt-Right and Global Information Warfare

Author: Bevensee Emmi
Ross Alexander Reid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

The Alt-Right is a neo-fascist white supremacist movement that is involved in violent extremism and shows signs of engagement in extensive disinformation campaigns. Using social media data mining, this study develops a deeper understanding of such targeted disinformation campaigns and the ways they spread. It also adds to the available literature on the endogenous and exogenous influences within the US far right, as well as motivating factors that drive disinformation campaigns, such as geopolitical strategy. This study is to be taken as a preliminary analysis to indicate future methods and follow-on research that will help develop an integrated approach to understanding the strategies and associations of the modern fascist movement.Comment: Presented and published through IEEE 2019 Big Data Conferenc

arXiv.org e-Print Archive

PDXScholar (Portland State University)

A Game Theoretical Analysis of Localization Security in Wireless Sensor Networks with Adversaries

Author: Gatti Nicola
Monga Mattia
Sicari Sabrina
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Wireless Sensor Networks (WSN) support data collection and distributed data processing by means of very small sensing devices that are easy to tamper and cloning: therefore classical security solutions based on access control and strong authentication are difficult to deploy. In this paper we look at the problem of assessing security of node localization. In particular, we analyze the scenario in which Verifiable Multilateration (VM) is used to localize nodes and a malicious node (i.e., the adversary) try to masquerade as non-malicious. We resort to non-cooperative game theory and we model this scenario as a two-player game. We analyze the optimal players' strategy and we show that the VM is indeed a proper mechanism to reduce fake positions.Comment: International Congress on Ultra Modern Telecommunications and Control Systems 2010. (ICUMT'10

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

AIR Universita degli studi di Milano

Archivio istituzionale della ricerca - Università dell'Insubria

Timely-Throughput Optimal Coded Computing over Cloud Networks

Author: Ananthanarayanan Ganesh
Chen Lingjiao
Dutta Sanghamitra
Hou I.
Tandon Rashish
Zaharia Matei
Publication venue
Publication date: 11/04/2019
Field of study

In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. We then consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. With timely computation requests submitted to the system with computation deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate-and Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It is shown that compared to the static allocation strategy, LEA increases the timely computation throughput by 1.4X - 17.5X in various scenarios via simulations and by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California