1,183,169 research outputs found

    Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

    Full text link
    The kernel kk-means is an effective method for data clustering which extends the commonly-used kk-means algorithm to work on a similarity matrix over complex data structures. The kernel kk-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel kk-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. In this paper, we are defining a family of kernel-based low-dimensional embeddings that allows for scaling kernel kk-means on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two methods for low-dimensional embedding that adhere to our definition of the embedding family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel kk-means. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data Mining (SDM), 201

    Big Data and the Stock Market: Distilling Data to Improve Stock Market Returns

    Get PDF
    In our modern competitive market, businesses are seeking efficient and innovative platforms to remain profitable and prepared, especially in the uncertain world of the financial stock market. One possible avenue for improving stock market returns that companies can turn to is harnessing a substantial volume of information, known as big data. However, because of the nature of big data, distilling and analyzing the vast amount of information can require complex analytical methods. Using a keyword selection process based on word frequency, we were able to filter out the data amongst the noise and derive a sector-specific keyword list. This list, used in combination with a previously created trading method along with the implementation of a thresholding technique, allowed us to develop a more specific trading strategy focused on different market sectors. Our results show that the use of thresholding techniques in addition to the Google Trends strategy may improve returns in the stock market

    The Alt-Right and Global Information Warfare

    Get PDF
    The Alt-Right is a neo-fascist white supremacist movement that is involved in violent extremism and shows signs of engagement in extensive disinformation campaigns. Using social media data mining, this study develops a deeper understanding of such targeted disinformation campaigns and the ways they spread. It also adds to the available literature on the endogenous and exogenous influences within the US far right, as well as motivating factors that drive disinformation campaigns, such as geopolitical strategy. This study is to be taken as a preliminary analysis to indicate future methods and follow-on research that will help develop an integrated approach to understanding the strategies and associations of the modern fascist movement.Comment: Presented and published through IEEE 2019 Big Data Conferenc

    A Game Theoretical Analysis of Localization Security in Wireless Sensor Networks with Adversaries

    Get PDF
    Wireless Sensor Networks (WSN) support data collection and distributed data processing by means of very small sensing devices that are easy to tamper and cloning: therefore classical security solutions based on access control and strong authentication are difficult to deploy. In this paper we look at the problem of assessing security of node localization. In particular, we analyze the scenario in which Verifiable Multilateration (VM) is used to localize nodes and a malicious node (i.e., the adversary) try to masquerade as non-malicious. We resort to non-cooperative game theory and we model this scenario as a two-player game. We analyze the optimal players' strategy and we show that the VM is indeed a proper mechanism to reduce fake positions.Comment: International Congress on Ultra Modern Telecommunications and Control Systems 2010. (ICUMT'10

    Timely-Throughput Optimal Coded Computing over Cloud Networks

    Get PDF
    In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. We then consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. With timely computation requests submitted to the system with computation deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate-and Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It is shown that compared to the static allocation strategy, LEA increases the timely computation throughput by 1.4X - 17.5X in various scenarios via simulations and by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201
    • …
    corecore