19,923 research outputs found

    Communication Theoretic Data Analytics

    Full text link
    Widespread use of the Internet and social networks invokes the generation of big data, which is proving to be useful in a number of applications. To deal with explosively growing amounts of data, data analytics has emerged as a critical technology related to computing, signal processing, and information networking. In this paper, a formalism is considered in which data is modeled as a generalized social network and communication theory and information theory are thereby extended to data analytics. First, the creation of an equalizer to optimize information transfer between two data variables is considered, and financial data is used to demonstrate the advantages. Then, an information coupling approach based on information geometry is applied for dimensionality reduction, with a pattern recognition example to illustrate the effectiveness. These initial trials suggest the potential of communication theoretic data analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan. 201

    DeepWalk: Online Learning of Social Representations

    Full text link
    We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1F_1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table

    Occam's hammer: a link between randomized learning and multiple testing FDR control

    Full text link
    We establish a generic theoretical tool to construct probabilistic bounds for algorithms where the output is a subset of objects from an initial pool of candidates (or more generally, a probability distribution on said pool). This general device, dubbed "Occam's hammer'', acts as a meta layer when a probabilistic bound is already known on the objects of the pool taken individually, and aims at controlling the proportion of the objects in the set output not satisfying their individual bound. In this regard, it can be seen as a non-trivial generalization of the "union bound with a prior'' ("Occam's razor''), a familiar tool in learning theory. We give applications of this principle to randomized classifiers (providing an interesting alternative approach to PAC-Bayes bounds) and multiple testing (where it allows to retrieve exactly and extend the so-called Benjamini-Yekutieli testing procedure).Comment: 13 pages -- conference communication type forma

    Curriculum Guidelines for Undergraduate Programs in Data Science

    Get PDF
    The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science

    Mathematics at the eve of a historic transition in biology

    Full text link
    A century ago physicists and mathematicians worked in tandem and established quantum mechanism. Indeed, algebras, partial differential equations, group theory, and functional analysis underpin the foundation of quantum mechanism. Currently, biology is undergoing a historic transition from qualitative, phenomenological and descriptive to quantitative, analytical and predictive. Mathematics, again, becomes a driving force behind this new transition in biology.Comment: 5 pages, 2 figure
    • …
    corecore