19,923 research outputs found
Communication Theoretic Data Analytics
Widespread use of the Internet and social networks invokes the generation of
big data, which is proving to be useful in a number of applications. To deal
with explosively growing amounts of data, data analytics has emerged as a
critical technology related to computing, signal processing, and information
networking. In this paper, a formalism is considered in which data is modeled
as a generalized social network and communication theory and information theory
are thereby extended to data analytics. First, the creation of an equalizer to
optimize information transfer between two data variables is considered, and
financial data is used to demonstrate the advantages. Then, an information
coupling approach based on information geometry is applied for dimensionality
reduction, with a pattern recognition example to illustrate the effectiveness.
These initial trials suggest the potential of communication theoretic data
analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan.
201
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
Occam's hammer: a link between randomized learning and multiple testing FDR control
We establish a generic theoretical tool to construct probabilistic bounds for
algorithms where the output is a subset of objects from an initial pool of
candidates (or more generally, a probability distribution on said pool). This
general device, dubbed "Occam's hammer'', acts as a meta layer when a
probabilistic bound is already known on the objects of the pool taken
individually, and aims at controlling the proportion of the objects in the set
output not satisfying their individual bound. In this regard, it can be seen as
a non-trivial generalization of the "union bound with a prior'' ("Occam's
razor''), a familiar tool in learning theory. We give applications of this
principle to randomized classifiers (providing an interesting alternative
approach to PAC-Bayes bounds) and multiple testing (where it allows to retrieve
exactly and extend the so-called Benjamini-Yekutieli testing procedure).Comment: 13 pages -- conference communication type forma
Curriculum Guidelines for Undergraduate Programs in Data Science
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program
met for the purpose of composing guidelines for undergraduate programs in Data
Science. The group consisted of 25 undergraduate faculty from a variety of
institutions in the U.S., primarily from the disciplines of mathematics,
statistics and computer science. These guidelines are meant to provide some
structure for institutions planning for or revising a major in Data Science
Mathematics at the eve of a historic transition in biology
A century ago physicists and mathematicians worked in tandem and established
quantum mechanism. Indeed, algebras, partial differential equations, group
theory, and functional analysis underpin the foundation of quantum mechanism.
Currently, biology is undergoing a historic transition from qualitative,
phenomenological and descriptive to quantitative, analytical and predictive.
Mathematics, again, becomes a driving force behind this new transition in
biology.Comment: 5 pages, 2 figure
- …