137 research outputs found
SCOPE: Scalable Composite Optimization for Learning on Spark
Many machine learning models, such as logistic regression~(LR) and support
vector machine~(SVM), can be formulated as composite optimization problems.
Recently, many distributed stochastic optimization~(DSO) methods have been
proposed to solve the large-scale composite optimization problems, which have
shown better performance than traditional batch methods. However, most of these
DSO methods are not scalable enough. In this paper, we propose a novel DSO
method, called \underline{s}calable \underline{c}omposite
\underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it
on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both
computation-efficient and communication-efficient. Theoretical analysis shows
that SCOPE is convergent with linear convergence rate when the objective
function is convex. Furthermore, empirical results on real datasets show that
SCOPE can outperform other state-of-the-art distributed learning methods on
Spark, including both batch learning methods and DSO methods
Game Theoretic Approaches to Massive Data Processing in Wireless Networks
Wireless communication networks are becoming highly virtualized with
two-layer hierarchies, in which controllers at the upper layer with tasks to
achieve can ask a large number of agents at the lower layer to help realize
computation, storage, and transmission functions. Through offloading data
processing to the agents, the controllers can accomplish otherwise prohibitive
big data processing. Incentive mechanisms are needed for the agents to perform
the controllers' tasks in order to satisfy the corresponding objectives of
controllers and agents. In this article, a hierarchical game framework with
fast convergence and scalability is proposed to meet the demand for real-time
processing for such situations. Possible future research directions in this
emerging area are also discussed
L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework
Despite the importance of sparsity in many large-scale applications, there
are few methods for distributed optimization of sparsity-inducing objectives.
In this paper, we present a communication-efficient framework for
L1-regularized optimization in the distributed environment. By viewing
classical objectives in a more general primal-dual setting, we develop a new
class of methods that can be efficiently distributed and applied to common
sparsity-inducing models, such as Lasso, sparse logistic regression, and
elastic net-regularized problems. We provide theoretical convergence guarantees
for our framework, and demonstrate its efficiency and flexibility with a
thorough experimental comparison on Amazon EC2. Our proposed framework yields
speedups of up to 50x as compared to current state-of-the-art methods for
distributed L1-regularized optimization
Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media
Real-time social systems like Facebook, Twitter, and Snapchat have been growing
rapidly, producing exabytes of data in different views or aspects. Coupled with more
and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable
information regarding “who”, “where”, “when” and “what”, these real-time human
sensor data promise new research opportunities to uncover models of user behavior, mobility,
and information sharing. These real-time dynamics in social systems usually come
in multiple aspects, which are able to help better understand the social interactions of the
underlying network. However, these multi-aspect datasets are often raw and incomplete
owing to various unpredictable or unavoidable reasons; for instance, API limitations and
data sampling policies can lead to an incomplete (and often biased) perspective on these
multi-aspect datasets. This missing data could raise serious concerns such as biased estimations
on structural properties of the network and properties of information cascades in
social networks. In order to recover missing values or information in social systems, we
identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption
of rich side information that is able to describe the similarities of entities, generation of
robust models rather than limiting them on specific applications, and scalability of models
to handle real large-scale datasets (billions of observed entries). With these challenges
in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks,
algorithms and methods for recovering missing information on social media. In
particular, this dissertation research makes four unique contributions:
_ The first research contribution of this dissertation research is to propose a scalable
framework based on low-rank tensor learning in the presence of incomplete information.
Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based
factorization approach based on the alternative direction method of multipliers
(ADMM) with the integration of the latent relationships derived from contextual
information among locations, memes, and times.
_ The second research contribution of this dissertation research is to evaluate the generalization
of the proposed tensor learning framework and extend it to the recommendation
problem. In particular, we develop a novel tensor-based approach to
solve the personalized expert recommendation by integrating both the latent relationships
between homogeneous entities (e.g., users and users, experts and experts)
and the relationships between heterogeneous entities (e.g., users and experts, topics
and experts) from the geo-spatial, topical, and social contexts.
_ The third research contribution of this dissertation research is to extend the proposed
tensor learning framework to the user topical profiling problem. Specifically,
we propose a tensor-based contextual regularization model embedded into a matrix
factorization framework, which leverages the social, textual, and behavioral contexts
across users, in order to overcome identified challenges.
_ The fourth research contribution of this dissertation research is to scale up the proposed
tensor learning framework to be capable of handling real large-scale datasets
that are too big to fit in the main memory of a single machine. Particularly, we
propose a novel distributed tensor completion algorithm with the trace-based regularization
of the auxiliary information based on ADMM under the proposed tensor
learning framework, which is designed to scale up to real large-scale tensors (e.g.,
billions of entries) by efficiently computing auxiliary variables, minimizing intermediate
data, and reducing the workload of updating new tensors
- …