274,461 research outputs found
An Integrated Semantic Web Service Discovery and Composition Framework
In this paper we present a theoretical analysis of graph-based service
composition in terms of its dependency with service discovery. Driven by this
analysis we define a composition framework by means of integration with
fine-grained I/O service discovery that enables the generation of a graph-based
composition which contains the set of services that are semantically relevant
for an input-output request. The proposed framework also includes an optimal
composition search algorithm to extract the best composition from the graph
minimising the length and the number of services, and different graph
optimisations to improve the scalability of the system. A practical
implementation used for the empirical analysis is also provided. This analysis
proves the scalability and flexibility of our proposal and provides insights on
how integrated composition systems can be designed in order to achieve good
performance in real scenarios for the Web.Comment: Accepted to appear in IEEE Transactions on Services Computing 201
Reducing Electricity Demand Charge for Data Centers with Partial Execution
Data centers consume a large amount of energy and incur substantial
electricity cost. In this paper, we study the familiar problem of reducing data
center energy cost with two new perspectives. First, we find, through an
empirical study of contracts from electric utilities powering Google data
centers, that demand charge per kW for the maximum power used is a major
component of the total cost. Second, many services such as Web search tolerate
partial execution of the requests because the response quality is a concave
function of processing time. Data from Microsoft Bing search engine confirms
this observation.
We propose a simple idea of using partial execution to reduce the peak power
demand and energy cost of data centers. We systematically study the problem of
scheduling partial execution with stringent SLAs on response quality. For a
single data center, we derive an optimal algorithm to solve the workload
scheduling problem. In the case of multiple geo-distributed data centers, the
demand of each data center is controlled by the request routing algorithm,
which makes the problem much more involved. We decouple the two aspects, and
develop a distributed optimization algorithm to solve the large-scale request
routing problem. Trace-driven simulations show that partial execution reduces
cost by for one data center, and by for geo-distributed
data centers together with request routing.Comment: 12 page
Product-based Neural Networks for User Response Prediction
Predicting user responses, such as clicks and conversions, is of great
importance and has found its usage in many Web applications including
recommender systems, web search and online advertising. The data in those
applications is mostly categorical and contains multiple fields; a typical
representation is to transform it into a high-dimensional sparse binary feature
representation via one-hot encoding. Facing with the extreme sparsity,
traditional models may limit their capacity of mining shallow patterns from the
data, i.e. low-order feature combinations. Deep models like deep neural
networks, on the other hand, cannot be directly applied for the
high-dimensional input because of the huge feature space. In this paper, we
propose a Product-based Neural Networks (PNN) with an embedding layer to learn
a distributed representation of the categorical data, a product layer to
capture interactive patterns between inter-field categories, and further fully
connected layers to explore high-order feature interactions. Our experimental
results on two large-scale real-world ad click datasets demonstrate that PNNs
consistently outperform the state-of-the-art models on various metrics.Comment: 6 pages, 5 figures, ICDM201
The STRESS Method for Boundary-point Performance Analysis of End-to-end Multicast Timer-Suppression Mechanisms
Evaluation of Internet protocols usually uses random scenarios or scenarios
based on designers' intuition. Such approach may be useful for average-case
analysis but does not cover boundary-point (worst or best-case) scenarios. To
synthesize boundary-point scenarios a more systematic approach is needed.In
this paper, we present a method for automatic synthesis of worst and best case
scenarios for protocol boundary-point evaluation.
Our method uses a fault-oriented test generation (FOTG) algorithm for
searching the protocol and system state space to synthesize these scenarios.
The algorithm is based on a global finite state machine (FSM) model. We extend
the algorithm with timing semantics to handle end-to-end delays and address
performance criteria. We introduce the notion of a virtual LAN to represent
delays of the underlying multicast distribution tree. The algorithms used in
our method utilize implicit backward search using branch and bound techniques
and start from given target events. This aims to reduce the search complexity
drastically. As a case study, we use our method to evaluate variants of the
timer suppression mechanism, used in various multicast protocols, with respect
to two performance criteria: overhead of response messages and response time.
Simulation results for reliable multicast protocols show that our method
provides a scalable way for synthesizing worst-case scenarios automatically.
Results obtained using stress scenarios differ dramatically from those obtained
through average-case analyses. We hope for our method to serve as a model for
applying systematic scenario generation to other multicast protocols.Comment: 24 pages, 10 figures, IEEE/ACM Transactions on Networking (ToN) [To
appear
Enumerating Maximal Bicliques from a Large Graph using MapReduce
We consider the enumeration of maximal bipartite cliques (bicliques) from a
large graph, a task central to many practical data mining problems in social
network analysis and bioinformatics. We present novel parallel algorithms for
the MapReduce platform, and an experimental evaluation using Hadoop MapReduce.
Our algorithm is based on clustering the input graph into smaller sized
subgraphs, followed by processing different subgraphs in parallel. Our
algorithm uses two ideas that enable it to scale to large graphs: (1) the
redundancy in work between different subgraph explorations is minimized through
a careful pruning of the search space, and (2) the load on different reducers
is balanced through the use of an appropriate total order among the vertices.
Our evaluation shows that the algorithm scales to large graphs with millions of
edges and tens of mil- lions of maximal bicliques. To our knowledge, this is
the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of
the 3rd IEEE International Congress on Big Data 201
- …