29 research outputs found
Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover
Stochastic Boolean Function Evaluation is the problem of determining the
value of a given Boolean function f on an unknown input x, when each bit of x_i
of x can only be determined by paying an associated cost c_i. The assumption is
that x is drawn from a given product distribution, and the goal is to minimize
the expected cost. This problem has been studied in Operations Research, where
it is known as "sequential testing" of Boolean functions. It has also been
studied in learning theory in the context of learning with attribute costs. We
consider the general problem of developing approximation algorithms for
Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for
evaluating Boolean linear threshold formulas. We also present an approximation
algorithm for evaluating CDNF formulas (and decision trees) achieving a factor
of O(log kd), where k is the number of terms in the DNF formula, and d is the
number of clauses in the CNF formula. In addition, we present approximation
algorithms for simultaneous evaluation of linear threshold functions, and for
ranking of linear functions.
Our function evaluation algorithms are based on reductions to the Stochastic
Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and
Krause. They presented an approximation algorithm for the problem, called
Adaptive Greedy. Our main technical contribution is a new approximation
algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an
extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito,
which is a generalization of Hochbaum's algorithm for the classical Set Cover
Problem. We also give a new bound on the approximation achieved by the Adaptive
Greedy algorithm of Golovin and Krause
Standing Processes in Service-Oriented Environments
Current realization techniques for service-oriented architectures (SOA) and business process management (BPM) cannot be efficiently applied to any kind of application scenario. For example, an important requirement in the finance sector is the continuous evaluation of stock prices to automatically trigger business processes--e.g. the buying or selling of stocks--with regard to several strategies. In this paper, we address the continuous evaluation of message streams within BPM to establish a common environment for stream-based message processing and traditional business processes. In detail, we propose the notion of standing processes as (i) a process-centric concept for the interpretation of message streams, and (ii) a trigger element for subsequent business processes. The demonstration system focuses on the execution of standing processes and the smooth interaction with the traditional business process environment
A Novel Approach to Data Extraction on Hyperlinked Webpages
The World Wide Web has an enormous amount of useful data presented as HTML tables. These tables are often linked to other web pages, providing further detailed information to certain attribute values. Extracting schema of such relational tables is a challenge due to the non-existence of a standard format and a lack of published algorithms. We downloaded 15,000 web pages using our in-house developed web-crawler, from various web sites. Tables from the HTML code were extracted and table rows were labeled with appropriate class labels. Conditional random fields (CRF) were used for the classification of table rows, and a nondeterministic finite automaton (NFA) algorithm was designed to identify simple, complex, hyperlinked, or non-linked tables. A simple schema for non-linked tables was extracted and for the linked-tables, relational schema in the form of primary and foreign-keys (PK and FK) were developed. Child tables were concatenated with the parent table’s attribute value (PK), serving as foreign keys (FKs). Resultantly, these tables could assist with performing better and stronger queries using the join operation. A manual checking of the linked web table results revealed a 99% precision and 68% recall values. Our 15,000-strong downloadable corpus and a novel algorithm will provide the basis for further research in this field.publishedVersio
Optimization of multi-domain queries on the Web
Where can I attend an interesting database workshop close
to a sunny beach? Who are the strongest experts on service
computing based upon their recent publication record and
accepted European projects? Can I spend an April week-
end in a city served by a low-cost direct
flight from Milano
offering a Mahler's symphony? We regard the above queries
as multi-domain queries, i.e., queries that can be answered
by combining knowledge from two or more domains (such
as: seaside locations,
flights, publications, accepted projects,
conference offerings, and so on). This information is avail-
able on the Web, but no general-purpose software system
can accept the above queries nor compute the answer. At
the most, dedicated systems support specific multi-domain
compositions (e.g., Google-local locates information such as
restaurants and hotels upon geographic maps).
This paper presents an overall framework for multi-domain
queries on the Web. We address the following problems: (a)
expressing multi-domain queries with an abstract formalism,
(b) separating the treatment of "search" services within the
model, by highlighting their dierences from "exact" Web
services, (c) explaining how the same query can be mapped
to multiple "query plans", i.e., a well-dened scheduling of
service invocations, possibly in parallel, which complies with
their access limitations and preserves the ranking order in
which search services return results; (d) introducing cross-
domain joins as first-class operation within plans; (e) eval-
uating the query plans against several cost metrics so as to
choose the most promising one for execution. This frame-
work adapts to a variety of application contexts, ranging
from end-user-oriented mash-up scenarios up to complex ap-
plication integration scenarios