29 research outputs found

    Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover

    Full text link
    Stochastic Boolean Function Evaluation is the problem of determining the value of a given Boolean function f on an unknown input x, when each bit of x_i of x can only be determined by paying an associated cost c_i. The assumption is that x is drawn from a given product distribution, and the goal is to minimize the expected cost. This problem has been studied in Operations Research, where it is known as "sequential testing" of Boolean functions. It has also been studied in learning theory in the context of learning with attribute costs. We consider the general problem of developing approximation algorithms for Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for evaluating Boolean linear threshold formulas. We also present an approximation algorithm for evaluating CDNF formulas (and decision trees) achieving a factor of O(log kd), where k is the number of terms in the DNF formula, and d is the number of clauses in the CNF formula. In addition, we present approximation algorithms for simultaneous evaluation of linear threshold functions, and for ranking of linear functions. Our function evaluation algorithms are based on reductions to the Stochastic Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and Krause. They presented an approximation algorithm for the problem, called Adaptive Greedy. Our main technical contribution is a new approximation algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito, which is a generalization of Hochbaum's algorithm for the classical Set Cover Problem. We also give a new bound on the approximation achieved by the Adaptive Greedy algorithm of Golovin and Krause

    Standing Processes in Service-Oriented Environments

    Get PDF
    Current realization techniques for service-oriented architectures (SOA) and business process management (BPM) cannot be efficiently applied to any kind of application scenario. For example, an important requirement in the finance sector is the continuous evaluation of stock prices to automatically trigger business processes--e.g. the buying or selling of stocks--with regard to several strategies. In this paper, we address the continuous evaluation of message streams within BPM to establish a common environment for stream-based message processing and traditional business processes. In detail, we propose the notion of standing processes as (i) a process-centric concept for the interpretation of message streams, and (ii) a trigger element for subsequent business processes. The demonstration system focuses on the execution of standing processes and the smooth interaction with the traditional business process environment

    A Novel Approach to Data Extraction on Hyperlinked Webpages

    Get PDF
    The World Wide Web has an enormous amount of useful data presented as HTML tables. These tables are often linked to other web pages, providing further detailed information to certain attribute values. Extracting schema of such relational tables is a challenge due to the non-existence of a standard format and a lack of published algorithms. We downloaded 15,000 web pages using our in-house developed web-crawler, from various web sites. Tables from the HTML code were extracted and table rows were labeled with appropriate class labels. Conditional random fields (CRF) were used for the classification of table rows, and a nondeterministic finite automaton (NFA) algorithm was designed to identify simple, complex, hyperlinked, or non-linked tables. A simple schema for non-linked tables was extracted and for the linked-tables, relational schema in the form of primary and foreign-keys (PK and FK) were developed. Child tables were concatenated with the parent table’s attribute value (PK), serving as foreign keys (FKs). Resultantly, these tables could assist with performing better and stronger queries using the join operation. A manual checking of the linked web table results revealed a 99% precision and 68% recall values. Our 15,000-strong downloadable corpus and a novel algorithm will provide the basis for further research in this field.publishedVersio

    A Query Rewriting Approach for Web Service Composition

    Full text link

    Optimization of multi-domain queries on the Web

    Get PDF
    Where can I attend an interesting database workshop close to a sunny beach? Who are the strongest experts on service computing based upon their recent publication record and accepted European projects? Can I spend an April week- end in a city served by a low-cost direct flight from Milano offering a Mahler's symphony? We regard the above queries as multi-domain queries, i.e., queries that can be answered by combining knowledge from two or more domains (such as: seaside locations, flights, publications, accepted projects, conference offerings, and so on). This information is avail- able on the Web, but no general-purpose software system can accept the above queries nor compute the answer. At the most, dedicated systems support specific multi-domain compositions (e.g., Google-local locates information such as restaurants and hotels upon geographic maps). This paper presents an overall framework for multi-domain queries on the Web. We address the following problems: (a) expressing multi-domain queries with an abstract formalism, (b) separating the treatment of "search" services within the model, by highlighting their dierences from "exact" Web services, (c) explaining how the same query can be mapped to multiple "query plans", i.e., a well-dened scheduling of service invocations, possibly in parallel, which complies with their access limitations and preserves the ranking order in which search services return results; (d) introducing cross- domain joins as first-class operation within plans; (e) eval- uating the query plans against several cost metrics so as to choose the most promising one for execution. This frame- work adapts to a variety of application contexts, ranging from end-user-oriented mash-up scenarios up to complex ap- plication integration scenarios
    corecore