35,335 research outputs found

    Optimal Information Retrieval with Complex Utility Functions

    Get PDF
    Existing retrieval models all attempt to optimize one single utility function, which is often based on the topical relevance of a document with respect to a query. In real applications, retrieval involves more complex utility functions that may involve preferences on several different dimensions. In this paper, we present a general optimization framework for retrieval with complex utility functions. A query language is designed according to this framework to enable users to submit complex queries. We propose an efficient algorithm for retrieval with complex utility functions based on the a-priori algorithm. As a case study, we apply our algorithm to a complex utility retrieval problem in distributed IR. Experiment results show that our algorithm allows for flexible tradeoff between multiple retrieval criteria. Finally, we study the efficiency issue of our algorithm on simulated data

    Lazier Than Lazy Greedy

    Full text link
    Is it possible to maximize a monotone submodular function faster than the widely used lazy greedy algorithm (also known as accelerated greedy), both in theory and practice? In this paper, we develop the first linear-time algorithm for maximizing a general monotone submodular function subject to a cardinality constraint. We show that our randomized algorithm, STOCHASTIC-GREEDY, can achieve a (1−1/e−ε)(1-1/e-\varepsilon) approximation guarantee, in expectation, to the optimum solution in time linear in the size of the data and independent of the cardinality constraint. We empirically demonstrate the effectiveness of our algorithm on submodular functions arising in data summarization, including training large-scale kernel methods, exemplar-based clustering, and sensor placement. We observe that STOCHASTIC-GREEDY practically achieves the same utility value as lazy greedy but runs much faster. More surprisingly, we observe that in many practical scenarios STOCHASTIC-GREEDY does not evaluate the whole fraction of data points even once and still achieves indistinguishable results compared to lazy greedy.Comment: In Proc. Conference on Artificial Intelligence (AAAI), 201

    Energy efficiency parametric design tool in the framework of holistic ship design optimization

    Get PDF
    Recent International Maritime Organization (IMO) decisions with respect to measures to reduce the emissions from maritime greenhouse gases (GHGs) suggest that the collaboration of all major stakeholders of shipbuilding and ship operations is required to address this complex techno-economical and highly political problem efficiently. This calls eventually for the development of proper design, operational knowledge, and assessment tools for the energy-efficient design and operation of ships, as suggested by the Second IMO GHG Study (2009). This type of coordination of the efforts of many maritime stakeholders, with often conflicting professional interests but ultimately commonly aiming at optimal ship design and operation solutions, has been addressed within a methodology developed in the EU-funded Logistics-Based (LOGBASED) Design Project (2004–2007). Based on the knowledge base developed within this project, a new parametric design software tool (PDT) has been developed by the National Technical University of Athens, Ship Design Laboratory (NTUA-SDL), for implementing an energy efficiency design and management procedure. The PDT is an integral part of an earlier developed holistic ship design optimization approach by NTUA-SDL that addresses the multi-objective ship design optimization problem. It provides Pareto-optimum solutions and a complete mapping of the design space in a comprehensive way for the final assessment and decision by all the involved stakeholders. The application of the tool to the design of a large oil tanker and alternatively to container ships is elaborated in the presented paper

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

    Discovering Valuable Items from Massive Data

    Full text link
    Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-Select, which utilizes prior knowledge about similarity be- tween items, expressed as a kernel function. GP-Select uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-Select to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-Select and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine de- sign task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users
    • …
    corecore