57,077 research outputs found

    Diversifying Top-K Results

    Full text link
    Top-k query processing finds a list of k results that have largest scores w.r.t the user given query, with the assumption that all the k results are independent to each other. In practice, some of the top-k results returned can be very similar to each other. As a result some of the top-k results returned are redundant. In the literature, diversified top-k search has been studied to return k results that take both score and diversity into consideration. Most existing solutions on diversified top-k search assume that scores of all the search results are given, and some works solve the diversity problem on a specific problem and can hardly be extended to general cases. In this paper, we study the diversified top-k search problem. We define a general diversified top-k search problem that only considers the similarity of the search results themselves. We propose a framework, such that most existing solutions for top-k query processing can be extended easily to handle diversified top-k search, by simply applying three new functions, a sufficient stop condition sufficient(), a necessary stop condition necessary(), and an algorithm for diversified top-k search on the current set of generated results, div-search-current(). We propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve the div-search-current() problem. div-astar is an A* based algorithm, div-dp is an algorithm that decomposes the results into components which are searched using div-astar independently and combined using dynamic programming. div-cut further decomposes the current set of generated results using cut points and combines the results using sophisticated operations. We conducted extensive performance studies using two real datasets, enwiki and reuters. Our div-cut algorithm finds the optimal solution for diversified top-k search problem in seconds even for k as large as 2,000.Comment: VLDB201

    Diversifying top-k results

    Full text link

    Diversification and fairness in top-k ranking algorithms

    Get PDF
    Given a user query, the typical user interfaces, such as search engines and recommender systems, only allow a small number of results to be returned to the user. Hence, figuring out what would be the top-k results is an important task in information retrieval, as it helps to ensure that the most relevant results are presented to the user. There exists an extensive body of research that studies how to score the records and return top-k to the user. Moreover, there exists an extensive set of criteria that researchers identify to present the user with top-k results, and result diversification is one of them. Diversifying the top-k result ensures that the returned result set is relevant as well as representative of the entire set of answers to the user query, and it is highly relevant in the context of search, recommendation, and data exploration. The goal of this dissertation is two-fold: the first goal is to focus on adapting existing popular diversification algorithms and studying how to expedite them without losing the accuracy of the answers. This work studies the scalability challenges of expediting the running time of existing diversification algorithms by designing a generic framework that produces the same results as the original algorithms, yet it is significantly faster in running time. This proposed approach handles scenarios where data change over a period of time and studies how to adapt the framework to accommodate data changes. The second aspect of the work studies how the existing top-k algorithms could lead to inequitable exposure of records that are equivalent qualitatively. This scenario is highly important for long-tail data where there exists a long tail of records that have similar utility, but the existing top-k algorithm only shows one of the top-ks, and the rest are never returned to the user. Both of these problems are studied analytically, and their hardness is studied. The contributions of this dissertation lie in (a) formalizing principal problems and studying them analytically. (b) designing scalable algorithms with theoretical guarantees, and (c) evaluating the efficacy and scalability of the designed solutions by comparing them with the state-of-the-art solutions over large-scale datasets

    Diversification in the international construction business

    Get PDF
    Economic globalization has created an interdependent market that allows companies to transcend traditional national boundaries to conduct business overseas. In the international construction market, companies often adopt diversification as a strategy for growth, for risk management or for both. However, the diversification patterns of international construction companies (ICCs) as a group are barely clear. The primary aim of this research is to cover this knowledge void by mapping ICCs’ diversification patterns in both business sectors and geographical dispersal. It starts from a literature review of diversification theories. Based on the review, a series of hypotheses relating to ICCs’ diversification are proposed. Data are gleaned from Engineering News-Record, i.e. Bloomberg and Capital IQ, ranging from 2001 to 2015. By testing the hypotheses, it is found that larger ICCs prefer to diversify than their smaller counterparts. Most of the ICCs tend to diversify to geographical markets with similar cultural or institutional environment. Market demands drive ICCs to diversify to different geographical markets while they are more prudential in venturing into new business sectors. The research provides not only valuable insights into diversification patterns of ICCs, but also a solid point of departure for future theoretical and empirical studies

    Efficient Diversification of Web Search Results

    Full text link
    In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track testbed. Results shown that OptSelect is able to run two orders of magnitude faster than the two other state-of-the-art approaches and to obtain comparable figures in diversification effectiveness.Comment: VLDB201

    Investor Protection and the Value Effects of Bank Merger Announcements in Europe and the US

    Get PDF
    Investor protection regimes have been shown to partly explain why the same type of corporate event may attract different investor reactions across countries. We compare the value effects of large bank merger announcements in Europe and the US and find an inverse relationship between the level of investor protection prevalent in the target country and abnormal returns that bidders realize during the announcement period. Accordingly, bidding banks realize higher returns when targeting low protection economies (most European economies) than bidders targeting institutions which operate under a high investor protection regime (the US). We argue that bidding bank shareholders need to be compensated for an increased risk of expropriation by insiders which they face in a low protection environment where takeover markets are illiquid and there are high private benefits of control

    Genomic and structural investigation on dolphin morbillivirus (DMV) in Mediterranean fin whales (Balaenoptera physalus).

    Get PDF
    Dolphin morbillivirus (DMV) has been deemed as one of the most relevant threats for fin whales (Balaenoptera physalus) being responsible for a mortality outbreak in the Mediterranean Sea in the last years. Knowledge of the complete viral genome is essential to understand any structural changes that could modify virus pathogenesis and viral tissue tropism. We report the complete DMV sequence of N, P/V/C, M, F and H genes identified from a fin whale and the comparison of primary to quaternary structure of proteins between this fin whale strain and some of those isolated during the 1990-'92 and the 2006-'08 epidemics. Some relevant substitutions were detected, particularly Asn52Ser located on F protein and Ile21Thr on N protein. Comparing mutations found in the fin whale DMV with those occurring in viral strains of other cetacean species, some of them were proven to be the result of diversifying selection, thus allowing to speculate on their role in host adaptation and on the way they could affect the interaction between the viral attachment and fusion with the target host cells
    • 

    corecore