57,077 research outputs found
Diversifying Top-K Results
Top-k query processing finds a list of k results that have largest scores
w.r.t the user given query, with the assumption that all the k results are
independent to each other. In practice, some of the top-k results returned can
be very similar to each other. As a result some of the top-k results returned
are redundant. In the literature, diversified top-k search has been studied to
return k results that take both score and diversity into consideration. Most
existing solutions on diversified top-k search assume that scores of all the
search results are given, and some works solve the diversity problem on a
specific problem and can hardly be extended to general cases. In this paper, we
study the diversified top-k search problem. We define a general diversified
top-k search problem that only considers the similarity of the search results
themselves. We propose a framework, such that most existing solutions for top-k
query processing can be extended easily to handle diversified top-k search, by
simply applying three new functions, a sufficient stop condition sufficient(),
a necessary stop condition necessary(), and an algorithm for diversified top-k
search on the current set of generated results, div-search-current(). We
propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve
the div-search-current() problem. div-astar is an A* based algorithm, div-dp is
an algorithm that decomposes the results into components which are searched
using div-astar independently and combined using dynamic programming. div-cut
further decomposes the current set of generated results using cut points and
combines the results using sophisticated operations. We conducted extensive
performance studies using two real datasets, enwiki and reuters. Our div-cut
algorithm finds the optimal solution for diversified top-k search problem in
seconds even for k as large as 2,000.Comment: VLDB201
Diversification and fairness in top-k ranking algorithms
Given a user query, the typical user interfaces, such as search engines and recommender systems, only allow a small number of results to be returned to the user. Hence, figuring out what would be the top-k results is an important task in information retrieval, as it helps to ensure that the most relevant results are presented to the user. There exists an extensive body of research that studies how to score the records and return top-k to the user. Moreover, there exists an extensive set of criteria that researchers identify to present the user with top-k results, and result diversification is one of them. Diversifying the top-k result ensures that the returned result set is relevant as well as representative of the entire set of answers to the user query, and it is highly relevant in the context of search, recommendation, and data exploration. The goal of this dissertation is two-fold: the first goal is to focus on adapting existing popular diversification algorithms and studying how to expedite them without losing the accuracy of the answers. This work studies the scalability challenges of expediting the running time of existing diversification algorithms by designing a generic framework that produces the same results as the original algorithms, yet it is significantly faster in running time. This proposed approach handles scenarios where data change over a period of time and studies how to adapt the framework to accommodate data changes. The second aspect of the work studies how the existing top-k algorithms could lead to inequitable exposure of records that are equivalent qualitatively. This scenario is highly important for long-tail data where there exists a long tail of records that have similar utility, but the existing top-k algorithm only shows one of the top-ks, and the rest are never returned to the user. Both of these problems are studied analytically, and their hardness is studied. The contributions of this dissertation lie in (a) formalizing principal problems and studying them analytically. (b) designing scalable algorithms with theoretical guarantees, and (c) evaluating the efficacy and scalability of the designed solutions by comparing them with the state-of-the-art solutions over large-scale datasets
Diversification in the international construction business
Economic globalization has created an interdependent market that allows companies to transcend traditional national boundaries to conduct business overseas. In the international construction market, companies often adopt diversification as a strategy for growth, for risk management or for both. However, the diversification patterns of international construction companies (ICCs) as a group are barely clear. The primary aim of this research is to cover this knowledge void by mapping ICCsâ diversification patterns in both business sectors and geographical dispersal. It starts from a literature review of diversification theories. Based on the review, a series of hypotheses relating to ICCsâ diversification are proposed. Data are gleaned from Engineering News-Record, i.e. Bloomberg and Capital IQ, ranging from 2001 to 2015. By testing the hypotheses, it is found that larger ICCs prefer to diversify than their smaller counterparts. Most of the ICCs tend to diversify to geographical markets with similar cultural or institutional environment. Market demands drive ICCs to diversify to different geographical markets while they are more prudential in venturing into new business sectors. The research provides not only valuable insights into diversification patterns of ICCs, but also a solid point of departure for future theoretical and empirical studies
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
Investor Protection and the Value Effects of Bank Merger Announcements in Europe and the US
Investor protection regimes have been shown to partly explain why the same type of corporate event may attract different investor reactions across countries. We compare the value effects of large bank merger announcements in Europe and the US and find an inverse relationship between the level of investor protection prevalent in the target country and abnormal returns that bidders realize during the announcement period. Accordingly, bidding banks realize higher returns when targeting low protection economies (most European economies) than bidders targeting institutions which operate under a high investor protection regime (the US). We argue that bidding bank shareholders need to be compensated for an increased risk of expropriation by insiders which they face in a low protection environment where takeover markets are illiquid and there are high private benefits of control
Genomic and structural investigation on dolphin morbillivirus (DMV) in Mediterranean fin whales (Balaenoptera physalus).
Dolphin morbillivirus (DMV) has been deemed as one of the most relevant threats for fin whales (Balaenoptera physalus) being responsible for a mortality outbreak in the Mediterranean Sea in the last years. Knowledge of the complete viral genome is essential to understand any structural changes that could modify virus pathogenesis and viral tissue tropism. We report the complete DMV sequence of N, P/V/C, M, F and H genes identified from a fin whale and the comparison of primary to quaternary structure of proteins between this fin whale strain and some of those isolated during the 1990-'92 and the 2006-'08 epidemics. Some relevant substitutions were detected, particularly Asn52Ser located on F protein and Ile21Thr on N protein. Comparing mutations found in the fin whale DMV with those occurring in viral strains of other cetacean species, some of them were proven to be the result of diversifying selection, thus allowing to speculate on their role in host adaptation and on the way they could affect the interaction between the viral attachment and fusion with the target host cells
- âŠ