77,946 research outputs found

    Are Algorithms Directly Optimizing IR Measures Really Direct?

    Get PDF
    Abstract In information retrieval (IR), the objective of ranking problem is to construct and return a ranked list of relevant documents to the user. The document ranking list is demanded to satisfy user's information need as much as possible with respect to a user's query. To evaluate the goodness of the returned document ranking list, performance measures, such as Normalized Discounted Cumulative Gain (NDCG) and Mean Average Precision (MAP), are adopted. Many learning to rank algorithms, which automatically learn ranking function through optimizing specially designed objective functions, are proposed to resolve the ranking problem. Intuitively, the IR performance measures are the ideal objective functions to be optimized to learn ranking function. However, IR performance measures, such as NDCG and MAP, are non-smooth and non-differentiable with respect to the ranking function parameter. Thus, most existing learning to rank algorithms are designed to optimize objective functions that are loosely related to the IR performance measures. As a result, such algorithms may only achieve sub-optimization of the IR performance measures even they can perform very well on optimizing their adopted objective functions. Therefore, it is highly demanded that learning to rank algorithms should be improved to be able to directly or approximately directly optimize information retrieval performance measures. To tackle the challenge of direct optimization of IR performance measures, several approaches, such as SoftRank[1] and SVM-MAP[2] are proposed. Although these algorithms can achieve good empirical performance, there are still some questions that are unclear and not yet answered: a) can ranking function learned by direct optimization of IR performance measures still perform well over unseen queries with respect to the optimized IR performance measures? b) how directly are IR performance measures optimized by the proposed approaches? In this report, we will attempt to answer the above questions. We first point out that, under some conditions, the ranking function learned by direct optimization of IR performance measures can also perform well upon unseen queries with respect to the optimized IR performance measures. Then, to study how directly IR performance measures are optimized by previous approaches, we proposed a directness evaluate metric. Based on this metric, SoftRank is analyzed and corresponding results are presented

    The most representative composite rank ordering of multi-attribute objects by the particle swarm optimization

    Get PDF
    Rank-ordering of individuals or objects on multiple criteria has many important practical applications. A reasonably representative composite rank ordering of multi-attribute objects/individuals or multi-dimensional points is often obtained by the Principal Component Analysis, although much inferior but computationally convenient methods also are frequently used. However, such rank ordering – even the one based on the Principal Component Analysis – may not be optimal. This has been demonstrated by several numerical examples. To solve this problem, the Ordinal Principal Component Analysis was suggested some time back. However, this approach cannot deal with various types of alternative schemes of rank ordering, mainly due to its dependence on the method of solution by the constrained integer programming. In this paper we propose an alternative method of solution, namely by the Particle Swarm Optimization. A computer program in FORTRAN to solve the problem has also been provided. The suggested method is notably versatile and can take care of various schemes of rank ordering, norms and types or measures of correlation. The versatility of the method and its capability to obtain the most representative composite rank ordering of multi-attribute objects or multi-dimensional points have been demonstrated by several numerical examples. It has also been found that rank ordering based on maximization of the sum of absolute values of the correlation coefficients of composite rank scores with its constituent variables has robustness, but it may have multiple optimal solutions. Thus, while it solves the one problem, it gives rise to the other problem. The overall ranking of objects by maximin correlation principle performs better if the composite rank scores are obtained by direct optimization with respect to the individual ranking scores.Rank ordering, standard; modified; competition; fractional; dense; ordinal; principal component; integer programming; repulsive particle swarm; maximin; absolute; correlation; FORTRAN; program

    Netter: re-ranking gene network inference predictions using structural network properties

    Get PDF
    Background: Many algorithms have been developed to infer the topology of gene regulatory networks from gene expression data. These methods typically produce a ranking of links between genes with associated confidence scores, after which a certain threshold is chosen to produce the inferred topology. However, the structural properties of the predicted network do not resemble those typical for a gene regulatory network, as most algorithms only take into account connections found in the data and do not include known graph properties in their inference process. This lowers the prediction accuracy of these methods, limiting their usability in practice. Results: We propose a post-processing algorithm which is applicable to any confidence ranking of regulatory interactions obtained from a network inference method which can use, inter alia, graphlets and several graph-invariant properties to re-rank the links into a more accurate prediction. To demonstrate the potential of our approach, we re-rank predictions of six different state-of-the-art algorithms using three simple network properties as optimization criteria and show that Netter can improve the predictions made on both artificially generated data as well as the DREAM4 and DREAM5 benchmarks. Additionally, the DREAM5 E. coli. community prediction inferred from real expression data is further improved. Furthermore, Netter compares favorably to other post-processing algorithms and is not restricted to correlation-like predictions. Lastly, we demonstrate that the performance increase is robust for a wide range of parameter settings. Netter is available at http://bioinformatics. intec. ugent. be. Conclusions: Network inference from high-throughput data is a long-standing challenge. In this work, we present Netter, which can further refine network predictions based on a set of user-defined graph properties. Netter is a flexible system which can be applied in unison with any method producing a ranking from omics data. It can be tailored to specific prior knowledge by expert users but can also be applied in general uses cases. Concluding, we believe that Netter is an interesting second step in the network inference process to further increase the quality of prediction

    Opinion-Based Centrality in Multiplex Networks: A Convex Optimization Approach

    Full text link
    Most people simultaneously belong to several distinct social networks, in which their relations can be different. They have opinions about certain topics, which they share and spread on these networks, and are influenced by the opinions of other persons. In this paper, we build upon this observation to propose a new nodal centrality measure for multiplex networks. Our measure, called Opinion centrality, is based on a stochastic model representing opinion propagation dynamics in such a network. We formulate an optimization problem consisting in maximizing the opinion of the whole network when controlling an external influence able to affect each node individually. We find a mathematical closed form of this problem, and use its solution to derive our centrality measure. According to the opinion centrality, the more a node is worth investing external influence, and the more it is central. We perform an empirical study of the proposed centrality over a toy network, as well as a collection of real-world networks. Our measure is generally negatively correlated with existing multiplex centrality measures, and highlights different types of nodes, accordingly to its definition

    End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss

    Full text link
    Cross-modality retrieval encompasses retrieval tasks where the fetched items are of a different type than the search query, e.g., retrieving pictures relevant to a given text query. The state-of-the-art approach to cross-modality retrieval relies on learning a joint embedding space of the two modalities, where items from either modality are retrieved using nearest-neighbor search. In this work, we introduce a neural network layer based on Canonical Correlation Analysis (CCA) that learns better embedding spaces by analytically computing projections that maximize correlation. In contrast to previous approaches, the CCA Layer (CCAL) allows us to combine existing objectives for embedding space learning, such as pairwise ranking losses, with the optimal projections of CCA. We show the effectiveness of our approach for cross-modality retrieval on three different scenarios (text-to-image, audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a multi-view network using freely learned projections optimized by a pairwise ranking loss, especially when little training data is available (the code for all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal of Multimedia Information Retrieva

    A recommender system for process discovery

    Get PDF
    Over the last decade, several algorithms for process discovery and process conformance have been proposed. Still, it is well-accepted that there is no dominant algorithm in any of these two disciplines, and then it is often difficult to apply them successfully. Most of these algorithms need a close-to expert knowledge in order to be applied satisfactorily. In this paper, we present a recommender system that uses portfolio-based algorithm selection strategies to face the following problems: to find the best discovery algorithm for the data at hand, and to allow bridging the gap between general users and process mining algorithms. Experiments performed with the developed tool witness the usefulness of the approach for a variety of instances.Peer ReviewedPostprint (author’s final draft

    Identification of functionally related enzymes by learning-to-rank methods

    Full text link
    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

    Ranking Alternatives on the Basis of a Dominance Intensity Measure

    Get PDF
    The additive multi-attribute utility model is widely used within MultiAttribute Utility Theory (MAUT), demanding all the information describing the decision-making situation. However, these information requirements can obviously be far too strict in many practical situations. Consequently, incomplete information about input parameters has been incorporated into the decisionmaking process. We propose an approach based on a dominance intensity measure to deal with such situations. The approach is based on the dominance values between pairs of alternatives that can be computed by linear programming. These dominance values are transformed into dominance intensities from which a dominance intensity measure is derived. It is used to analyze the robustness of a ranking of technologies for the disposition of surplus weapons-grade plutonium by the Department of Energy in the USA, and compared with other dominance measuring methods

    Hashing as Tie-Aware Learning to Rank

    Full text link
    Hashing, or learning binary embeddings of data, is frequently used in nearest neighbor retrieval. In this paper, we develop learning to rank formulations for hashing, aimed at directly optimizing ranking-based evaluation metrics such as Average Precision (AP) and Normalized Discounted Cumulative Gain (NDCG). We first observe that the integer-valued Hamming distance often leads to tied rankings, and propose to use tie-aware versions of AP and NDCG to evaluate hashing for retrieval. Then, to optimize tie-aware ranking metrics, we derive their continuous relaxations, and perform gradient-based optimization with deep neural networks. Our results establish the new state-of-the-art for image retrieval by Hamming ranking in common benchmarks.Comment: 15 pages, 3 figures. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201
    • …
    corecore