13 research outputs found

    Быстрая согласованность по Кемени на основе поиска по стандартным матрицам с минимальным расстоянием до усредненного экспертного ранжирования

    Get PDF
    Проблематика. Розглядається задача ранжування скінченної множини об’єктів. Мета дослідження. Розробка алгоритму, який дав би змогу пришвидшити пошук узгодженості за Кемені поряд з обґрунтуванням метрики для порівняння ранжувань. Методика реалізації. Пропонується й обґрунтовується підхід щодо об’єднання експертних ранжувань. Також пропонується й обґрунтовується метрика для порівняння ранжувань. Результати дослідження. Розроблений алгоритм знаходить множину ранжувань Кемені значно швидше, ніж класичний прямий пошук. Також ця множина часто містить єдину узгодженість за Кемені, що не вдається за прямого пошуку. Крім цього, єдина узгодженість за Кемені визначається відразу, якщо усереднене експертне ранжування виявляється ациклічним. Так розв’язується задача вибору єдиної узгодженості за Кемені. Висновки. Для 10 і більше об’єктів, де більшість відомих підходів стають незастосовними, алгоритм є реалізовним завдяки пошуку по тільки тих стандартних матрицях, чия відстань до першого ранжування відрізняється від відстані між цим ранжуванням та усередненим експертним ранжуванням на мінімальну величину.Background. The problem of ranking a finite set of objects is considered. Objective. The goal is to develop an algorithm that would let speed up the search of the Kemeny consensus along with substantiation of a metric to compare rankings. Methods. An approach for aggregating experts’ rankings is suggested and substantiated. Also a metric to compare rankings is suggested and substantiated. Results. The developed algorithm finds a set of Kemeny rankings much faster than the classical straightforward search. Also this set often contains a single Kemeny consensus, what fails by the straightforward search. Besides, a single Kemeny consensus is determined at one stroke if the averaged expert ranking turns out acyclic. Thus the problem of selecting a single Kemeny consensus is solved. Conclusions. For 10 objects and more, where most known approaches become intractable, the algorithm still is tractable due to searching over only those standard matrices whose distance to the first ranking differs minimally from the distance between this ranking and the averaged expert ranking.Проблематика. Рассматривается задача ранжирования конечного множества объектов. Цель исследования. Разработка алгоритма, который позволил бы ускорить поиск согласованности по Кемени вместе с обоснованием метрики для сравнения ранжирований. Методика реализации. Предлагается и обосновывается подход относительно объединения экспертных ранжирований. Также предлагается и обосновывается метрика для сравнения ранжирований. Результаты исследования. Разработанный алгоритм находит множество ранжирований Кемени гораздо быстрее, чем классический прямой поиск. Также это множество часто содержит единственную согласованность по Кемени, что не удается при прямом поиске. Кроме этого, единственная согласованность по Кемени определяется сразу, если усредненное экспертное ранжирование оказывается ациклическим. Так решается задача выбора единственной согласованности по Кемени. Выводы. Для 10 и более объектов, где большинство известных подходов становятся неисполнимыми, алгоритм является осуществимым благодаря поиску по только тем стандартным матрицам, чье расстояние к первому ранжированию отличается от расстояния между этим ранжированием и усредненным экспертным ранжированием на минимальную величину

    Next Generation Cluster Editing

    Get PDF
    This work aims at improving the quality of structural variant prediction from the mapped reads of a sequenced genome. We suggest a new model based on cluster editing in weighted graphs and introduce a new heuristic algorithm that allows to solve this problem quickly and with a good approximation on the huge graphs that arise from biological datasets

    Next generation cluster editing

    Get PDF

    Improved Parameterized Algorithms for the Kemeny Aggregation Problem

    Full text link
    We give improvements over fixed parameter tractable (FPT) algo-rithms to solve the Kemeny aggregation problem, where the task is to summarize a multi-set of preference lists, called votes, over a set of alternatives, called candidates, into a single preference list that has the minimum total τ-distance from the votes. The τ-distance between two preference lists is the number of pairs of candidates that are or-dered differently in the two lists. We study the problem for preference lists that are total orders. We develop algorithms of running times O∗(1.403kt), O∗(5.823kt/m) ≤ O∗(5.823kavg) and O∗(4.829kmax) for the problem, ignoring the polynomial factors in the O ∗ notation, where kt is the optimum total τ-distance, m is the number of votes, and kavg (resp, kmax) is the average (resp, maximum) over pairwise τ-distances of votes. Our algorithms improve the best previously known running times of O∗(1.53kt) and O∗(16kavg) ≤ O∗(16kmax) [4, 5], which also implies an O∗(164kt/m) running time. We also show how to enumerate all optimal solutions in O∗(36kt/m) ≤ O∗(36kavg) time.

    Going weighted: Parameterized algorithms for cluster editing

    Get PDF
    AbstractThe goal of the Cluster Editing problem is to make the fewest changes to the edge set of an input graph such that the resulting graph is a disjoint union of cliques. This problem is NP-complete but recently, several parameterized algorithms have been proposed. In this paper, we present a number of surprisingly simple search tree algorithms for Weighted Cluster Editing assuming that edge insertion and deletion costs are positive integers. We show that the smallest search tree has size O(1.82k) for edit cost k, resulting in the currently fastest parameterized algorithm, both for this problem and its unweighted counterpart. We have implemented and compared our algorithms, and achieved promising results.11This is an extended version of two articles published in: Proc. of the 6th Asia Pacific Bioinformatics Conference, APBC 2008, in: Series on Advances in Bioinformatics and Computational Biology, vol. 5, Imperial College Press, pp. 211–220; and in: Proc. of the 2nd Conference on Combinatorial Optimization and Applications, COCOA 2008, in: LNCS, vol. 5038, Springer, pp. 289–302

    Clustering and Validation of Microarray Data Using Consensus Clustering

    Get PDF
    Clustering is a popular method to glean useful information from microarray data. Unfortunately the results obtained from the common clustering algorithms are not consistent and even with multiple runs of different algorithms a further validation step is required. Due to absence of well defined class labels, and unknown number of clusters, the unsupervised learning problem of finding optimal clustering is hard. Obtaining a consensus of judiciously obtained clusterings not only provides stable results but also lends a high level of confidence in the quality of results. Several base algorithm runs are used to generate clusterings and a co-association matrix of pairs of points is obtained using a configurable majority criterion. Using this consensus as a similarity measure we generate a clustering using four algorithms. Synthetic as well as real world datasets are used in experiment and results obtained are compared using various internal and external validity measures. Results on real world datasets showed a marked improvement over those obtained by other researchers with the same datasets

    Fixed-Parameter Algorithms for Computing Kemeny Scores - Theory and Practice

    Full text link
    The central problem in this work is to compute a ranking of a set of elements which is "closest to" a given set of input rankings of the elements. We define "closest to" in an established way as having the minimum sum of Kendall-Tau distances to each input ranking. Unfortunately, the resulting problem Kemeny consensus is NP-hard for instances with n input rankings, n being an even integer greater than three. Nevertheless this problem plays a central role in many rank aggregation problems. It was shown that one can compute the corresponding Kemeny consensus list in f(k) + poly(n) time, being f(k) a computable function in one of the parameters "score of the consensus", "maximum distance between two input rankings", "number of candidates" and "average pairwise Kendall-Tau distance" and poly(n) a polynomial in the input size. This work will demonstrate the practical usefulness of the corresponding algorithms by applying them to randomly generated and several real-world data. Thus, we show that these fixed-parameter algorithms are not only of theoretical interest. In a more theoretical part of this work we will develop an improved fixed-parameter algorithm for the parameter "score of the consensus" having a better upper bound for the running time than previous algorithms.Comment: Studienarbei

    Preference relations based unsupervised rank aggregation for metasearch

    Get PDF
    Rank aggregation mechanisms have been used in solving problems from various domains such as bioinformatics, natural language processing, information retrieval, etc. Metasearch is one such application where a user gives a query to the metasearch engine, and the metasearch engine forwards the query to multiple individual search engines. Results or rankings returned by these individual search engines are combined using rank aggregation algorithms to produce the final result to be displayed to the user. We identify few aspects that should be kept in mind for designing any rank aggregation algorithms for metasearch. For example, generally equal importance is given to the input rankings while performing the aggregation. However, depending on the indexed set of web pages, features considered for ranking, ranking functions used etc. by the individual search engines, the individual rankings may be of different qualities. So, the aggregation algorithm should give more weight to the better rankings while giving less weight to others. Also, since the aggregation is performed when the user is waiting for response, the operations performed in the algorithm need to be light weight. Moreover, getting supervised data for rank aggregation problem is often difficult. In this paper, we present an unsupervised rank aggregation algorithm that is suitable for metasearch and addresses the aspects mentioned above. We also perform detailed experimental evaluation of the proposed algorithm on four different benchmark datasets having ground truth information. Apart from the unsupervised Kendall-Tau distance measure, several supervised evaluation measures are used for performance comparison. Experimental results demonstrate the efficacy of the proposed algorithm over baseline methods in terms of supervised evaluation metrics. Through these experiments we also show that Kendall-Tau distance metric may not be suitable for evaluating rank aggregation algorithms for metasearch
    corecore