10 research outputs found

    Aggregating Inconsistent Information: Ranking and Clustering

    Get PDF
    ABSTRACT We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the number of disagreements with the respective inputs. Specifically, the problems we address are rank aggregation, the feedback arc set problem on tournaments, and correlation and consensus clustering. We show that for all these problems (and various weighted versions of them), we can obtain improved approximation factors using essentially the same remarkably simple algorithm. Additionally, we almost settle a long-standing conjecture of Bang-Jensen and Thomassen and show that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments

    On Geometric Prototype and Applications

    Get PDF
    In this paper, we propose to study a new geometric optimization problem called the "geometric prototype" in Euclidean space. Given a set of patterns, where each pattern is represented by a (weighted or unweighted) point set, the geometric prototype can be viewed as the "average pattern" minimizing the total matching cost to them. As a general model, the problem finds many applications in real-world, such as Wasserstein barycenter and ensemble clustering. The dimensionality could be either constant or high, depending on the applications. To our best knowledge, the general geometric prototype problem has yet to be seriously considered by the theory community. To bridge the gap between theory and practice, we first show that a small core-set can be obtained to substantially reduce the data size. Consequently, any existing heuristic or algorithm can run on the core-set to achieve a great improvement on the efficiency. As a new application of core-set, it needs to tackle a couple of challenges particularly in theory. Finally, we test our method on both image and high dimensional clustering datasets; the experimental results remain stable even if we run the algorithms on core-sets much smaller than the original datasets, while the running times are reduced significantly

    Multi criteria decision making using correlation coefficient under rough neutrosophic environment

    Get PDF
    In this paper, we define correlation coefficient measure between any two rough neutrosophic sets. We also prove some of its basic properties.. We develop a new multiple attribute group decision making method based on the proposed correlation coefficient measure. An illustrative example of medical diagnosis is solved to demonstrate the applicability and effecriveness of the proposed method

    Doctor of Philosophy

    Get PDF
    dissertationWith the tremendous growth of data produced in the recent years, it is impossible to identify patterns or test hypotheses without reducing data size. Data mining is an area of science that extracts useful information from the data by discovering patterns and structures present in the data. In this dissertation, we will largely focus on clustering which is often the first step in any exploratory data mining task, where items that are similar to each other are grouped together, making downstream data analysis robust. Different clustering techniques have different strengths, and the resulting groupings provide different perspectives on the data. Due to the unsupervised nature i.e., the lack of domain experts who can label the data, validation of results is very difficult. While there are measures that compute "goodness" scores for clustering solutions as a whole, there are few methods that validate the assignment of individual data items to their clusters. To address these challenges we focus on developing a framework that can generate, compare, combine, and evaluate different solutions to make more robust and significant statements about the data. In the first part of this dissertation, we present fast and efficient techniques to generate and combine different clustering solutions. We build on some recent ideas on efficient representations of clusters of partitions to develop a well founded metric that is spatially aware to compare clusterings. With the ability to compare clusterings, we describe a heuristic to combine different solutions to produce a single high quality clustering. We also introduce a Markov chain Monte Carlo approach to sample different clusterings from the entire landscape to provide the users with a variety of choices. In the second part of this dissertation, we build certificates for individual data items and study their influence on effective data reduction. We present a geometric approach by defining regions of influence for data items and clusters and use this to develop adaptive sampling techniques to speedup machine learning algorithms. This dissertation is therefore a systematic approach to study the landscape of clusterings in an attempt to provide a better understanding of the data

    Full Issue

    Get PDF

    Source identification in image forensics

    Get PDF
    Source identification is one of the most important tasks in digital image forensics. In fact, the ability to reliably associate an image with its acquisition device may be crucial both during investigations and before a court of law. For example, one may be interested in proving that a certain photo was taken by his/her camera, in order to claim intellectual property. On the contrary, it may be law enforcement agencies that are interested to trace back the origin of some images, because they violate the law themselves (e.g. do not respect privacy laws), or maybe they point to subjects involved in unlawful and dangerous activities (like terrorism, pedo-pornography, etc). More in general, proving, beyond reasonable doubts, that a photo was taken by a given camera, may be an important element for decisions in court. The key assumption of forensic source identification is that acquisition devices leave traces in the acquired content, and that instances of these traces are specific to the respective (class of) device(s). This kind of traces is present in the so-called device fingerprint. The name stems from the forensic value of human fingerprints. Motivated by the importance of the source identification in digital image forensics community and the need of reliable techniques using device fingerprint, the work developed in the Ph.D. thesis concerns different source identification level, using both feature-based and PRNU-based approach for model and device identification. In addition, it is also shown that counter-forensics methods can easily attack machine learning techniques for image forgery detection. In model identification, an analysis of hand-crafted local features and deep learning ones has been considered for the basic two-class classification problem. In addition, a comparison with the limited knowledge and the blind scenario are presented. Finally, an application of camera model identification on various iris sensor models is conducted. A blind scenario technique that faces the problem of device source identification using the PRNU-based approach is also proposed. With the use of the correlation between single-image sensor noise, a blind two-step source clustering is proposed. In the first step correlation clustering together with ensemble method is used to obtain an initial partition, which is then refined in the second step by means of a Bayesian approach. Experimental results show that this proposal outperforms the state-of-the-art techniques and still give an acceptable performance when considering images downloaded from Facebook
    corecore