655,982 research outputs found

    Association Discovery in Two-View Data

    Get PDF
    International audienceTwo-view datasets are datasets whose attributes are naturally split into two sets, each providing a different view on the same set of objects. We introduce the task of finding small and non-redundant sets of associations that describe how the two views are related. To achieve this, we propose a novel approach in which sets of rules are used to translate one view to the other and vice versa. Our models, dubbed translation tables, contain both unidirectional and bidirectional rules that span both views and provide lossless translation from either of the views to the opposite view. To be able to evaluate different translation tables and perform model selection, we present a score based on the Minimum Description Length (MDL) principle. Next, we introduce three TRANSLATOR algorithms to find good models according to this score. The first algorithm is parameter-free and iteratively adds the rule that improves compression most. The other two algorithms use heuristics to achieve better trade-offs between runtime and compression. The empirical evaluation on real-world data demonstrates that only modest numbers of associations are needed to characterize the two-view structure present in the data, while the obtained translation rules are easily interpretable and provide insight into the data

    Sentimental classification analysis of polarity multi-view textual data using data mining techniques

    Get PDF
    The data and information available in most community environments is complex in nature. Sentimental data resources may possibly consist of textual data collected from multiple information sources with different representations and usually handled by different analytical models. These types of data resource characteristics can form multi-view polarity textual data. However, knowledge creation from this type of sentimental textual data requires considerable analytical efforts and capabilities. In particular, data mining practices can provide exceptional results in handling textual data formats. Besides, in the case of the textual data exists as multi-view or unstructured data formats, the hybrid and integrated analysis efforts of text data mining algorithms are vital to get helpful results. The objective of this research is to enhance the knowledge discovery from sentimental multi-view textual data which can be considered as unstructured data format to classify the polarity information documents in the form of two different categories or types of useful information. A proposed framework with integrated data mining algorithms has been discussed in this paper, which is achieved through the application of X-means algorithm for clustering and HotSpot algorithm of association rules. The analysis results have shown improved accuracies of classifying the sentimental multi-view textual data into two categories through the application of the proposed framework on online polarity user-reviews dataset upon a given topics

    Toward accurate high-throughput SNP genotyping in the presence of inherited copy number variation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The recent discovery of widespread copy number variation in humans has forced a shift away from the assumption of two copies per locus per cell throughout the autosomal genome. In particular, a SNP site can no longer always be accurately assigned one of three genotypes in an individual. In the presence of copy number variability, the individual may theoretically harbor any number of copies of each of the two SNP alleles.</p> <p>Results</p> <p>To address this issue, we have developed a method to infer a "generalized genotype" from raw SNP microarray data. Here we apply our approach to data from 48 individuals and uncover thousands of aberrant SNPs, most in regions that were previously unreported as copy number variants. We show that our allele-specific copy numbers follow Mendelian inheritance patterns that would be obscured in the absence of SNP allele information. The interplay between duplication and point mutation in our data shed light on the relative frequencies of these events in human history, showing that at least some of the duplication events were recurrent.</p> <p>Conclusion</p> <p>This new multi-allelic view of SNPs has a complicated role in disease association studies, and further work will be necessary in order to accurately assess its importance. Software to perform generalized genotyping from SNP array data is freely available online <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

    Self-healing topology discovery protocol for software defined networks

    Get PDF
    “© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://ieeexplore.ieee.org/document/8319433/”This letter presents the design of a self-healing protocol for automatic discovery and maintenance of the network topology in Software Defined Networks (SDN). The proposed protocol integrates two enhanced features (i.e. layer 2 topology discovery and autonomic fault recovery) in a unified mechanism. This novel approach is validated through simulation experiments using OMNET++. Obtained results show that our protocol discovers and recovers the control topology efficiently in terms of time and message load over a wide range of generated networks.Peer ReviewedPostprint (author's final draft

    Web Usage Mining with Evolutionary Extraction of Temporal Fuzzy Association Rules

    Get PDF
    In Web usage mining, fuzzy association rules that have a temporal property can provide useful knowledge about when associations occur. However, there is a problem with traditional temporal fuzzy association rule mining algorithms. Some rules occur at the intersection of fuzzy sets' boundaries where there is less support (lower membership), so the rules are lost. A genetic algorithm (GA)-based solution is described that uses the flexible nature of the 2-tuple linguistic representation to discover rules that occur at the intersection of fuzzy set boundaries. The GA-based approach is enhanced from previous work by including a graph representation and an improved fitness function. A comparison of the GA-based approach with a traditional approach on real-world Web log data discovered rules that were lost with the traditional approach. The GA-based approach is recommended as complementary to existing algorithms, because it discovers extra rules. (C) 2013 Elsevier B.V. All rights reserved

    Assessment of the genetic basis of rosacea by genome-wide association study.

    Get PDF
    Rosacea is a common, chronic skin disease that is currently incurable. Although environmental factors influence rosacea, the genetic basis of rosacea is not established. In this genome-wide association study, a discovery group of 22,952 individuals (2,618 rosacea cases and 20,334 controls) was analyzed, leading to identification of two significant single-nucleotide polymorphisms (SNPs) associated with rosacea, one of which replicated in a new group of 29,481 individuals (3,205 rosacea cases and 26,262 controls). The confirmed SNP, rs763035 (P=8.0 × 10(-11) discovery group; P=0.00031 replication group), is intergenic between HLA-DRA and BTNL2. Exploratory immunohistochemical analysis of HLA-DRA and BTNL2 expression in papulopustular rosacea lesions from six individuals, including one with the rs763035 variant, revealed staining in the perifollicular inflammatory infiltrate of rosacea for both proteins. In addition, three HLA alleles, all MHC class II proteins, were significantly associated with rosacea in the discovery group and confirmed in the replication group: HLA-DRB1*03:01 (P=1.0 × 10(-8) discovery group; P=4.4 × 10(-6) replication group), HLA-DQB1*02:01 (P=1.3 × 10(-8) discovery group; P=7.2 × 10(-6) replication group), and HLA-DQA1*05:01 (P=1.4 × 10(-8) discovery group; P=7.6 × 10(-6) replication group). Collectively, the gene variants identified in this study support the concept of a genetic component for rosacea, and provide candidate targets for future studies to better understand and treat rosacea
    corecore