1,073 research outputs found

    Evolutionary clustering for categorical data using parametric links among multinomial mixture models

    Get PDF
    International audienceIn this paper, we propose a novel evolutionary clustering method for temporal categorical data based on parametric links among multinomial mixture models. Besides clustering, our main goal is to interpret the evolutions of clusters over time. To this aim, first we propose the formulation of a generalized model that establishes parametric links among two multinomial mixture. Afterward, different parametric sub-models are defined in order to model typical evolutions of the clustering structure. Model selection criteria allow to select the best sub-models and thus to guess the clustering evolution.For the experiments, first we evaluate the proposed method with synthetic temporal data. Next, we apply it to analyze the annotated social media data. Results show that the proposed method is better than the state-of-the-art based on the common evaluation metrics. Additionally, it can provide interpretation about the temporal evolution of the clusters

    Hierarchical Dirichlet Process-Based Models For Discovery of Cross-species Mammalian Gene Expression

    Get PDF
    An important research problem in computational biology is theidentification of expression programs, sets of co-activatedgenes orchestrating physiological processes, and thecharacterization of the functional breadth of these programs. Theuse of mammalian expression data compendia for discovery of suchprograms presents several challenges, including: 1) cellularinhomogeneity within samples, 2) genetic and environmental variationacross samples, and 3) uncertainty in the numbers of programs andsample populations. We developed GeneProgram, a new unsupervisedcomputational framework that uses expression data to simultaneouslyorganize genes into overlapping programs and tissues into groups toproduce maps of inter-species expression programs, which are sortedby generality scores that exploit the automatically learnedgroupings. Our method addresses each of the above challenges byusing a probabilistic model that: 1) allocates mRNA to differentexpression programs that may be shared across tissues, 2) ishierarchical, treating each tissue as a sample from a population ofrelated tissues, and 3) uses Dirichlet Processes, a non-parametricBayesian method that provides prior distributions over numbers ofsets while penalizing model complexity. Using real gene expressiondata, we show that GeneProgram outperforms several popularexpression analysis methods in recovering biologically interpretablegene sets. From a large compendium of mouse and human expressiondata, GeneProgram discovers 19 tissue groups and 100 expressionprograms active in mammalian tissues. Our method automaticallyconstructs a comprehensive, body-wide map of expression programs andcharacterizes their functional generality. This map can be used forguiding future biological experiments, such as discovery of genesfor new drug targets that exhibit minimal "cross-talk" withunintended organs, or genes that maintain general physiologicalresponses that go awry in disease states. Further, our method isgeneral, and can be applied readily to novel compendia of biologicaldata

    A Comprehensive Survey of Data Mining-based Fraud Detection Research

    Full text link
    This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.Comment: 14 page

    Event identification and analysis on Twitter

    Get PDF

    SEMANTIC SOCIAL NETWORK ANALYSIS FOR THE ENTERPRISE

    Get PDF
    Business processes are generally fixed and enforced strictly, as reflected by the static nature of underlying software systems and datasets. However, internal and external situations, organizational changes and various other factors trigger dynamism, which is reflected in the form of issues, complains, Q&A, opinions, reviews, etc, over a plethora of communication channels, such as email, chat, discussion forums, and internal social network. Careful and timely analysis and processing of such channels may lead to early detection of emerging trends, critical issues, opportunities, topics of interests, contributors, experts etc. Social network analytics have been successfully applied in general purpose, online social network platforms, like Facebook and Twitter. However, in order for such techniques to be useful in business context, it is mandatory to integrate them with underlying business systems, processes and practices. Such integration problem is increasingly recognized as Big Data problem. We argue that SemanticWeb technology applied with social network analytics can solve enterprise knowledge management, while achieving integration
    corecore