364,948 research outputs found

    Classification and Target Group Selection Based Upon Frequent Patterns

    Get PDF
    In this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is constructed. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The classification method utilizes directly this collection. Target group selection is a known problem in direct marketing. Our selection algorithm is based upon the collection of frequent patterns.classification;association rules;frequent item sets;target group selection

    Classification and Target Group Selection Based Upon Frequent Patterns

    Get PDF
    In this technical report , two new algorithms based upon frequent patterns are proposed. One algorithm is a classification method. The other one is an algorithm for target group selection. In both algorithms, first of all, the collection of frequent patterns in the training set is constructed. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The classification method utilizes directly this collection. Target group selection is a known problem in direct marketing. Our selection algorithm is based upon the collection of frequent patterns

    Mining Brain Networks using Multiple Side Views for Neurological Disorder Identification

    Full text link
    Mining discriminative subgraph patterns from graph data has attracted great interest in recent years. It has a wide variety of applications in disease diagnosis, neuroimaging, etc. Most research on subgraph mining focuses on the graph representation alone. However, in many real-world applications, the side information is available along with the graph data. For example, for neurological disorder identification, in addition to the brain networks derived from neuroimaging data, hundreds of clinical, immunologic, serologic and cognitive measures may also be documented for each subject. These measures compose multiple side views encoding a tremendous amount of supplemental information for diagnostic purposes, yet are often ignored. In this paper, we study the problem of discriminative subgraph selection using multiple side views and propose a novel solution to find an optimal set of subgraph features for graph classification by exploring a plurality of side views. We derive a feature evaluation criterion, named gSide, to estimate the usefulness of subgraph patterns based upon side views. Then we develop a branch-and-bound algorithm, called gMSV, to efficiently search for optimal subgraph features by integrating the subgraph mining process and the procedure of discriminative feature selection. Empirical studies on graph classification tasks for neurological disorders using brain networks demonstrate that subgraph patterns selected by the multi-side-view guided subgraph selection approach can effectively boost graph classification performances and are relevant to disease diagnosis.Comment: in Proceedings of IEEE International Conference on Data Mining (ICDM) 201

    Discovery of topological constraints on spatial object classes using a refined topological model

    Get PDF
    In a typical data collection process, a surveyed spatial object is annotated upon creation, and is classified based on its attributes. This annotation can also be guided by textual definitions of objects. However, interpretations of such definitions may differ among people, and thus result in subjective and inconsistent classification of objects. This problem becomes even more pronounced if the cultural and linguistic differences are considered. As a solution, this paper investigates the role of topology as the defining characteristic of a class of spatial objects. We propose a data mining approach based on frequent itemset mining to learn patterns in topological relations between objects of a given class and other spatial objects. In order to capture topological relations between more than two (linear) objects, this paper further proposes a refinement of the 9-intersection model for topological relations of line geometries. The discovered topological relations form topological constraints of an object class that can be used for spatial object classification. A case study has been carried out on bridges in the OpenStreetMap dataset for the state of Victoria, Australia. The results show that the proposed approach can successfully learn topological constraints for the class bridge, and that the proposed refined topological model for line geometries outperforms the 9-intersection model in this task

    Prognostic indicators of survival for patients with oral cavity squamous cell carcinoma in Norway. Outcomes in a retrospective, multicenter cohort, with special focus on oral tongue squamous cell carcinoma, 2005-2009

    Get PDF
    Oral cavity cancer (OCC) is the most frequent of head and neck cancers, most being squamous cell carcinomas (SCC). Within the oral cavity, the oral tongue is the most common site of cancer. These cancers are very aggressive, with poor survival. The treatment decision is based upon classifications of tumor (T), lymph node (N), and metastasis (M), although tumors with the same classification may act differently in aggressiveness. Treatment of these patients would be primary site surgery, with additional neck-dissection for many. Postsurgical radiotherapy may be added, rarely chemotherapy. The tumor growth pattern may predict the aggressiveness of the tumor, and thereby supplement the TNM classification in treatment decision. There is no established tumor growth pattern in use today that differentiates between those who need an additional neck-dissection and radiotherapy or chemotherapy; this may lead to overtreatment or undertreatment. This study investigated retrospectively a cohort of OCC from 2005-2009, which was called the Norwegian oral cancer (NOROC) study. We used data from the national Cause of Death Registry to calculate survival outcome. Spearman bivariate calculation was used to investigate correlations between the variables. Log-Rank univariate analyses were used to give Kaplan-Meier survival curves. Cox regression was used in multivariate analyses to determine which variables best predicted survival outcome. We found 535 primary treatment-naïve oral cavity SCC. Median age at diagnosis was 67 years; five-year disease-specific survival was 52%. Our data show that high-risk Human Papilloma Virus was not detected in oral tongue (OT) SCC. Growth patterns of tumor depth of invasion (DOI), tumor budding, and WHO differentiation and lympocytic infiltate in a combined histo-score, can predict the aggressiveness. Tumor DOI is already implemented in the new TNM classification. We suggest that other risk-patterns we found can be added to TNM classification for individualized treatment

    Prediction of peptides binding to MHC class I alleles by partial periodic pattern mining

    Get PDF
    MHC (Major Histocompatibility Complex) is a key player in the immune response of an organism. It is important to be able to predict which antigenic peptides will bind to a spe-cific MHC allele and which will not, creating possibilities for controlling immune response and for the applications of immunotherapy. However a problem encountered in the computational binding prediction methods for MHC class I is the presence of bulges and loops in the peptides, changing the total length. Most machine learning methods in use to-day require the sequences to be of same length to success-fully mine the binding motifs. We propose the use of time-based data mining methods in motif mining to be able to mine motifs position-independently. Also, the information for both binding and non-binding peptides are used on the contrary to the other methods which only rely on binding peptides. The prediction results are between 70-80% for the tested alleles

    Data mining based cyber-attack detection

    Get PDF

    "How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts

    Full text link
    Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained "dialogue acts" frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.Comment: 13 pages, 6 figures, IUI 201
    corecore