56 research outputs found

    Single-Class Learning for Spam Filtering: An Ensemble Approach

    Get PDF
    Spam, also known as Unsolicited Commercial Email (UCE), has been an increasingly annoying problem to individuals and organizations. Most of prior research formulated spam filtering as a classical text categorization task, in which training examples must include both spam emails (positive examples) and legitimate mails (negatives). However, in many spam filtering scenarios, obtaining legitimate emails for training purpose is more difficult than collecting spam and unclassified emails. Hence, it would be more appropriate to construct a classification model for spam filtering from positive (i.e., spam emails) and unlabeled instances only; i.e., training a spam filter without any legitimate emails as negative training examples. Several single-class learning techniques that include PNB and PEBL have been proposed in the literature. However, they incur fundamental limitations when applying to spam filtering. In this study, we propose and develop an ensemble approach, referred to as E2, to address the limitations of PNB and PEBL. Specifically, we follow the two-stage framework of PEBL and extend each stage with an ensemble strategy. Our empirical evaluation results on two spam-filtering corpora suggest that the proposed E2 technique exhibits more stable and reliable performance than its benchmark techniques (i.e., PNB and PEBL)

    IMPROVING RECOMMENDATION PERFORMANCE WITH USER INTEREST EVOLUTION PATTERNS

    Get PDF
    Effective recommendation is indispensable to customized or personalized services. Collaborative filtering approach is a salient technique to support automated recommendations, which relies on the profiles of customers to make recommendations to a target customer based on the neighbors with similar preferences. However, traditional collaborative recommendation techniques only use static information of customers’ preferences and ignore the evolution of their purchasing behaviours which contain valuable information for making recommendations. Thus, this study proposes an approach to increase the effectiveness of personalized recommendations by mining the sequence patterns from the evolving preferences of a target customer over time. The experimental results have shown that the proposed technique has improved the recommendation precision in comparison with collaborative filtering method based on Top k recommendation

    Cost-Sensitive Learning for Recurrence Prediction of Breast Cancer

    Get PDF
    Breast cancer is one of the top cancer-death causes and specifically accounts for 10.4% of all cancer incidences among women. The prediction of breast cancer recurrence has been a challenging research problem for many researchers. Data mining techniques have recently received considerable attention, especially when used for the construction of prognosis models from survival data. However, existing data mining techniques may not be effective to handle censored data. Censored instances are often discarded when applying classification techniques to prognosis. In this paper, we propose a cost-sensitive learning approach to involve the censored data in prognostic assessment with better recurrence prediction capability. The proposed approach employs an outcome inference mechanism to infer the possible probabilistic outcome of each censored instance and adopt the cost-proportionate rejection sampling and a committee machine strategy to take into account these instances with probabilistic outcomes during the classification model learning process. We empirically evaluate the effectiveness of our proposed approach for breast cancer recurrence prediction and include a censored-data-discarding method (i.e., building the recurrence prediction model by only using uncensored data) and the Kaplan-Meier method (a common prognosis method) as performance benchmarks. Overall, our evaluation results suggest that the proposed approach outperforms its benchmark techniques, measured by precision, recall and F1 score

    Supporting Acute Appendicitis Diagnosis: A Pre-Clustering-Based Classification Technique

    Get PDF
    Service quality and cost containment represent two critical challenges in healthcare management. Toward that end, acute appendicitis, a common surgical condition, is important and requires timely, accurate diagnosis. The diverse and atypical symptoms make such diagnoses difficult, thus resulting in increased morbidity and negative appendectomy. While prior research has recognized the use of classification analysis to support acute appendicitis diagnosis, the skewed distribution of the cases pertaining to positive or negative acute appendicitis has significantly constrained the effectiveness of the existing classification techniques. In this study, we develop a pre-clustering-based classification (PCC) technique to address the skewed distribution problem common to acute appendicitis diagnosis. We empirically evaluate the proposed PCC technique with 574 clinical cases of positive and negative acute appendicitis obtained from a tertiary medical center in Taiwan. Our evaluation includes tradition support vector machine, a prevalent resampling classification technique, Alvarado scoring system, and a multi-classifier committee for performance benchmark purposes. Our results show the PCC technique more effective and less biased than the benchmark techniques, without favoring the positive or negative class

    Positive Example Learning for Content-Based Recommendations: A Cost-Sensitive Learning-Based Approach

    Get PDF
    Existing supervised learning techniques can support product recommendations but are ineffective in scenarios characterized by single-class learning; i.e., training samples consisted of some positive examples and a much greater number of unlabeled examples. To address the limitations inherent in existing single-class learning techniques, we develop COst-sensitive Learning-based Positive Example Learning (COLPEL), which constructs an automated classifier from a singleclass training sample. Our method employs cost-proportionate rejection sampling to derive, from unlabeled examples, a subset likely to feature negative examples, according to the respective misclassification costs. COLPEL follows a committee machine strategy, thereby constructing a set of automated classifiers used together to reduce probable biases common to a single classifier. We use customers’ book ratings from the Amazon.com Web site to evaluate COLPEL, with PNB and PEBL as benchmarks. Our results show that COLPEL outperforms both PNB and PEBL, as measured by its accuracy, positive F1 score, and negative F1 score

    A Temporal Frequent Itemset-Based Clustering Approach For Discovering Event Episodes From News Sequence

    Get PDF
    When performing environmental scanning, organizations typically deal with a numerous of events and topics about their core business, relevant technique standards, competitors, and market, where each event or topic to monitor or track generally is associated with many news documents. To reduce information overload and information fatigues when monitoring or tracking such events, it is essential to develop an effective event episode discovery mechanism for organizing all news documents pertaining to an event of interest. In this study, we propose the time-adjoining frequent itemset-based event-episode discovery (TAFIED) technique. Based on the frequent itemset-based hierarchical clustering (FIHC) approach, our proposed TAFIED further considers the temporal characteristic of news articles, including the burst, novelty, and temporal proximity of features in an event episode, when discovering event episodes from the sequence of news articles pertaining to a specific event. Using the traditional feature-based HAC, HAC with a time-decaying function (HAC+TD), and FIHC techniques as performance benchmarks, our empirical evaluation results suggest that the proposed TAFIED technique outperforms all evaluation benchmarks in cluster recall and cluster precision

    Microbiologic Characteristics, Serologic Responses, and Clinical Manifestations in Severe Acute Respiratory Syndrome, Taiwan1

    Get PDF
    The genome of one Taiwanese severe acute respiratory syndrome-associated coronavirus (SARS-CoV) strain (TW1) was 29,729 nt in length. Viral RNA may persist for some time in patients who seroconvert, and some patients may lack an antibody response (immunoglobulin G) to SARS-CoV >21 days after illness onset. An upsurge of antibody response was associated with the aggravation of respiratory failure

    MicroRNA profiling in ischemic injury of the gracilis muscle in rats

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To profile the expression of microRNAs (miRNAs) and their potential target genes in the gracilis muscles following ischemic injury in rats by monitoring miRNA and mRNA expression on a genome-wide basis.</p> <p>Methods</p> <p>Following 4 h of ischemia and subsequent reperfusion for 4 h of the gracilis muscles, the specimens were analyzed with an Agilent rat miRNA array to detect the expressed miRNAs in the experimental muscles compared to those from the sham-operated controls. Their expressions were subsequently quantified by real-time reverse transcription polymerase chain reaction (real-time RT-PCR) to determine their expression pattern after different durations of ischemia and reperfusion. In addition, the expression of the mRNA in the muscle specimens after 4 h of ischemia and reperfusion for 1, 3, 7, and 14 d were detected with the Agilent Whole Rat Genome 4 Ă— 44 k oligo microarray. A combined approach using a computational prediction algorithm that included miRanda, PicTar, TargetScanS, MirTarget2, RNAhybrid, and the whole genome microarray experiment was performed by monitoring the mRNA:miRNA association to identify potential target genes.</p> <p>Results</p> <p>Three miRNAs (miR-21, miR-200c, and miR-205) of 350 tested rat miRNAs were found to have an increased expression in the miRNA array. Real-time RT-PCR demonstrated that, with 2-fold increase after 4 h of ischemia, a maximum 24-fold increase at 7 d, and a 7.5-fold increase at 14 d after reperfusion, only the miR-21, but not the miR-200c or miR-205 was upregulated throughout the experimental time. In monitoring the target genes of miR-21 in the expression array at 1, 3, 7, 14 d after reperfusion, with persistent expression throughout the experiment, we detected the same 4 persistently downregulated target genes (<it>Nqo1</it>, <it>Pdpn</it>, <it>CXCL3</it>, and <it>Rad23b</it>) with the prediction algorithms miRanda and RNAhybrid, but no target gene was revealed with PicTar, TargetScanS, and MirTarget2.</p> <p>Conclusions</p> <p>This study revealed 3 upregulated miRNAs in the gracilis muscle following ischemic injury and identified 4 potential target genes of miR-21 by examining miRNAs and mRNAs expression patterns in a time-course fashion using a combined approach with prediction algorithms and a whole genome expression array experiment.</p
    • …
    corecore