144 research outputs found

    On the class distribution labelling step sensitivity of co-training

    Get PDF
    Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI

    On the class distribution labelling step sensitivity of co-training

    Get PDF
    Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set. However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training. Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI

    A hybrid wrapper/filter approach for feature subset selection

    Get PDF
    This work presents a hybrid wrapper/filter algorithm for feature subset selection that can use a combination of several quality criteria measures to rank the set of features of a dataset. These ranked features are used to prune the search space of subsets of possible features such that the number of times the wrapper executes the learning algorithm for a dataset with M features is reduced to O(M) runs. Experimental results using 14 datasets show that, for most of the datasets, the AUC assessed using the reduced feature set is comparable to the AUC of the model constructed using all the features. Furthermore, the algorithm archieved a good reduction in the number of features.Sociedad Argentina de Informática e Investigación Operativ

    A Method for Refining Knowledge Rules Using Exceptions

    Get PDF
    The search for patterns in data sets is a fundamental task in Data Mining, where Machine Learning algorithms are generally used. However, Machine Learning algorithms have biases that strengthen the classifica-tion task, not taking into consideration exceptions. Exceptions contra-dict common sense rules. They are generally unknown, unexpected and contradictory to the user believes. For this reason, exceptions may be interesting. In this work we propose a method to find exceptions out from common sense rules. Besides, we apply the proposed method in a real world data set, to discover rules and exceptions in the HIV virus protein cleavage process.Sociedad Argentina de Informática e Investigación Operativ

    The central region of the msp gene of Treponema denticola has sequence heterogeneity among clinical samples, obtained from patients with periodontitis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Treponema denticola </it>is an oral spirochete involved in the pathogenesis and progression of periodontal disease. Of its virulence factors, the major surface protein (MSP) plays a role in the interaction between the treponeme and host. To understand the possible evolution of this protein, we analyzed the sequence of the <it>msp </it>gene in 17 <it>T. denticola </it>positive clinical samples.</p> <p>Methods</p> <p>Nucleotide and amino acid sequence of MSP have been determined by PCR amplification and sequencing in seventeen <it>T. denticola </it>clinical specimens to evaluate the genetic variability and the philogenetic relationship of the <it>T. denticola msp </it>gene among the different amplified sequence of positive samples. In silico antigenic analysis was performed on each MSP sequences to determined possible antigenic variation.</p> <p>Results</p> <p>The <it>msp </it>sequences showed two highly conserved 5' and 3' ends and a central region that varies substantially. Phylogenetic analysis categorized the 17 specimens into 2 principal groups, suggesting a low rate of evolutionary variability and an elevated degree of conservation of <it>msp </it>in clinically derived genetic material. Analysis of the predicted antigenic variability between isolates, demonstrated that the major differences lay between amino acids 200 and 300.</p> <p>Conclusion</p> <p>These findings showed for the first time, the nucleotide and amino acids variation of the <it>msp </it>gene in infecting <it>T. denticola</it>, <it>in vivo</it>. This data suggested that the antigenic variability found in to the MSP molecule, may be an important factor involved in immune evasion by <it>T. denticola</it>.</p
    corecore