144 research outputs found
On the class distribution labelling step sensitivity of co-training
Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set.
However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training.
Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI
On the class distribution labelling step sensitivity of co-training
Co-training can learn from datasets having a small number of labelled examples and a large number of unlabelled ones. It is an iterative algorithm where examples labelled in previous iterations are used to improve the classification of examples from the unlabelled set.
However, as the number of initial labelled examples is often small we do not have reliable estimates regarding the underlying population which generated the data. In this work we make the claim that the proportion in which examples are labelled is a key parameter to co-training.
Furthermore, we have done a series of experiments to investigate how the proportion in which we label examples in each step influences cotraining performance. Results show that co-training should be used with care in challenging domains.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI
A hybrid wrapper/filter approach for feature subset selection
This work presents a hybrid wrapper/filter algorithm for feature subset selection that can use a combination of several quality criteria measures to rank the set of features of a dataset. These ranked features are used to prune the search space of subsets of possible features such that the number of times the wrapper executes the learning algorithm for a dataset with M features is reduced to O(M) runs. Experimental results using 14 datasets show that, for most of the datasets, the AUC assessed using the reduced feature set is comparable to the AUC of the model constructed using all the features. Furthermore, the algorithm archieved a good reduction in the number of features.Sociedad Argentina de Informática e Investigación Operativ
A Method for Refining Knowledge Rules Using Exceptions
The search for patterns in data sets is a fundamental task in Data Mining, where Machine Learning algorithms are generally used. However, Machine Learning algorithms have biases that strengthen the classifica-tion task, not taking into consideration exceptions. Exceptions contra-dict common sense rules. They are generally unknown, unexpected and contradictory to the user believes. For this reason, exceptions may be interesting. In this work we propose a method to find exceptions out from common sense rules. Besides, we apply the proposed method in a real world data set, to discover rules and exceptions in the HIV virus protein cleavage process.Sociedad Argentina de Informática e Investigación Operativ
Recommended from our members
The Relationship Between Stigma and Health-Related Quality of Life in People Living with HIV Who Have Full Access to Antiretroviral Treatment: An Assessment of Earnshaw and Chaudoir's HIV Stigma Framework Using Empirical Data.
The aim was to empirically test the tenets of Earnshaw and Chaudoir's HIV stigma framework and its potential covariates for persons living with HIV in Sweden. Partial least squares structural equation modelling was used on survey data from 173 persons living with HIV in Sweden. Experiencing stigma was reported to a higher extent by younger persons and by women who had migrated to Sweden. As expected, anticipated stigma was related to lower Physical functioning, and internalized stigma to lower Emotional wellbeing. In contrast to that hypothesized by the HIV stigma framework, enacted stigma was not related to Physical functioning and no relationships were found between HIV-related stigma and antiretroviral adherence. These results indicate that the HIV stigma framework may need to be revised for contexts where a very high proportion of persons living with HIV are diagnosed and under efficient treatment
The central region of the msp gene of Treponema denticola has sequence heterogeneity among clinical samples, obtained from patients with periodontitis
<p>Abstract</p> <p>Background</p> <p><it>Treponema denticola </it>is an oral spirochete involved in the pathogenesis and progression of periodontal disease. Of its virulence factors, the major surface protein (MSP) plays a role in the interaction between the treponeme and host. To understand the possible evolution of this protein, we analyzed the sequence of the <it>msp </it>gene in 17 <it>T. denticola </it>positive clinical samples.</p> <p>Methods</p> <p>Nucleotide and amino acid sequence of MSP have been determined by PCR amplification and sequencing in seventeen <it>T. denticola </it>clinical specimens to evaluate the genetic variability and the philogenetic relationship of the <it>T. denticola msp </it>gene among the different amplified sequence of positive samples. In silico antigenic analysis was performed on each MSP sequences to determined possible antigenic variation.</p> <p>Results</p> <p>The <it>msp </it>sequences showed two highly conserved 5' and 3' ends and a central region that varies substantially. Phylogenetic analysis categorized the 17 specimens into 2 principal groups, suggesting a low rate of evolutionary variability and an elevated degree of conservation of <it>msp </it>in clinically derived genetic material. Analysis of the predicted antigenic variability between isolates, demonstrated that the major differences lay between amino acids 200 and 300.</p> <p>Conclusion</p> <p>These findings showed for the first time, the nucleotide and amino acids variation of the <it>msp </it>gene in infecting <it>T. denticola</it>, <it>in vivo</it>. This data suggested that the antigenic variability found in to the MSP molecule, may be an important factor involved in immune evasion by <it>T. denticola</it>.</p
- …