19 research outputs found

    Decision Support System Design for Informatics Student Final Projects Using C4.5 Algorithm

    Get PDF
    Academic consultation activities between students and academic supervisors are necessary to help students carry out academic activities. Based on the transcript of grades obtained, many students do not choose the appropriate final project/thesis specialization fields based on their academic abilities, resulting in a lot of inconsistencies between the course grades and the final project specialization fields. The purpose of this research is to minimize the subjectivity aspect of students in choosing their final project academic supervisors and minimize the inconsistencies between the course grades and the final project specialization fields. The method used in this research is classification data mining using the Decision Tree and C4.5 Algorithm methods, with the attributes involved being courses, course grades, and specialization courses. The C4.5 Decision Tree algorithm is used to transform data (tables) into a tree model and then convert the tree model into rules. The implementation of the C4.5 Decision Tree algorithm in the specialization field decision support system has been successfully carried out, with an accuracy rate of 70% from the total calculation data. The data used in this research is a sample data from several senior students in the Informatics program at Ubhara-Jaya. The results of the research decision support system can be used as a good recommendation for the Informatics program and senior students to direct their final project research. It is expected that further research will use more sample data so that the accuracy rate will be better and can be implemented in website or mobile-based applications

    Prediction of Metabolic Pathways Involvement in Prokaryotic UniProtKB Data by Association Rule Mining

    Full text link
    The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations

    Comparative analysis of machine learning algorithms used in the diagnosis of Cervical cancers

    Get PDF
    Serviks (Rahim Ağzı Kanseri) ölüme yol açan ve ölüm oranı en yüksek kanser türlerinden biri olarak görülmektedir. Serviks kanseri kadın kanseri arasında meme kanserinden sonra 2. Sırada yer almaktadır. Günümüzde makine öğrenmesi yöntemlerinin kullanımıyla biyomedikal veri kümelerinin analizi yaygınlaşmıştır. Özellikle kanser gibi habis hastalıkların erken teşhisinde tahminleme sistemleri önemli rol oynamaktadır. Serviks kanseri üzerinde belirlenmiş risk faktörlerine yönelik yapılan tahminler tutarlı olabilmektedir. Bu çalışmada serviks kanserinin teşhisinde kullanılan makine öğrenmesi metotlarının başarıları karşılaştırılmıştır. Çalışmada kullanılan 23 ayrı makine öğrenmesi algoritması, 838 örnek, 32 öznitelik ve 4 hedef değişkenli veri seti üzerinde test edilmiştir. Veri önişleme, özellik seçimi ve sınıflandırma olmak üzere üç aşamadan oluşan analizde sınıflandırma performansları; sınıflandırma doğruluğu, kesinlik, duyarlılık ve F-ölçütü metrikleri kullanılarak analiz edilmiştir. Analiz sonucunda RepTree algoritmasının en başarılı sonuç veren model olduğu belirlenmiştir.Cervix (Cervical Cancer) is seen as one of the cancer types that causes death and has the highest mortality rate. Cervical cancer is the second most common female cancer after breast cancer. Today, the analysis of biomedical datasets has become widespread with the use of machine learning methods. Prediction systems play an important role in the early diagnosis of malignant diseases such as cancer. Estimates of risk factors for cervical cancer can be consistent. In this study, the success of machine learning methods used in the diagnosis of cervical cancers was compared. 23 different machine learning algorithms used in the study were tested on a data set with 838 samples, 32 features and 4 target variables. Classification performances in the analysis consisting of three stages: data preprocessing, feature selection and classification; Comparisons were made using classification accuracy, precision, sensitivity, and F-criterion metrics. As a result of the analysis, it was determined that the RepTree algorithm was the model that gave the most successful results

    Deep Sequencing of the Vaginal Microbiota of Women with HIV

    Get PDF
    BACKGROUND: Women living with HIV and co-infected with bacterial vaginosis (BV) are at higher risk for transmitting HIV to a partner or newborn. It is poorly understood which bacterial communities constitute BV or the normal vaginal microbiota among this population and how the microbiota associated with BV responds to antibiotic treatment. METHODS AND FINDINGS: The vaginal microbiota of 132 HIV positive Tanzanian women, including 39 who received metronidazole treatment for BV, were profiled using Illumina to sequence the V6 region of the 16S rRNA gene. Of note, Gardnerella vaginalis and Lactobacillus iners were detected in each sample constituting core members of the vaginal microbiota. Eight major clusters were detected with relatively uniform microbiota compositions. Two clusters dominated by L. iners or L. crispatus were strongly associated with a normal microbiota. The L. crispatus dominated microbiota were associated with low pH, but when L. crispatus was not present, a large fraction of L. iners was required to predict a low pH. Four clusters were strongly associated with BV, and were dominated by Prevotella bivia, Lachnospiraceae, or a mixture of different species. Metronidazole treatment reduced the microbial diversity and perturbed the BV-associated microbiota, but rarely resulted in the establishment of a lactobacilli-dominated microbiota. CONCLUSIONS: Illumina based microbial profiling enabled high though-put analyses of microbial samples at a high phylogenetic resolution. The vaginal microbiota among women living with HIV in Sub-Saharan Africa constitutes several profiles associated with a normal microbiota or BV. Recurrence of BV frequently constitutes a different BV-associated profile than before antibiotic treatment

    A Computational Strategy for Protein Function Assignment Which Addresses the Multidomain Problem

    Get PDF
    A method for assigning functions to unknown sequences based on finding correlations between short signals and functional annotations in a protein database is presented. This approach is based on keyword (KW) and feature (FT) information stored in the SWISS-PROT database. The former refers to particular protein characteristics and the latter locates these characteristics at a specific sequence position. In this way, a certain keyword is only assigned to a sequence if sequence similarity is found in the position described by the FT field. Exhaustive tests performed over sequences with homologues (cluster set) and without homologues (singleton set) in the database show that assigning functions is much ’cleaner’ when information about domains (FT field) is used, than when only the keywords are used

    GrAPFI: predicting enzymatic function of proteins from domain similarity graphs

    Get PDF
    This work is dedicated to the memory of David W. Ritchie, who recently passed away.International audienceBackground: Thanks to recent developments in genomic sequencing technologies, the number of protein sequences in public databases is growing enormously. To enrich and exploit this immensely valuable data, it is essential to annotate these sequences with functional properties such as Enzyme Commission (EC) numbers, for example. The January 2019 release of the Uniprot Knowledge base (UniprotKB) contains around 140 million protein sequences. However, only about half of a million of these (UniprotKB/SwissProt) have been reviewed and functionally annotated by expert curators using data extracted from the literature and computational analyses. To reduce the gap between the annotated and unannotated protein sequences, it is essential to develop accurate automatic protein function annotation techniques. Results: In this work, we present GrAPFI (Graph-based Automatic Protein Function Inference) for automatically annotating proteins with EC number functional descriptors from a protein domain similarity graph. We validated the performance of GrAPFI using six reference proteomes in UniprotKB/SwissProt, namely Human, Mouse, Rat, Yeast, E. Coli and Arabidopsis thaliana. We also compared GrAPFI with existing EC prediction approaches such as ECPred, DEEPre, and SVMProt. This shows that GrAPFI achieves better accuracy and comparable or better coverage with respect to these earlier approaches. Conclusions: GrAPFI is a novel protein function annotation tool that performs automatic inference on a network of proteins that are related according to their domain composition. Our evaluation of GrAPFI shows that it gives better performance than other state of the art methods. GrAPFI is available at https://gitlab.inria.fr/bsarker/bmc_grapfi.git as a stand alone tool written in Python
    corecore