12,298 research outputs found

    Mining SOM expression portraits: Feature selection and integrating concepts of molecular function

    Get PDF
    Background: 
Self organizing maps (SOM) enable the straightforward portraying of high-dimensional data of large sample collections in terms of sample-specific images. The analysis of their texture provides so-called spot-clusters of co-expressed genes which require subsequent significance filtering and functional interpretation. We address feature selection in terms of the gene ranking problem and the interpretation of the obtained spot-related lists using concepts of molecular function.

Results: 
Different expression scores based either on simple fold change-measures or on regularized Students t-statistics are applied to spot-related gene lists and compared with special emphasis on the error characteristics of microarray expression data. The spot-clusters are analyzed using different methods of gene set enrichment analysis with the focus on overexpression and/or overrepresentation of predefined sets of genes. Metagene-related overrepresentation of selected gene sets was mapped into the SOM images to assign gene function to different regions. Alternatively we estimated set-related overexpression profiles over all samples studied using a gene set enrichment score. It was also applied to the spot-clusters to generate lists of enriched gene sets. We used the tissue body index data set, a collection of expression data of human tissues, as an illustrative example. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. In addition, we display special sets of housekeeping and of consistently weak and highly expressed genes using SOM data filtering. 

Conclusions:
The presented methods allow the comprehensive downstream analysis of SOM-transformed expression data in terms of cluster-related gene lists and enriched gene sets for functional interpretation. SOM clustering implies the ability to define either new gene sets using selected SOM spots or to verify and/or to amend existing ones

    Applying Genetic Algorithm to Generation of High-Dimensional Item Response Data

    Get PDF
    The item response data is the nm-dimensional data based on the responses made by m examinees to the questionnaire consisting of n items. It is used to estimate the ability of examinees and item parameters in educational evaluation. For estimates to be valid, the simulation input data must reflect reality. This paper presents the effective combination of the genetic algorithm (GA) and Monte Carlo methods for the generation of item response data as simulation input data similar to real data. To this end, we generated four types of item response data using Monte Carlo and the GA and evaluated how similarly the generated item response data represents the real item response data with the item parameters (item difficulty and discrimination). We adopt two types of measurement, which are root mean square error and Kullback-Leibler divergence, for comparison of item parameters between real data and four types of generated data. The results show that applying the GA to initial population generated by Monte Carlo is the most effective in generating item response data that is most similar to real item response data. This study is meaningful in that we found that the GA contributes to the generation of more realistic simulation input data

    Prediction Of Antimicrobial Peptides Based On Sequence Alignment And Secondary Structure Sequence And Segment Sequence.pdf

    Get PDF
    Peptida antimicrobial (AMP) adalah sejenis peptide semula jadi yang penting untuk sistem imun. Penyelidik berminat untuk membuat ubat dengan AMP sebagai alternatif kerana bakteria semakin boleh menentang dengan antibiotik yang sedia ada. Walaubagaimanapun, eksperimen untuk mengekstrak AMP dari protein mahal dan mengambil masa. Oleh itu, alat pengiraan yang berkesan dan tepat meramalkan AMP baru amat dikehendaki untuk mengkaji ubat baru. Dalam projek ini, algoritma baru dicadangkan sebagai alat pengiraan dengan mengabungkan kaedah penjajaran urutan dan urutan struktur sekunder (SSS) dan urutan segmen (SS). Penjajaran urutan dilaksana berdasarkan HSPs maksimum skor yang diramalkan oleh BLASTP. Kaedah penjajaran urutan tidak dapat meramalkan semua urutan. Keputusan fasa penjajaran urutan adalah di 91.02 % bagi set data biasa, 80.88 % untuk urutan yang mempunyai persamaan <0.7, dan 96.02 % untuk CAMP set data. Bagi urutan yang tidak boleh diramalkan, ramalan diteruskan dengan menggunakan ciri-ciri SSS dan SS. Pengekstrakan ciri dan pilihan ciri dilakukan dan kemudian ciri-ciri tersebut digunakan untuk melatih pembelajaran mesin SVM bagi mengklasifikasikan urutan sama ada AMP atau bukan AMP. Keputusan ujian keseluruhan adalah 83.27% bagi set data biasa, 71.83% untuk urutan yang mempunyai persamaan <0.7, dan 91.49% untuk CAMP set data. Berbanding dengan fasa kedua kajian dulu yang menggabungkan dengan kaedah penjajaran jujukan, kajian ini mempunyai hasil yang rendah (<27%) dengan hanya menggunakan ramalan dengan SSS dan SS. Ini menunjukkan bahawa algoritma baru yang dicadangkan tidak sesuai untuk digunakan sebagai peramal AMP. ________________________________________________________________________________________________________________________ Antimicrobial peptides (AMPs) are natural peptides that are important for immune system. Researchers are interested in designing alternative drugs with AMPs because more bacteria are becoming resistant to the available antibiotics. However, the experiments to extract AMP from protein sequences are time consuming and costly. Thus, a computational tool with more effective and accurately predicting novel AMPs is highly demanded to provide more candidates and useful insights for drug design. In this study, a new algorithm is proposed as a computational tool by integrating the sequence alignment method and the secondary structure sequence (SSS) and segment sequence (SS). The sequence alignment is accomplished by the classification of test sequences based on the maximum high-scoring segment pairs (HSPs) score predicted by Basic Local Alignment Search Tool for protein (BLASTP). The results of sequence alignment phase are in 91.02% for normal dataset, 80.88% on <0.7 sequence similarity train set and 96.02% for CAMP dataset. Sequence alignment method is not able to predict all sequences and the unpredicted sequences is then predicted by utilizing the SSS and SS features. Feature extraction and feature selection is performed to obtain the features. These features are used to train the SVM model which is then be used to classify the sequences to whether it is AMP or non-AMP. The overall results of independent test is 83.27% for normal dataset, 71.83% for sequence with <0.7 similarity dataset and 91.49% for CAMP dataset. In comparison of second phase with past research that combines with sequence alignment method, this research has relatively low yield (<27%) contributed by the prediction utilizing SSS and SS features only. This indicates that the proposed algorithm is not suitable to be used as AMPs predictor

    An artificial intelligence tool for heterogeneous team formation in the classroom

    Get PDF
    Nowadays, there is increasing interest in the development of teamwork skills in the educational context. This growing interest is motivated by its pedagogical effectiveness and the fact that, in labour contexts, enterprises organize their employees in teams to carry out complex projects. Despite its crucial importance in the classroom and industry, there is a lack of support for the team formation process. Not only do many factors influence team performance, but the problem becomes exponentially costly if teams are to be optimized. In this article, we propose a tool whose aim it is to cover such a gap. It combines artificial intelligence techniques such as coalition structure generation, Bayesian learning, and Belbin's role theory to facilitate the generation of working groups in an educational context. This tool improves current state of the art proposals in three ways: i) it takes into account the feedback of other teammates in order to establish the most predominant role of a student instead of self-perception questionnaires; ii) it handles uncertainty with regard to each student's predominant team role; iii) it is iterative since it considers information from several interactions in order to improve the estimation of role assignments. We tested the performance of the proposed tool in an experiment involving students that took part in three different team activities. The experiments suggest that the proposed tool is able to improve different teamwork aspects such as team dynamics and student satisfaction

    2014 Annual Research Symposium Abstract Book

    Get PDF
    2014 annual volume of abstracts for science research projects conducted by students at Trinity College

    AI driven B-cell Immunotherapy Design

    Full text link
    Antibodies, a prominent class of approved biologics, play a crucial role in detecting foreign antigens. The effectiveness of antigen neutralisation and elimination hinges upon the strength, sensitivity, and specificity of the paratope-epitope interaction, which demands resource-intensive experimental techniques for characterisation. In recent years, artificial intelligence and machine learning methods have made significant strides, revolutionising the prediction of protein structures and their complexes. The past decade has also witnessed the evolution of computational approaches aiming to support immunotherapy design. This review focuses on the progress of machine learning-based tools and their frameworks in the domain of B-cell immunotherapy design, encompassing linear and conformational epitope prediction, paratope prediction, and antibody design. We mapped the most commonly used data sources, evaluation metrics, and method availability and thoroughly assessed their significance and limitations, discussing the main challenges ahead
    corecore