12,298 research outputs found
Mining SOM expression portraits: Feature selection and integrating concepts of molecular function
Background: 
Self organizing maps (SOM) enable the straightforward portraying of high-dimensional data of large sample collections in terms of sample-specific images. The analysis of their texture provides so-called spot-clusters of co-expressed genes which require subsequent significance filtering and functional interpretation. We address feature selection in terms of the gene ranking problem and the interpretation of the obtained spot-related lists using concepts of molecular function.

Results: 
Different expression scores based either on simple fold change-measures or on regularized Students t-statistics are applied to spot-related gene lists and compared with special emphasis on the error characteristics of microarray expression data. The spot-clusters are analyzed using different methods of gene set enrichment analysis with the focus on overexpression and/or overrepresentation of predefined sets of genes. Metagene-related overrepresentation of selected gene sets was mapped into the SOM images to assign gene function to different regions. Alternatively we estimated set-related overexpression profiles over all samples studied using a gene set enrichment score. It was also applied to the spot-clusters to generate lists of enriched gene sets. We used the tissue body index data set, a collection of expression data of human tissues, as an illustrative example. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. In addition, we display special sets of housekeeping and of consistently weak and highly expressed genes using SOM data filtering. 

Conclusions:
The presented methods allow the comprehensive downstream analysis of SOM-transformed expression data in terms of cluster-related gene lists and enriched gene sets for functional interpretation. SOM clustering implies the ability to define either new gene sets using selected SOM spots or to verify and/or to amend existing ones
Applying Genetic Algorithm to Generation of High-Dimensional Item Response Data
The item response data is the nm-dimensional data based on the responses made by m examinees to the questionnaire consisting of n items. It is used to estimate the ability of examinees and item parameters in educational evaluation. For estimates to be valid, the simulation input data must reflect reality. This paper presents the effective combination of the genetic algorithm (GA) and Monte Carlo methods for the generation of item response data as simulation input data similar to real data. To this end, we generated four types of item response data using Monte Carlo and the GA and evaluated how similarly the generated item response data represents the real item response data with the item parameters (item difficulty and discrimination). We adopt two types of measurement, which are root mean square error and Kullback-Leibler divergence, for comparison of item parameters between real data and four types of generated data. The results show that applying the GA to initial population generated by Monte Carlo is the most effective in generating item response data that is most similar to real item response data. This study is meaningful in that we found that the GA contributes to the generation of more realistic simulation input data
Prediction Of Antimicrobial Peptides Based On Sequence Alignment And Secondary Structure Sequence And Segment Sequence.pdf
Peptida antimicrobial (AMP) adalah sejenis peptide semula jadi yang penting untuk sistem imun. Penyelidik berminat untuk membuat ubat dengan AMP sebagai alternatif kerana bakteria semakin boleh menentang dengan antibiotik yang sedia ada. Walaubagaimanapun, eksperimen untuk mengekstrak AMP dari protein mahal dan mengambil masa. Oleh itu, alat pengiraan yang berkesan dan tepat meramalkan AMP baru amat dikehendaki untuk mengkaji ubat baru. Dalam projek ini, algoritma baru dicadangkan sebagai alat pengiraan dengan mengabungkan kaedah penjajaran urutan dan urutan struktur sekunder (SSS) dan urutan segmen (SS). Penjajaran urutan dilaksana berdasarkan HSPs maksimum skor yang diramalkan oleh BLASTP. Kaedah penjajaran urutan tidak dapat meramalkan semua urutan. Keputusan fasa penjajaran urutan adalah di 91.02 % bagi set data biasa, 80.88 % untuk urutan yang mempunyai persamaan <0.7, dan 96.02 % untuk CAMP set data. Bagi urutan yang tidak boleh diramalkan, ramalan diteruskan dengan menggunakan ciri-ciri SSS dan SS. Pengekstrakan ciri dan pilihan ciri dilakukan dan kemudian ciri-ciri tersebut digunakan untuk melatih pembelajaran mesin SVM bagi mengklasifikasikan urutan sama ada AMP atau bukan AMP. Keputusan ujian keseluruhan adalah 83.27% bagi set data biasa, 71.83% untuk urutan yang mempunyai persamaan <0.7, dan 91.49% untuk CAMP set data. Berbanding dengan fasa kedua kajian dulu yang menggabungkan dengan kaedah penjajaran jujukan, kajian ini mempunyai hasil yang rendah (<27%) dengan hanya menggunakan ramalan dengan SSS dan SS. Ini menunjukkan bahawa algoritma baru yang dicadangkan tidak sesuai untuk digunakan sebagai peramal AMP.
________________________________________________________________________________________________________________________
Antimicrobial peptides (AMPs) are natural peptides that are important for immune system. Researchers are interested in designing alternative drugs with AMPs because more bacteria are becoming resistant to the available antibiotics. However, the experiments to extract AMP from protein sequences are time consuming and costly. Thus, a computational tool with more effective and accurately predicting novel AMPs is highly demanded to provide more candidates and useful insights for drug design. In this study, a new algorithm is proposed as a computational tool by integrating the sequence alignment method and the secondary structure sequence (SSS) and segment sequence (SS). The sequence alignment is accomplished by the classification of test sequences based on the maximum high-scoring segment pairs (HSPs) score predicted by Basic Local Alignment Search Tool for protein (BLASTP). The results of sequence alignment phase are in 91.02% for normal dataset, 80.88% on <0.7 sequence similarity train set and 96.02% for CAMP dataset. Sequence alignment method is not able to predict all sequences and the unpredicted sequences is then predicted by utilizing the SSS and SS features. Feature extraction and feature selection is performed to obtain the features. These features are used to train the SVM model which is then be used to classify the sequences to whether it is AMP or non-AMP. The overall results of independent test is 83.27% for normal dataset, 71.83% for sequence with <0.7 similarity dataset and 91.49% for CAMP dataset. In comparison of second phase with past research that combines with sequence alignment method, this research has relatively low yield (<27%) contributed by the prediction utilizing SSS and SS features only. This indicates that the proposed algorithm is not suitable to be used as AMPs predictor
Recommended from our members
Optical biopsy identification and grading of gliomas using label-free visible resonance Raman spectroscopy.
Glioma is one of the most refractory types of brain tumor. Accurate tumor boundary identification and complete resection of the tumor are essential for glioma removal during brain surgery. We present a method based on visible resonance Raman (VRR) spectroscopy to identify glioma margins and grades. A set of diagnostic spectral biomarkers features are presented based on tissue composition changes revealed by VRR. The Raman spectra include molecular vibrational fingerprints of carotenoids, tryptophan, amide I/II/III, proteins, and lipids. These basic in situ spectral biomarkers are used to identify the tissue from the interface between brain cancer and normal tissue and to evaluate glioma grades. The VRR spectra are also analyzed using principal component analysis for dimension reduction and feature detection and support vector machine for classification. The cross-validated sensitivity, specificity, and accuracy are found to be 100%, 96.3%, and 99.6% to distinguish glioma tissues from normal brain tissues, respectively. The area under the receiver operating characteristic curve for the classification is about 1.0. The accuracies to distinguish normal, low grade (grades I and II), and high grade (grades III and IV) gliomas are found to be 96.3%, 53.7%, and 84.1% for the three groups, respectively, along with a total accuracy of 75.1%. A set of criteria for differentiating normal human brain tissues from normal control tissues is proposed and used to identify brain cancer margins, yielding a diagnostic sensitivity of 100% and specificity of 71%. Our study demonstrates the potential of VRR as a label-free optical molecular histopathology method used for in situ boundary line judgment for brain surgery in the margins
An artificial intelligence tool for heterogeneous team formation in the classroom
Nowadays, there is increasing interest in the development of teamwork skills
in the educational context. This growing interest is motivated by its
pedagogical effectiveness and the fact that, in labour contexts, enterprises
organize their employees in teams to carry out complex projects. Despite its
crucial importance in the classroom and industry, there is a lack of support
for the team formation process. Not only do many factors influence team
performance, but the problem becomes exponentially costly if teams are to be
optimized. In this article, we propose a tool whose aim it is to cover such a
gap. It combines artificial intelligence techniques such as coalition structure
generation, Bayesian learning, and Belbin's role theory to facilitate the
generation of working groups in an educational context. This tool improves
current state of the art proposals in three ways: i) it takes into account the
feedback of other teammates in order to establish the most predominant role of
a student instead of self-perception questionnaires; ii) it handles uncertainty
with regard to each student's predominant team role; iii) it is iterative since
it considers information from several interactions in order to improve the
estimation of role assignments. We tested the performance of the proposed tool
in an experiment involving students that took part in three different team
activities. The experiments suggest that the proposed tool is able to improve
different teamwork aspects such as team dynamics and student satisfaction
2014 Annual Research Symposium Abstract Book
2014 annual volume of abstracts for science research projects conducted by students at Trinity College
AI driven B-cell Immunotherapy Design
Antibodies, a prominent class of approved biologics, play a crucial role in
detecting foreign antigens. The effectiveness of antigen neutralisation and
elimination hinges upon the strength, sensitivity, and specificity of the
paratope-epitope interaction, which demands resource-intensive experimental
techniques for characterisation. In recent years, artificial intelligence and
machine learning methods have made significant strides, revolutionising the
prediction of protein structures and their complexes. The past decade has also
witnessed the evolution of computational approaches aiming to support
immunotherapy design. This review focuses on the progress of machine
learning-based tools and their frameworks in the domain of B-cell immunotherapy
design, encompassing linear and conformational epitope prediction, paratope
prediction, and antibody design. We mapped the most commonly used data sources,
evaluation metrics, and method availability and thoroughly assessed their
significance and limitations, discussing the main challenges ahead
- …