2,389 research outputs found

    One Decade of Development and Evolution of MicroRNA Target Prediction Algorithms

    Get PDF
    Nearly two decades have passed since the publication of the first study reporting the discovery of microRNAs (miRNAs). The key role of miRNAs in post-transcriptional gene regulation led to the performance of an increasing number of studies focusing on origins, mechanisms of action and functionality of miRNAs. In order to associate each miRNA to a specific functionality it is essential to unveil the rules that govern miRNA action. Despite the fact that there has been significant improvement exposing structural characteristics of the miRNA-mRNA interaction, the entire physical mechanism is not yet fully understood. In this respect, the development of computational algorithms for miRNA target prediction becomes increasingly important. This manuscript summarizes the research done on miRNA target prediction. It describes the experimental data currently available and used in the field and presents three lines of computational approaches for target prediction. Finally, the authors put forward a number of considerations regarding current challenges and future direction

    Post-transcriptional knowledge in pathway analysis increases the accuracy of phenotypes classification

    Get PDF
    Motivation: Prediction of phenotypes from high-dimensional data is a crucial task in precision biology and medicine. Many technologies employ genomic biomarkers to characterize phenotypes. However, such elements are not sufficient to explain the underlying biology. To improve this, pathway analysis techniques have been proposed. Nevertheless, such methods have shown lack of accuracy in phenotypes classification. Results: Here we propose a novel methodology called MITHrIL (Mirna enrIched paTHway Impact anaLysis) for the analysis of signaling pathways, which has built on top of the work of Tarca et al., 2009. MITHrIL extends pathways by adding missing regulatory elements, such as microRNAs, and their interactions with genes. The method takes as input the expression values of genes and/or microRNAs and returns a list of pathways sorted according to their deregulation degree, together with the corresponding statistical significance (p-values). Our analysis shows that MITHrIL outperforms its competitors even in the worst case. In addition, our method is able to correctly classify sets of tumor samples drawn from TCGA. Availability: MITHrIL is freely available at the following URL: http://alpha.dmi.unict.it/mithril

    ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํ‘œํ˜„ ํ•™์Šต

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์œค์„ฑ๋กœ.As we are living in the era of big data, the biomedical domain is not an exception. With the advent of technologies such as next-generation sequencing, developing methods to capitalize on the explosion of biomedical data is one of the most major challenges in bioinformatics. Representation learning, in particular deep learning, has made significant advancements in diverse fields where the artificial intelligence community has struggled for many years. However, although representation learning has also shown great promises in bioinformatics, it is not a silver bullet. Off-the-shelf applications of representation learning cannot always provide successful results for biological sequence data. There remain full of challenges and opportunities to be explored. This dissertation presents a set of representation learning methods to address three issues in biological sequence data analysis. First, we propose a two-stage training strategy to address throughput and information trade-offs within wet-lab CRISPR-Cpf1 activity experiments. Second, we propose an encoding scheme to model interaction between two sequences for functional microRNA target prediction. Third, we propose a self-supervised pre-training method to bridge the exponentially growing gap between the numbers of unlabeled and labeled protein sequences. In summary, this dissertation proposes a set of representation learning methods that can derive invaluable information from the biological sequence data.์šฐ๋ฆฌ๋Š” ๋น…๋ฐ์ดํ„ฐ์˜ ์‹œ๋Œ€๋ฅผ ๋งž์ดํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์˜์ƒ๋ช… ๋ถ„์•ผ ๋˜ํ•œ ์˜ˆ์™ธ๊ฐ€ ์•„๋‹ˆ๋‹ค. ์ฐจ์„ธ๋Œ€ ์—ผ๊ธฐ์„œ์—ด ๋ถ„์„๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ๋“ค์ด ๋„๋ž˜ํ•จ์— ๋”ฐ๋ผ, ํญ๋ฐœ์ ์ธ ์˜์ƒ๋ช… ๋ฐ์ดํ„ฐ์˜ ์ฆ๊ฐ€๋ฅผ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก ์˜ ๊ฐœ๋ฐœ์€ ์ƒ๋ฌผ์ •๋ณดํ•™ ๋ถ„์•ผ์˜ ์ฃผ์š” ๊ณผ์ œ ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ์‹ฌ์ธต ํ•™์Šต์„ ํฌํ•จํ•œ ํ‘œํ˜„ ํ•™์Šต ๊ธฐ๋ฒ•๋“ค์€ ์ธ๊ณต์ง€๋Šฅ ํ•™๊ณ„๊ฐ€ ์˜ค๋žซ๋™์•ˆ ์–ด๋ ค์›€์„ ๊ฒช์–ด์˜จ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ƒ๋‹นํ•œ ๋ฐœ์ „์„ ์ด๋ฃจ์—ˆ๋‹ค. ํ‘œํ˜„ ํ•™์Šต์€ ์ƒ๋ฌผ์ •๋ณดํ•™ ๋ถ„์•ผ์—์„œ๋„ ๋งŽ์€ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‹จ์ˆœํ•œ ์ ์šฉ์œผ๋กœ๋Š” ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ ๋ถ„์„์˜ ์„ฑ๊ณต์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ํ•ญ์ƒ ์–ป์„ ์ˆ˜๋Š” ์•Š์œผ๋ฉฐ, ์—ฌ์ „ํžˆ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•œ ๋งŽ์€ ๋ฌธ์ œ๋“ค์ด ๋‚จ์•„์žˆ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ๊ด€๋ จ๋œ ์„ธ ๊ฐ€์ง€ ์‚ฌ์•ˆ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด, ํ‘œํ˜„ ํ•™์Šต์— ๊ธฐ๋ฐ˜ํ•œ ์ผ๋ จ์˜ ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์œ ์ „์ž๊ฐ€์œ„ ์‹คํ—˜ ๋ฐ์ดํ„ฐ์— ๋‚ด์žฌ๋œ ์ •๋ณด์™€ ์ˆ˜์œจ์˜ ๊ท ํ˜•์— ๋Œ€์ฒ˜ํ•  ์ˆ˜ ์žˆ๋Š” 2๋‹จ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๋‘ ์—ผ๊ธฐ ์„œ์—ด ๊ฐ„์˜ ์ƒํ˜ธ ์ž‘์šฉ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ๋ถ€ํ˜ธํ™” ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ, ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋Š” ํŠน์ง•๋˜์ง€ ์•Š์€ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์„ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์ž๊ธฐ ์ง€๋„ ์‚ฌ์ „ ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์š”์•ฝํ•˜์ž๋ฉด, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ๋„์ถœํ•  ์ˆ˜ ์žˆ๋Š” ํ‘œํ˜„ ํ•™์Šต์— ๊ธฐ๋ฐ˜ํ•œ ์ผ๋ จ์˜ ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค.1 Introduction 1 1.1 Motivation 1 1.2 Contents of Dissertation 4 2 Background 8 2.1 Representation Learning 8 2.2 Deep Neural Networks 12 2.2.1 Multi-layer Perceptrons 12 2.2.2 Convolutional Neural Networks 14 2.2.3 Recurrent Neural Networks 16 2.2.4 Transformers 19 2.3 Training of Deep Neural Networks 23 2.4 Representation Learning in Bioinformatics 26 2.5 Biological Sequence Data Analyses 29 2.6 Evaluation Metrics 32 3 CRISPR-Cpf1 Activity Prediction 36 3.1 Methods 39 3.1.1 Model Architecture 39 3.1.2 Training of Seq-deepCpf1 and DeepCpf1 41 3.2 Experiment Results 44 3.2.1 Datasets 44 3.2.2 Baselines 47 3.2.3 Evaluation of Seq-deepCpf1 49 3.2.4 Evaluation of DeepCpf1 51 3.3 Summary 55 4 Functional microRNA Target Prediction 56 4.1 Methods 62 4.1.1 Candidate Target Site Selection 63 4.1.2 Input Encoding 64 4.1.3 Residual Network 67 4.1.4 Post-processing 68 4.2 Experiment Results 70 4.2.1 Datasets 70 4.2.2 Classification of Functional and Non-functional Targets 71 4.2.3 Distinguishing High-functional Targets 73 4.2.4 Ablation Studies 76 4.3 Summary 77 5 Self-supervised Learning of Protein Representations 78 5.1 Methods 83 5.1.1 Pre-training Procedure 83 5.1.2 Fine-tuning Procedure 86 5.1.3 Model Architecturen 87 5.2 Experiment Results 90 5.2.1 Experiment Setup 90 5.2.2 Pre-training Results 92 5.2.3 Fine-tuning Results 93 5.2.4 Comparison with Larger Protein Language Models 97 5.2.5 Ablation Studies 100 5.2.6 Qualitative Interpreatation Analyses 103 5.3 Summary 106 6 Discussion 107 6.1 Challenges and Opportunities 107 7 Conclusion 111 Bibliography 113 Abstract in Korean 130๋ฐ•

    Role of Artificial Intelligence in High Throughput Diagnostics for Colorectal Cancer Current Updates

    Get PDF
    The existence of cancer has been stated as a centuryโ€™s oldest challenge for the entire human race around theglobe recording a large amount of mortality per year and as per the WHO data nearly 10 million deaths were reported in 2021 worldwide besides others. Colorectal cancer is considered a major threat as this is cancer-related to the colon and rectum with an incidence of 41/1,00,000 recorded annually to overcome this challenge our medical system requires more advanced, accurate and efficient high throughput techniques for the prognosis and effective treatment of this disease. Artificial intelligenceโ€™s role in healthcare has been a matter of discussion among experts over the past few years, but more recently the spotlight has focused more specifically on the role that this technology can play in improving patient outcomes and improving the effectiveness of diagnosis and treatment processes. Artificial intelligence refers to a broad category of technologies, including machine learning, natural language processing and deep learning. Exploration of Molecular pathways with characteristics that helps in subtyping of Colorectal Cancer (CRC) leading to specific treatment response or prognosis, for the effective treatment, classification and early detection done using Artificial Intelligence based technologies have shown promising results so far, that it may be utilized to create prediction models in the current environment to distinguish between polyps, metastases, or normal cells in addition to early detection and effective cancer therapy. Nowadays many scientists are putting effort into designing such fabricating models by combining natural language processes and deep learning that can differentiate between non-adenomatous and adenomatous polyps to identify hyper-mutated tumours, genetic mutations and molecular pathways known as IDaRS strategy or iterative draw-and-rank sampling. The review study primarily focuses on the significance of emerging AI-based approaches for the diagnosis, detection, and prognosis of colorectal cancer in light of existing obstacles

    Survival-Related Clustering of Cancer Patients by Integrating Clinical and Biological Datasets

    Get PDF
    Subtype-based treatments and drug therapies are essential aspects to be considered in cancer patients\u27 clinical trials to provide appropriate personalized therapies. With the advancement of the next-generation sequencing technology, several computational models, integrating genomic and transcriptomic datasets (i.e., multi-omics) in the prediction of subtype-based classification in cancer patients, were emerged. However, integration of the prognostic features from the clinical data, related to survival risks with the multi-omics datasets in the prediction of different subtypes, is limited and an important research area to be explored. In this study, we proposed a data integration pipeline with the prognostic features from the clinical data and multi-omics datasets to predict the survival-risk-based subtypes in Kidney Renal Clear Cell Carcinoma (KIRC) patients from The Cancer Genome Atlas (TCGA) database. Firstly, we applied an unsupervised clustering algorithm on KIRC patients and clustered them into two survival-risk-based subgroups, i.e., subtypes. Then, using the clustering-based subtype labels as class labels for cancer patients, we trained a supervised classification model to determine the class label of un-labeled patients.In our clustering step, we applied multivariate Cox Proportional Hazard (Cox-PH) model to select the survival-related prognostically significant features (p-value \u3c 0.05) from the patientsโ€™ multivariate clinical data. Then, we used the Silhouette Coefficient to determine the optimal number (k) of the clusters. In our classification step, we integrated high dimensional multi-omics datasets with three different data modalities (such as gene expression, microRNA expression, and DNA methylation). We utilized a dimension-reduction approach, followed by a univariate Cox-PH for each reduced data modality with patientsโ€™ survival status. Then, we selected the survival-related reduced-omics-features in our classification model. In this step, we applied a supervised classification method with 10-fold cross-validation to check our survival-based subtype prediction accuracy. We tested multiple machine learning and deep learning algorithms in different steps of the pipeline for clustering (K-means, K-modes and, Gaussian mixture model), dimension-reduction (Denoising Autoencoder and Principal Component Analysis) and classification (Support Vector Machine and Random Forest) purposes. We proposed an optimized model with the highest survival-specific-subtype classification accuracy as the final model

    Transcriptome Analysis of Nonโ€Coding RNAs in Livestock Species: Elucidating the Ambiguity

    Get PDF
    The recent remarkable development of transcriptomics technologies, especially next generation sequencing technologies, allows deeper exploration of the hidden landscapes of complex traits and creates great opportunities to improve livestock productivity and welfare. Non-coding RNAs (ncRNAs), RNA molecules that are not translated into proteins, are key transcriptional regulators of health and production traits, thus, transcriptomics analyses of ncRNAs are important for a better understanding of the regulatory architecture of livestock phenotypes. In this chapter, we present an overview of common frameworks for generating and processing RNA sequence data to obtain ncRNA transcripts. Then, we review common approaches for analyzing ncRNA transcriptome data and present current state of the art methods for identification of ncRNAs and functional inference of identified ncRNAs, with emphasis on tools for livestock species. We also discuss future challenges and perspectives for ncRNA transcriptome data analysis in livestock species

    Advancing Biomedicine with Graph Representation Learning: Recent Progress, Challenges, and Future Directions

    Full text link
    Graph representation learning (GRL) has emerged as a pivotal field that has contributed significantly to breakthroughs in various fields, including biomedicine. The objective of this survey is to review the latest advancements in GRL methods and their applications in the biomedical field. We also highlight key challenges currently faced by GRL and outline potential directions for future research.Comment: Accepted by 2023 IMIA Yearbook of Medical Informatic

    MACHINE LEARNING APPROACHES FOR BIOMARKER IDENTIFICATION AND SUBGROUP DISCOVERY FOR POST-TRAUMATIC STRESS DISORDER

    Get PDF
    Post-traumatic stress disorder (PTSD) is a psychiatric disorder caused by environmental and genetic factors resulting from alterations in genetic variation, epigenetic changes and neuroimaging characteristics. There is a pressing need to identify reliable molecular and physiological biomarkers for accurate diagnosis, prognosis, and treatment, as well to deepen the understanding of PTSD pathophysiology. Machine learning methods are widely used to infer patterns from biological data, identify biomarkers, and make predictions. The objective of this research is to apply machine learning methods for the accurate classification of human diseases from genome-scale datasets, focusing primarily on PTSD.The DoD-funded Systems Biology of PTSD Consortium has recruited combat veterans with and without PTSD for measurement of molecular and physiological data from blood or urine samples with the goal of identifying accurate and specific PTSD biomarkers. As a member of the Consortium with access to these PTSD multiple omics datasets, we first completed a project titled Clinical Subgroup-Specific PTSD Classification and Biomarker Discovery. We applied machine learning approaches to these data to build classification models consisting of molecular and clinical features to predict PTSD status. We also identified candidate biomarkers for diagnosis, which improves our understanding of PTSD pathogenesis. In a second project, entitled Multi-Omic PTSD Subgroup Identification and Clinical Characterization, we applied methods for integrating multiple omics datasets to investigate the complex, multivariate nature of the biological systems underlying PTSD. We identified an optimal 2 PTSD subgroups using two different machine learning approaches from 82 PTSD positive samples, and we found that the subgroups exhibited different remitting behavior as inferred from subjects recalled at a later time point. The results from our association, differential expression, and classification analyses demonstrated the distinct clinical and molecular features characterizing these subgroups.Taken together, our work has advanced our understanding of PTSD biomarkers and subgroups through the use of machine learning approaches. Results from our work should strongly contribute to the precise diagnosis and eventual treatment of PTSD, as well as other diseases. Future work will involve continuing to leverage these results to enable precision medicine for PTSD

    A voting-based machine learning approach for classifying biological and clinical datasets.

    Get PDF
    BACKGROUND: Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS: The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value \u3cโ€‰0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies byโ€‰~โ€‰10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION: Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans
    • โ€ฆ
    corecore