Search CORE

20 research outputs found

Distance Measures in Bioinformatics

Author: Xiong Feiyu
Publication venue: Drexel University
Publication date
Field of study

Many bioinformatics applications rely on the computation of similarities between objects. Distance and similarity measures applied to vectors of characteristics are essential to problems such as classification, clustering and information retrieval. This study explores the usefulness of distance and similarity measures in several bioinformatics applications. These applications are in two categories. (1) Estimation of the adverse reaction severity of unknown pharmaceutical treatments, based on the severity of known treatments, in order to provide guidance for testing of the unknown treatments in clinical trials. (2) Classification of cancer tissue types and estimation of cancer stages, based on high-dimensional microarray data, in order to support clinical decisions making. To address the first category, we studied several clustering and classification approaches for binary severity estimation of Cytokine Release Syndrome (CRS). We developed a Severity Estimation using Distance Metric Learning (SE-DML) approach to get graded severity estimation. With binary estimation we were able to identify treatments that caused the most severe response and then built prediction models for CRS. Using the SE-DML approach, we evaluated four known data sets and showed that SE-DML outperformed other widely used methods on these data sets. For the second category, we presented Kernelized Information-Theoretic Metric Learning (KITML) algorithms that optimize distance metrics and effectively handle high-dimensional data. This learned metric by KITML is used to improve the performance of

k

-nearest neighbor classification for cancer tissue microarray data. We evaluated our approach on fourteen (14) cancer microarray data sets and compared our results with other state-of-the-art approaches. We achieved the best overall performance for the classification task. In addition we tested the KITML algorithm in estimating the severity stages of cancer samples, with accurate results.Ph.D., Electrical Engineering -- Drexel University, 201

Drexel Libraries E-Repository and Archives

Probabilistic analysis of the human transcriptome with side information

Author: Lahti Leo
Publication venue
Publication date: 01/01/2010
Field of study

Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Cell Type-specific Analysis of Human Interactome and Transcriptome

Author: Mohammadi Shahin
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

Cells are the fundamental building block of complex tissues in higher-order organisms. These cells take different forms and shapes to perform a broad range of functions. What makes a cell uniquely eligible to perform a task, however, is not well-understood; neither is the defining characteristic that groups similar cells together to constitute a cell type. Even for known cell types, underlying pathways that mediate cell type-specific functionality are not readily available. These functions, in turn, contribute to cell type-specific susceptibility in various disorders

Purdue E-Pubs

Computational Approaches to Drug Profiling and Drug-Protein Interactions

Author: Scott Oliver B.
Publication venue: UCL (University College London)
Publication date: 28/03/2023
Field of study

Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

UCL Discovery

Scalable Feature Selection Applications for Genome-Wide Association Studies of Complex Diseases

Author: Okser Sebastian
Publication venue: Turku Centre for Computer Science
Publication date: 19/08/2015
Field of study

Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have aﬀorded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to eﬀectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including ﬁlter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be eﬀective at predicting the disease phenotypes, but also doing so eﬃciently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.Siirretty Doriast

UTUPub

Quantification and machine learning classification of imaging biomarkers towards the optimization of MRI differential diagnosis

Author: Βαμβακάς Αλέξανδρος-Χρυσοβαλάντης Ν.
Βαμβακάς Αλέξανδρος-Χρυσοβαλάντης Ν.
Publication venue
Publication date: 01/01/2023
Field of study

University of Thessaly Institutional Repository

Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications

Author: Roffo Giorgio
Publication venue
Publication date: 01/01/1952
Field of study

The last decade has seen a revolution in the theory and application of machine learning and pattern recognition. Through these advancements, variable ranking has emerged as an active and growing research area and it is now beginning to be applied to many new problems. The rationale behind this fact is that many pattern recognition problems are by nature ranking problems. The main objective of a ranking algorithm is to sort objects according to some criteria, so that, the most relevant items will appear early in the produced result list. Ranking methods can be analyzed from two different methodological perspectives: ranking to learn and learning to rank. The former aims at studying methods and techniques to sort objects for improving the accuracy of a machine learning model. Enhancing a model performance can be challenging at times. For example, in pattern classification tasks, different data representations can complicate and hide the different explanatory factors of variation behind the data. In particular, hand-crafted features contain many cues that are either redundant or irrelevant, which turn out to reduce the overall accuracy of the classifier. In such a case feature selection is used, that, by producing ranked lists of features, helps to filter out the unwanted information. Moreover, in real-time systems (e.g., visual trackers) ranking approaches are used as optimization procedures which improve the robustness of the system that deals with the high variability of the image streams that change over time. The other way around, learning to rank is necessary in the construction of ranking models for information retrieval, biometric authentication, re-identification, and recommender systems. In this context, the ranking model's purpose is to sort objects according to their degrees of relevance, importance, or preference as defined in the specific application.Comment: European PhD Thesis. arXiv admin note: text overlap with arXiv:1601.06615, arXiv:1505.06821, arXiv:1704.02665 by other author

arXiv.org e-Print Archive

Biblioteca Virtual del Patrimonio Bibliográfico (Virtual Library of Bibliographical Heritage)