118 research outputs found
Computational Methods for the Analysis of Genomic Data and Biological Processes
In recent decades, new technologies have made remarkable progress in helping to understand biological systems. Rapid advances in genomic profiling techniques such as microarrays or high-performance sequencing have brought new opportunities and challenges in the fields of computational biology and bioinformatics. Such genetic sequencing techniques allow large amounts of data to be produced, whose analysis and cross-integration could provide a complete view of organisms. As a result, it is necessary to develop new techniques and algorithms that carry out an analysis of these data with reliability and efficiency. This Special Issue collected the latest advances in the field of computational methods for the analysis of gene expression data, and, in particular, the modeling of biological processes. Here we present eleven works selected to be published in this Special Issue due to their interest, quality, and originality
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Data fusion techniques for biomedical informatics and clinical decision support
Data fusion can be used to combine multiple data sources or modalities to facilitate enhanced visualization, analysis, detection, estimation, or classification. Data fusion can be applied at the raw-data, feature-based, and decision-based levels. Data fusion applications of different sorts have been built up in areas such as statistics, computer vision and other machine learning aspects. It has been employed in a variety of realistic scenarios such as medical diagnosis, clinical decision support, and structural health monitoring. This dissertation includes investigation and development of methods to perform data fusion for cervical cancer intraepithelial neoplasia (CIN) and a clinical decision support system. The general framework for these applications includes image processing followed by feature development and classification of the detected region of interest (ROI). Image processing methods such as k-means clustering based on color information, dilation, erosion and centroid locating methods were used for ROI detection. The features extracted include texture, color, nuclei-based and triangle features. Analysis and classification was performed using feature- and decision-level data fusion techniques such as support vector machine, statistical methods such as logistic regression, linear discriminant analysis and voting algorithms --Abstract, page iv
Finding new genes and pathways involved in cancer development by analysing insertional mutagenesis data
Dissertação de mestrado em BioinformáticaCancer emerges froman uncontrollable division of the organism’s cells, creating a tumour.
These tumours can emerge fromany part of the human body. The increase of cellular division
and growth can be created by mutations in the genome. Several methodologies are approached,
in the research, to finding new cancer genes. The insertional mutagenesis (IM)
has been one of the most used, in which the mouse is infected by a retrovirus or a transposon,
increasing the gene expression in the insertions’ vicinity.
The data used in work essay are a collection of independent studies of IM inmice. After
its processing, the data has 3,414 samples, having information of 7,751 genes. Each sample
matches a type of cancer (colorectal, hematopoietic, hepatocellular carcinoma, lymphoma,
malignant peripheral nerve sheath, medulloblastoma and pancreatic).
The main goal of this project is to determine if there are specific genes for a particular
type of cancer. And, if there are, which are the 15 most evolved genes for that type of cancer.
Machine learning (ML) is a subject where its goal is to increase knowledge based on
given experimental data, allowing it to execute predictions and accurate decisions. To answer
our purpose, it is necessary the transform the data into a dissimilarity relation between
samples. Different approaches were used: two of them are known from the literature
(Hamming distance and Jaccard distance) and two new metrics were developed (Gene
DependentMethod (GDM) and Gene IndependentMethod (GIM)).With these transformations,
unsupervised learning methods (such as Principal Component Analysis (PCA) and
t-distributed stochastic neighbor embedding (t-SNE)) and supervised learning approach,
testing different classifiers by crossed validation, were used.
The main results show that some genes may be specific to a particular type of cancer.
Therefore, it is possible to create a ranking gene, according to its importance to a type of
cancer. 105 genes are presented (15 genes of each type of cancer), of which 18 were not
annotated yet and 19 have already been mentioned in the literature to be involved in the
development of the selected cancer tissue. Afterwards it must be performed a proper in
vitro and in vivo validation.O cancro surge da divisão incontrolável de células de um organismo, criando um tumor.
Estes tumores podem surgir em qualquer parte do corpo do ser vivo. O aumento da divisão
e crescimento celular pode dever-se a mutações no genoma. São várias as metodologias
abordadas na investigação para a descoberta de novos genes de cancro. A mutação por
inserção (IM) tem sido uma abordagem bastante utilizada, no qual o rato é infetado por
um retrovÃrus ou um transposão, aumentando a expressão do gene que se encontra na
vizinhança da inserção.
Os dados usados neste trabalho correspondem a uma coleção de estudos independentes
de IM em ratos. Após o seu processamento, os dados contêm 3,414 amostras, tendo
informação de 7,751 genes. Cada umadas amostras corresponde a umtipo de cancro (colorectal,
tecido hematopoiético, carcinoma hepatocelular, linfoma, tumor maligno de bainha
nervosa, meduloblastoma e pâncreas).
O objetivo principal deste projeto é determinar se existem genes especÃficos para um
determinado tipo de cancro e, se sim, quais são os 15 genes mais envolvidos para o desenvolvimento
do mesmo.
A aprendizagem de máquina (ML) tem como objetivo ganhar conhecimento com base
em dados experimentais fornecidos, permitindo que este possa realizar previsões e decisões
precisas. Para se responder ao objetivo, é necessária a transformação dos dados
numa relação de dissimilaridade entre amostras. Foram usadas quatro abordagens: duas
delas são descritas na literatura (a distância de Hamming e a distância de Jaccard) e duas
novas métricas foram desenvolvidas (o método de gene dependente (GDM) e o método
de gene independente (GIM)). A partir destas transformações foram usadas metodologias
de aprendizagem não supervisionada (a Análise de Componentes Principais (PCA) e o tdistributed
stochastic neighbor embedding (t-SNE)), e a metodologia supervisionada, testando
diferentes classificadores por validação cruzada.
Os resultados principais mostram que existem genes que poderão ser especÃficos para
umdado tipo de cancro. Assim sendo, é possÃvel criar uma ordenação dos genes de acordo
com a sua importância face a umtipo de cancro. São apresentados 105 genes (15 genes para
cada tipo de cancro), dos quais 18 ainda não foram anotados e 19 já foram mencionados
na literatura por estarem envolvidos no desenvolvimento do cancro do tecido selecionado.
Posteriormente deverá ser realizada a devida validação in vitro e in vivo
Non-communicable Diseases, Big Data and Artificial Intelligence
This reprint includes 15 articles in the field of non-communicable Diseases, big data, and artificial intelligence, overviewing the most recent advances in the field of AI and their application potential in 3P medicine
Mass spectral imaging of clinical samples using deep learning
A better interpretation of tumour heterogeneity and variability is vital for the improvement of novel diagnostic techniques and personalized cancer treatments. Tumour tissue heterogeneity is characterized by biochemical heterogeneity, which can be investigated by unsupervised metabolomics.
Mass Spectrometry Imaging (MSI) combined with Machine Learning techniques have generated increasing interest as analytical and diagnostic tools for the analysis of spatial molecular patterns in tissue samples. Considering the high complexity of data produced by the application of MSI, which can consist of many thousands of spectral peaks, statistical analysis and in particular machine learning and deep learning have been investigated as novel approaches to deduce the relationships between the measured molecular patterns and the local structural and biological properties of the tissues.
Machine learning have historically been divided into two main categories: Supervised and Unsupervised learning. In MSI, supervised learning methods may be used to segment tissues into histologically relevant areas e.g. the classification of tissue regions in H&E (Haemotoxylin and Eosin) stained samples. Initial classification by an expert histopathologist, through visual inspection enables the development of univariate or multivariate models, based on tissue regions that have significantly up/down-regulated ions. However, complex data may result in underdetermined models, and alternative methods that can cope with high dimensionality and noisy data are required.
Here, we describe, apply, and test a novel diagnostic procedure built using a combination of MSI and deep learning with the objective of delineating and identifying biochemical differences between cancerous and non-cancerous tissue in metastatic liver cancer and epithelial ovarian cancer. The workflow investigates the robustness of single (1D) to multidimensional (3D) tumour analyses and also highlights possible biomarkers which are not accessible from classical visual analysis of the H&E images. The identification of key molecular markers may provide a deeper understanding of tumour heterogeneity and potential targets for intervention.Open Acces
- …