8 research outputs found
Descoberta da topologia de rede
Doutoramento em MatemáticaA monitorização e avaliação do desempenho de uma rede são essenciais
para detetar e resolver falhas no seu funcionamento. De modo a
conseguir efetuar essa monitorização, e essencial conhecer a topologia
da rede, que muitas vezes e desconhecida. Muitas das técnicas usadas
para a descoberta da topologia requerem a cooperação de todos os
dispositivos de rede, o que devido a questões e políticas de segurança
e quase impossível de acontecer. Torna-se assim necessário utilizar
técnicas que recolham, passivamente e sem a cooperação de dispositivos
intermédios, informação que permita a inferência da topologia
da rede. Isto pode ser feito recorrendo a técnicas de tomografia, que
usam medições extremo-a-extremo, tais como o atraso sofrido pelos
pacotes.
Nesta tese usamos métodos de programação linear inteira para resolver
o problema de inferir uma topologia de rede usando apenas medições
extremo-a-extremo. Apresentamos duas formulações compactas de
programação linear inteira mista (MILP) para resolver o problema.
Resultados computacionais mostraram que a medida que o número de
dispositivos terminais cresce, o tempo que as duas formulações MILP
compactas necessitam para resolver o problema, também cresce rapidamente.
Consequentemente, elaborámos duas heurísticas com base
nos métodos Feasibility Pump e Local ranching. Uma vez que as medidas
de atraso têm erros associados, desenvolvemos duas abordagens
robustas, um para controlar o número máximo de desvios e outra para
reduzir o risco de custo alto. Criámos ainda um sistema que mede
os atrasos de pacotes entre computadores de uma rede e apresenta a
topologia dessa rede.Monitoring and evaluating the performance of a network is essential
to detect and resolve network failures. In order to achieve this monitoring
level, it is essential to know the topology of the network which
is often unknown. Many of the techniques used to discover the topology
require the cooperation of all network devices, which is almost
impossible due to security and policy issues. It is therefore, necessary
to use techniques that collect, passively and without the cooperation
of intermediate devices, the necessary information to allow the inference
of the network topology. This can be done using tomography
techniques, which use end-to-end measurements, such as the packet
delays.
In this thesis, we used some integer linear programming theory and
methods to solve the problem of inferring a network topology using
only end-to-end measurements. We present two compact mixed integer
linear programming (MILP) formulations to solve the problem. Computational
results showed that as the number of end-devices grows, the
time need by the two compact MILP formulations to solve the problem
also grows rapidly. Therefore, we elaborate two heuristics based on the
Feasibility Pump and Local Branching method. Since the packet delay
measurements have some errors associated, we developed two robust
approaches, one to control the maximum number of deviations and
the other to reduce the risk of high cost. We also created a system
that measures the packet delays between computers on a network and
displays the topology of that network
Árvores filogenéticas e o problema da evolução mínima
Mestrado em Matemática e AplicaçõesAs árvores filogenéticas permite compreender a história evolutiva
das espécies e pode ajudar no desenvolvimento de vacinas e no
estudo da biodiversidade. Existem vários critérios para seleccionar
uma árvore filogenética de entre as muitas possíveis, sendo um deles
o da evolução mínima.
Nesta dissertação estudam-se vários métodos para a construção
das árvores filogenéticas e várias formulações para a resolução do
problema da evolução mínima. Ainda, se apresenta uma formulação
alternativa que foi implementada em XPRESS.The phylogenetic trees permits to understand the evolutionary history
of species and can assist in the development of vaccines and
the study of biodiversity. There are several criteria to select a phylogenetic
tree among the many possible, one being the evolution of
the minimum.
In this thesis we study various methods for the construction of phylogenetic
trees and various formulations to solve the problem of minimum
evolution. It, also, presents an alternative formulation that was
implemented in XPRESS
Merging microarray studies to identify a common gene expression signature to several structural heart diseases
Background: Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed genes. However, most of the microarray datasets are composed of a reduced number of samples and to obtain more reliable results, several datasets have to be merged, which is a challenging task. The identification of differentially expressed genes is commonly done using statistical methods. Nonetheless, these methods are based on the definition of an arbitrary threshold to select the differentially expressed genes and there is no consensus on the values that should be used. Results: Nine publicly available microarray datasets from studies of different heart diseases were merged to form a dataset composed of 689 samples and 8354 features. Subsequently, the adjusted p-value and fold change were determined and by combining a set of adjusted p-values cutoffs with a list of different fold change thresholds, 12 sets of differentially expressed genes were obtained. To select the set of differentially expressed genes that has the best accuracy in classifying samples from patients with heart diseases and samples from patients with no heart condition, the random forest algorithm was used. A set of 62 differentially expressed genes having a classification accuracy of approximately 95% was identified. Conclusions: We identified a gene expression signature common to different cardiac diseases and supported our findings by showing their involvement in the pathophysiology of the heart. The approach used in this study is suitable for the identification of gene expression signatures, and can be extended to different diseases.info:eu-repo/semantics/publishedVersio
Naprt expression regulation mechanisms: novel functions predicted by a bioinformatics approach
The nicotinate phosphoribosyltransferase (NAPRT) gene has gained relevance in the research of cancer therapeutic strategies due to its main role as a NAD biosynthetic enzyme. NAD metabolism is an attractive target for the development of anti-cancer therapies, given the high energy requirements of proliferating cancer cells and NAD-dependent signaling. A few studies have shown that NAPRT expression varies in different cancer types, making it imperative to assess NAPRT expression and functionality status prior to the application of therapeutic strategies targeting NAD. In addition, the recent finding of NAPRT extracellular form (eNAPRT) suggested the involvement of NAPRT in inflammation and signaling. However, the mechanisms regulating NAPRT gene expression have never been thoroughly addressed. In this study, we searched for NAPRT gene expression regulatory mechanisms in transcription factors (TFs), RNA binding proteins (RBPs) and microRNA (miRNAs) databases. We identified several potential regulators of NAPRT transcription activation, downregulation and alternative splicing and performed GO and expression analyses. The results of the functional analysis of TFs, RBPs and miRNAs suggest new, unexpected functions for the NAPRT gene in cell differentiation, development and neuronal biology.info:eu-repo/semantics/publishedVersio
Methodology to identify a gene expression signature by merging microarray datasets
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.info:eu-repo/semantics/publishedVersio
GTO : A toolkit to unify pipelines in genomic and proteomic research
Next-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of the field. We present GTO, a comprehensive toolkit designed to unify pipelines in genomic and proteomic research, which combines specialised tools for analysis, simulation, compression, development, visualisation, and transformation of the data. This toolkit combines novel tools with a modular architecture, being an excellent platform for experimental scientists, as well as a useful resource for teaching bioinformatics enquiry to students in life sciences. GTO is implemented in C language and is available, under the MIT license, at https://bioinformatics.ua.pt/gto. (C) 2020 The Authors. Published by Elsevier B.V.Peer reviewe
Portuguese twitter dataset on COVID-19
Over the last two years, the COVID-19 pandemic
has affected hundreds of millions of people around the world.
As in many crises, people turn to social media platforms, like
Twitter, to communicate and share information. Twitter datasets
have been used over the years in many research studies to
extract valuable information. Therefore, several large COVID-
19 Twitter datasets have been released over the last two years.
However, none of these datasets contains only Portuguese Tweets,
despite the Portuguese Language being reported as one of the
top five languages used on Twitter. In this paper, we present
the first large-scale Portuguese COVID-19 Twitter dataset. The
dataset contains over 19 million Tweets spanning 2020 and 2021,
allowing the entire pandemic to be analyzed. We also conducted
a sentiment analysis on the dataset and correlated the various
spikes in Tweet count and sentiment scores to various news
articles and government announcements in Portugal and Brazil.
The dataset is available at: https://github.com/bioinformaticsua/
Portuguese-Covid19-DatasetThis work was supported by FCT – Fundaçãoo para a Ciência
e Tecnologia within project DSAIPA/AI/0088/2020.info:eu-repo/semantics/publishedVersio
MIP model-based heuristics for the minimum weighted tree reconstruction problem
We consider the Minimum Weighted Tree Reconstruction (MWTR) problem and two matheuristic methods to obtain optimal or near-optimal solutions: the Feasibility Pump heuristic and the Local Branching heuristic.
These matheuristics are based on a Mixed Integer Programming (MIP) model used to find feasible solutions.
We discuss the applicability and effectiveness of the matheuristics to obtain solutions to the MWTR problem.
The purpose of the MWTR problem is to find a minimum weighted tree connecting a set of leaves in such a way that the length of the path between each pair of leaves is greater than or equal to a given distance between the considered pair of leaves.
The Feasibility Pump matheuristic starts with the Linear Programming solution, iteratively fixes the values of some variables and solves the corresponding problem until a feasible solution is achieved.
The Local Branching matheuristic, in its turn, improves a feasible solution by using a local search.
Computational results using two different sets of instances, one from the phylogenetic area and another from the telecommunications area, show that these matheuristics are quite effective in finding feasible solutions and present small gap values.
Each matheuristic can be used independently; however, the best results are obtained when used together.
For instances of the problem having up to 17 leaves, the feasible solution obtained by the Feasibility Pump heuristic is improved by the Local Branching heuristic. Noticeably, when comparing with existing based models processes that solve instances having up to 15 leaves, this achievement of the matheuristic increases the size of solved instances.publishe