8 research outputs found

    Descoberta da topologia de rede

    Get PDF
    Doutoramento em MatemáticaA monitorização e avaliação do desempenho de uma rede são essenciais para detetar e resolver falhas no seu funcionamento. De modo a conseguir efetuar essa monitorização, e essencial conhecer a topologia da rede, que muitas vezes e desconhecida. Muitas das técnicas usadas para a descoberta da topologia requerem a cooperação de todos os dispositivos de rede, o que devido a questões e políticas de segurança e quase impossível de acontecer. Torna-se assim necessário utilizar técnicas que recolham, passivamente e sem a cooperação de dispositivos intermédios, informação que permita a inferência da topologia da rede. Isto pode ser feito recorrendo a técnicas de tomografia, que usam medições extremo-a-extremo, tais como o atraso sofrido pelos pacotes. Nesta tese usamos métodos de programação linear inteira para resolver o problema de inferir uma topologia de rede usando apenas medições extremo-a-extremo. Apresentamos duas formulações compactas de programação linear inteira mista (MILP) para resolver o problema. Resultados computacionais mostraram que a medida que o número de dispositivos terminais cresce, o tempo que as duas formulações MILP compactas necessitam para resolver o problema, também cresce rapidamente. Consequentemente, elaborámos duas heurísticas com base nos métodos Feasibility Pump e Local ranching. Uma vez que as medidas de atraso têm erros associados, desenvolvemos duas abordagens robustas, um para controlar o número máximo de desvios e outra para reduzir o risco de custo alto. Criámos ainda um sistema que mede os atrasos de pacotes entre computadores de uma rede e apresenta a topologia dessa rede.Monitoring and evaluating the performance of a network is essential to detect and resolve network failures. In order to achieve this monitoring level, it is essential to know the topology of the network which is often unknown. Many of the techniques used to discover the topology require the cooperation of all network devices, which is almost impossible due to security and policy issues. It is therefore, necessary to use techniques that collect, passively and without the cooperation of intermediate devices, the necessary information to allow the inference of the network topology. This can be done using tomography techniques, which use end-to-end measurements, such as the packet delays. In this thesis, we used some integer linear programming theory and methods to solve the problem of inferring a network topology using only end-to-end measurements. We present two compact mixed integer linear programming (MILP) formulations to solve the problem. Computational results showed that as the number of end-devices grows, the time need by the two compact MILP formulations to solve the problem also grows rapidly. Therefore, we elaborate two heuristics based on the Feasibility Pump and Local Branching method. Since the packet delay measurements have some errors associated, we developed two robust approaches, one to control the maximum number of deviations and the other to reduce the risk of high cost. We also created a system that measures the packet delays between computers on a network and displays the topology of that network

    Árvores filogenéticas e o problema da evolução mínima

    Get PDF
    Mestrado em Matemática e AplicaçõesAs árvores filogenéticas permite compreender a história evolutiva das espécies e pode ajudar no desenvolvimento de vacinas e no estudo da biodiversidade. Existem vários critérios para seleccionar uma árvore filogenética de entre as muitas possíveis, sendo um deles o da evolução mínima. Nesta dissertação estudam-se vários métodos para a construção das árvores filogenéticas e várias formulações para a resolução do problema da evolução mínima. Ainda, se apresenta uma formulação alternativa que foi implementada em XPRESS.The phylogenetic trees permits to understand the evolutionary history of species and can assist in the development of vaccines and the study of biodiversity. There are several criteria to select a phylogenetic tree among the many possible, one being the evolution of the minimum. In this thesis we study various methods for the construction of phylogenetic trees and various formulations to solve the problem of minimum evolution. It, also, presents an alternative formulation that was implemented in XPRESS

    Merging microarray studies to identify a common gene expression signature to several structural heart diseases

    Get PDF
    Background: Heart disease is the leading cause of death worldwide. Knowing a gene expression signature in heart disease can lead to the development of more efficient diagnosis and treatments that may prevent premature deaths. A large amount of microarray data is available in public repositories and can be used to identify differentially expressed genes. However, most of the microarray datasets are composed of a reduced number of samples and to obtain more reliable results, several datasets have to be merged, which is a challenging task. The identification of differentially expressed genes is commonly done using statistical methods. Nonetheless, these methods are based on the definition of an arbitrary threshold to select the differentially expressed genes and there is no consensus on the values that should be used. Results: Nine publicly available microarray datasets from studies of different heart diseases were merged to form a dataset composed of 689 samples and 8354 features. Subsequently, the adjusted p-value and fold change were determined and by combining a set of adjusted p-values cutoffs with a list of different fold change thresholds, 12 sets of differentially expressed genes were obtained. To select the set of differentially expressed genes that has the best accuracy in classifying samples from patients with heart diseases and samples from patients with no heart condition, the random forest algorithm was used. A set of 62 differentially expressed genes having a classification accuracy of approximately 95% was identified. Conclusions: We identified a gene expression signature common to different cardiac diseases and supported our findings by showing their involvement in the pathophysiology of the heart. The approach used in this study is suitable for the identification of gene expression signatures, and can be extended to different diseases.info:eu-repo/semantics/publishedVersio

    Naprt expression regulation mechanisms: novel functions predicted by a bioinformatics approach

    Get PDF
    The nicotinate phosphoribosyltransferase (NAPRT) gene has gained relevance in the research of cancer therapeutic strategies due to its main role as a NAD biosynthetic enzyme. NAD metabolism is an attractive target for the development of anti-cancer therapies, given the high energy requirements of proliferating cancer cells and NAD-dependent signaling. A few studies have shown that NAPRT expression varies in different cancer types, making it imperative to assess NAPRT expression and functionality status prior to the application of therapeutic strategies targeting NAD. In addition, the recent finding of NAPRT extracellular form (eNAPRT) suggested the involvement of NAPRT in inflammation and signaling. However, the mechanisms regulating NAPRT gene expression have never been thoroughly addressed. In this study, we searched for NAPRT gene expression regulatory mechanisms in transcription factors (TFs), RNA binding proteins (RBPs) and microRNA (miRNAs) databases. We identified several potential regulators of NAPRT transcription activation, downregulation and alternative splicing and performed GO and expression analyses. The results of the functional analysis of TFs, RBPs and miRNAs suggest new, unexpected functions for the NAPRT gene in cell differentiation, development and neuronal biology.info:eu-repo/semantics/publishedVersio

    Methodology to identify a gene expression signature by merging microarray datasets

    Get PDF
    A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.info:eu-repo/semantics/publishedVersio

    GTO : A toolkit to unify pipelines in genomic and proteomic research

    Get PDF
    Next-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of the field. We present GTO, a comprehensive toolkit designed to unify pipelines in genomic and proteomic research, which combines specialised tools for analysis, simulation, compression, development, visualisation, and transformation of the data. This toolkit combines novel tools with a modular architecture, being an excellent platform for experimental scientists, as well as a useful resource for teaching bioinformatics enquiry to students in life sciences. GTO is implemented in C language and is available, under the MIT license, at https://bioinformatics.ua.pt/gto. (C) 2020 The Authors. Published by Elsevier B.V.Peer reviewe

    Portuguese twitter dataset on COVID-19

    Get PDF
    Over the last two years, the COVID-19 pandemic has affected hundreds of millions of people around the world. As in many crises, people turn to social media platforms, like Twitter, to communicate and share information. Twitter datasets have been used over the years in many research studies to extract valuable information. Therefore, several large COVID- 19 Twitter datasets have been released over the last two years. However, none of these datasets contains only Portuguese Tweets, despite the Portuguese Language being reported as one of the top five languages used on Twitter. In this paper, we present the first large-scale Portuguese COVID-19 Twitter dataset. The dataset contains over 19 million Tweets spanning 2020 and 2021, allowing the entire pandemic to be analyzed. We also conducted a sentiment analysis on the dataset and correlated the various spikes in Tweet count and sentiment scores to various news articles and government announcements in Portugal and Brazil. The dataset is available at: https://github.com/bioinformaticsua/ Portuguese-Covid19-DatasetThis work was supported by FCT – Fundaçãoo para a Ciência e Tecnologia within project DSAIPA/AI/0088/2020.info:eu-repo/semantics/publishedVersio

    MIP model-based heuristics for the minimum weighted tree reconstruction problem

    No full text
    We consider the Minimum Weighted Tree Reconstruction (MWTR) problem and two matheuristic methods to obtain optimal or near-optimal solutions: the Feasibility Pump heuristic and the Local Branching heuristic. These matheuristics are based on a Mixed Integer Programming (MIP) model used to find feasible solutions. We discuss the applicability and effectiveness of the matheuristics to obtain solutions to the MWTR problem. The purpose of the MWTR problem is to find a minimum weighted tree connecting a set of leaves in such a way that the length of the path between each pair of leaves is greater than or equal to a given distance between the considered pair of leaves. The Feasibility Pump matheuristic starts with the Linear Programming solution, iteratively fixes the values of some variables and solves the corresponding problem until a feasible solution is achieved. The Local Branching matheuristic, in its turn, improves a feasible solution by using a local search. Computational results using two different sets of instances, one from the phylogenetic area and another from the telecommunications area, show that these matheuristics are quite effective in finding feasible solutions and present small gap values. Each matheuristic can be used independently; however, the best results are obtained when used together. For instances of the problem having up to 17 leaves, the feasible solution obtained by the Feasibility Pump heuristic is improved by the Local Branching heuristic. Noticeably, when comparing with existing based models processes that solve instances having up to 15 leaves, this achievement of the matheuristic increases the size of solved instances.publishe
    corecore