8 research outputs found
Incorporating Prior Knowledge in Deep Learning Models via Pathway Activity Autoencoders
Motivation: Despite advances in the computational analysis of high-throughput
molecular profiling assays (e.g. transcriptomics), a dichotomy exists between
methods that are simple and interpretable, and ones that are complex but with
lower degree of interpretability. Furthermore, very few methods deal with
trying to translate interpretability in biologically relevant terms, such as
known pathway cascades. Biological pathways reflecting signalling events or
metabolic conversions are Small improvements or modifications of existing
algorithms will generally not be suitable, unless novel biological results have
been predicted and verified. Determining which pathways are implicated in
disease and incorporating such pathway data as prior knowledge may enhance
predictive modelling and personalised strategies for diagnosis, treatment and
prevention of disease.
Results: We propose a novel prior-knowledge-based deep auto-encoding
framework, PAAE, together with its accompanying generative variant, PAVAE, for
RNA-seq data in cancer. Through comprehensive comparisons among various
learning models, we show that, despite having access to a smaller set of
features, our PAAE and PAVAE models achieve better out-of-set reconstruction
results compared to common methodologies. Furthermore, we compare our model
with equivalent baselines on a classification task and show that they achieve
better results than models which have access to the full input gene set.
Another result is that using vanilla variational frameworks might negatively
impact both reconstruction outputs as well as classification performance.
Finally, our work directly contributes by providing comprehensive
interpretability analyses on our models on top of improving prognostication for
translational medicine
Aprendendo medidas de centralidade com redes grafo-neurais
Centrality Measures are important metrics used in Social Network Analysis. Such measures allow one to infer which entity in a network is more central (informally, more important) than another. Analyses based on centrality measures may help detect possible social influencers, security weak spots, etc. This dissertation investigates methods for learning how to predict these centrality measures using only the graph’s structure. More specifically, different ways of ranking the vertices according to their centrality measures are shown, as well as a brief analysis on how to approximate the centrality measures themselves. This is achieved by building on previous work that used neural networks to estimate centrality measures given other centrality measures. In this dissertation, we use the concept of a Graph Neural Network – a Deep Learning model that builds the computation graph according to the topology of a desired input graph. Here these models’ performances are evaluated with different centrality measures, briefly comparing them with other machine learning models in the literature. The analyses for both the approximation and ranking of the centrality measures are evaluated and we show that the ranking of centrality measures is easier to compute. The transfer between the tasks of predicting these different centralities is analysed, and the advantages of each model is highlighted. The models are tested on graphs from different random distributions than the ones they were trained with, on graphs larger than the ones they saw during training as well as with real world instances that are much larger than the largest training graphs. The internal embeddings of the vertices produced by the model are analysed through lower-dimensional projections and conjectures are made on the behaviour seen in the experiments. Finally, we raise and identify possible future work highlighted by the experimental results presented here.Medidas de Centralidade são um tipo de métrica importante na Análise de Redes Sociais. Tais métricas permitem inferir qual entidade é mais central (ou informalmente, mais importante) que outra. Análises baseadas em medidas de centralidade podem ajudar a detectar influenciadores sociais, pontos fracos em sistemas de segurança, etc. Nesta dissertação se investiga métodos para aprender a predizer estas medidas de centralidade utilizando somente a estrutura do grafo de entrada. Mais especificamente, são demonstradas diferentes formas de se classificar os vértices de acordo com suas medidas de centralidade, assim como uma breve análise de como aproximar estas medidas de centralidade. Nesta dissertação utiliza-se o conceito de uma Rede Grafo-Neural – um model de Aprendizagem Profunda que constrói o grafo de computação de acordo com a topologia do grafo que recebe de entrada. Aqui as performances destes modelos são avaliadas com várias medidas de centralidade e são comparadas com outros modelos de aprendizado de máquina na literatura. As análises para tanto a aproximação quanto a classificação das medidas de centralidade são feitas e se mostra que a classificação é mais fácil de ser computada. A transferência entre as tarefas de predizer as diferentes centralidades é analizada e as vantagens de cada modelo são destacadas. Os modelos são testados em grafos de distribuições aleatórias diferentes das quais foram treinados, em grafos maiores daqueles vistos durante o treinamento assim como com instâncias reais que são muito maiores do que as maiores instâncias vistas durante o treinamento. As representações internas dos vértices aprendidas pelo modelo são analisadas através de projeções de menor dimensão e se conjectura sobre o comportamento visto nos experimentos. Por fim, se identifica possíveis futuros trabalhosm destacados pelos resultados experimentais apresentados aqui
A Bayesian predictive analytics model for improving long range epidemic forecasting during an infection wave
Following the outbreak of the coronavirus epidemic in early 2020, municipalities, regional governments and policymakers worldwide had to plan their Non-Pharmaceutical Interventions (NPIs) amidst a scenario of great uncertainty. At this early stage of an epidemic, where no vaccine or medical treatment is in sight, algorithmic prediction can become a powerful tool to inform local policymaking. However, when we replicated one prominent epidemiological model to inform health authorities in a region in the south of Brazil, we found that this model relied too heavily on manually predetermined covariates and was too reactive to changes in data trends. Our four proposed models access data of both daily reported deaths and infections as well as take into account missing data (e.g., the under-reporting of cases) more explicitly, with two of the proposed versions also attempting to model the delay in test reporting. We simulated weekly forecasting of deaths from the period from 31/05/2020 until 31/01/2021, with first week data being used as a cold-start to the algorithm, after which we use a lighter variant of the model for faster forecasting. Because our models are significantly more proactive in identifying trend changes, this has improved forecasting, especially in long-range predictions and after the peak of an infection wave, as they were quicker to adapt to scenarios after these peaks in reported deaths. Assuming reported cases were under-reported greatly benefited the model in its stability, and modelling retroactively-added data (due to the “hot” nature of the data used) had a negligible impact on performance
Características clínicas da Doença Arterial Obstrutiva Periférica (DAOP): um estudo sistemático
Introdução: A doença arterial obstrutiva periférica (DAOP) é um problema crescente de saúde
pública por conta da sua alta prevalência e grande impacto na qualidade de vida. Associada a
fatores de risco cardiovasculares, a DAOP se origina de uma resposta inflamatória crônica que,
juntamente com um desequilíbrio na produção de substâncias vasodilatadoras e
vasoconstritoras, leva à obstrução progressiva das artérias. As manifestações clínicas mais
comuns incluem claudicação intermitente e dor em repouso. Sem diagnóstico e tratamento
adequados, a DAOP pode levar a complicações graves, como úlceras não cicatrizantes e
amputação. Dessa forma, o reconhecimento dos sinais clínicos é essencial para um diagnóstico precoce e um manejo adequado da doença. Materiais e métodos: nos meses de junho e julho de
2023, utilizando as seguintes bases de dados: SciELO, Pubmed, Google Acadêmico. Foram
selecionados alguns artigos com os descritores: doença arterial periférica, doença arterial
obstrutiva periférica, quadro clínico, sinais e sintomas, características clínicas. Resultados: A
DAOP é caracterizada por diferentes manifestações clínicas. Claudicação intermitente envolve
dor nas pernas durante o esforço, aliviada com repouso. A dor em repouso, intensificada à noite,
indica graves obstruções arteriais e pode sinalizar a necessidade de intervenções. Alterações na
pele e músculos dos membros inferiores refletem a diminuição do fluxo sanguíneo. A ausência
de pulso nas extremidades é um indicador chave da localização da obstrução arterial. Feridas
ou úlceras, com cicatrização lenta, surgem devido à redução do fluxo sanguíneo, enquanto a
gangrena, um tecido morto, aponta para um estágio avançado de isquemia. A impotência pode
estar relacionada à extensão da doença vascular. Sensações de frio nas extremidades estão
ligadas a diminuição do fluxo sanguíneo, sendo um sinal do declínio mais rápido da qualidade
de vida. Por fim, o diagnóstico precoce em pacientes assintomáticos é essencial, combinando
monitoramento, exames e aconselhamento sobre hábitos de vida. Conclusão: A DAOP é uma
doença de grande relevância para a saúde pública, tanto por sua prevalência elevada quanto
pelo impacto na qualidade de vida. Apesar dos benéficos avanços tecnológicos relacionados ao
diagnóstico da doença, é de extrema importância entender os sinais e sintomas para realizar o
diagnóstico precoce. Além disso, a conscientização sobre a DAOP é essencial, tanto para o
público em geral quanto para os profissionais de saúde
NEOTROPICAL CARNIVORES: a data set on carnivore distribution in the Neotropics
Mammalian carnivores are considered a key group in maintaining ecological health and can indicate potential ecological integrity in landscapes where they occur. Carnivores also hold high conservation value and their habitat requirements can guide management and conservation plans. The order Carnivora has 84 species from 8 families in the Neotropical region: Canidae; Felidae; Mephitidae; Mustelidae; Otariidae; Phocidae; Procyonidae; and Ursidae. Herein, we include published and unpublished data on native terrestrial Neotropical carnivores (Canidae; Felidae; Mephitidae; Mustelidae; Procyonidae; and Ursidae). NEOTROPICAL CARNIVORES is a publicly available data set that includes 99,605 data entries from 35,511 unique georeferenced coordinates. Detection/non-detection and quantitative data were obtained from 1818 to 2018 by researchers, governmental agencies, non-governmental organizations, and private consultants. Data were collected using several methods including camera trapping, museum collections, roadkill, line transect, and opportunistic records. Literature (peer-reviewed and grey literature) from Portuguese, Spanish and English were incorporated in this compilation. Most of the data set consists of detection data entries (n = 79,343; 79.7%) but also includes non-detection data (n = 20,262; 20.3%). Of those, 43.3% also include count data (n = 43,151). The information available in NEOTROPICAL CARNIVORES will contribute to macroecological, ecological, and conservation questions in multiple spatio-temporal perspectives. As carnivores play key roles in trophic interactions, a better understanding of their distribution and habitat requirements are essential to establish conservation management plans and safeguard the future ecological health of Neotropical ecosystems. Our data paper, combined with other large-scale data sets, has great potential to clarify species distribution and related ecological processes within the Neotropics. There are no copyright restrictions and no restriction for using data from this data paper, as long as the data paper is cited as the source of the information used. We also request that users inform us of how they intend to use the data
NEOTROPICAL ALIEN MAMMALS: a data set of occurrence and abundance of alien mammals in the Neotropics
Biological invasion is one of the main threats to native biodiversity. For a species to become invasive, it must be voluntarily or involuntarily introduced by humans into a nonnative habitat. Mammals were among first taxa to be introduced worldwide for game, meat, and labor, yet the number of species introduced in the Neotropics remains unknown. In this data set, we make available occurrence and abundance data on mammal species that (1) transposed a geographical barrier and (2) were voluntarily or involuntarily introduced by humans into the Neotropics. Our data set is composed of 73,738 historical and current georeferenced records on alien mammal species of which around 96% correspond to occurrence data on 77 species belonging to eight orders and 26 families. Data cover 26 continental countries in the Neotropics, ranging from Mexico and its frontier regions (southern Florida and coastal-central Florida in the southeast United States) to Argentina, Paraguay, Chile, and Uruguay, and the 13 countries of Caribbean islands. Our data set also includes neotropical species (e.g., Callithrix sp., Myocastor coypus, Nasua nasua) considered alien in particular areas of Neotropics. The most numerous species in terms of records are from Bos sp. (n = 37,782), Sus scrofa (n = 6,730), and Canis familiaris (n = 10,084); 17 species were represented by only one record (e.g., Syncerus caffer, Cervus timorensis, Cervus unicolor, Canis latrans). Primates have the highest number of species in the data set (n = 20 species), partly because of uncertainties regarding taxonomic identification of the genera Callithrix, which includes the species Callithrix aurita, Callithrix flaviceps, Callithrix geoffroyi, Callithrix jacchus, Callithrix kuhlii, Callithrix penicillata, and their hybrids. This unique data set will be a valuable source of information on invasion risk assessments, biodiversity redistribution and conservation-related research. There are no copyright restrictions. Please cite this data paper when using the data in publications. We also request that researchers and teachers inform us on how they are using the data