Search CORE

4 research outputs found

Portal Web para enriquecimento de informação Genómica e Proteómica

Author: David Vanhuysse
Publication venue
Publication date: 09/02/2017
Field of study

Atualmente a quantidade de informação no domínio da biologia molecular é enorme. As áreas da Genómica e da Proteómica não são excepção. São enumeras as Bases de dados acessíveis na Web com informação importante para os estudos nestas áreas. O facto de existir muita informação pode ser bom pelo facto de termos imensos sítios Web onde procurar e haver sempre informação para aquilo que procuramos e mau pelo facto de não ser simples aceder à informação e de poder haver muita informação redundante, e por vezes os identificadores dos items de informação (genes, proteínas, por exemplo) serem diferentes para o mesmo item em sítios Web diferentes.Existem diversos sítios web com informação relevante para os estudos em genómica e proteómica. Cada um deles tem o seu formato e a forma como se faz o acesso aos dados em cada um deles varia de sítio Web para sítio Web. Muitos destes sítios Web disponibilizam APIs (Application Programming Interface), o que permite o acesso à informação por aplicações de software, enquanto que outros guardam toda a sua informação em Bases de Dados e mostram o conteúdo em páginas HTML. Estas diversas formas que podemos consultar a informação sobre a genómica e a proteómica tornam difícil o acesso por aplicações de software, dificultando por sua vez o trabalho dos biólogos.Acresce ainda que muitas vezes os investigadores têm ue estudar uma grande quantidade de genes (resultado, por exemplo que uma sequenciação de um genoma) ou de proteínas (prever, por exemplo, oas posições da estrutura linear de aminoácidos onde irão surgir hélices - estruturas secundárias). Nestes casos que envolvem uma grade quantidade de gens ou proteínas, os métodos automáticos são extremanete valiosos. Não só pela rapidez na obtenção de resultados mas porque podem usar variados tipos de informação (recolhida na Web) para ajudar o especialista a agrupar uma grande quantidade de genes ou ajudar na explicação dos locais onde aparecem as estruturas secundarias nas proteínas.Para tal será elaborado um Portal Web que facilite nesta tarefa de recolha da informação e ao mesmo tempo fazer este agrupamento dos genes e proteínas de forma racional, sem a existência dos vários identificadores para o mesmo item. O sítio Web vai permitir aos investigadores adicionar novos repositórios que contenham API's ou repositórios que não disponham de nenhuma API.Currently the amount of information in the field of molecular biology is huge. The areas of Genomics and Proteomics are no exception. There are many accessible databases on the Web with important information for studies in these areas. The fact that there is too much information can be good because of huge websites terms where to look and always be information to what we seek and bad for not being simple access to information and can be a lot of redundant information, and sometimes the identifiers the items of information (genes, proteins, for example) are different for the same item in different websites.There are several websites with information relevant to studies in genomics and proteomics. Each has its shape and the way it makes access to the data in each varies from website to website. Many of these websites offer APIs (Application Programming Interface), which allows access to information applications software, while others keep all your information in databases and show content in HTML pages. These various ways that we can refer to the information on genomics and proteomics make it difficult to access software applications, in turn hampering the work of biologists.Furthermore, sometimes researchers have studying a large number of genes (resulting, for example a sequencing a genome) or protein (to provide, for example, oas positions of the linear structure of amino acids which will arise propellers - secondary structures ). In these cases that involve a grid amount of genes or proteins, automatic methods are valuable extremanete. Not only the speed in achieving results but because they can use various types of information (collected on the Web) to help the expert to group a large number of genes or help in explaining where appear the secondary structures in proteins.To do this will produce a Web portal that facilitates this collection task information while making this grouping of genes and proteins in a rational way, without the existence of multiple identifiers for the same item. The website will enable researchers to add new repositories containing API's or repositories that do not have any API

Repositório Aberto da Universidade do Porto

Explainable methods for knowledge graph refinement and exploration via symbolic reasoning

Author: Gad-Elrab Mohamed Hassan Mohamed
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2021
Field of study

Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.Wissensgraphen haben viele Anwendungen in verschiedenen Bereichen, beispielsweise im Finanz- und Gesundheitswesen. Wissensgraphen sind jedoch unvollständig und enthalten auch ungültige Daten. Hohe Abdeckung und Korrektheit erfordern neue Methoden zur Wissensgraph-Erweiterung und Wissensgraph-Validierung. Beide Aufgaben zusammen werden als Wissensgraph-Verfeinerung bezeichnet. Ein wichtiger Aspekt dabei ist die Erklärbarkeit und Verständlichkeit von Wissensgraphinhalten für Nutzer. In Anwendungen ist darüber hinaus die nutzerseitige Exploration von Wissensgraphen von besonderer Bedeutung. Suchen und Navigieren im Graph hilft dem Anwender, die Wissensinhalte und ihre Limitationen besser zu verstehen. Aufgrund der riesigen Menge an vorhandenen Entitäten und Fakten ist die Wissensgraphen-Exploration eine Herausforderung. Taxonomische Typsystem helfen dabei, sind jedoch für tiefergehende Exploration nicht ausreichend. Diese Dissertation adressiert die Herausforderungen der Wissensgraph-Verfeinerung und der Wissensgraph-Exploration durch algorithmische Inferenz über dem Wissensgraph. Sie erweitert logisches Schlussfolgern und kombiniert es mit anderen Methoden, insbesondere mit neuronalen Wissensgraph-Einbettungen und mit Text-Mining. Diese neuen Methoden liefern Ausgaben mit Erklärungen für Nutzer. Die Dissertation umfasst folgende Beiträge: Insbesondere leistet die Dissertation folgende Beiträge: • Zur Wissensgraph-Erweiterung präsentieren wir ExRuL, eine Methode zur Revision von Horn-Regeln durch Hinzufügen von Ausnahmebedingungen zum Rumpf der Regeln. Die erweiterten Regeln können neue Fakten inferieren und somit Lücken im Wissensgraphen schließen. Experimente mit großen Wissensgraphen zeigen, dass diese Methode Fehler in abgeleiteten Fakten erheblich reduziert und nutzerfreundliche Erklärungen liefert. • Mit RuLES stellen wir eine Methode zum Lernen von Regeln vor, die auf probabilistischen Repräsentationen für fehlende Fakten basiert. Das Verfahren erweitert iterativ die aus einem Wissensgraphen induzierten Regeln, indem es neuronale Wissensgraph-Einbettungen mit Informationen aus Textkorpora kombiniert. Bei der Regelgenerierung werden neue Metriken für die Regelqualität verwendet. Experimente zeigen, dass RuLES die Qualität der gelernten Regeln und ihrer Vorhersagen erheblich verbessert. • Zur Unterstützung der Wissensgraph-Validierung wird ExFaKT vorgestellt, ein Framework zur Konstruktion von Erklärungen für Faktkandidaten. Die Methode transformiert Kandidaten mit Hilfe von Regeln in eine Menge von Aussagen, die leichter zu finden und zu validieren oder widerlegen sind. Die Ausgabe von ExFaKT ist eine Menge semantischer Evidenzen für Faktkandidaten, die aus Textkorpora und dem Wissensgraph extrahiert werden. Experimente zeigen, dass die Transformationen die Ausbeute und Qualität der entdeckten Erklärungen deutlich verbessert. Die generierten unterstützen Erklärungen unterstütze sowohl die manuelle Wissensgraph- Validierung durch Kuratoren als auch die automatische Validierung. • Zur Unterstützung der Wissensgraph-Exploration wird ExCut vorgestellt, eine Methode zur Erzeugung von informativen Entitäts-Clustern mit Erklärungen unter Verwendung von Wissensgraph-Einbettungen und automatisch induzierten Regeln. Eine Cluster-Erklärung besteht aus einer Kombination von Relationen zwischen den Entitäten, die den Cluster identifizieren. ExCut verbessert gleichzeitig die Cluster- Qualität und die Cluster-Erklärbarkeit durch iteratives Verschränken des Lernens von Einbettungen und Regeln. Experimente zeigen, dass ExCut Cluster von hoher Qualität berechnet und dass die Cluster-Erklärungen für Nutzer informativ sind

Universaar

Acronym

Análise de imagens médicas com recurso a metodologias de deep learning

Author: Castro Simão Pedro Pereira
Publication venue
Publication date: 01/01/2021
Field of study

Mestrado em Engenharia Eletrónica e InformáticaExame público realizado em 26 de Julho de 2021A imagiologia médica refere-se a um conjunto de processos ou técnicas que permitem criar representações visuais das partes interiores do corpo. A avaliação de uma imagem médica requer uma análise cuidadosa bem como a compreensão das propriedades e dos detalhes das imagens, que incluem as condições de aquisição, as condições experimentais e as características do sistema biológico. O recurso à imagiologia médica permite a investigação e o diagnóstico precoce de diferentes patologias. Portanto, uma abordagem baseada no conhecimento para a análise e interpretação de tais imagens é imperativa. Há cada vez mais inovações no que concerne ao diagnóstico através de imagens médicas. Como tal, os avanços técnicos que permitam a produção de imagens de maior resolução, aliados a métodos de análise de imagens médicas que permitam extrair novas informações, têm sido investigados por parte da comunidade científica. Uma das áreas de investigação em destaque consiste na aplicação da inteligência artificial na imagem médica emulando a racionalidade do diagnóstico realizada pelo médico e oferecendo uma oportunidade para novos desenvolvimentos no que concerne à utilização da imagem médica como ponto de partida para o diagnóstico. Este trabalho visa investigar e implementar metodologias de machine learning, um ramo da inteligência artificial, para classificar e segmentar imagens médicas. Para tal, foi realizada uma extensa pesquisa bibliográfica sobre o estado da arte em revistas da especialidade indexadas. No sentido de testar diferentes abordagens foram selecionados para teste dois dataset para classificação, MedMNIST e MedNIST compostos por 454591 e 58954 imagens médicas respetivamente, e dois dataset para segmentação, BBBC038 composto por 735 imagens médicas e o ICPR2012 com 50 imagens H&E. Assim, o trabalho foi dividido em duas vertentes principais. Uma primeira parte onde se foca na classificação de imagens médicas, onde foi implementada e comparada a performance de várias arquiteturas utilizando as métricas adequadas. Para a realização desta primeira tarefa, foi necessário um pré-processamento dos dados (das imagens médicas). Em segundo lugar, foram investigadas formas de segmentação de imagens com o intuito de identificar núcleo celulares. Uma vez mais, foram construídas e comparadas as performances de diferentes arquiteturas, utilizando as métricas mais pertinentes. Adicionalmente, foi investigada a segmentação e a deteção com a particularidade de identificar núcleos que se encontrassem em mitose. Para ambas as tarefas foram obtidos resultados mais promissores do que os previamente reportados para os dataset’s estudados. No final, foi ainda desenvolvida uma aplicação web que permite testar os modelos e visualizar os resultados. Em resumo, os resultados deste estudo demonstraram o potencial das metodologias de machine learning como uma ferramenta importante para automatização de tarefas na área de imagem médica apresentando importantes contributos que permitem uma melhoria na classificação de determinadas patologias.Medical imaging encompasses a set of processes or techniques which allow the creation of visual representations of the inner parts of the body. The evaluation of a medical image requires a careful analysis, as well as the understanding of the properties and details of the images, that include the acquisition and experimental conditions, and the features of the biological system. The use of medical imaging allows the investigation and the early diagnosis of different pathologies. Therefore, a knowledge-based approach for the analysis and interpretation of such images is imperative. There is an increasing innovation concerning diagnosis through medical imaging. As such, the technical advances that allow the production of higher resolution images, allied to methods of medical images analysis that uncover new information, have been investigated by the scientific community. A research field that must be highlighted within medical imaging is artificial intelligence, which emulates the rationality of the diagnosis performed by the medical doctor and offers an opportunity for new developments regarding the use of medical imaging as a starting point for diagnosis. This work aims to investigate and implement machine learning methodologies, a field of artificial intelligence, to classify and segment medical images. For that goal, an intensive literature search in indexed specialty journals was conducted. As a way to test different approaches, two datasets were selected for classification, MedMNIST and MedNIST, composed by 454591 and 58954 medical images, respectively, and two datasets for segmentation, BBBC038, composed by 735 medical images and ICPR2012 with 50 H&E images. Therefore, this work was divided into two main components. A first part, where the focus is on the classification of medical images, where the performance of several architectures was implemented and characterized, using the adequate metrics. To accomplish this first task, a pre-processing of the data (medical images) was needed. Secondly, the segmentation of images with the goal of identifying cell nuclei were investigated. Once again, the performances of several architectures were built and compared, using the most relevant metrics. Additionally, research was conducted concerning segmentation and detection, with the singularity of identifying nuclei undergoing mitosis. The results obtained were more promising for both tasks than what had previously been reported for the studied datasets. In the end, a web application capable of testing the models and visualize the results was developed. In brief, the results obtained herein demonstrate the potential of machine learning methodologies as an important tool for the automatization of tasks in the medical imaging field, providing important contributions that lead to a better classification of certain pathologies

Repositório das Universidades Lusíada de Lisboa (RUL)