104 research outputs found
Entity Linking for the Biomedical Domain
Entity linking is the process of detecting mentions of different concepts in text documents and linking them to canonical entities in a target lexicon.
However, one of the biggest issues in entity linking is the ambiguity in entity names. The ambiguity is an issue that many text mining tools have yet to address since different names can represent the same thing and every mention could indicate a different thing. For instance, search engines that rely on heuristic string matches frequently return irrelevant results, because they are unable to satisfactorily resolve ambiguity.
Thus, resolving named entity ambiguity is a crucial step in entity linking. To solve the problem of ambiguity,
this work proposes a heuristic method for entity recognition and entity linking over the biomedical knowledge graph concerning the semantic similarity of entities in the knowledge graph. Named entity recognition (NER), relation extraction (RE), and relationship linking make up a conventional entity linking (EL) system pipeline (RL). We have used the accuracy metric in this thesis.
Therefore, for each identified relation or entity, the solution comprises identifying the correct one and matching it to its corresponding unique CUI in the knowledge base. Because KBs contain a substantial number of relations and entities, each with only one natural language label, the second phase is directly dependent on the accuracy of the first. The framework developed in this thesis enables the extraction of relations and entities from the text and their mapping to the associated CUI in the UMLS knowledge base. This approach derives a new representation of the knowledge base that lends it to the easy comparison. Our idea to select the best candidates is to build a graph of relations and determine the shortest path distance using a ranking approach.
We test our suggested approach on two well-known benchmarks in the biomedical field and show that our method exceeds the search engine's top result and provides us with around 4% more accuracy. In general, when it comes to fine-tuning, we notice that entity linking contains subjective characteristics and modifications may be required depending on the task at hand. The performance of the framework is evaluated based on a Python implementation
Medical image retrieval for augmenting diagnostic radiology
Even though the use of medical imaging to diagnose patients is ubiquitous in clinical settings, their interpretations are still challenging for radiologists. Many factors make this interpretation task difficult, one of which is that medical images sometimes present subtle clues yet are crucial for diagnosis. Even worse, on the other hand, similar clues could indicate multiple diseases, making it challenging to figure out the definitive diagnoses. To help radiologists quickly and accurately interpret medical images, there is a need for a tool that can augment their diagnostic procedures and increase efficiency in their daily workflow. A general-purpose medical image retrieval system can be such a
tool as it allows them to search and retrieve similar cases that are already diagnosed to make comparative analyses that would complement their diagnostic decisions. In this thesis, we contribute to developing such a system by proposing approaches to be integrated as modules of a single system, enabling it to handle various information needs of radiologists and thus augment their diagnostic processes during the interpretation of medical images.
We have mainly studied the following retrieval approaches to handle radiologists’different information needs; i) Retrieval Based on Contents, ii) Retrieval Based on Contents, Patients’ Demographics, and Disease Predictions, and iii) Retrieval Based on Contents and Radiologists’ Text Descriptions. For the first study, we aimed to find an effective feature representation method to distinguish medical images considering their semantics and modalities. To do that, we have experimented different representation techniques based on handcrafted methods (mainly texture features) and deep learning (deep features). Based on the experimental results, we propose an effective feature representation approach and deep learning architectures for learning and extracting medical image contents. For the second study, we present a multi-faceted method that complements image contents with patients’ demographics and deep learning-based disease predictions, making it able to identify similar cases accurately considering the clinical context the radiologists seek.
For the last study, we propose a guided search method that integrates an image with a radiologist’s text description to guide the retrieval process. This method guarantees that the retrieved images are suitable for the comparative analysis to confirm or rule
out initial diagnoses (the differential diagnosis procedure). Furthermore, our method is based on a deep metric learning technique and is better than traditional content-based approaches that rely on only image features and, thus, sometimes retrieve insignificant random images
The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives
The Archive Query Log (AQL) is a previously unused, comprehensive query log
collected at the Internet Archive over the last 25 years. Its first version
includes 356 million queries, 166 million search result pages, and 1.7 billion
search results across 550 search providers. Although many query logs have been
studied in the literature, the search providers that own them generally do not
publish their logs to protect user privacy and vital business data. Of the few
query logs publicly available, none combines size, scope, and diversity. The
AQL is the first to do so, enabling research on new retrieval models and
(diachronic) search engine analyses. Provided in a privacy-preserving manner,
it promotes open research as well as more transparency and accountability in
the search industry.Comment: SIGIR 2023 resource paper, 13 page
Bibliographic Control in the Digital Ecosystem
With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; “discoverability” in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines
Advances in Information Security and Privacy
With the recent pandemic emergency, many people are spending their days in smart working and have increased their use of digital resources for both work and entertainment. The result is that the amount of digital information handled online is dramatically increased, and we can observe a significant increase in the number of attacks, breaches, and hacks. This Special Issue aims to establish the state of the art in protecting information by mitigating information risks. This objective is reached by presenting both surveys on specific topics and original approaches and solutions to specific problems. In total, 16 papers have been published in this Special Issue
Citizen Science and Geospatial Capacity Building
This book is a collection of the articles published the Special Issue of ISPRS International Journal of Geo-Information on “Citizen Science and Geospatial Capacity Building”. The articles cover a wide range of topics regarding the applications of citizen science from a geospatial technology perspective. Several applications show the importance of Citizen Science (CitSci) and volunteered geographic information (VGI) in various stages of geodata collection, processing, analysis and visualization; and for demonstrating the capabilities, which are covered in the book. Particular emphasis is given to various problems encountered in the CitSci and VGI projects with a geospatial aspect, such as platform, tool and interface design, ontology development, spatial analysis and data quality assessment. The book also points out the needs and future research directions in these subjects, such as; (a) data quality issues especially in the light of big data; (b) ontology studies for geospatial data suited for diverse user backgrounds, data integration, and sharing; (c) development of machine learning and artificial intelligence based online tools for pattern recognition and object identification using existing repositories of CitSci and VGI projects; and (d) open science and open data practices for increasing the efficiency, decreasing the redundancy, and acknowledgement of all stakeholders
Bench-Ranking: ettekirjutav analüüsimeetod suurte teadmiste graafide päringutele
Relatsiooniliste suurandmete (BD) töötlemisraamistike kasutamine suurte teadmiste graafide töötlemiseks kätkeb endas võimalust päringu jõudlust optimeerimida. Kaasaegsed BD-süsteemid on samas keerulised andmesüsteemid, mille konfiguratsioonid omavad olulist mõju jõudlusele. Erinevate raamistike ja konfiguratsioonide võrdlusuuringud pakuvad kogukonnale parimaid tavasid parema jõudluse saavutamiseks. Enamik neist võrdlusuuringutest saab liigitada siiski vaid kirjeldavaks ja diagnostiliseks analüütikaks. Lisaks puudub ühtne standard nende uuringute võrdlemiseks kvantitatiivselt järjestatud kujul. Veelgi enam, suurte graafide töötlemiseks vajalike konveierite kavandamine eeldab täiendavaid disainiotsuseid mis tulenevad mitteloomulikust (relatsioonilisest) graafi töötlemise paradigmast. Taolisi disainiotsuseid ei saa automaatselt langetada, nt relatsiooniskeemi, partitsioonitehnika ja salvestusvormingute valikut. Käesolevas töös käsitleme kuidas me antud uurimuslünga täidame. Esmalt näitame disainiotsuste kompromisside mõju BD-süsteemide jõudluse korratavusele suurte teadmiste graafide päringute tegemisel. Lisaks näitame BD-raamistike jõudluse kirjeldavate ja diagnostiliste analüüside piiranguid suurte graafide päringute tegemisel. Seejärel uurime, kuidas lubada ettekirjutavat analüütikat järjestamisfunktsioonide ja mitmemõõtmeliste optimeerimistehnikate (nn "Bench-Ranking") kaudu. See lähenemine peidab kirjeldava tulemusanalüüsi keerukuse, suunates praktiku otse teostatavate teadlike otsusteni.Leveraging relational Big Data (BD) processing frameworks to process large knowledge graphs yields a great interest in optimizing query performance. Modern BD systems are yet complicated data systems, where the configurations notably affect the performance. Benchmarking different frameworks and configurations provides the community with best practices for better performance. However, most of these benchmarking efforts are classified as descriptive and diagnostic analytics. Moreover, there is no standard for comparing these benchmarks based on quantitative ranking techniques. Moreover, designing mature pipelines for processing big graphs entails considering additional design decisions that emerge with the non-native (relational) graph processing paradigm. Those design decisions cannot be decided automatically, e.g., the choice of the relational schema, partitioning technique, and storage formats. Thus, in this thesis, we discuss how our work fills this timely research gap. Particularly, we first show the impact of those design decisions’ trade-offs on the BD systems’ performance replicability when querying large knowledge graphs. Moreover, we showed the limitations of the descriptive and diagnostic analyses of BD frameworks’ performance for querying large graphs. Thus, we investigate how to enable prescriptive analytics via ranking functions and Multi-Dimensional optimization techniques (called ”Bench-Ranking”). This approach abstracts out from the complexity of descriptive performance analysis, guiding the practitioner directly to actionable informed decisions.https://www.ester.ee/record=b553332
- …