880 research outputs found

    Entity Linking for the Biomedical Domain

    Get PDF
    Entity linking is the process of detecting mentions of different concepts in text documents and linking them to canonical entities in a target lexicon. However, one of the biggest issues in entity linking is the ambiguity in entity names. The ambiguity is an issue that many text mining tools have yet to address since different names can represent the same thing and every mention could indicate a different thing. For instance, search engines that rely on heuristic string matches frequently return irrelevant results, because they are unable to satisfactorily resolve ambiguity. Thus, resolving named entity ambiguity is a crucial step in entity linking. To solve the problem of ambiguity, this work proposes a heuristic method for entity recognition and entity linking over the biomedical knowledge graph concerning the semantic similarity of entities in the knowledge graph. Named entity recognition (NER), relation extraction (RE), and relationship linking make up a conventional entity linking (EL) system pipeline (RL). We have used the accuracy metric in this thesis. Therefore, for each identified relation or entity, the solution comprises identifying the correct one and matching it to its corresponding unique CUI in the knowledge base. Because KBs contain a substantial number of relations and entities, each with only one natural language label, the second phase is directly dependent on the accuracy of the first. The framework developed in this thesis enables the extraction of relations and entities from the text and their mapping to the associated CUI in the UMLS knowledge base. This approach derives a new representation of the knowledge base that lends it to the easy comparison. Our idea to select the best candidates is to build a graph of relations and determine the shortest path distance using a ranking approach. We test our suggested approach on two well-known benchmarks in the biomedical field and show that our method exceeds the search engine's top result and provides us with around 4% more accuracy. In general, when it comes to fine-tuning, we notice that entity linking contains subjective characteristics and modifications may be required depending on the task at hand. The performance of the framework is evaluated based on a Python implementation

    Towards an Unsupervised Bayesian Network Pipeline for Explainable Prediction, Decision Making and Discovery

    Full text link
    An unsupervised learning pipeline for discrete Bayesian networks is proposed to facilitate prediction, decision making, discovery of patterns, and transparency in challenging real-world AI applications, and contend with data limitations. We explore methods for discretizing data, and notably apply the pipeline to prediction and prevention of preterm birth

    Design of an E-learning system using semantic information and cloud computing technologies

    Get PDF
    Humanity is currently suffering from many difficult problems that threaten the life and survival of the human race. It is very easy for all mankind to be affected, directly or indirectly, by these problems. Education is a key solution for most of them. In our thesis we tried to make use of current technologies to enhance and ease the learning process. We have designed an e-learning system based on semantic information and cloud computing, in addition to many other technologies that contribute to improving the educational process and raising the level of students. The design was built after much research on useful technology, its types, and examples of actual systems that were previously discussed by other researchers. In addition to the proposed design, an algorithm was implemented to identify topics found in large textual educational resources. It was tested and proved to be efficient against other methods. The algorithm has the ability of extracting the main topics from textual learning resources, linking related resources and generating interactive dynamic knowledge graphs. This algorithm accurately and efficiently accomplishes those tasks even for bigger books. We used Wikipedia Miner, TextRank, and Gensim within our algorithm. Our algorithm‘s accuracy was evaluated against Gensim, largely improving its accuracy. Augmenting the system design with the implemented algorithm will produce many useful services for improving the learning process such as: identifying main topics of big textual learning resources automatically and connecting them to other well defined concepts from Wikipedia, enriching current learning resources with semantic information from external sources, providing student with browsable dynamic interactive knowledge graphs, and making use of learning groups to encourage students to share their learning experiences and feedback with other learners.Programa de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Luis Sánchez Fernández.- Secretario: Luis de la Fuente Valentín.- Vocal: Norberto Fernández Garcí

    Northeastern Illinois University, Academic Catalog 2023-2024

    Get PDF
    https://neiudc.neiu.edu/catalogs/1064/thumbnail.jp

    Longest Path and Cycle Transversal and Gallai Families

    Get PDF
    A longest path transversal in a graph G is a set of vertices S of G such that every longest path in G has a vertex in S. The longest path transversal number of a graph G is the size of a smallest longest path transversal in G and is denoted lpt(G). Similarly, a longest cycle transversal is a set of vertices S in a graph G such that every longest cycle in G has a vertex in S. The longest cycle transversal number of a graph G is the size of a smallest longest cycle transversal in G and is denoted lct(G). A Gallai family is a family of graphs whose connected members have longest path transversal number 1. In this paper we find several Gallai families and give upper bounds on lpt(G) and lct(G) for general graphs and chordal graphs in terms of |V(G)|

    Discrimination in Insurance Pricing

    Get PDF
    Discrimination is an ongoing problem in the insurance industry that persists, regardless of intent, when the insurer blinds the pricing process from socially controversial or legally prohibited input. In this thesis, we contextualize the problem in property and casualty insurance, considering the prevailing legislation in the United States and the European Union. In Chapter 1 we introduce the problem of discrimination in insurance, and present contemporary legal cases in the United States, along with recent pricing evidence that supports the hypothesis of discrimination in insurance pricing. We contrast the strengths and weaknesses of some anti-discrimination methodologies for a continuous response variable, from theoretical and practical viewpoints. This introduction opens the door to four research questions, which we contribute an answer to throughout this thesis. To ensure that the numerical results of our study are realistic, in Chapter 2 we analyze the largest publicly available database of police-reported motor vehicle traffic accidents in the United States. We describe a methodology for extracting a representative sample during the period 2001-2020, and present some results from an analysis of the data. A nationally representative sample of 1,583,520 people involved in 20 years of fatal and non-fatal accidents is analyzed to examine the effects on the injury severity of motor vehicle occupants. We examine the impact of traditional personal automobile insurance rating factors such as gender, age and previous traffic infractions on serious and fatal injuries. An estimated cost of the accidents is used to highlight the rating factors which have the highest influence in prediction accuracy. These results aid in the calibration of a microsimulation model, presented in Chapter 4. In Chapter 3 we examine the discrimination-free premium in Lindholm et al. (2022a) within a theoretical causal inference framework, and we consider its societal context, to assess when the pricing formula should be used. We consider the insurance pricing problem through the use of directed acyclic graphs. This particular tool allows us to rigorously define an insurance risk factor in a causal framework. We then use this definition in assessing the appropriate application of the discrimination-free premium through three simplified pricing examples, including a health insurance policy and two personal automobile insurance policies with different coverages. From our findings, we suggest criteria for the application of the discrimination-free premium that is dependent on the risk factors and the social context. In Chapter 4 we describe a microsimulation model which can generate a simulated population of the United States. It is designed to match in aggregate selected characteristics of the target population. We focus on a 2020 pseudo-population from Wisconsin, which we use to explore personal automobile insurance premium ratings. We contrast four pricing models, in terms of prediction accuracy, and in terms of their discriminatory impact over race, using four different definitions of discrimination proposed in the actuarial and machine learning literature. By adapting definitions for disparate impact and proxy discrimination to a statistical test we show that the traditional assumption of independence between frequency and severity cannot only result in reduced prediction performance, but can also be detrimental to racial minorities. In Chapter 5 we conclude and present some directions for future research

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    A Magnetic Framelet-Based Convolutional Neural Network for Directed Graphs

    Get PDF
    Recent years have witnessed the surging popularity among studies on directed graphs (digraphs) and digraph neural networks. With the unique capability of encoding directional relationships, digraphs have shown their superiority in modelling many real-life applications, such as citation analysis and website hyperlinks. Spectral Graph Convolutional Neural Networks (spectral GCNNs), a powerful tool for processing and analyzing undirected graph data, have been recently introduced to digraphs. Although spectral GCNNs typically apply frequency filtering via Fourier transform to obtain representations with selective information, research shows that model performance can be enhanced by framelet transform-based filtering. However, the massive majority of such research only considers spectral GCNNs for undirected graphs. In this thesis, we introduce Framelet-MagNet, a magnetic framelet-based spectral GCNN for digraphs. The model adopts magnetic framelet transform which decomposes the input digraph data to low-pass and high-pass frequency components in the spectral domain, forming a more sophisticated digraph representation for filtering. Digraph framelets are constructed with the complex-valued magnetic Laplacian, simultaneously leading to signal processing in both real and complex domains. To our best knowledge, this approach is the first attempt to conduct framelet-based convolution on digraph data in both real and complex domains. We empirically validate the predictive power of Framelet-MagNet via various tasks, including node classification, link prediction, and denoising. Besides, we show through experiment results that Framelet-MagNet can outperform the state-of-the-art approaches across several benchmark datasets
    • …
    corecore