823 research outputs found

    Knowledge-based Biomedical Data Science 2019

    Full text link
    Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    Analyzing Authentic Texts for Language Learning: Web-based Technology for Input Enrichment and Question Generation

    Get PDF
    Acquisition of a language largely depends on the learner's exposure to and interaction with it. Our research goal is to explore and implement automatic techniques that help create a richer grammatical intake from a given text input and engage learners in making form-meaning connections during reading. A starting point for addressing this issue is the automatic input enrichment method, which aims to ensure that a target structure is richly represented in a given text. We demonstrate the high performance of our rule-based algorithm, which is able to detect 87 linguistic forms contained in an official curriculum for the English language. Showcasing the algorithm's capability to differentiate between the various functions of the same linguistic form, we establish the task of tense sense disambiguation, which we approach by leveraging machine learning and rule-based methods. Using the aforementioned technology, we develop an online information retrieval system FLAIR that prioritizes texts with a rich representation of selected linguistic forms. It is implemented as a web search engine for language teachers and learners and provides effective input enrichment in a real-life teaching setting. It can also serve as a foundation for empirical research on input enrichment and input enhancement. The input enrichment component of the FLAIR system is evaluated in a web-based study that demonstrates that English teachers prefer automatic input enrichment to standard web search when selecting reading material for class. We then explore automatic question generation for facilitating and testing reading comprehension as well as linguistic knowledge. We give an overview of the types of questions that are usually asked and can be automatically generated from text in the language learning context. We argue that questions can facilitate the acquisition of different linguistic forms by providing functionally driven input enhancement, i.e., by ensuring that the learner notices and processes the form. The generation of well-established and novel types of questions is discussed and examples are provided; moreover, the results from a crowdsourcing study show that automatically generated questions are comparable to human-written ones

    Automated Deductive Content Analysis of Text: A Deep Contrastive and Active Learning Based Approach

    Get PDF
    Content analysis traditionally involves human coders manually combing through text documents to search for relevant concepts and categories. However, this approach is time-intensive and not scalable, particularly for secondary data like social media content, news articles, or corporate reports. To address this problem, the paper presents an automated framework called Automated Deductive Content Analysis of Text (ADCAT) that uses deep learning-based semantic techniques, ontology of validated construct measures, large language model, human-in-the-loop disambiguation, and a novel augmentation-based weighted contrastive learning approach for improved language representations, to build a scalable approach for deductive content analysis. We demonstrate the effectiveness of the proposed approach to identify firm innovation strategies from their 10-K reports to obtain inferences reasonably close to human coding
    • …
    corecore