295 research outputs found

    Automated Deductive Content Analysis of Text: A Deep Contrastive and Active Learning Based Approach

    Get PDF
    Content analysis traditionally involves human coders manually combing through text documents to search for relevant concepts and categories. However, this approach is time-intensive and not scalable, particularly for secondary data like social media content, news articles, or corporate reports. To address this problem, the paper presents an automated framework called Automated Deductive Content Analysis of Text (ADCAT) that uses deep learning-based semantic techniques, ontology of validated construct measures, large language model, human-in-the-loop disambiguation, and a novel augmentation-based weighted contrastive learning approach for improved language representations, to build a scalable approach for deductive content analysis. We demonstrate the effectiveness of the proposed approach to identify firm innovation strategies from their 10-K reports to obtain inferences reasonably close to human coding

    Hybrid intelligent framework for automated medical learning

    Get PDF
    This paper investigates the automated medical learning and proposes hybrid intelligent framework, called Hybrid Automated Medical Learning (HAML). The goal is the efficient combination of several intelligent components in order to automatically learn the medical data. Multi agents system is proposed by using distributed deep learning, and knowledge graph for learning medical data. The distributed deep learning is used for efficient learning of the different agents in the system, where the knowledge graph is used for dealing with heterogeneous medical data. To demonstrate the usefulness and accuracy of the HAML framework, intensive simulations on medical data were conducted. A wide range of experiments were conducted to verify the efficiency of the proposed system. Three case studies are discussed in this research, the first case study is related to process mining, and more precisely on the ability of HAML to detect relevant patterns from event medical data. The second case study is related to smart building, and the ability of HAML to recognize the different activities of the patients. The third one is related to medical image retrieval, and the ability of HAML to find the most relevant medical images according to the image query. The results show that the developed HAML achieves good performance compared to the most up-to-date medical learning models regarding both the computational and cost the quality of returned solutions.publishedVersio

    Hybrid intelligent framework for automated medical learning

    Get PDF
    This paper investigates the automated medical learning and proposes hybrid intelligent framework, called Hybrid Automated Medical Learning (HAML). The goal is the efficient combination of several intelligent components in order to automatically learn the medical data. Multi agents system is proposed by using distributed deep learning, and knowledge graph for learning medical data. The distributed deep learning is used for efficient learning of the different agents in the system, where the knowledge graph is used for dealing with heterogeneous medical data. To demonstrate the usefulness and accuracy of the HAML framework, intensive simulations on medical data were conducted. A wide range of experiments were conducted to verify the efficiency of the proposed system. Three case studies are discussed in this research, the first case study is related to process mining, and more precisely on the ability of HAML to detect relevant patterns from event medical data. The second case study is related to smart building, and the ability of HAML to recognize the different activities of the patients. The third one is related to medical image retrieval, and the ability of HAML to find the most relevant medical images according to the image query. The results show that the developed HAML achieves good performance compared to the most up-to-date medical learning models regarding both the computational and cost the quality of returned solutionspublishedVersio

    Query-Time Data Integration

    Get PDF
    Today, data is collected in ever increasing scale and variety, opening up enormous potential for new insights and data-centric products. However, in many cases the volume and heterogeneity of new data sources precludes up-front integration using traditional ETL processes and data warehouses. In some cases, it is even unclear if and in what context the collected data will be utilized. Therefore, there is a need for agile methods that defer the effort of integration until the usage context is established. This thesis introduces Query-Time Data Integration as an alternative concept to traditional up-front integration. It aims at enabling users to issue ad-hoc queries on their own data as if all potential other data sources were already integrated, without declaring specific sources and mappings to use. Automated data search and integration methods are then coupled directly with query processing on the available data. The ambiguity and uncertainty introduced through fully automated retrieval and mapping methods is compensated by answering those queries with ranked lists of alternative results. Each result is then based on different data sources or query interpretations, allowing users to pick the result most suitable to their information need. To this end, this thesis makes three main contributions. Firstly, we introduce a novel method for Top-k Entity Augmentation, which is able to construct a top-k list of consistent integration results from a large corpus of heterogeneous data sources. It improves on the state-of-the-art by producing a set of individually consistent, but mutually diverse, set of alternative solutions, while minimizing the number of data sources used. Secondly, based on this novel augmentation method, we introduce the DrillBeyond system, which is able to process Open World SQL queries, i.e., queries referencing arbitrary attributes not defined in the queried database. The original database is then augmented at query time with Web data sources providing those attributes. Its hybrid augmentation/relational query processing enables the use of ad-hoc data search and integration in data analysis queries, and improves both performance and quality when compared to using separate systems for the two tasks. Finally, we studied the management of large-scale dataset corpora such as data lakes or Open Data platforms, which are used as data sources for our augmentation methods. We introduce Publish-time Data Integration as a new technique for data curation systems managing such corpora, which aims at improving the individual reusability of datasets without requiring up-front global integration. This is achieved by automatically generating metadata and format recommendations, allowing publishers to enhance their datasets with minimal effort. Collectively, these three contributions are the foundation of a Query-time Data Integration architecture, that enables ad-hoc data search and integration queries over large heterogeneous dataset collections

    An End-to-end Neural Natural Language Interface for Databases

    Full text link
    The ability to extract insights from new data sets is critical for decision making. Visual interactive tools play an important role in data exploration since they provide non-technical users with an effective way to visually compose queries and comprehend the results. Natural language has recently gained traction as an alternative query interface to databases with the potential to enable non-expert users to formulate complex questions and information needs efficiently and effectively. However, understanding natural language questions and translating them accurately to SQL is a challenging task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet made their way into practical tools and commercial products. In this paper, we present DBPal, a novel data exploration tool with a natural language interface. DBPal leverages recent advances in deep models to make query understanding more robust in the following ways: First, DBPal uses a deep model to translate natural language statements to SQL, making the translation process more robust to paraphrasing and other linguistic variations. Second, to support the users in phrasing questions without knowing the database schema and the query features, DBPal provides a learned auto-completion model that suggests partial query extensions to users during query formulation and thus helps to write complex queries

    Knowledge-augmented Graph Machine Learning for Drug Discovery: A Survey from Precision to Interpretability

    Full text link
    The integration of Artificial Intelligence (AI) into the field of drug discovery has been a growing area of interdisciplinary scientific research. However, conventional AI models are heavily limited in handling complex biomedical structures (such as 2D or 3D protein and molecule structures) and providing interpretations for outputs, which hinders their practical application. As of late, Graph Machine Learning (GML) has gained considerable attention for its exceptional ability to model graph-structured biomedical data and investigate their properties and functional relationships. Despite extensive efforts, GML methods still suffer from several deficiencies, such as the limited ability to handle supervision sparsity and provide interpretability in learning and inference processes, and their ineffectiveness in utilising relevant domain knowledge. In response, recent studies have proposed integrating external biomedical knowledge into the GML pipeline to realise more precise and interpretable drug discovery with limited training instances. However, a systematic definition for this burgeoning research direction is yet to be established. This survey presents a comprehensive overview of long-standing drug discovery principles, provides the foundational concepts and cutting-edge techniques for graph-structured data and knowledge databases, and formally summarises Knowledge-augmented Graph Machine Learning (KaGML) for drug discovery. A thorough review of related KaGML works, collected following a carefully designed search methodology, are organised into four categories following a novel-defined taxonomy. To facilitate research in this promptly emerging field, we also share collected practical resources that are valuable for intelligent drug discovery and provide an in-depth discussion of the potential avenues for future advancements

    Active Learning for Reducing Labeling Effort in Text Classification Tasks

    Get PDF
    Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERTbase_{base} as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERTbase_{base} outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to BNAIC/BENELEARN, adds several improvements including a more thorough discussion of related work plus an extended discussion section. 28 pages including references and appendice

    Dwelling on ontology - semantic reasoning over topographic maps

    Get PDF
    The thesis builds upon the hypothesis that the spatial arrangement of topographic features, such as buildings, roads and other land cover parcels, indicates how land is used. The aim is to make this kind of high-level semantic information explicit within topographic data. There is an increasing need to share and use data for a wider range of purposes, and to make data more definitive, intelligent and accessible. Unfortunately, we still encounter a gap between low-level data representations and high-level concepts that typify human qualitative spatial reasoning. The thesis adopts an ontological approach to bridge this gap and to derive functional information by using standard reasoning mechanisms offered by logic-based knowledge representation formalisms. It formulates a framework for the processes involved in interpreting land use information from topographic maps. Land use is a high-level abstract concept, but it is also an observable fact intimately tied to geography. By decomposing this relationship, the thesis correlates a one-to-one mapping between high-level conceptualisations established from human knowledge and real world entities represented in the data. Based on a middle-out approach, it develops a conceptual model that incrementally links different levels of detail, and thereby derives coarser, more meaningful descriptions from more detailed ones. The thesis verifies its proposed ideas by implementing an ontology describing the land use ‘residential area’ in the ontology editor Protégé. By asserting knowledge about high-level concepts such as types of dwellings, urban blocks and residential districts as well as individuals that link directly to topographic features stored in the database, the reasoner successfully infers instances of the defined classes. Despite current technological limitations, ontologies are a promising way forward in the manner we handle and integrate geographic data, especially with respect to how humans conceptualise geographic space

    Form-ing institutional order: the scaffolding of lists and identifiers

    Get PDF
    This paper examines the central place of the list and the associated concept of an identifier within the scaffolding of contemporary institutional order. These terms are deliberately chosen to make strange and help unpack the constitutive capacity of information systems and information technology within and between contemporary organisations. We draw upon the substantial body of work by John Searle to help understand the place of lists and identifiers in the constitution of institutional order. To enable us to ground our discussion of the potentiality and problematic associated with lists we describe a number of significant instances of list-making, situated particularly around the use of identifiers to refer to people, places and products. The theorisation developed allows us to better explain not only the significance imbued within lists and identifiers but the key part they play in form-ing the institutional order. We also hint at the role such symbolic artefacts play within breakdowns in institutional order
    corecore