853 research outputs found

    Active Learning for Reducing Labeling Effort in Text Classification Tasks

    Get PDF
    Labeling data can be an expensive task as it is usually performed manually by domain experts. This is cumbersome for deep learning, as it is dependent on large labeled datasets. Active learning (AL) is a paradigm that aims to reduce labeling effort by only using the data which the used model deems most informative. Little research has been done on AL in a text classification setting and next to none has involved the more recent, state-of-the-art Natural Language Processing (NLP) models. Here, we present an empirical study that compares different uncertainty-based algorithms with BERTbase_{base} as the used classifier. We evaluate the algorithms on two NLP classification datasets: Stanford Sentiment Treebank and KvK-Frontpages. Additionally, we explore heuristics that aim to solve presupposed problems of uncertainty-based AL; namely, that it is unscalable and that it is prone to selecting outliers. Furthermore, we explore the influence of the query-pool size on the performance of AL. Whereas it was found that the proposed heuristics for AL did not improve performance of AL; our results show that using uncertainty-based AL with BERTbase_{base} outperforms random sampling of data. This difference in performance can decrease as the query-pool size gets larger.Comment: Accepted as a conference paper at the joint 33rd Benelux Conference on Artificial Intelligence and the 30th Belgian Dutch Conference on Machine Learning (BNAIC/BENELEARN 2021). This camera-ready version submitted to BNAIC/BENELEARN, adds several improvements including a more thorough discussion of related work plus an extended discussion section. 28 pages including references and appendice

    Étude de l'évolution du modèle de l'utilisateur des systèmes de construction collaborative d'ontologies

    Get PDF
    National audienceCet article rend compte d'une étude en cours sur l'évolution du modèle de l'utilisateur de systèmes de construction collaborative d'ontologies. Par modèle de l'utilisateur (ou modèle du contributeur), nous entendons la représentation que les concepteurs se font des utilisateurs de leurs systèmes et plus généralement des contributeurs à la construction des ontologies. Nous décrivons : 1) la méthode que nous utilisons pour étudier l'évolution du modèle de l'utilisateur ; 2) l'évolution de ce modèle (en termes de types d'utilisateurs, de caractérisations de l'utilisateur et de caractérisations de l'environnement de l'utilisateur) ; 3) les évolutions parallèles : a) des méthodes de conception des systèmes collaboratifs ; b) des systèmes eux-mêmes ; et c) des méthodes de construction collaborative des ontologies. Nous mentionnons quelques perspectives d'évolution envisagées par les concepteurs eux-mêmes. Cette étude vise à faire ressortir l'importance d'acquérir une meilleure connaissance des contributeurs potentiels à la construction collaborative des ontologies afin d'obtenir des outils collaboratifs mieux adaptés à ces contributeurs

    Knowledge Extraction from Textual Resources through Semantic Web Tools and Advanced Machine Learning Algorithms for Applications in Various Domains

    Get PDF
    Nowadays there is a tremendous amount of unstructured data, often represented by texts, which is created and stored in variety of forms in many domains such as patients' health records, social networks comments, scientific publications, and so on. This volume of data represents an invaluable source of knowledge, but unfortunately it is challenging its mining for machines. At the same time, novel tools as well as advanced methodologies have been introduced in several domains, improving the efficacy and the efficiency of data-based services. Following this trend, this thesis shows how to parse data from text with Semantic Web based tools, feed data into Machine Learning methodologies, and produce services or resources to facilitate the execution of some tasks. More precisely, the use of Semantic Web technologies powered by Machine Learning algorithms has been investigated in the Healthcare and E-Learning domains through not yet experimented methodologies. Furthermore, this thesis investigates the use of some state-of-the-art tools to move data from texts to graphs for representing the knowledge contained in scientific literature. Finally, the use of a Semantic Web ontology and novel heuristics to detect insights from biological data in form of graph are presented. The thesis contributes to the scientific literature in terms of results and resources. Most of the material presented in this thesis derives from research papers published in international journals or conference proceedings

    Analyze Municipal Annexations: Case Studies in Frederick and Caroline Counties of Maryland, 1990-2010

    Get PDF
    Municipal annexations play an important role in converting undeveloped land to development, influencing landscape change. However, the existing literature does not explore the links between annexation and development. An additional inadequacy is the failure to consider environment/landscape aspect of annexation. Therefore, this dissertation proposes a new theoretical framework that is drawn upon political ecology and structuration theory to examine annexation phenomenon processes: environmental/landscape sensitivity and its causal social structures. Frederick and Caroline counties in Maryland from 1990 to 2010 were the two case-study areas because both counties experience increased annexation activities and are representative of suburban and exurban settings at rural - urban continuum of the United States. The data used in this qualitative research were collected from multiple data sources, including key-person interviews, a review of Maryland's annexation log, annexation applications and meeting minutes, and observations at public meetings. Triangulating content analysis, discourse analysis, and social network analysis, this research finds that environmental/landscape is not considered more widely in annexation practices. Although environmental mitigation measures are considered at site level if a property has site environmental elements, the overall environmental/landscape sensitivity is low. It is also found that the economic-centered space remains dynamic in the annexation processes determining annexation approvals and low-density zoning. In addition, the triangulated analyses reveal that current social structures are not conducive to environmental-conscious landscape planning because environmentally oriented non-profit organizations and residents are injected at a later stage of annexation process and is not being fully considered in the evaluation process. Power asymmetry in current annexation structures is due to a lack of environmental voice in annexation processes. The voice of such groups needs to be institutionalized to facilitate more tenable annexation practices

    Linked democracy : foundations, tools, and applications

    Get PDF
    Chapter 1Introduction to Linked DataAbstractThis chapter presents Linked Data, a new form of distributed data on theweb which is especially suitable to be manipulated by machines and to shareknowledge. By adopting the linked data publication paradigm, anybody can publishdata on the web, relate it to data resources published by others and run artificialintelligence algorithms in a smooth manner. Open linked data resources maydemocratize the future access to knowledge by the mass of internet users, eitherdirectly or mediated through algorithms. Governments have enthusiasticallyadopted these ideas, which is in harmony with the broader open data movement

    An Intelligent Multi-Agent System Approach to Automating Safety Features for On-Line Real Time Communications: Agent Mediated Information Exchange

    Get PDF
    Child safety online is a growing problem, governmental attempts to highlight and combat this issue have not been as successful as it was hoped, and still there are highly publicised cases of children, young people and vulnerable adults coming to harm as a result of unsafe online practices. This thesis presents the research, design and development of a prototype system called SafeChat, which will provide a safer environment for children interacting in online environments. In order to combat such a complex problem, it is necessary to integrate various artificial intelligent technologies and autonomous systems. The SafeChat prototype system discussed within this research has been implemented in Java Agent Development Environment (JADE) and utilises Protégé Ontology development, reasoning and natural language processing techniques. To evaluate our system performance, comprehensive testing to measure its effectiveness in detecting potential risk to the user (e.g. child) is in constant development. Initial results of system testing are encouraging and demonstrate its effectiveness in identifying different levels of threat during online conversation. The potential impact of this work is immense, when used as a plug-in to popular communications software, such as Facebook Messenger, Skype and WhatsApp. SafeChat provides a safer environment for children to communicate, identifying potential and actual threats, whilst maintaining the privacy of their discourse. The SafeChat system could be easily adapted to provide autonomous solutions in other areas of online threat, such as cyberbullying and radicalisation

    Linked Democracy

    Get PDF
    This open access book shows the factors linking information flow, social intelligence, rights management and modelling with epistemic democracy, offering licensed linked data along with information about the rights involved. This model of democracy for the web of data brings new challenges for the social organisation of knowledge, collective innovation, and the coordination of actions. Licensed linked data, licensed linguistic linked data, right expression languages, semantic web regulatory models, electronic institutions, artificial socio-cognitive systems are examples of regulatory and institutional design (regulations by design). The web has been massively populated with both data and services, and semantically structured data, the linked data cloud, facilitates and fosters human-machine interaction. Linked data aims to create ecosystems to make it possible to browse, discover, exploit and reuse data sets for applications. Rights Expression Languages semi-automatically regulate the use and reuse of content. ; Links information flow, social intelligence, rights management, and modelling with epistemic democracy Presents examples of regulatory and institutional desig

    Open dialogues for business model innovation

    Get PDF
    This thesis was previously held under moratorium from until 20th June 2018 until 30th July 2021.A growing body of research is highlighting how open innovative business models support the growth and economic success of new ideas and technologies. In this Ph.D., building on an action research study in SMEs, I develop the Open Business Model Innovation Framework that accounts for the interactions between value creation and active participation in the development of unmet needs to new business formations. I begin to unpack the process of open business model innovation development supporting the ability of SMEs to build and re-build their businesses.A growing body of research is highlighting how open innovative business models support the growth and economic success of new ideas and technologies. In this Ph.D., building on an action research study in SMEs, I develop the Open Business Model Innovation Framework that accounts for the interactions between value creation and active participation in the development of unmet needs to new business formations. I begin to unpack the process of open business model innovation development supporting the ability of SMEs to build and re-build their businesses
    • …
    corecore