6 research outputs found

    A deep learning classifier for sentence classification in biomedical and computer science abstracts

    Get PDF
    The automatic classification of abstract sentences into its main elements (background, objectives, methods, results, conclusions) is a key tool to support scientific database querying, to summarize relevant literature works and to assist in the writing of new abstracts. In this paper, we propose a novel deep learning approach based on a convolutional layer and a bidirectional gated recurrent unit to classify sentences of abstracts. First, the proposed neural network was tested on a publicly available repository containing 20 thousand abstracts from the biomedical domain. Competitive results were achieved, with weight-averaged Precision, Recall and F1-score values around 91%, and an area under the ROC curve (AUC) of 99%, which are higher when compared to a state-of-the-art neural network. Then, a crowdsourcing approach using gamification was adopted to create a new comprehensive set of 4111 classified sentences from the computer science domain, focused on social media abstracts. The results of applying the same deep learning modeling technique trained with 3287 (80%) of the available sentences were below the ones obtained for the larger biomedical dataset, with weight-averaged Precision, Recall and F1-score values between 73 and 76%, and an AUC of 91%. Considering the dataset dimension as a likely important factor for such performance decrease, a data augmentation approach was further applied. This involved the use of text mining to translate sentences of the computer science abstract corpus while retaining the same meaning. Such approach resulted in slight improvements (around 2 percentage points) for the weight-averaged Recall and F1-score values.This work was supported by Fundação para a Ciência e Tecnologia (FCT) within the Project Scope: UID/CEC/00319/2019

    Intention to Use Abstract Sentence Classification Technology

    Get PDF
    This paper introduces research in progress to study the intention of researchers to use academic abstract sentence classification technology when undertaking literature acquisition activities. We introduce an enhanced prototypical academic abstract sentence classification system capable of performing on demand sentence classification for metadata results from several academic literature indices. We also outline a preliminary theoretical information systems model developed to explore the intention of researchers to use the system when searching for literature via digital means. Additionally, we provide the survey instrument to be used for review. The overarching body of work this paper introduces will benefit the research community as it is the first time primary research has been conducted to examine the utility of this technology to improve the way researchers interact more efficiently with the large body of literature digitally available

    Automated Knowledge Extraction from IS Research Articles Combining Sentence Classification and Ontological Annotation

    Get PDF
    Manually analyzing large collections of research articles is a time- and resource-intensive activity, making it difficult to stay on top of the latest research findings. Limitations of automated solutions lie in limited domain knowledge and not being able to attribute extracted key terms to a focal article, related work, or background information. We aim to address this challenge by (1) developing a framework for classifying sentences in scientific publications, (2) performing several experiments comparing state-of-the-art sentence transformer algorithms with a novel few-shot learning technique and (3) automatically analyzing a corpus of articles and evaluating automated knowledge extraction capabilities. We tested our approach for combining sentence classification with ontological annotations on a manually created dataset of 1,000 sentences from Information Systems (IS) articles. The results indicate a high degree of accuracy underlining the potential for novel approaches in analyzing scientific publication

    Pro-russian propaganda recognition and analytics system based on text classification model and statistical data processing methods

    Get PDF
    In this paper a neural network model for classifying the political polarity of text has been developed, along with a database for training the neural network and an analytics system for pro-Russian propaganda. This allows to classify the political polarity of the message source based on its identifier, as well as to construct and display different networks that represent useful insights about popular Twitter hashtags or Telegram channels that related to Russo Ukrainian War. Also, a user interface has been developed that allows users to interact with the system. Developed system will help people with navigation through the information space and avoidance of pro-Russian propaganda

    Factors influencing charter flight departure delay

    Get PDF
    This study aims to identify the main factors leading to charter flight departure delay through data mining. The data sample analysed consists of 5,484 flights operated by a European airline between 2014 and 2017. The tuned dataset of 33 features was used for modelling departure delay (e.g., if the flight delayed more than 15 minutes). The results proved the value of the proposed approach by an area under the receiver operating characteristic curve of 0.831 and supported knowledge extraction through the data-based sensitivity analysis. The features related to previous flight delay information were considered as being the most influential toward current flight being delayed or not, which is consistent with the propagating effect of flight delays. However, it is not the reason for the previous delay nor the delay duration that accounted for the most relevance. Instead, a computed feature indicating if there were two or more registered reasons accounted for 33% of relevance. The contributions include also using a broader data mining approach supported by an extensive data understanding and preparation stage using both proprietary and open access data sources to build a comprehensive dataset.info:eu-repo/semantics/acceptedVersio
    corecore