1,647 research outputs found

    Joint RNN Model for Argument Component Boundary Detection

    Full text link
    Argument Component Boundary Detection (ACBD) is an important sub-task in argumentation mining; it aims at identifying the word sequences that constitute argument components, and is usually considered as the first sub-task in the argumentation mining pipeline. Existing ACBD methods heavily depend on task-specific knowledge, and require considerable human efforts on feature-engineering. To tackle these problems, in this work, we formulate ACBD as a sequence labeling problem and propose a variety of Recurrent Neural Network (RNN) based methods, which do not use domain specific or handcrafted features beyond the relative position of the sentence in the document. In particular, we propose a novel joint RNN model that can predict whether sentences are argumentative or not, and use the predicted results to more precisely detect the argument component boundaries. We evaluate our techniques on two corpora from two different genres; results suggest that our joint RNN model obtain the state-of-the-art performance on both datasets.Comment: 6 pages, 3 figures, submitted to IEEE SMC 201

    ChineseEEG: A Chinese Linguistic Corpora EEG Dataset for Semantic Alignment and Neural Decoding

    Get PDF
    An Electroencephalography (EEG) dataset utilizing rich text stimuli can advance the understanding of how the brain encodes semantic information and contribute to semantic decoding in brain-computer interface (BCI). Addressing the scarcity of EEG datasets featuring Chinese linguistic stimuli, we present the ChineseEEG dataset, a high-density EEG dataset complemented by simultaneous eye-tracking recordings. This dataset was compiled while 10 participants silently read approximately 13 hours of Chinese text from two well-known novels. This dataset provides long-duration EEG recordings, along with pre-processed EEG sensor-level data and semantic embeddings of reading materials extracted by a pre-trained natural language processing (NLP) model. As a pilot EEG dataset derived from natural Chinese linguistic stimuli, ChineseEEG can significantly support research across neuroscience, NLP, and linguistics. It establishes a benchmark dataset for Chinese semantic decoding, aids in the development of BCIs, and facilitates the exploration of alignment between large language models and human cognitive processes. It can also aid research into the brain’s mechanisms of language processing within the context of the Chinese natural language

    Handling non-compositionality in multilingual CNLs

    Full text link
    In this paper, we describe methods for handling multilingual non-compositional constructions in the framework of GF. We specifically look at methods to detect and extract non-compositional phrases from parallel texts and propose methods to handle such constructions in GF grammars. We expect that the methods to handle non-compositional constructions will enrich CNLs by providing more flexibility in the design of controlled languages. We look at two specific use cases of non-compositional constructions: a general-purpose method to detect and extract multilingual multiword expressions and a procedure to identify nominal compounds in German. We evaluate our procedure for multiword expressions by performing a qualitative analysis of the results. For the experiments on nominal compounds, we incorporate the detected compounds in a full SMT pipeline and evaluate the impact of our method in machine translation process.Comment: CNL workshop in COLING 201

    The Knowledge Graph Construction in the Educational Domain: Take an Australian School Science Course as an Example

    Get PDF
    The evolution of the Internet technology and artificial intelligence has changed the ways we gain knowledge, which has expanded to every aspect of our lives. In recent years, Knowledge Graphs technology as one of the artificial intelligence techniques has been widely used in the educational domain. However, there are few studies dedicating the construction of knowledge graphs for K-10 education in Australia, and most of the existing studies only focus on at the theory level, and little research shows practical pipeline steps to complete the complex flow of constructing the educational knowledge graph. Apart from that, most studies focused on concept entities and their relations but ignored the features of concept entities and the relations between learning knowledge points and required learning outcomes. To overcome these shortages and provide the data foundation for the development of downstream research and applications in this educational domain, the construction processes of building a knowledge graph for Australian K-10 education were analyzed at the theory level and implemented in a practical way in this research. We took the Year 9 science course as a typical data source example fed to the proposed method called K10EDU-RCF-KG to construct this educational knowledge graph and to enrich the features of entities in the knowledge graph. In the construction pipeline, a variety of techniques were employed to complete the building process. Firstly, the POI and OCR techniques were applied to convert Word and PDF format files into text, followed by developing an educational resources management platform where the machine-readable text could be stored in a relational database management system. Secondly, we designed an architecture framework as the guidance of the construction pipeline. According to this architecture, the educational ontology was initially designed, and a backend microservice was developed to process the entity extraction and relation extraction by NLP-NER and probabilistic association rule mining algorithms, respectively. We also adopted the NLP-POS technique to find out the neighbor adjectives related to entitles to enrich features of these concept entitles. In addition, a subject dictionary was introduced during the refinement process of the knowledge graph, which reduced the data noise rate of the knowledge graph entities. Furthermore, the connections between learning outcome entities and topic knowledge point entities were directly connected, which provides a clear and efficient way to identify what corresponding learning objectives are related to the learning unit. Finally, a set of REST APIs for querying this educational knowledge graph were developed

    A word-building method based on neural network for text classification

    Get PDF
    Text classification is a foundational task in many natural language processing applications. All traditional text classifiers take words as the basic units and conduct the pre-training process (like word2vec) to directly generate word vectors at the first step. However, none of them have considered the information contained in word structure which is proved to be helpful for text classification. In this paper, we propose a word-building method based on neural network model that can decompose a Chinese word to a sequence of radicals and learn structure information from these radical level features which is a key difference from the existing models. Then, the convolutional neural network is applied to extract structure information of words from radical sequence to generate a word vector, and the long short-term memory is applied to generate the sentence vector for the prediction purpose. The experimental results show that our model outperforms other existing models on Chinese dataset. Our model is also applicable to English as well where an English word can be decomposed down to character level, which demonstrates the excellent generalisation ability of our model. The experimental results have proved that our model also outperforms others on English dataset
    • …
    corecore