97,087 research outputs found

    Topological properties and organizing principles of semantic networks

    Full text link
    Interpreting natural language is an increasingly important task in computer algorithms due to the growing availability of unstructured textual data. Natural Language Processing (NLP) applications rely on semantic networks for structured knowledge representation. The fundamental properties of semantic networks must be taken into account when designing NLP algorithms, yet they remain to be structurally investigated. We study the properties of semantic networks from ConceptNet, defined by 7 semantic relations from 11 different languages. We find that semantic networks have universal basic properties: they are sparse, highly clustered, and many exhibit power-law degree distributions. Our findings show that the majority of the considered networks are scale-free. Some networks exhibit language-specific properties determined by grammatical rules, for example networks from highly inflected languages, such as e.g. Latin, German, French and Spanish, show peaks in the degree distribution that deviate from a power law. We find that depending on the semantic relation type and the language, the link formation in semantic networks is guided by different principles. In some networks the connections are similarity-based, while in others the connections are more complementarity-based. Finally, we demonstrate how knowledge of similarity and complementarity in semantic networks can improve NLP algorithms in missing link inference

    Patent Thickets Identification

    Get PDF
    Patent thickets have been identified by various citations-based techniques, such as Graevenitz et al (2011) and Clarkson (2005). An alternative direct measurement is based on expert opinion. We use natural language processing techniques to measure pairwise semantic similarity of patents identified as thicket members by experts to create a semantic network. We compare the semantic similarity scores for patents in different expert-identified thickets: those within the same thicket, those in different thickets, and those not in thickets. We show that patents within the same thicket are significantly more semantically similar than other pairs of patents. We then present a statistical model to assess the probability of a newly added patent belonging to a thicket based on semantic networks as well as other measures from the existing thicket literature (the triples of Graevenitz and Clarkson’s density ratio). We conclude that combining information from semantic distance with other sources can be helpful to isolate the patents that are likely to be members of thickets

    Disorganization of Semantic Brain Networks in Schizophrenia Revealed by fMRI

    Get PDF
    OBJECTIVES: Schizophrenia is a mental illness that presents with thought disorders including delusions and disorganized speech. Thought disorders have been regarded as a consequence of the loosening of associations between semantic concepts since the term "schizophrenia" was first coined by Bleuler. However, a mechanistic account of this cardinal disturbance in terms of functional dysconnection has been lacking. To evaluate how aberrant semantic connections are expressed through brain activity, we characterized large-scale network structures of concept representations using functional magnetic resonance imaging (fMRI). STUDY DESIGN: We quantified various concept representations in patients' brains from fMRI activity evoked by movie scenes using encoding modeling. We then constructed semantic brain networks by evaluating the similarity of these semantic representations and conducted graph theory-based network analyses. STUDY RESULTS: Neurotypical networks had small-world properties similar to those of natural languages, suggesting small-worldness as a universal property in semantic knowledge networks. Conversely, small-worldness was significantly reduced in networks of schizophrenia patients and was correlated with psychological measures of delusions. Patients' semantic networks were partitioned into more distinct categories and had more random within-category structures than those of controls. CONCLUSIONS: The differences in conceptual representations manifest altered semantic clustering and associative intrusions that underlie thought disorders. This is the first study to provide pathophysiological evidence for the loosening of associations as reflected in randomization of semantic networks in schizophrenia. Our method provides a promising approach for understanding the neural basis of altered or creative inner experiences of individuals with mental illness or exceptional abilities, respectively

    Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts

    Get PDF
    There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between the various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies

    Transformer-based Joint Source Channel Coding for Textual Semantic Communication

    Full text link
    The Space-Air-Ground-Sea integrated network calls for more robust and secure transmission techniques against jamming. In this paper, we propose a textual semantic transmission framework for robust transmission, which utilizes the advanced natural language processing techniques to model and encode sentences. Specifically, the textual sentences are firstly split into tokens using wordpiece algorithm, and are embedded to token vectors for semantic extraction by Transformer-based encoder. The encoded data are quantized to a fixed length binary sequence for transmission, where binary erasure, symmetric, and deletion channels are considered for transmission. The received binary sequences are further decoded by the transformer decoders into tokens used for sentence reconstruction. Our proposed approach leverages the power of neural networks and attention mechanism to provide reliable and efficient communication of textual data in challenging wireless environments, and simulation results on semantic similarity and bilingual evaluation understudy prove the superiority of the proposed model in semantic transmission.Comment: 6 pages, 5 figures. Accepted by IEEE/CIC ICCC 202

    Matching Natural Language Sentences with Hierarchical Sentence Factorization

    Full text link
    Semantic matching of natural language sentences or identifying the relationship between two sentences is a core research problem underlying many natural language tasks. Depending on whether training data is available, prior research has proposed both unsupervised distance-based schemes and supervised deep learning schemes for sentence matching. However, previous approaches either omit or fail to fully utilize the ordered, hierarchical, and flexible structures of language objects, as well as the interactions between them. In this paper, we propose Hierarchical Sentence Factorization---a technique to factorize a sentence into a hierarchical representation, with the components at each different scale reordered into a "predicate-argument" form. The proposed sentence factorization technique leads to the invention of: 1) a new unsupervised distance metric which calculates the semantic distance between a pair of text snippets by solving a penalized optimal transport problem while preserving the logical relationship of words in the reordered sentences, and 2) new multi-scale deep learning models for supervised semantic training, based on factorized sentence hierarchies. We apply our techniques to text-pair similarity estimation and text-pair relationship classification tasks, based on multiple datasets such as STSbenchmark, the Microsoft Research paraphrase identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments show that the proposed hierarchical sentence factorization can be used to significantly improve the performance of existing unsupervised distance-based metrics as well as multiple supervised deep learning models based on the convolutional neural network (CNN) and long short-term memory (LSTM).Comment: Accepted by WWW 2018, 10 page
    corecore