97,087 research outputs found
Topological properties and organizing principles of semantic networks
Interpreting natural language is an increasingly important task in computer
algorithms due to the growing availability of unstructured textual data.
Natural Language Processing (NLP) applications rely on semantic networks for
structured knowledge representation. The fundamental properties of semantic
networks must be taken into account when designing NLP algorithms, yet they
remain to be structurally investigated. We study the properties of semantic
networks from ConceptNet, defined by 7 semantic relations from 11 different
languages. We find that semantic networks have universal basic properties: they
are sparse, highly clustered, and many exhibit power-law degree distributions.
Our findings show that the majority of the considered networks are scale-free.
Some networks exhibit language-specific properties determined by grammatical
rules, for example networks from highly inflected languages, such as e.g.
Latin, German, French and Spanish, show peaks in the degree distribution that
deviate from a power law. We find that depending on the semantic relation type
and the language, the link formation in semantic networks is guided by
different principles. In some networks the connections are similarity-based,
while in others the connections are more complementarity-based. Finally, we
demonstrate how knowledge of similarity and complementarity in semantic
networks can improve NLP algorithms in missing link inference
Patent Thickets Identification
Patent thickets have been identified by various citations-based techniques, such as Graevenitz et al (2011) and Clarkson (2005). An alternative direct measurement is based on expert opinion. We use natural language processing techniques to measure pairwise semantic similarity of patents identified as thicket members by experts to create a semantic network. We compare the semantic similarity scores for patents in different expert-identified thickets: those within the same thicket, those in different thickets, and those not in thickets. We show that patents within the same thicket are significantly more semantically similar than other pairs of patents. We then present a statistical model to assess the probability of a newly added patent belonging to a thicket based on semantic networks as well as other measures from the existing thicket literature (the triples of Graevenitz and Clarkson’s density ratio). We conclude that combining information from semantic distance with other sources can be helpful to isolate the patents that are likely to be members of thickets
Disorganization of Semantic Brain Networks in Schizophrenia Revealed by fMRI
OBJECTIVES: Schizophrenia is a mental illness that presents with thought disorders including delusions and disorganized speech. Thought disorders have been regarded as a consequence of the loosening of associations between semantic concepts since the term "schizophrenia" was first coined by Bleuler. However, a mechanistic account of this cardinal disturbance in terms of functional dysconnection has been lacking. To evaluate how aberrant semantic connections are expressed through brain activity, we characterized large-scale network structures of concept representations using functional magnetic resonance imaging (fMRI). STUDY DESIGN: We quantified various concept representations in patients' brains from fMRI activity evoked by movie scenes using encoding modeling. We then constructed semantic brain networks by evaluating the similarity of these semantic representations and conducted graph theory-based network analyses. STUDY RESULTS: Neurotypical networks had small-world properties similar to those of natural languages, suggesting small-worldness as a universal property in semantic knowledge networks. Conversely, small-worldness was significantly reduced in networks of schizophrenia patients and was correlated with psychological measures of delusions. Patients' semantic networks were partitioned into more distinct categories and had more random within-category structures than those of controls. CONCLUSIONS: The differences in conceptual representations manifest altered semantic clustering and associative intrusions that underlie thought disorders. This is the first study to provide pathophysiological evidence for the loosening of associations as reflected in randomization of semantic networks in schizophrenia. Our method provides a promising approach for understanding the neural basis of altered or creative inner experiences of individuals with mental illness or exceptional abilities, respectively
Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts
There are different ways to define similarity for grouping similar texts into
clusters, as the concept of similarity may depend on the purpose of the task.
For instance, in topic extraction similar texts mean those within the same
semantic field, whereas in author recognition stylistic features should be
considered. In this study, we introduce ways to classify texts employing
concepts of complex networks, which may be able to capture syntactic, semantic
and even pragmatic features. The interplay between the various metrics of the
complex networks is analyzed with three applications, namely identification of
machine translation (MT) systems, evaluation of quality of machine translated
texts and authorship recognition. We shall show that topological features of
the networks representing texts can enhance the ability to identify MT systems
in particular cases. For evaluating the quality of MT texts, on the other hand,
high correlation was obtained with methods capable of capturing the semantics.
This was expected because the golden standards used are themselves based on
word co-occurrence. Notwithstanding, the Katz similarity, which involves
semantic and structure in the comparison of texts, achieved the highest
correlation with the NIST measurement, indicating that in some cases the
combination of both approaches can improve the ability to quantify quality in
MT. In authorship recognition, again the topological features were relevant in
some contexts, though for the books and authors analyzed good results were
obtained with semantic features as well. Because hybrid approaches encompassing
semantic and topological features have not been extensively used, we believe
that the methodology proposed here may be useful to enhance text classification
considerably, as it combines well-established strategies
Transformer-based Joint Source Channel Coding for Textual Semantic Communication
The Space-Air-Ground-Sea integrated network calls for more robust and secure
transmission techniques against jamming. In this paper, we propose a textual
semantic transmission framework for robust transmission, which utilizes the
advanced natural language processing techniques to model and encode sentences.
Specifically, the textual sentences are firstly split into tokens using
wordpiece algorithm, and are embedded to token vectors for semantic extraction
by Transformer-based encoder. The encoded data are quantized to a fixed length
binary sequence for transmission, where binary erasure, symmetric, and deletion
channels are considered for transmission. The received binary sequences are
further decoded by the transformer decoders into tokens used for sentence
reconstruction. Our proposed approach leverages the power of neural networks
and attention mechanism to provide reliable and efficient communication of
textual data in challenging wireless environments, and simulation results on
semantic similarity and bilingual evaluation understudy prove the superiority
of the proposed model in semantic transmission.Comment: 6 pages, 5 figures. Accepted by IEEE/CIC ICCC 202
Matching Natural Language Sentences with Hierarchical Sentence Factorization
Semantic matching of natural language sentences or identifying the
relationship between two sentences is a core research problem underlying many
natural language tasks. Depending on whether training data is available, prior
research has proposed both unsupervised distance-based schemes and supervised
deep learning schemes for sentence matching. However, previous approaches
either omit or fail to fully utilize the ordered, hierarchical, and flexible
structures of language objects, as well as the interactions between them. In
this paper, we propose Hierarchical Sentence Factorization---a technique to
factorize a sentence into a hierarchical representation, with the components at
each different scale reordered into a "predicate-argument" form. The proposed
sentence factorization technique leads to the invention of: 1) a new
unsupervised distance metric which calculates the semantic distance between a
pair of text snippets by solving a penalized optimal transport problem while
preserving the logical relationship of words in the reordered sentences, and 2)
new multi-scale deep learning models for supervised semantic training, based on
factorized sentence hierarchies. We apply our techniques to text-pair
similarity estimation and text-pair relationship classification tasks, based on
multiple datasets such as STSbenchmark, the Microsoft Research paraphrase
identification (MSRP) dataset, the SICK dataset, etc. Extensive experiments
show that the proposed hierarchical sentence factorization can be used to
significantly improve the performance of existing unsupervised distance-based
metrics as well as multiple supervised deep learning models based on the
convolutional neural network (CNN) and long short-term memory (LSTM).Comment: Accepted by WWW 2018, 10 page
- …