1,026 research outputs found

    Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

    Full text link
    We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated with textual units of a large background knowledge corpus. We present an efficient algorithm for learning such semantic models from a training sample of relatedness preferences. Our method is corpus independent and can essentially rely on any sufficiently large (unstructured) collection of coherent texts. Moreover, the approach facilitates the fitting of semantic models for specific users or groups of users. We present the results of extensive range of experiments from small to large scale, indicating that the proposed method is effective and competitive with the state-of-the-art.Comment: 37 pages, 8 figures A short version of this paper was already published at ECML/PKDD 201

    Towards a Universal Wordnet by Learning from Combined Evidenc

    Get PDF
    Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification

    Disorganization of Semantic Brain Networks in Schizophrenia Revealed by fMRI

    Get PDF
    OBJECTIVES: Schizophrenia is a mental illness that presents with thought disorders including delusions and disorganized speech. Thought disorders have been regarded as a consequence of the loosening of associations between semantic concepts since the term "schizophrenia" was first coined by Bleuler. However, a mechanistic account of this cardinal disturbance in terms of functional dysconnection has been lacking. To evaluate how aberrant semantic connections are expressed through brain activity, we characterized large-scale network structures of concept representations using functional magnetic resonance imaging (fMRI). STUDY DESIGN: We quantified various concept representations in patients' brains from fMRI activity evoked by movie scenes using encoding modeling. We then constructed semantic brain networks by evaluating the similarity of these semantic representations and conducted graph theory-based network analyses. STUDY RESULTS: Neurotypical networks had small-world properties similar to those of natural languages, suggesting small-worldness as a universal property in semantic knowledge networks. Conversely, small-worldness was significantly reduced in networks of schizophrenia patients and was correlated with psychological measures of delusions. Patients' semantic networks were partitioned into more distinct categories and had more random within-category structures than those of controls. CONCLUSIONS: The differences in conceptual representations manifest altered semantic clustering and associative intrusions that underlie thought disorders. This is the first study to provide pathophysiological evidence for the loosening of associations as reflected in randomization of semantic networks in schizophrenia. Our method provides a promising approach for understanding the neural basis of altered or creative inner experiences of individuals with mental illness or exceptional abilities, respectively

    A unified formal framework for factorial and probabilistic topic modelling

    Get PDF
    Topic modelling has become a highly popular technique for extracting knowledge from texts. It encompasses various method families, including Factorial methods, Probabilistic methods, and Natural Language Processing methods. This paper introduces a unified conceptual framework for Factorial and Probabilistic methods by identifying shared elements and representing them using a homogeneous notation. The paper presents 12 different methods within this framework, enabling easy comparative analysis to assess the flexibility and how realistic the assumptions of each approach are. This establishes the initial stage of a broader analysis aimed at relating all method families to this common framework, comprehensively understanding their strengths and weaknesses, and establishing general application guidelines. Also, an experimental setup reinforces the convenience of having harmonized notational schema. The paper concludes with a discussion on the presented methods and outlines future research directions.Peer ReviewedPostprint (published version

    Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying

    Get PDF
    Cyberbullying (harassment on social networks) is widely recognized as a serious social problem, especially for adolescents. It is as much a threat to the viability of online social networks for youth today as spam once was to email in the early days of the Internet. Current work to tackle this problem has involved social and psychological studies on its prevalence as well as its negative effects on adolescents. While true solutions rest on teaching youth to have healthy personal relationships, few have considered innovative design of social network software as a tool for mitigating this problem. Mitigating cyberbullying involves two key components: robust techniques for effective detection and reflective user interfaces that encourage users to reflect upon their behavior and their choices. Spam filters have been successful by applying statistical approaches like Bayesian networks and hidden Markov models. They can, like Google’s GMail, aggregate human spam judgments because spam is sent nearly identically to many people. Bullying is more personalized, varied, and contextual. In this work, we present an approach for bullying detection based on state-of-the-art natural language processing and a common sense knowledge base, which permits recognition over a broad spectrum of topics in everyday life. We analyze a more narrow range of particular subject matter associated with bullying (e.g. appearance, intelligence, racial and ethnic slurs, social acceptance, and rejection), and construct BullySpace, a common sense knowledge base that encodes particular knowledge about bullying situations. We then perform joint reasoning with common sense knowledge about a wide range of everyday life topics. We analyze messages using our novel AnalogySpace common sense reasoning technique. We also take into account social network analysis and other factors. We evaluate the model on real-world instances that have been reported by users on Formspring, a social networking website that is popular with teenagers. On the intervention side, we explore a set of reflective user-interaction paradigms with the goal of promoting empathy among social network participants. We propose an “air traffic control”-like dashboard, which alerts moderators to large-scale outbreaks that appear to be escalating or spreading and helps them prioritize the current deluge of user complaints. For potential victims, we provide educational material that informs them about how to cope with the situation, and connects them with emotional support from others. A user evaluation shows that in-context, targeted, and dynamic help during cyberbullying situations fosters end-user reflection that promotes better coping strategies

    Analysis of category co-occurrence in Wikipedia networks

    Get PDF
    Wikipedia has seen a huge expansion of content since its inception. Pages within this online encyclopedia are organised by assigning them to one or more categories, where Wikipedia maintains a manually constructed taxonomy graph that encodes the semantic relationship between these categories. An alternative, called the category co-occurrence graph, can be produced automatically by linking together categories that have pages in common. Properties of the latter graph and its relationship to the former is the concern of this thesis. The analytic framework, called t-component, is introduced to formalise the graphs and discover category clusters connecting relevant categories together. The m-core, a cohesive subgroup concept as a clustering model, is used to construct a subgraph depending on the number of shared pages between the categories exceeding a given threshold t. The significant of the clustering result of the m-core is validated using a permutation test. This is compared to the k-core, another clustering model. TheWikipedia category co-occurrence graphs are scale-free with a few category hubs and the majority of clusters are size 2. All observed properties for the distribution of the largest clusters of the category graphs obey power-laws with decay exponent averages around 1. As the threshold t of the number of shared pages is increased, eventually a critical threshold is reached when the largest cluster shrinks significantly in size. This phenomena is only exhibited for the m-core but not the k-core. Lastly, the clustering in the category graph is shown to be consistent with the distance between categories in the taxonomy graph

    Encoding Sequential Information in Semantic Space Models: Comparing Holographic Reduced Representation and Random Permutation

    Get PDF
    Circular convolution and random permutation have each been proposed as neurally plausible binding operators capable of encoding sequential information in semantic memory. We perform several controlled comparisons of circular convolution and random permutation as means of encoding paired associates as well as encoding sequential information. Random permutations outperformed convolution with respect to the number of paired associates that can be reliably stored in a single memory trace. Performance was equal on semantic tasks when using a small corpus, but random permutations were ultimately capable of achieving superior performance due to their higher scalability to large corpora. Finally, “noisy” permutations in which units are mapped to other units arbitrarily (no one-to-one mapping) perform nearly as well as true permutations. These findings increase the neurological plausibility of random permutations and highlight their utility in vector space models of semantics
    • …
    corecore