653 research outputs found

    Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

    Full text link
    This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are above one BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.Comment: To appear in TAC

    Word Sense Disambiguation: Algorithms and applications

    Get PDF
    This book describes the state of the art in Word Sense Disambiguation. Current algorithms and applications are presente

    From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

    Get PDF
    Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

    Menetelmiä mielenkiintoisten solmujen löytämiseen verkostoista

    Get PDF
    With the increasing amount of graph-structured data available, finding interesting objects, i.e., nodes in graphs, becomes more and more important. In this thesis we focus on finding interesting nodes and sets of nodes in graphs or networks. We propose several definitions of node interestingness as well as different methods to find such nodes. Specifically, we propose to consider nodes as interesting based on their relevance and non-redundancy or representativeness w.r.t. the graph topology, as well as based on their characterisation for a class, such as a given node attribute value. Identifying nodes that are relevant, but non-redundant to each other is motivated by the need to get an overview of different pieces of information related to a set of given nodes. Finding representative nodes is of interest, e.g. when the user needs or wants to select a few nodes that abstract the large set of nodes. Discovering nodes characteristic for a class helps to understand the causes behind that class. Next, four methods are proposed to find a representative set of interesting nodes. The first one incrementally picks one interesting node after another. The second iteratively changes the set of nodes to improve its overall interestingness. The third method clusters nodes and picks a medoid node as a representative for each cluster. Finally, the fourth method contrasts diverse sets of nodes in order to select nodes characteristic for their class, even if the classes are not identical across the selected nodes. The first three methods are relatively simple and are based on the graph topology and a similarity or distance function for nodes. For the second and third, the user needs to specify one parameter, either an initial set of k nodes or k, the size of the set. The fourth method assumes attributes and class attributes for each node, a class-related interesting measure, and possible sets of nodes which the user wants to contrast, such as sets of nodes that represent different time points. All four methods are flexible and generic. They can, in principle, be applied on any weighted graph or network regardless of what nodes, edges, weights, or attributes represent. Application areas for the methods developed in this thesis include word co-occurrence networks, biological networks, social networks, data traffic networks, and the World Wide Web. As an illustrating example, consider a word co-occurrence network. There, finding terms (nodes in the graph) that are relevant to some given nodes, e.g. branch and root, may help to identify different, shared contexts such as botanics, mathematics, and linguistics. A real life application lies in biology where finding nodes (biological entities, e.g. biological processes or pathways) that are relevant to other, given nodes (e.g. some genes or proteins) may help in identifying biological mechanisms that are possibly shared by both the genes and proteins.Väitöskirja käsittelee verkostojen louhinnan menetelmiä. Sen tavoitteena on löytää mielenkiintoisia tietoja painotetuista verkoista. Painotettuna verkkona voi tarkastella esim. tekstiainestoja, biologisia ainestoja, ihmisten välisiä yhteyksiä tai internettiä. Tällaisissa verkoissa solmut edustavat käsitteitä (esim. sanoja, geenejä, ihmisiä tai internetsivuja) ja kaaret niiden välisiä suhteita (esim. kaksi sanaa esiintyy samassa lauseessa, geeni koodaa proteiinia, ihmisten ystävyyksiä tai internetsivu viittaa toiseen internetsivuun). Kaarten painot voivat vastata esimerkiksi yhteyden voimakuutta tai luotettavuutta. Väitöskirjassa esitetään erilaisia verkon rakenteeseen tai solmujen attribuutteihin perustuvia määritelmiä solmujen mielenkiintoisuudelle sekä useita menetelmiä mielenkiintoisten solmujen löytämiseksi. Mielenkiintoisuuden voi määritellä esim. merkityksellisyytenä suhteessa joihinkin annettuihin solmuihin ja toisaalta mielenkiintoisten solmujen keskinäisenä erilaisuutena. Esimerkiksi ns. ahneella menetelmällä voidaan löytää keskenään erilaisia solmuja yksi kerrallaan. Väitöskirjan tuloksia voidaan soveltaa esimerkiksi tekstiaineistoa käsittelemällä saatuun sanojen väliseen verkostoon, jossa kahden sanan välillä on sitä voimakkaampi yhteys mitä useammin ne tapaavat esiintyä keskenään samoissa lauseissa. Sanojen erilaisia käyttöyhteyksiä ja jopa merkityksiä voidaan nyt löytää automaattisesti. Jos kohdesanaksi otetaan vaikkapa "juuri", niin siihen liittyviä mutta keskenään toisiinsa liittymättömiä sanoja ovat "puu" (biologinen merkitys: kasvin juuri), "yhtälö" (matemaattinen merkitys: yhtälön ratkaisu eli juuri) sekä "indoeurooppalainen" (kielitieteellinen merkitys: sanan vartalo eli juuri). Tällaisia menetelmiä voidaan soveltaa esimerkiksi hakukoneessa: sanalla "juuri" tehtyihin hakutuloksiin sisällytetään tuloksia mahdollisimman erilaisista käyttöyhteyksistä, jotta käyttäjän tarkoittama merkitys tulisi todennäköisemmin katetuksi hakutuloksissa. Merkittävä sovelluskohde väitöskirjan menetelmille ovat biologiset verkot, joissa solmut edustavat biologisia käsitteitä (esim. geenejä, proteiineja tai sairauksia) ja kaaret niiden välisiä suhteita (esim. geeni koodaa proteiinia tai proteiini on aktiivinen tietyssä sairauksessa). Menetelmillä voidaan etsiä esimerkiksi sairauksiin vaikuttavia biologisia mekanismeja paikantamalla edustava joukko sairauteen ja siihen mahdollisesti liittyviin geeneihin verkostossa kytkeytyviä muita solmuja. Nämä voivat auttaa biologeja ymmärtämään geenien ja sairauden mahdollisia kytköksiä ja siten kohdentamaan jatkotutkimustaan lupaavimpiin geeneihin, proteiineihin tms. Väitöskirjassa esitetyt solmujen mielenkiintoisuuden määritelmät sekä niiden löytämiseen ehdotetut menetelmät ovat yleispäteviä ja niitä voi soveltaa periaatteessa mihin tahansa verkkoon riippumatta siitä, mitä solmut, kaaret tai painot edustavat. Kokeet erilaisilla verkoilla osoittavat että ne löytävät mielenkiintoisia solmuja

    Tag disambiguation based on social network information

    No full text
    Within 20 years the Web has grown from a tool for scientists at CERN into a global information space. While returning to its roots as a read/write tool, its entering a more social and participatory phase. Hence a new, improved version called the Social Web where users are responsible for generating and sharing content on the global information space, they are also accountable for replicating the information. This collaborative activity can be observed in two of the most widely practised Social Web services such as social network sites and social tagging systems. Users annotate their interests and inclinations with free form keywords while they share them with their social connections. Although these keywords (tag) assist information organization and retrieval, theysuffer from polysemy.In this study we employ the effectiveness of social network sites to address the issue of ambiguity in social tagging. Moreover, we also propose that homophily in social network sites can be a useful aspect is disambiguating tags. We have extracted the ‘Likes’ of 20 Facebook users and employ them in disambiguation tags on Flickr. Classifiers are generated on the retrieved clusters from Flickr using K-Nearest-Neighbour algorithm and then their degree of similarity is calculated with user keywords. As tag disambiguation techniques lack gold standards for evaluation, we asked the users to indicate the contexts and used them as ground truth while examining the results. We analyse the performance of our approach by quantitative methods and report successful results. Our proposed method is able classify images with an accuracy of 6 out of 10 (on average). Qualitative analysis reveal some factors that affect the findings, and if addressed can produce more precise results
    corecore