2,650 research outputs found
Multiplex Communities and the Emergence of International Conflict
Advances in community detection reveal new insights into multiplex and
multilayer networks. Less work, however, investigates the relationship between
these communities and outcomes in social systems. We leverage these advances to
shed light on the relationship between the cooperative mesostructure of the
international system and the onset of interstate conflict. We detect
communities based upon weaker signals of affinity expressed in United Nations
votes and speeches, as well as stronger signals observed across multiple layers
of bilateral cooperation. Communities of diplomatic affinity display an
expected negative relationship with conflict onset. Ties in communities based
upon observed cooperation, however, display no effect under a standard model
specification and a positive relationship with conflict under an alternative
specification. These results align with some extant hypotheses but also point
to a paucity in our understanding of the relationship between community
structure and behavioral outcomes in networks.Comment: arXiv admin note: text overlap with arXiv:1802.0039
ParaPhraser: Russian paraphrase corpus and shared task
The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with fine-grained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.Peer reviewe
Multimedia information technology and the annotation of video
The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning
ISTRAŽIVANJE O POVEZIVANJU ENTITETA ZA SPECIFIÄNE DOMENE S HETEROGENIM INFORMACIJSKIM MREŽAMA
Entity linking is a task of extracting information that links the mentioned entity in a collection of text with their similar knowledge base as well as it is the task of allocating unique identity to various entities such as locations, individuals and companies. Knowledgebase (KB) is used to optimize the information collection, organization and for retrieval of information. Heterogeneous information networks (HIN) comprises multiple-type interlinked objects with various types of relationship which are becoming increasingly most popular named bibliographic networks, social media networks as well including the typical relational database data. In HIN, there are various data objects are interconnected through various relations. The entity linkage determines the corresponding entities from unstructured web text, in the existing HIN. This work is the most important and it is the most challenge because of ambiguity and existing limited knowledge. Some HIN could be considered as a domain-specific KB. The current Entity Linking (EL) systems aimed towards corpora which contain heterogeneous as web information and it performs sub-optimally on the domain-specific corpora. The EL systems used one or more general or specific domains of linking such as DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet and MKB. This paper presents a survey on domain-specific entity linking with HIN. This survey describes with a deep understanding of HIN, which includes datasets,types and examples with related concepts.Povezivanje entiteta je zadatak izvlaÄenja podataka koji povezuju spomenuti entitet u zbirci teksta sa njihovom sliÄnom bazom znanja, kao i zadatak dodjeljivanja jedinstvenog identiteta razliÄitim entitetima, kao Å”to su lokacije, pojedinci i tvrtke. Baza znanja (BZ) koristi se za optimizaciju prikupljanja, organizacije i pronalaženja informacija. Heterogene mreže informacija (HMI) obuhvaÄaju viÅ”estruke meÄusobno povezane objekte razliÄitih vrsta odnosa koji postaju sve popularniji i nazivaju se bibliografskim mrežama, mrežama druÅ”tvenih medija, ukljuÄujuÄi tipiÄne podatke relacijske baze podataka. U HMI-u postoje razni podaci koji su meÄusobno povezani kroz razliÄite odnose. Povezanost entiteta odreÄuje odgovarajuÄe entitete iz nestrukturiranog teksta na webu u postojeÄem HMI-u. Ovaj je rad najvažniji i najveÄi izazov zbog nejasnoÄe i postojeÄeg ograniÄenog znanja. Neki se HMI mogu smatrati BZ-om specifiÄnim za domenu. Trenutni sustav povezivanja entiteta (PE) usmjeren je prema korpusima koji sadrže heterogene informacije kao web informacije i oni djeluju suptimalno na korpusima specifiÄnim za domenu. PE sustavi koristili su jednu ili viÅ”e opÄih ili specifiÄnih domena povezivanja, kao Å”to su DBpedia, Wikipedia, Freebase, IMDB, YAGO, Wordnet i MKB. U ovom radu predstavljeno je istraživanje o povezivanju entiteta specifiÄnog za domenu sa HMI-om. Ovo istraživanje opisuje s dubokim razumijevanjem HMI-a, Å”to ukljuÄuje skupove podataka, vrste i primjere s povezanim konceptima
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Despite substantial interest in applications of neural networks to
information retrieval, neural ranking models have only been applied to standard
ad hoc retrieval tasks over web pages and newswire documents. This paper
proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network)
a novel neural ranking model specifically designed for ranking short social
media posts. We identify document length, informal language, and heterogeneous
relevance signals as features that distinguish documents in our domain, and
present a model specifically designed with these characteristics in mind. Our
model uses hierarchical convolutional layers to learn latent semantic
soft-match relevance signals at the character, word, and phrase levels. A
pooling-based similarity measurement layer integrates evidence from multiple
types of matches between the query, the social media post, as well as URLs
contained in the post. Extensive experiments using Twitter data from the TREC
Microblog Tracks 2011--2014 show that our model significantly outperforms prior
feature-based as well and existing neural ranking models. To our best
knowledge, this paper presents the first substantial work tackling search over
social media posts using neural ranking models.Comment: AAAI 2019, 10 page
Paraphrastic Representations at Scale
We present a system that allows users to train their own state-of-the-art
paraphrastic sentence representations in a variety of languages. We also
release trained models for English, Arabic, German, French, Spanish, Russian,
Turkish, and Chinese. We train these models on large amounts of data, achieving
significantly improved performance from the original papers proposing the
methods on a suite of monolingual semantic similarity, cross-lingual semantic
similarity, and bitext mining tasks. Moreover, the resulting models surpass all
prior work on unsupervised semantic textual similarity, significantly
outperforming even BERT-based models like Sentence-BERT (Reimers and Gurevych,
2019). Additionally, our models are orders of magnitude faster than prior work
and can be used on CPU with little difference in inference speed (even improved
speed over GPU when using more CPU cores), making these models an attractive
choice for users without access to GPUs or for use on embedded devices.
Finally, we add significantly increased functionality to the code bases for
training paraphrastic sentence models, easing their use for both inference and
for training them for any desired language with parallel data. We also include
code to automatically download and preprocess training data.Comment: Published as a demo paper at EMNLP 202
Recommended from our members
Cross-lingual semantic specialization via lexical relation induction
Semantic specialization integrates structured linguistic knowledge from external resources (such as lexical relations in WordNet) into pretrained distributional vectors in the form of constraints. However, this technique cannot be leveraged in many languages, because their structured external resources are typically incomplete or non-existent. To bridge this gap, we propose a novel method that transfers specialization from a resource-rich source language (English) to virtually any target language. Our specialization transfer comprises two crucial steps: 1) Inducing noisy constraints in the target language through automatic word translation; and 2) Filtering the noisy constraints via a state-of-the-art relation prediction model trained on the source language constraints. This allows us to specialize any set of distributional vectors in the target language with the refined constraints. We prove the effectiveness of our method through intrinsic word similarity evaluation in 8 languages, and with 3 downstream tasks in 5 languages: lexical simplification, dialog state tracking, and semantic textual similarity. The gains over the previous state-of-art specialization methods are substantial and consistent across languages. Our results also suggest that the transfer method is effective even for lexically distant source-target language pairs. Finally, as a by-product, our method produces lists of WordNet-style lexical relations in resource-poor languages
- ā¦