420 research outputs found
Effective Approaches to Attention-based Neural Machine Translation
An attentional mechanism has lately been used to improve neural machine
translation (NMT) by selectively focusing on parts of the source sentence
during translation. However, there has been little work exploring useful
architectures for attention-based NMT. This paper examines two simple and
effective classes of attentional mechanism: a global approach which always
attends to all source words and a local one that only looks at a subset of
source words at a time. We demonstrate the effectiveness of both approaches
over the WMT translation tasks between English and German in both directions.
With local attention, we achieve a significant gain of 5.0 BLEU points over
non-attentional systems which already incorporate known techniques such as
dropout. Our ensemble model using different attention architectures has
established a new state-of-the-art result in the WMT'15 English to German
translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over
the existing best system backed by NMT and an n-gram reranker.Comment: 11 pages, 7 figures, EMNLP 2015 camera-ready version, more training
detail
Self-organizing Structured RDF in MonetDB
The semantic web uses RDF as its data model, providing ultimate flexibility
for users to represent and evolve data without need of a schema.
Yet, this flexibility poses challenges in implementing efficient RDF
stores, leading from plans with very many self-joins to a triple table,
difficulties to optimize these, and a lack of data locality since without
a notion of multi-attribute data structure, clustered indexing opportunities are lost.
Apart from performance issues, users of huge RDF graphs often have problems
formulating queries as they lack any system-supported notion of the structure in the data.
In this research, we exploit the observation that real RDF data, while not as regularly
structured as relational data, still has the great majority of triples conforming to regular patterns.
We conjecture that a system that would recognize this structure automatically
would both allow RDF stores to become more efficient and also easier to use.
Concretely, we propose to derive self-organizing RDF that stores data
in PSO format in such a way that the regular parts of the data physically
correspond to relational columnar storage; and propose RDFscan/RDFjoin algorithms
that compute star-patterns over these without wasting effort in self-joins.
These regular parts, i.e. tables, are identified on ingestion by a schema discovery
algorithm -- as such users will gain an SQL view of the regular part of the RDF data.
This research aims to produce a state-of-the-art SPARQL frontend for MonetDB
as a by-product, and we already present some preliminary results on this platform
Exploiting emergent schemas to make RDF systems more efficient
We build on our earlier finding that more than 95 % of the
triples in actual RDF triple graphs have a remarkably tabular structure,
whose schema does not necessarily follow from explicit metadata such as
ontologies, but for which an RDF store can automatically derive by looking
at the data using so-called “emergent schema” detection techniques.
In this paper we investigate how computers and in particular RDF stores
can take advantage from this emergent schema to more compactly store
RDF data and more efficiently optimize and execute SPARQL queries.
To this end, we contribute techniques for efficient emergent schema aware
RDF storage and new query operator algorithms for emergent schema
aware scans and joins. In all, these techniques allow RDF schema processors
fully catch up with relational database techniques in terms of rich
physical database design options and efficiency, without requiring a rigid
upfront schema structure definition
S3G2: a Scalable Structure-correlated Social Graph Generator
Benchmarking graph-oriented database workloads and graph-oriented database systems are increasingly becoming relevant in analytical Big Data tasks, such as social network analysis. In graph data, structure is not mainly found inside the nodes, but especially in the way nodes happen to be connected, i.e. structural correlations. Because such structural correlations determine join fan-outs experienced by graph analysis algorithms and graph query executors, they are an essential, yet typically neglected, ingredient of synthetic graph generators. To address this, we present S3G2: a Scalable Structure-correlated Social Graph Generator. This graph generator creates a synthetic social graph, containing non-uniform value distributions and structural correlations, and is intended as a testbed for scalable graph analysis algorithms and graph database systems. We generalize the problem to decompose correlated graph generation in multiple passes that each focus on one so-called "correlation dimension"; each of which can be mapped to a MapReduce task. We show that using S3G2 can generate social graphs that (i) share well-known graph connectivity characteristics typically found in real social graphs (ii) contain certain plausible structural correlations that influence the performance of graph analysis algorithms and queries, and (iii) can be quickly generated at huge sizes on common cluster hardware
Homophily-based social group formation in a spin-glass self-assembly framework
Homophily, the tendency of humans to attract each other when sharing similar
features, traits, or opinions has been identified as one of the main driving
forces behind the formation of structured societies. Here we ask to what extent
homophily can explain the formation of social groups, particularly their size
distribution. We propose a spin-glass-inspired framework of self-assembly,
where opinions are represented as multidimensional spins that dynamically
self-assemble into groups; individuals within a group tend to share similar
opinions (intra-group homophily), and opinions between individuals belonging to
different groups tend to be different (inter-group heterophily). We compute the
associated non-trivial phase diagram by solving a self-consistency equation for
'magnetization' (combined average opinion). Below a critical temperature, there
exist two stable phases: one ordered with non-zero magnetization and large
clusters, the other disordered with zero magnetization and no clusters. The
system exhibits a first-order transition to the disordered phase. We
analytically derive the group-size distribution that successfully matches
empirical group-size distributions from online communities.Comment: 6 pages, 5 pages of SI, to appear in Phys. Rev. Let
Network Sensitivity of Systemic Risk
A growing body of studies on systemic risk in financial markets has
emphasized the key importance of taking into consideration the complex
interconnections among financial institutions. Much effort has been put in
modeling the contagion dynamics of financial shocks, and to assess the
resilience of specific financial markets - either using real network data,
reconstruction techniques or simple toy networks. Here we address the more
general problem of how shock propagation dynamics depends on the topological
details of the underlying network. To this end we consider different realistic
network topologies, all consistent with balance sheets information obtained
from real data on financial institutions. In particular, we consider networks
of varying density and with different block structures, and diversify as well
in the details of the shock propagation dynamics. We confirm that the systemic
risk properties of a financial network are extremely sensitive to its network
features. Our results can aid in the design of regulatory policies to improve
the robustness of financial markets
sFuzz: An efficient adaptive fuzzer for solidity smart contracts
Ministry of Education, Singapore under its Academic Research Funding Tier
Adjustment of Vietnamese labour market in time of economic fluctuations and structural changes
Dans cet article, nous examinons les ajustements du marché du travail aux fluctuations économiques, compte tenu des transformations structurelles en cours ainsi que des changements à court terme. Nous utilisons pour cela des données des recensements de la population ou publiées dans les annuaires statistiques de l’Office Général de la Statistique pour les séries à long terme, et les enquêtes emploi conduites entre 2007 à 2012 pour les données à court terme. Cet article souligne la profonde transformation du marché du travail au cours des dernières décennies. La population active a doublé en 25 ans et la part de l'agriculture est passée en dessous du seuil de 50 %. L’absorption de l'offre de travail a donc été l'un des principaux défis pour l'économie vietnamienne sur cette période. Le secteur des entreprises familiales agricoles et non-agricoles a été le principal pourvoyeur d'emplois au cours de ces années. Le marché du travail s'est adapté au récent ralentissement économique à travers différents canaux. Le chômage est resté stable mais le nombre de personnes inactives a augmenté. La quantité de travail a également été affectée par une réduction significative du nombre d'heures travaillées. Alors que le secteur non agricole a généré plus d'emplois pour les travailleurs qualifiés, un flux de travailleurs non-qualifiés vers l’agriculture a été observé. En raison de facteurs démographiques, l'absorption de l'offre de travail et la création de nouveaux emplois ne sont plus le principal problème. En revanche, l’évolution récente du marché du travail appelle à la mise en oeuvre de politiques structurelles en vue d’améliorer les conditions de travail, la période étant particulièrement favorable pour mener ces politiques puisque le Vietnam profite actuellement du dividende démographique
- …