Search CORE

11,529 research outputs found

Keyword Search on RDF Graphs - A Query Graph Assembly Approach

Author: Han Shuo
Yu Jeffrey Xu
Zhao Dongyan
Zou Lei
Publication venue
Publication date: 25/08/2017
Field of study

Keyword search provides ordinary users an easy-to-use interface for querying RDF data. Given the input keywords, in this paper, we study how to assemble a query graph that is to represent user's query intention accurately and efficiently. Based on the input keywords, we first obtain the elementary query graph building blocks, such as entity/class vertices and predicate edges. Then, we formally define the query graph assembly (QGA) problem. Unfortunately, we prove theoretically that QGA is a NP-complete problem. In order to solve that, we design some heuristic lower bounds and propose a bipartite graph matching-based best-first search algorithm. The algorithm's time complexity is

O(k^{2l} \cdot l^{3l})

, where

l

is the number of the keywords and

k

is a tunable parameter, i.e., the maximum number of candidate entity/class vertices and predicate edges allowed to match each keyword. Although QGA is intractable, both

l

and

k

are small in practice. Furthermore, the algorithm's time complexity does not depend on the RDF graph size, which guarantees the good scalability of our system in large RDF graphs. Experiments on DBpedia and Freebase confirm the superiority of our system on both effectiveness and efficiency

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Measuring Syntactic Complexity in Spoken and Written Learner Language: Comparing the Incomparable?

Author: Bardovi
Bardovi
Baron
Baron
Beaman
Beaman
Bergman
Bergman
Bourdin
Bourdin
Brown
Brown
Bulté
Bulté
Cambridge
Cambridge
Cleland
Cleland
Conrad
Conrad
Ellis
Ellis
Ellis
Ellis
Foster
Foster
Gaies
Gaies
Gilabert
Gilabert
Halleck
Halleck
Halliday
Halliday
Halliday
Halliday
Housen
Housen
Housen
Housen
Housen
Housen
Hunt
Hunt
Ishikawa
Ishikawa
Ishikawa
Ishikawa
Iwashita
Iwashita
Kuiken
Kuiken
Larsen
Larsen
Larsen
Larsen
Larson
Larson
Leech
Leech
Mari Mäkilä
Norris
Norris
Norwood
Norwood
Ortega
Ortega
Ortega
Ortega
Pallotti
Pallotti
Pekka Lintunen
Pica
Pica
Pietilä
Pietilä
Robinson
Robinson
Robinson
Robinson
Scarborough
Scarborough
Sharma
Sharma
Silva
Silva
Silva
Silva
Skehan
Skehan
Skehan
Skehan
Storch
Storch
Szmrecsányi
Szmrecsányi
Tanskanen
Tanskanen
Tavakoli
Tavakoli
Tonkyn
Tonkyn
Towell
Towell
Vyatkina
Vyatkina
Wolfe
Wolfe
Zhang
Zhang
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2014
Field of study

Spoken and written language are two modes of language. When learners aim at higher skill levels, the expected outcome of successful second language learning is usually to become a fluent speaker and writer who can produce accurate and complex language in the target language. There is an axiomatic difference between speech and writing, but together they form the essential parts of learners’ L2 skills. The two modes have their own characteristics, and there are differences between native and nonnative language use. For instance, hesitations and pauses are not visible in the end result of the writing process, but they are characteristic of nonnative spoken language use. The present study is based on the analysis of L2 English spoken and written productions of 18 L1 Finnish learners with focus on syntactic complexity. As earlier spoken language segmentation units mostly come from fluency studies, we conducted an experiment with a new unit, the U-unit, and examined how using this unit as the basis of spoken language segmentation affects the results. According to the analysis, written language was more complex than spoken language. However, the difference in the level of complexity was greatest when the traditional units, T-units and AS-units, were used in segmenting the data. Using the U-unit revealed that spoken language may, in fact, be closer to written language in its syntactic complexity than earlier studies had suggested. Therefore, further research is needed to discover whether the differences in spoken and written learner language are primarily due to the nature of these modes or, rather, to the units and measures used in the analysis

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Modelling Discourse-related terminology in OntoLingAnnot’s ontologies

Author: Aguado de Cea G.
Pareja-Lora A.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2010
Field of study

Recently, computational linguists have shown great interest in discourse annotation in an attempt to capture the internal relations in texts. With this aim, we have formalized the linguistic knowledge associated to discourse into different linguistic ontologies. In this paper, we present the most prominent discourse-related terms and concepts included in the ontologies of the OntoLingAnnot annotation model. They show the different units, values, attributes, relations, layers and strata included in the discourse annotation level of the OntoLingAnnot model, within which these ontologies are included, used and evaluated

Archivo Digital UPM

Memory-Based Shallow Parsing

Author: Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2002
Field of study

We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement

arXiv.org e-Print Archive

CiteSeerX

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Multi-Modal Mean-Fields via Cardinality-Based Clamping

Author: Baqué Pierre
Fleuret François
Fua Pascal
Publication venue
Publication date: 23/11/2016
Field of study

Mean Field inference is central to statistical physics. It has attracted much interest in the Computer Vision community to efficiently solve problems expressible in terms of large Conditional Random Fields. However, since it models the posterior probability distribution as a product of marginal probabilities, it may fail to properly account for important dependencies between variables. We therefore replace the fully factorized distribution of Mean Field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. By introducing two new ideas, namely, conditioning on groups of variables instead of single ones and using a parameter of the conditional random field potentials, that we identify to the temperature in the sense of statistical physics to select such groups, we can perform this minimization efficiently. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of Mean Field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean fields.Comment: Submitted for review to CVPR 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Implicit learning of recursive context-free grammars

Author: Johan J. Bolhuis
Martin Rohrmeier
Qiufang Fu
Zoltan Dienes
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex context-free structures, which model some features of natural languages. They support the relevance of artificial grammar learning for probing mechanisms of language learning and challenge existing theories and computational models of implicit learning

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

Institute of Psychology,Chinese Academy Of Sciences

PubMed Central

Sussex Research Online

FigShare