11,529 research outputs found
Keyword Search on RDF Graphs - A Query Graph Assembly Approach
Keyword search provides ordinary users an easy-to-use interface for querying
RDF data. Given the input keywords, in this paper, we study how to assemble a
query graph that is to represent user's query intention accurately and
efficiently. Based on the input keywords, we first obtain the elementary query
graph building blocks, such as entity/class vertices and predicate edges. Then,
we formally define the query graph assembly (QGA) problem. Unfortunately, we
prove theoretically that QGA is a NP-complete problem. In order to solve that,
we design some heuristic lower bounds and propose a bipartite graph
matching-based best-first search algorithm. The algorithm's time complexity is
, where is the number of the keywords and is a
tunable parameter, i.e., the maximum number of candidate entity/class vertices
and predicate edges allowed to match each keyword. Although QGA is intractable,
both and are small in practice. Furthermore, the algorithm's time
complexity does not depend on the RDF graph size, which guarantees the good
scalability of our system in large RDF graphs. Experiments on DBpedia and
Freebase confirm the superiority of our system on both effectiveness and
efficiency
Measuring Syntactic Complexity in Spoken and Written Learner Language: Comparing the Incomparable?
Spoken and written language are two modes of language. When learners aim at higher skill levels, the expected outcome of successful second language learning is usually to become a fluent speaker and writer who can produce accurate and complex language in the target language. There is an axiomatic difference between speech and writing, but together they form the essential parts of learners’ L2 skills. The two modes have their own characteristics, and there are differences between native and nonnative language use. For instance, hesitations and pauses are not visible in the end result of the writing process, but they are characteristic of nonnative spoken language use. The present study is based on the analysis of L2 English spoken and written productions of 18 L1 Finnish learners with focus on syntactic complexity. As earlier spoken language segmentation units mostly come from fluency studies, we conducted an experiment with a new unit, the U-unit, and examined how using this unit as the basis of spoken language segmentation affects the results. According to the analysis, written language was more complex than spoken language. However, the difference in the level of complexity was greatest when the traditional units, T-units and AS-units, were used in segmenting the data. Using the U-unit revealed that spoken language may, in fact, be closer to written language in its syntactic complexity than earlier studies had suggested. Therefore, further research is needed to discover whether the differences in spoken and written learner language are primarily due to the nature of these modes or, rather, to the units and measures used in the analysis
Modelling Discourse-related terminology in OntoLingAnnot’s ontologies
Recently, computational linguists have shown great interest in discourse annotation in an attempt to capture the internal relations in texts. With this aim, we have formalized the linguistic knowledge associated to discourse into different linguistic ontologies. In this paper, we present the most prominent discourse-related terms and concepts included in the ontologies of the OntoLingAnnot annotation model. They show the different units, values, attributes, relations, layers and strata included in the discourse annotation level of the OntoLingAnnot model, within which these ontologies are included, used and evaluated
Memory-Based Shallow Parsing
We present memory-based learning approaches to shallow parsing and apply
these to five tasks: base noun phrase identification, arbitrary base phrase
recognition, clause detection, noun phrase parsing and full parsing. We use
feature selection techniques and system combination methods for improving the
performance of the memory-based learner. Our approach is evaluated on standard
data sets and the results are compared with that of other systems. This reveals
that our approach works well for base phrase identification while its
application towards recognizing embedded structures leaves some room for
improvement
Multi-Modal Mean-Fields via Cardinality-Based Clamping
Mean Field inference is central to statistical physics. It has attracted much
interest in the Computer Vision community to efficiently solve problems
expressible in terms of large Conditional Random Fields. However, since it
models the posterior probability distribution as a product of marginal
probabilities, it may fail to properly account for important dependencies
between variables. We therefore replace the fully factorized distribution of
Mean Field by a weighted mixture of such distributions, that similarly
minimizes the KL-Divergence to the true posterior. By introducing two new
ideas, namely, conditioning on groups of variables instead of single ones and
using a parameter of the conditional random field potentials, that we identify
to the temperature in the sense of statistical physics to select such groups,
we can perform this minimization efficiently. Our extension of the clamping
method proposed in previous works allows us to both produce a more descriptive
approximation of the true posterior and, inspired by the diverse MAP paradigms,
fit a mixture of Mean Field approximations. We demonstrate that this positively
impacts real-world algorithms that initially relied on mean fields.Comment: Submitted for review to CVPR 201
Implicit learning of recursive context-free grammars
Context-free grammars are fundamental for the description of linguistic syntax. However, most artificial grammar learning
experiments have explored learning of simpler finite-state grammars, while studies exploring context-free grammars have
not assessed awareness and implicitness. This paper explores the implicit learning of context-free grammars employing
features of hierarchical organization, recursive embedding and long-distance dependencies. The grammars also featured
the distinction between left- and right-branching structures, as well as between centre- and tail-embedding, both
distinctions found in natural languages. People acquired unconscious knowledge of relations between grammatical classes
even for dependencies over long distances, in ways that went beyond learning simpler relations (e.g. n-grams) between
individual words. The structural distinctions drawn from linguistics also proved important as performance was greater for
tail-embedding than centre-embedding structures. The results suggest the plausibility of implicit learning of complex
context-free structures, which model some features of natural languages. They support the relevance of artificial grammar
learning for probing mechanisms of language learning and challenge existing theories and computational models of
implicit learning
- …