48,068 research outputs found
Holistic corpus-based dialectology
This paper is concerned with sketching future directions for corpus-based dialectology. We advocate a holistic approach to the study of geographically conditioned linguistic variability, and we present a suitable methodology, 'corpusbased dialectometry', in exactly this spirit. Specifically, we argue that in order to live up to the potential of the corpus-based method, practitioners need to (i) abandon their exclusive focus on individual linguistic features in favor of the study of feature aggregates, (ii) draw on computationally advanced multivariate analysis techniques (such as multidimensional scaling, cluster analysis, and principal component analysis), and (iii) aid interpretation of empirical results by marshalling state-of-the-art data visualization techniques. To exemplify this line of analysis, we present a case study which explores joint frequency variability of 57 morphosyntax features in 34 dialects all over Great Britain
Joint perceptual decision-making: a case study in explanatory pluralism.
Traditionally different approaches to the study of cognition have been viewed as competing explanatory frameworks. An alternative view, explanatory pluralism, regards different approaches to the study of cognition as complementary ways of studying the same phenomenon, at specific temporal and spatial scales, using appropriate methodological tools. Explanatory pluralism has been often described abstractly, but has rarely been applied to concrete cases. We present a case study of explanatory pluralism. We discuss three separate ways of studying the same phenomenon: a perceptual decision-making task (Bahrami et al., 2010), where pairs of subjects share information to jointly individuate an oddball stimulus among a set of distractors. Each approach analyzed the same corpus but targeted different units of analysis at different levels of description: decision-making at the behavioral level, confidence sharing at the linguistic level, and acoustic energy at the physical level. We discuss the utility of explanatory pluralism for describing this complex, multiscale phenomenon, show ways in which this case study sheds new light on the concept of pluralism, and highlight good practices to critically assess and complement approaches
QuesNet: A Unified Representation for Heterogeneous Test Questions
Understanding learning materials (e.g. test questions) is a crucial issue in
online learning systems, which can promote many applications in education
domain. Unfortunately, many supervised approaches suffer from the problem of
scarce human labeled data, whereas abundant unlabeled resources are highly
underutilized. To alleviate this problem, an effective solution is to use
pre-trained representations for question understanding. However, existing
pre-training methods in NLP area are infeasible to learn test question
representations due to several domain-specific characteristics in education.
First, questions usually comprise of heterogeneous data including content text,
images and side information. Second, there exists both basic linguistic
information as well as domain logic and knowledge. To this end, in this paper,
we propose a novel pre-training method, namely QuesNet, for comprehensively
learning question representations. Specifically, we first design a unified
framework to aggregate question information with its heterogeneous inputs into
a comprehensive vector. Then we propose a two-level hierarchical pre-training
algorithm to learn better understanding of test questions in an unsupervised
way. Here, a novel holed language model objective is developed to extract
low-level linguistic features, and a domain-oriented objective is proposed to
learn high-level logic and knowledge. Moreover, we show that QuesNet has good
capability of being fine-tuned in many question-based tasks. We conduct
extensive experiments on large-scale real-world question data, where the
experimental results clearly demonstrate the effectiveness of QuesNet for
question understanding as well as its superior applicability
Using Fuzzy Linguistic Representations to Provide Explanatory Semantics for Data Warehouses
A data warehouse integrates large amounts of extracted and summarized data from multiple sources for direct querying and analysis. While it provides decision makers with easy access to such historical and aggregate data, the real meaning of the data has been ignored. For example, "whether a total sales amount 1,000 items indicates a good or bad sales performance" is still unclear. From the decision makers' point of view, the semantics rather than raw numbers which convey the meaning of the data is very important. In this paper, we explore the use of fuzzy technology to provide this semantics for the summarizations and aggregates developed in data warehousing systems. A three layered data warehouse semantic model, consisting of quantitative (numerical) summarization, qualitative (categorical) summarization, and quantifier summarization, is proposed for capturing and explicating the semantics of warehoused data. Based on the model, several algebraic operators are defined. We also extend the SQL language to allow for flexible queries against such enhanced data warehouses
On the universal structure of human lexical semantics
How universal is human conceptual structure? The way concepts are organized
in the human brain may reflect distinct features of cultural, historical, and
environmental background in addition to properties universal to human
cognition. Semantics, or meaning expressed through language, provides direct
access to the underlying conceptual structure, but meaning is notoriously
difficult to measure, let alone parameterize. Here we provide an empirical
measure of semantic proximity between concepts using cross-linguistic
dictionaries. Across languages carefully selected from a phylogenetically and
geographically stratified sample of genera, translations of words reveal cases
where a particular language uses a single polysemous word to express concepts
represented by distinct words in another. We use the frequency of polysemies
linking two concepts as a measure of their semantic proximity, and represent
the pattern of such linkages by a weighted network. This network is highly
uneven and fragmented: certain concepts are far more prone to polysemy than
others, and there emerge naturally interpretable clusters loosely connected to
each other. Statistical analysis shows such structural properties are
consistent across different language groups, largely independent of geography,
environment, and literacy. It is therefore possible to conclude the conceptual
structure connecting basic vocabulary studied is primarily due to universal
features of human cognition and language use.Comment: Press embargo in place until publicatio
Painting and Language: A Pictoral Syntax of Shapes
In previous articles, the author proposed that paintings can have syntactic rules. In this article he develops his proposal further and shows that shapes act as syntactic elements in the languages of painting styles. He meets Nelson Goodman\u27s objections to his proposal by showing that shapes meet the criterion of syntactic discreteness proposed by the latter to separate linguistic from other symbolic systems. His approach is to specify style as the domain of a language of painting, to show that style is syntactical and to argue that shapes are the primitive syntactic elements of style. His essay relates current research on the development of syntax for picture-reading machines to the question of syntax for paintings
- âŠ