39,520 research outputs found
Using distributional similarity to organise biomedical terminology
We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy
Fat-tailed fluctuations in the size of organizations: the role of social influence
Organizational growth processes have consistently been shown to exhibit a
fatter-than-Gaussian growth-rate distribution in a variety of settings. Long
periods of relatively small changes are interrupted by sudden changes in all
size scales. This kind of extreme events can have important consequences for
the development of biological and socio-economic systems. Existing models do
not derive this aggregated pattern from agent actions at the micro level. We
develop an agent-based simulation model on a social network. We take our
departure in a model by a Schwarzkopf et al. on a scale-free network. We
reproduce the fat-tailed pattern out of internal dynamics alone, and also find
that it is robust with respect to network topology. Thus, the social network
and the local interactions are a prerequisite for generating the pattern, but
not the network topology itself. We further extend the model with a parameter
that weights the relative fraction of an individual's neighbours
belonging to a given organization, representing a contextual aspect of social
influence. In the lower limit of this parameter, the fraction is irrelevant and
choice of organization is random. In the upper limit of the parameter, the
largest fraction quickly dominates, leading to a winner-takes-all situation. We
recover the real pattern as an intermediate case between these two extremes.Comment: 15 pages, 4 figure
Recommended from our members
Effects of classification context on categorization in natural categories
The patterns of classification of borderline instances of eight common taxonomic categories were examined under three different instructional conditions to test two predictions: first, that lack of a specified context contributes to vagueness in categorization, and second, that altering the purpose of classification can lead to greater or lesser dependence on similarity in classification. The instructional conditions contrasted purely pragmatic with more technical/quasi-legal contexts as purposes for classification, and these were compared with a no-context control. The measures of category vagueness were between-subjects disagreement and within-subjects consistency, and the measures of similarity based categorization were category breadth and the correlation of instance categorization probability with mean rated typicality, independently measured in a neutral context. Contrary to predictions, none of the measures of vagueness, reliability, category breadth, or correlation with typicality were generally affected by the instructional setting as a function of pragmatic versus technical purposes. Only one subcondition, in which a situational context was implied in addition to a purposive context, produced a significant change in categorization. Further experiments demonstrated that the effect of context was not increased when participants talked their way through the task, and that a technical context did not elicit more all-or-none categorization than did a pragmatic context. These findings place an important boundary condition on the effects of instructional context on conceptual categorization
Taxonomy for Humans or Computers? Cognitive Pragmatics for Big Data
Criticism of big data has focused on showing that more is not necessarily better, in the sense that data may lose their value when taken out of context and aggregated together. The next step is to incorporate an awareness of pitfalls for aggregation into the design of data infrastructure and institutions. A common strategy minimizes aggregation errors by increasing the precision of our conventions for identifying and classifying data. As a counterpoint, we argue that there are pragmatic trade-offs between precision and ambiguity that are key to designing effective solutions for generating big data about biodiversity. We focus on the importance of theory-dependence as a source of ambiguity in taxonomic nomenclature and hence a persistent challenge for implementing a single, long-term solution to storing and accessing meaningful sets of biological specimens. We argue that ambiguity does have a positive role to play in scientific progress as a tool for efficiently symbolizing multiple aspects of taxa and mediating between conflicting hypotheses about their nature. Pursuing a deeper understanding of the trade-offs and synthesis of precision and ambiguity as virtues of scientific language and communication systems then offers a productive next step for realizing sound, big biodiversity data services
Recommended from our members
Abstraction and context in concept representation
This paper develops the notion of abstraction in the context of the psychology of concepts, and discusses its relation to context dependence in knowledge representation. Three general approaches to modelling conceptual knowledge from the domain of cognitive psychology are discussed, which serve to illustrate a theoretical dimension of increasing levels of abstraction
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Mean-field methods in evolutionary duplication-innovation-loss models for the genome-level repertoire of protein domains
We present a combined mean-field and simulation approach to different models
describing the dynamics of classes formed by elements that can appear,
disappear or copy themselves. These models, related to a paradigm
duplication-innovation model known as Chinese Restaurant Process, are devised
to reproduce the scaling behavior observed in the genome-wide repertoire of
protein domains of all known species. In view of these data, we discuss the
qualitative and quantitative differences of the alternative model formulations,
focusing in particular on the roles of element loss and of the specificity of
empirical domain classes.Comment: 10 Figures, 2 Table
- âŠ