889 research outputs found
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
Extensions of SNOMED taxonomy abstraction networks supporting auditing and complexity analysis
The Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT) has been widely used as a standard terminology in various biomedical domains. The enhancement of the quality of SNOMED contributes to the improvement of the medical systems that it supports.
In previous work, the Structural Analysis of Biomedical Ontologies Center (SABOC) team has defined the partial-area taxonomy, a hierarchical abstraction network consisting of units called partial-areas. Each partial-area comprises a set of SNOMED concepts exhibiting a particular relationship structure and being distinguished by a unique root concept. In this dissertation, some extensions and applications of the taxonomy framework are considered. Some concepts appearing in multiple partial-areas have been designated as complex due to the fact that they constitute a tangled portion of a hierarchy and can be obstacles to users trying to gain an understanding of the hierarchy’s content. A methodology for partitioning the entire collection of these so-called overlapping complex concepts into singly-rooted groups was presented. A novel auditing methodology based on an enhanced abstraction network is described.
In addition, the existing abstraction network relies heavily on the structure of the outgoing relationships of the concepts. But some of SNOMED hierarchies (or subhierarchies) serve only as targets of relationships, with few or no outgoing relationships of their own. This situation impedes the applicability of the abstraction network. To deal with this problem, a variation of the above abstraction network, called the converse abstraction network (CAN) is defined and derived automatically from a given SNOMED hierarchy. An auditing methodology based on the CAN is formulated.
Furthermore, a preliminary study of the complementary use of the abstraction network in description logic (DL) for quality assurance purposes pertaining to SNOMED is presented.
Two complexity measures, a structural complexity measure and a hierarchical complexity measure, based on the abstraction network are introduced to quantify the complexity of a SNOMED hierarchy. An extension of the two measures is also utilized specifically to track the complexity of the versions of the SNOMED hierarchies before and after a sequence of auditing processes
Developmental Bootstrapping of AIs
Although some current AIs surpass human abilities in closed artificial worlds
such as board games, their abilities in the real world are limited. They make
strange mistakes and do not notice them. They cannot be instructed easily, fail
to use common sense, and lack curiosity. They do not make good collaborators.
Mainstream approaches for creating AIs are the traditional manually-constructed
symbolic AI approach and generative and deep learning AI approaches including
large language models (LLMs). These systems are not well suited for creating
robust and trustworthy AIs. Although it is outside of the mainstream, the
developmental bootstrapping approach has more potential. In developmental
bootstrapping, AIs develop competences like human children do. They start with
innate competences. They interact with the environment and learn from their
interactions. They incrementally extend their innate competences with
self-developed competences. They interact and learn from people and establish
perceptual, cognitive, and common grounding. They acquire the competences they
need through bootstrapping. However, developmental robotics has not yet
produced AIs with robust adult-level competences. Projects have typically
stopped at the Toddler Barrier corresponding to human infant development at
about two years of age, before their speech is fluent. They also do not bridge
the Reading Barrier, to skillfully and skeptically draw on the socially
developed information resources that power current LLMs. The next competences
in human cognitive development involve intrinsic motivation, imitation
learning, imagination, coordination, and communication. This position paper
lays out the logic, prospects, gaps, and challenges for extending the practice
of developmental bootstrapping to acquire further competences and create
robust, resilient, and human-compatible AIs.Comment: 102 pages, 29 figure
From Simple Associations to Systemic Reasoning: A Connectionist Representation of Rules, Variables and Dynamic Bindings
Human agents draw a variety of inferences effortlessly, spontaneously, and with remarkable efficiency - as though these inferences are a reflex response of their cognitive apparatus. The work presented in this paper is a step toward a computational account of this remarkable reasoning ability. We describe how a connectionist system made up of simple and slow neuron-like elements can encode millions of facts and rules involving n-ary predicates and variables, and yet perform a variety of inferences within hundreds of milliseconds. We observe that an efficient reasoning system must represent and propagate, dynamically, a large number of variable bindings. The proposed system does so by propagating rhythmic patterns of activity wherein dynamic bindings are represented as the in-phase, i.e., synchronous, firing of appropriate nodes. The mechanisms for representing and propagating dynamic bindings are biologically plausible. Neurophysiological evidence suggests that similar mechanisms may in fact be used by the brain to represent and process sensorimotor information
Structural analysis and auditing of SNOMED hierarchies using abstraction networks
SNOMED is one of the leading healthcare terminologies being used worldwide. Due to its sheer volume and continuing expansion, it is inevitable that errors will make their way into SNOMED. Thus, quality assurance is an important part of its maintenance cycle.
A structural approach is presented in this dissertation, aiming at developing automated techniques that can aid auditors in the discovery of terminology errors more effectively and efficiently. Large SNOMED hierarchies are partitioned, based primarily on their relationships patterns, into concept groups of more manageable sizes. Three related abstraction networks with respect to a SNOMED hierarchy, namely the area taxonomy, partial-area taxonomy, and disjoint partial-area taxonomy, are derived programmatically from the partitions. Altogether they afford high-level abstraction views of the underlying hierarchy, each with different granularity. The area taxonomy gives a global structural view of a SNOMED hierarchy, while the partial-area taxonomy focuses more on the semantic uniformity and hierarchical proximity of concepts. The disjoint partial-area taxonomy is devised as an enhancement of the partial-area taxonomy and is based on the partition of the entire collection of so-called overlapping concepts into singly-rooted groups.
The taxonomies are exploited as the basis for a number of systematic auditing regimens, with a theme that complex concepts are more error-prone and require special attention in auditing activities. In general, group-based auditing is promoted to achieve a more efficient review within semantically uniform groups. Certain concept groups in the different taxonomies are deemed “complex” according to various criteria and thus deserve focused auditing. Examples of these include strict inheritance regions in the partial-area taxonomy and overlapping partial-areas in the disjoint partial-area taxonomy.
Multiple hypotheses are formulated to characterize the error distributions and ratios with respect to different concept groups presented by the taxonomies, and thus further establish their efficacy as vehicles for auditing. The methodologies are demonstrated using SNOMED’s Specimen hierarchy as the test bed. Auditing results are reported and analyzed to assess the hypotheses. With the use of the double bootstrap and Fisher’s exact test (two-tailed), the aforementioned hypotheses are confirmed. Auditing on various complex concept groups based on the taxonomies is shown to yield a statistically significant higher proportion of errors
Processes on Paper: Writing Procedures as Non-Material Research Devices
The paper focuses on the instrumentality of writing in the context of scientific research. It is suggested that the tool-character of writing is related to specific writing procedures, such as the list. These procedures can vary in their degree of complexity and often follow rules that are not codified. In any case, writing procedures can be characterized as non-material devices of "concretion.” Two examples from the notebooks of the physicist and philosopher of science, Ernst Mach (1838-1916), will help to develop the notion of writing procedures. Typical for Mach's use of his notebooks, they highlight the effects of writing in the context of reasoning and reflectin
- …