1,223 research outputs found
Infectious Disease Ontology
Technological developments have resulted in tremendous increases in the volume and diversity of the data and information that must be processed in the course of biomedical and clinical research and practice. Researchers are at the same time under ever greater pressure to share data and to take steps to ensure that data resources are interoperable. The use of ontologies to annotate data has proven successful in supporting these goals and in providing new possibilities for the automated processing of data and information. In this chapter, we describe different types of vocabulary resources and emphasize those features of formal ontologies that make them most useful for computational applications. We describe current uses of ontologies and discuss future goals for ontology-based computing, focusing on its use in the field of infectious diseases. We review the largest and most widely used vocabulary resources relevant to the study of infectious diseases and conclude with a description of the Infectious Disease Ontology (IDO) suite of interoperable ontology modules that together cover the entire infectious disease domain
The algebra of lexical semantics
Abstract. The current generative theory of the lexicon relies primar-ily on tools from formal language theory and mathematical logic. Here we describe how a different formal apparatus, taken from algebra and automata theory, resolves many of the known problems with the gener-ative lexicon. We develop a finite state theory of word meaning based on machines in the sense of Eilenberg [11], a formalism capable of de-scribing discrepancies between syntactic type (lexical category) and se-mantic type (number of arguments). This mechanism is compared both to the standard linguistic approaches and to the formalisms developed in AI/KR. 1 Problem Statement In developing a formal theory of lexicography our starting point will be the informal practice of lexicography, rather than the more immediately related for-mal theories of Artificial Intelligence (AI) and Knowledge Representation (KR). Lexicography is a relatively mature field, with centuries of work experience an
Natural language and the genetic code: from the semiotic analogy to biolinguistics
[Abstract] With the discovery of the DNA structure (Watson and Crick, 1953), the idea of DNA as a linguistic code arose (Monod, 1970). Many researchers have considered DNA as a language, pointing out the semiotic parallelism between genetic code and natural language. This idea had been discussed, almost dismissed and somehow accepted. This paper does not claim that the genetic code is a linguistic structure, but it highlights several important semiotic analogies between DNA and verbal language. Genetic code and natural language share a number of units, structures and operations. The syntactic and semantic parallelisms between those codes should lead to a methodological exchange between biology, linguistics and semiotics. During the 20th century, biology has become a pilot science, so that many disciplines have formulated their theories under models taken from biology. Computer science has become almost a bioinspired field thanks to the great development of natural computing and DNA computing. Biology and semiotics are two different sciences challenged by the same common goal of deciphering the codes of the nature. Linguistics could become another «bio-inspired» science by taking advantage of the structural and «semantic» similarities between the genetic code and natural language. Biological methods coming from computer science can be very useful in the field of linguistics, since they provide flexible and intuitive tools for describing natural languages. In this way, we obtain a theoretical framework where biology, linguistics and computer science exchange methods and interact, thanks to the semiotic parallelism between the genetic code a natural language. The influence of the semiotics of the genetic code in linguistics is parallel to the need of achieving an implementable formal description of natural language. In this paper we present an overview of different bio-inspired methods — from theoretical computer science — that during the last years have been successfully applied to several linguistics issues, from syntax to pragmatics
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Language and scientific explanation: Where does semantics fit in?
This book discusses the two main construals of the explanatory goals of semantic theories. The first, externalist conception, understands semantic theories in terms of a hermeneutic and interpretive explanatory project. The second, internalist conception, understands semantic theories in terms of the psychological mechanisms in virtue of which meanings are generated. It is argued that a fruitful scientific explanation is one that aims to uncover the underlying mechanisms in virtue of which the observable phenomena are made possible, and that a scientific semantics should be doing just that. If this is the case, then a scientific semantics is unlikely to be externalist, for reasons having to do with the subject matter and form of externalist theories. It is argued that semantics construed hermeneutically is nevertheless a valuable explanatory project
ImmunoLingo: Linguistics-based formalization of the antibody language
Apparent parallels between natural language and biological sequence have led
to a recent surge in the application of deep language models (LMs) to the
analysis of antibody and other biological sequences. However, a lack of a
rigorous linguistic formalization of biological sequence languages, which would
define basic components, such as lexicon (i.e., the discrete units of the
language) and grammar (i.e., the rules that link sequence well-formedness,
structure, and meaning) has led to largely domain-unspecific applications of
LMs, which do not take into account the underlying structure of the biological
sequences studied. A linguistic formalization, on the other hand, establishes
linguistically-informed and thus domain-adapted components for LM applications.
It would facilitate a better understanding of how differences and similarities
between natural language and biological sequences influence the quality of LMs,
which is crucial for the design of interpretable models with extractable
sequence-functions relationship rules, such as the ones underlying the antibody
specificity prediction problem. Deciphering the rules of antibody specificity
is crucial to accelerating rational and in silico biotherapeutic drug design.
Here, we formalize the properties of the antibody language and thereby
establish not only a foundation for the application of linguistic tools in
adaptive immune receptor analysis but also for the systematic immunolinguistic
studies of immune receptor specificity in general.Comment: 19 pages, 3 figure
- …