17,952 research outputs found
Automating Metadata Extraction: Genre Classification
A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.
Enlightened Romanticism: Mary Gartside’s colour theory in the age of Moses Harris, Goethe and George Field
The aim of this paper is to evaluate the work of Mary Gartside, a British female colour theorist, active in London between 1781 and 1808. She published three books between 1805 and 1808. In chronological and intellectual terms Gartside can cautiously be regarded an exemplary link between Moses Harris, who published a short but important theory of colour in the second half of the eighteenth century, and J.W. von Goethe’s highly influential Zur Farbenlehre, published in Germany in 1810. Gartside’s colour theory was published privately under the disguise of a traditional water colouring manual, illustrated with stunning abstract colour blots (see example above). Until well into the twentieth century, she remained the only woman known to have published a theory of colour. In contrast to Goethe and other colour theorists in the late 18th and early 19th century Gartside was less inclined to follow the anti-Newtonian attitudes of the Romantic movement
The Mirror DBMS at TREC-8
The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking
Bilingual episodic memory: an introduction
Our current models of bilingual memory are essentially accounts of semantic memory whose goal is to explain bilingual lexical access to underlying imagistic and conceptual referents. While this research has included episodic memory, it has focused largely on recall for words, phrases, and sentences in the service of understanding the structure of semantic memory. Building on the four papers in this special issue, this article focuses on larger units of episodic memory(from quotidian events with simple narrative form to complex autobiographical memories) in service of developing a model of bilingual episodic memory. This requires integrating theory and research on how culture-specific narrative traditions inform encoding and retrieval with theory and research on the relation between(monolingual) semantic and episodic memory(Schank, 1982; Schank & Abelson, 1995; Tulving, 2002). Then, taking a cue from memory-based text processing studies in psycholinguistics(McKoon & Ratcliff, 1998), we suggest that as language forms surface in the progressive retrieval of features of an event, they trigger further forms within the same language serving to guide a within-language/ within-culture retrieval
Speech rhythm: a metaphor?
Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and that it is this analogical process which allows speech to be matched to external rhythms
Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies
An automatic word classification system has been designed which processes
word unigram and bigram frequency statistics extracted from a corpus of natural
language utterances. The system implements a binary top-down form of word
clustering which employs an average class mutual information metric. Resulting
classifications are hierarchical, allowing variable class granularity. Words
are represented as structural tags --- unique -bit numbers the most
significant bit-patterns of which incorporate class information. Access to a
structural tag immediately provides access to all classification levels for the
corresponding word. The classification system has successfully revealed some of
the structure of English, from the phonemic to the semantic level. The system
has been compared --- directly and indirectly --- with other recent word
classification systems. Class based interpolated language models have been
constructed to exploit the extra information supplied by the classifications
and some experiments have shown that the new models improve model performance.Comment: 17 Page Paper. Self-extracting PostScript Fil
Questioning short-term memory and its measurement: why digit span measures long-term associative learning
Traditional accounts of verbal short-term memory explain differences in performance for different types of verbal material by reference to inherent characteristics of the verbal items making up memory sequences. The role of previous experience with sequences of different types is ostensibly controlled for either by deliberate exclusion or by presenting multiple trials constructed from different random permutations
A network model of interpersonal alignment in dialog
In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysi
Designing Women: Essentializing Femininity in AI Linguistics
Since the eighties, feminists have considered technology a force capable of subverting sexism because of technology’s ability to produce unbiased logic. Most famously, Donna Haraway’s “A Cyborg Manifesto” posits that the cyborg has the inherent capability to transcend gender because of its removal from social construct and lack of loyalty to the natural world. But while humanoids and artificial intelligence have been imagined as inherently subversive to gender, current artificial intelligence perpetuates gender divides in labor and language as their programmers imbue them with traits considered “feminine.” A majority of 21st century AI and humanoids are programmed to fit female stereotypes as they fulfill emotional labor and perform pink-collar tasks, whether through roles as therapists, query-fillers, or companions. This paper examines four specific chat-based AI --ELIZA, XiaoIce, Sophia, and Erica-- and examines how their feminine linguistic patterns are used to maintain the illusion of emotional understanding in regards to the tasks that they perform. Overall, chat-based AI fails to subvert gender roles, as feminine AI are relegated to the realm of emotional intelligence and labor
- …