17,952 research outputs found

    Automating Metadata Extraction: Genre Classification

    Get PDF
    A problem that frequently arises in the management and integration of scientific data is the lack of context and semantics that would link data encoded in disparate ways. To bridge the discrepancy, it often helps to mine scientific texts to aid the understanding of the database. Mining relevant text can be significantly aided by the availability of descriptive and semantic metadata. The Digital Curation Centre (DCC) has undertaken research to automate the extraction of metadata from documents in PDF([22]). Documents may include scientific journal papers, lab notes or even emails. We suggest genre classification as a first step toward automating metadata extraction. The classification method will be built on looking at the documents from five directions; as an object of specific visual format, a layout of strings with characteristic grammar, an object with stylo-metric signatures, an object with meaning and purpose, and an object linked to previously classified objects and external sources. Some results of experiments in relation to the first two directions are described here; they are meant to be indicative of the promise underlying this multi-faceted approach.

    Enlightened Romanticism: Mary Gartside’s colour theory in the age of Moses Harris, Goethe and George Field

    Get PDF
    The aim of this paper is to evaluate the work of Mary Gartside, a British female colour theorist, active in London between 1781 and 1808. She published three books between 1805 and 1808. In chronological and intellectual terms Gartside can cautiously be regarded an exemplary link between Moses Harris, who published a short but important theory of colour in the second half of the eighteenth century, and J.W. von Goethe’s highly influential Zur Farbenlehre, published in Germany in 1810. Gartside’s colour theory was published privately under the disguise of a traditional water colouring manual, illustrated with stunning abstract colour blots (see example above). Until well into the twentieth century, she remained the only woman known to have published a theory of colour. In contrast to Goethe and other colour theorists in the late 18th and early 19th century Gartside was less inclined to follow the anti-Newtonian attitudes of the Romantic movement

    The Mirror DBMS at TREC-8

    Get PDF
    The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking

    Bilingual episodic memory: an introduction

    Get PDF
    Our current models of bilingual memory are essentially accounts of semantic memory whose goal is to explain bilingual lexical access to underlying imagistic and conceptual referents. While this research has included episodic memory, it has focused largely on recall for words, phrases, and sentences in the service of understanding the structure of semantic memory. Building on the four papers in this special issue, this article focuses on larger units of episodic memory(from quotidian events with simple narrative form to complex autobiographical memories) in service of developing a model of bilingual episodic memory. This requires integrating theory and research on how culture-specific narrative traditions inform encoding and retrieval with theory and research on the relation between(monolingual) semantic and episodic memory(Schank, 1982; Schank & Abelson, 1995; Tulving, 2002). Then, taking a cue from memory-based text processing studies in psycholinguistics(McKoon & Ratcliff, 1998), we suggest that as language forms surface in the progressive retrieval of features of an event, they trigger further forms within the same language serving to guide a within-language/ within-culture retrieval

    Speech rhythm: a metaphor?

    Get PDF
    Is speech rhythmic? In the absence of evidence for a traditional view that languages strive to coordinate either syllables or stress-feet with regular time intervals, we consider the alternative that languages exhibit contrastive rhythm subsisting merely in the alternation of stronger and weaker elements. This is initially plausible, particularly for languages with a steep ‘prominence gradient’, i.e. a large disparity between stronger and weaker elements; but we point out that alternation is poorly achieved even by a ‘stress-timed’ language such as English, and, historically, languages have conspicuously failed to adopt simple phonological remedies that would ensure alternation. Languages seem more concerned to allow ‘syntagmatic contrast’ between successive units and to use durational effects to support linguistic functions than to facilitate rhythm. Furthermore, some languages (e.g. Tamil, Korean) lack the lexical prominence which would most straightforwardly underpin prominence alternation. We conclude that speech is not incontestibly rhythmic, and may even be antirhythmic. However, its linguistic structure and patterning allow the metaphorical extension of rhythm in varying degrees and in different ways depending on the language, and that it is this analogical process which allows speech to be matched to external rhythms

    Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies

    Full text link
    An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a binary top-down form of word clustering which employs an average class mutual information metric. Resulting classifications are hierarchical, allowing variable class granularity. Words are represented as structural tags --- unique nn-bit numbers the most significant bit-patterns of which incorporate class information. Access to a structural tag immediately provides access to all classification levels for the corresponding word. The classification system has successfully revealed some of the structure of English, from the phonemic to the semantic level. The system has been compared --- directly and indirectly --- with other recent word classification systems. Class based interpolated language models have been constructed to exploit the extra information supplied by the classifications and some experiments have shown that the new models improve model performance.Comment: 17 Page Paper. Self-extracting PostScript Fil

    A network model of interpersonal alignment in dialog

    Get PDF
    In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysi

    Designing Women: Essentializing Femininity in AI Linguistics

    Get PDF
    Since the eighties, feminists have considered technology a force capable of subverting sexism because of technology’s ability to produce unbiased logic. Most famously, Donna Haraway’s “A Cyborg Manifesto” posits that the cyborg has the inherent capability to transcend gender because of its removal from social construct and lack of loyalty to the natural world. But while humanoids and artificial intelligence have been imagined as inherently subversive to gender, current artificial intelligence perpetuates gender divides in labor and language as their programmers imbue them with traits considered “feminine.” A majority of 21st century AI and humanoids are programmed to fit female stereotypes as they fulfill emotional labor and perform pink-collar tasks, whether through roles as therapists, query-fillers, or companions. This paper examines four specific chat-based AI --ELIZA, XiaoIce, Sophia, and Erica-- and examines how their feminine linguistic patterns are used to maintain the illusion of emotional understanding in regards to the tasks that they perform. Overall, chat-based AI fails to subvert gender roles, as feminine AI are relegated to the realm of emotional intelligence and labor
    corecore