147 research outputs found

    Description and Necessity: Towards a cognitive science of work meaning

    Get PDF

    An investigation of the role of context in retrieval of information from semantic memory

    Get PDF
    The research reported here examines how a person' s knowledge of the world is used in language recognition and production. Essentially it is concerned with the importance of a word's meaning as a factor in its recognition by a listener or reader and is its production by a speaker or writer. This area of research overlays with a great many areas in psychology, drawing upon research in attention, pattern recognition, memory, psycholinguistics and thought . It is necessary to give some working definitions of the terms used. The definition of semantic memory used here is that supplied by Tulving (1972, p 386): "Semantic memory is the memory necessary for the use of a language. It is a mental thesaurus, the organized knowledge a person possesses about words and other verbal symbols, their meanings and referents, about relations among them and about rules, formulas and algorithms for the manipulation of these symbols, concepts and relations." The contents of semantic memory are typically what a person would say that he "knows" rather than what he "remembers". e . g. a person might say "I know canaries are yellow" whereas "I remember canaries are yellow" would not "sound right" to most native English speakers. This also illustrates an important property of semantic memory. The knowledge it contains is to a large extent common to members of a given culture. There will of course be individual differences but a sufficient body of knowledge will be shared in order to allow communication between persons. Retrieval from semantic memory is used here to refer to any process that involves making use of such stored knowledge. This may range from simply deciding that a particular sound pattern has occurred in speech before to verifying complex propositions. Context is restricted here to linguistic context. The question asked is how information provided by previous linguistic input affects processing of later input or output of language. The view of language comprehension taken here is similar to Goodman's (1967) approach to reading. This approach is described as follows: " ... Reading is a psycholinguistic guessing game. It involves an interaction between thought and language. Efficient reading does not result from precise perceptions and identification of all elements but from skill in selecting the fewest, most productive cues necessary to produce guesses which are right first time. The ability to anticipate that which has not been seen, of course, is vital in reading, just as the ability to anticipate what has not yet been heard is vital in listening." (p 260) It is assumed here that a person's ability to anticipate is dependent upon the knowledge stored in semantic memory. The way this knowledge is used will in turn depend upon how it is organized. Since the Ancient Greeks the importance of organization in memory has been recognized but it is only relatively recently that psychologists have attempted to determine the principles underlying this organization. Since Quillian (1966) a number of models of how semantic memory is organized have been proposed. These will be discussed in the following sections. Many of the experiments reported here are concerned with what might be called "micro-context", that is how individual words, phrases and sentences affect recognition of incoming stimuli. Of course, the use of context goes far beyond the immediately preceding input but as yet there are no satisfactory theories, linguistic or psychological, that can deal with these wider aspects of language use. In fact there is still considerable disagreement over the processes involved in the recognition of single words, (see, for example, Rubenstein, Lewis and Rubenstein, 1971; Baron, 1973) . The approach taken to word recognition here is similar to Norman (1968) and Morton (1969). The notion which is central to both these authors and Goodman (see above) is the realization that no process can be analysed in isolation. The language system cannot decode the incoming sensory information without reference to stored knowledge. As Norman (1969, p 3) describes the role of memory, "it provides the information about the past necessary for proper understanding of the present". Thus context indicates to the memory system what knowledge is relevant to the analysis of the current input. To summarize this approach the information provided by context (immediate past) is referred to semantic memory (past) which in turn helps to produce the best guess as to the nature of the current sensory input (present) or even the nature of input which has not yet arrived (future) . The problem examined in this research is how the organizational structure of knowledge in semantic memory influences this guessing process . Whether such guessing is an active process as suggested by some investigators (e . g. Liberman, Stevens and Halle) or a passive process suggested by others (e . g. Morton, Treisman) will be discussed in a later section

    Neural Representations of Concepts and Texts for Biomedical Information Retrieval

    Get PDF
    Information retrieval (IR) methods are an indispensable tool in the current landscape of exponentially increasing textual data, especially on the Web. A typical IR task involves fetching and ranking a set of documents (from a large corpus) in terms of relevance to a user\u27s query, which is often expressed as a short phrase. IR methods are the backbone of modern search engines where additional system-level aspects including fault tolerance, scale, user interfaces, and session maintenance are also addressed. In addition to fetching documents, modern search systems may also identify snippets within the documents that are potentially most relevant to the input query. Furthermore, current systems may also maintain preprocessed structured knowledge derived from textual data as so called knowledge graphs, so certain types of queries that are posed as questions can be parsed as such; a response can be an output of one or more named entities instead of a ranked list of documents (e.g., what diseases are associated with EGFR mutations? ). This refined setup is often termed as question answering (QA) in the IR and natural language processing (NLP) communities. In biomedicine and healthcare, specialized corpora are often at play including research articles by scientists, clinical notes generated by healthcare professionals, consumer forums for specific conditions (e.g., cancer survivors network), and clinical trial protocols (e.g., www.clinicaltrials.gov). Biomedical IR is specialized given the types of queries and the variations in the texts are different from that of general Web documents. For example, scientific articles are more formal with longer sentences but clinical notes tend to have less grammatical conformity and are rife with abbreviations. There is also a mismatch between the vocabulary of consumers and the lingo of domain experts and professionals. Queries are also different and can range from simple phrases (e.g., COVID-19 symptoms ) to more complex implicitly fielded queries (e.g., chemotherapy regimens for stage IV lung cancer patients with ALK mutations ). Hence, developing methods for different configurations (corpus, query type, user type) needs more deliberate attention in biomedical IR. Representations of documents and queries are at the core of IR methods and retrieval methodology involves coming up with these representations and matching queries with documents based on them. Traditional IR systems follow the approach of keyword based indexing of documents (the so called inverted index) and matching query phrases against the document index. It is not difficult to see that this keyword based matching ignores the semantics of texts (synonymy at the lexeme level and entailment at phrase/clause/sentence levels) and this has lead to dimensionality reduction methods such as latent semantic indexing that generally have scale-related concerns; such methods also do not address similarity at the sentence level. Since the resurgence of neural network methods in NLP, the IR field has also moved to incorporate advances in neural networks into current IR methods. This dissertation presents four specific methodological efforts toward improving biomedical IR. Neural methods always begin with dense embeddings for words and concepts to overcome the limitations of one-hot encoding in traditional NLP/IR. In the first effort, we present a new neural pre-training approach to jointly learn word and concept embeddings for downstream use in applications. In the second study, we present a joint neural model for two essential subtasks of information extraction (IE): named entity recognition (NER) and entity normalization (EN). Our method detects biomedical concept phrases in texts and links them to the corresponding semantic types and entity codes. These first two studies provide essential tools to model textual representations as compositions of both surface forms (lexical units) and high level concepts with potential downstream use in QA. In the third effort, we present a document reranking model that can help surface documents that are likely to contain answers (e.g, factoids, lists) to a question in a QA task. The model is essentially a sentence matching neural network that learns the relevance of a candidate answer sentence to the given question parametrized with a bilinear map. In the fourth effort, we present another document reranking approach that is tailored for precision medicine use-cases. It combines neural query-document matching and faceted text summarization. The main distinction of this effort from previous efforts is to pivot from a query manipulation setup to transforming candidate documents into pseudo-queries via neural text summarization. Overall, our contributions constitute nontrivial advances in biomedical IR using neural representations of concepts and texts

    Linguistic analysis of modality - with special reference to English and German

    Get PDF

    Explanations in the study of child language development

    Get PDF

    Cognitive Underpinnings of Focus on Form

    Get PDF
    The purpose of this paper is to examine focus on form in cognitive processing terms by postulating plausible, psychologically real, cognitive correlates for a range of L2 learning processes that have become prevalent in the instructed second language acquisition (SLA) literature. Progress in adult SLA is thought often to depend crucially upon cognitive processes such as paying attention to features of target input' noticing interlocutor reactions to interlanguage output' and making insightful comparisons involving differences between input and output utterance details- To be effective' these cognitive comparisons must be carried out under certain conditions of processing meaning, forms, and function, i.e., conditions which promoteprocessingfor language learning. Whereas pedagogically oriented discussions of issues-such as noticing the gap and L2 processing-abound, psycholinguistically motivated rationales for pedagogical recommendations are still rare

    Estate Tamil: a morphosyntactic study

    Get PDF

    A Dual-Route Approach to Orthographic Processing

    Get PDF
    In the present theoretical note we examine how different learning constraints, thought to be involved in optimizing the mapping of print to meaning during reading acquisition, might shape the nature of the orthographic code involved in skilled reading. On the one hand, optimization is hypothesized to involve selecting combinations of letters that are the most informative with respect to word identity (diagnosticity constraint), and on the other hand to involve the detection of letter combinations that correspond to pre-existing sublexical phonological and morphological representations (chunking constraint). These two constraints give rise to two different kinds of prelexical orthographic code, a coarse-grained and a fine-grained code, associated with the two routes of a dual-route architecture. Processing along the coarse-grained route optimizes fast access to semantics by using minimal subsets of letters that maximize information with respect to word identity, while coding for approximate within-word letter position independently of letter contiguity. Processing along the fined-grained route, on the other hand, is sensitive to the precise ordering of letters, as well as to position with respect to word beginnings and endings. This enables the chunking of frequently co-occurring contiguous letter combinations that form relevant units for morpho-orthographic processing (prefixes and suffixes) and for the sublexical translation of print to sound (multi-letter graphemes)

    Syntactic and Semantic Patterns of Domain-specific Multiword Units in Marine Accident Investigation Reports

    Get PDF
    The present study is a systematic corpus-based investigation of the domain-specific multiword units (henceforth MWUs) in marine accident investigation reports (henceforth MAIR), with a view to characterizing their most prominent syntactic, semantic and functional features. To achieve these principal objectives, the target MWUs were first identified by applying a new approach, which incorporates the notion of ‘meaning’ into statistical-based measures. This method ensures the domain-specific MWU extraction to the largest extent and provides valid data for the subsequent analysis. Through proposing a three-dimensional analytical framework, this study has obtained the following findings: First, the domain-specific MWUs are largely composed of two-word sequences, while the occurrences of 4- and 5-word MWUs are relatively rare. Among all the target MWUs, only 1.10% of the expressions occur very commonly within the genre (˚1,000 times). By contrast, the majority of the expressions (70.97%) occur with the frequency less than 100 times. The skewed distribution indicates that MAIR genre tends to employ a wide variety of domain-specific MWUs rather than repetition of a small number of common expressions. Second, in terms of the syntactic features of the domain-specific MWUs, NP structure is the most commonly employed grammatical type. The abundant use of this structure implies that the domain-specific meaning of MAIR genre is largely carried in the nominal group. Apart from NP structure, there is also a marked prevalence of VP structures among the domain-specific MWUs in MAIR genre and these MWUs present structural variation. Of all the VP-based patterns, the ‘verb phrase with active verb’ pattern stands out since it incorporates a large number of action verbs, which are used to describe the actions done by people. The wide use of these phrases implies that MAIR genre tends to highlight the people’s roles during the accidents, with particular attention to the information about what or who caused or performed the activity. Similarly, PP structures were also frequently adopted by the domain-specific MWUs, especially the pattern beginning with preposition of. This pattern was mostly used to specify possessions. It thus can be inferred that the information that provided in MAIR genre tends to be concrete and specific. Third, by conducting a functional analysis of the target MWUs, it was found that the primary function of the domain-specific MWUs is to express referential meanings and contribute to the thematic development. Furthermore, due to their multifunctional nature, some referential MWUs also perform the function of stance and discourse organizing. When expressing stance, most MWUs express impersonal epistemic stance, with the purpose of minimizing the imposition of the reporters’ opinions. Other word sequences appear to be deontic in nature, as they are mainly realized by the MWUs incorporating with require or modal verbs. The primary function of these MWUs is to set out the obligations and issue suggestions for the agents according to certain norms and regulations. When functioning as discourse organizer, the domain-specific MWUs usually adopt the pattern of ‘that-clause controlled by main verbs in active voice’ to introduce the topics. Unlikely, when using for elaborating the topics, they tend to clarify the logical relationships, especially the causative-resultative relation, rather than providing additional information in MAIR genre. Fourth, the distinctive semantic features of the domain-specific MWUs can be best reflected when these MWUs perform the functions of activity identification and specification. For instance, most domain-specific MWUs used for describing activities are of general nature, but they convey specialized meaning in MAIR genre. Similarly, when domain-specific MWUs are used to provide tangible or intangible frames for specifying certain attributes, the use of these MWUs in MAIR genre is significantly deviant from their use in general English register. In all, by gaining insights into the salient features of the domain-specific MWUs in MAIR genre, the present study may make contributions and implications in the following aspects: the construction of extraction method for domain-specific MWUs, the compilation of maritime-specific MWU list, the teaching and learning of maritime English, especially the maritime-specific MWUs, and providing reference for writing MAIR to the experts who are from non-native English speaking countries.Abstract i List of Tables v List of Figures vii Chapter 1 Introduction 1 1.1. Background of this study 1 1.2. Objectives of this study 3 1.3. Significance of this study 4 1.4. Terminological issues 5 1.5. Organization of this dissertation 6 Chapter 2 Theoretical background 8 2.1. Understanding the notions of phraseology 8 2.2.1. An overview of influential notions of phraseology 9 2.1.2. Parameters of defining MWUs 13 2.1.3. Operational definition of MWUs 17 2.1.4. An overview of influential taxonomy of phraseology 19 2.2. Theoretical discussion of MWUs 23 2.2.1. Theoretical framework of this study 23 2.2.2. Nature of multiword units 25 2.2.3. Previous studies of phraseology 29 Chapter 3 Analytical framework and research design 37 3.1. Analytical framework 37 3.1.1 Analytical framework for syntactic features of domain-specific MWUs 38 3.1.2. Analytical framework for semantic features of domain-specific MWUs 40 3.1.3. Analytical framework for functional features of domain-specificMWUs 42 3.2. Research questions 43 3.3. Corpora used in this study 44 3.3.1. Corpus of Marine Accident Investigation Reports (COMAIR) 44 3.3.2. British National Corpus Baby (BNC Baby) 47 3.4. Tools and procedures for data analysis 48 3.4.1. Tools for data processing 48 3.4.2. Procedures for data analysis 49 3.4.3. Inter-rater reliability 50 3.5. Summary 51 Chapter 4 Identification of domain-specific MWUs in the COMAIR 52 4.1. Current approaches to MWU extraction 52 4.2. My proposed approach to domain-specific MWU extraction 53 4.3. The detailed process of domain-specific MWU extraction 55 4.3.1. Step 1: N-gram retrieval 55 4.3.2. Step 2: Keyword-gram extraction 56 4.3.3. Step 3: Measuring the association strength of keyword-grams 58 4.3.4. Step 4: Filtering out process 66 4.3.5. Step 5: Domain-specific MWU identification 70 Chapter 5 Frequency distributions and syntactic features of domain-specific MWUs 72 5.1. Frequency distributions of domain-specific MWUs 72 5.1.1. Frequency distributions of domain-specific MWUs in various lengths 72 5.1.2. Overall frequency distribution across different frequency bands 74 5.2. Syntactic features of domain-specific MWUs 76 Chapter 6 Functional and semantic features of domain-specific MWUs 80 6.1. Distributions across primary discourse functions 80 6.2. Multiple functioning 82 6.3. Stance MWUs 84 6.3.1. Notion of stance MWUs 84 6.3.2. Stance MWUs in COMAIR 84 6.4. Discourse organizing MWUs 90 6.4.1. Notion of discourse organizing MWUs 90 6.4.2. Discourse organizing MWUs in COMAIR 90 6.5. Referential MWUs 96 6.5.1. Notion of referential MWUs 97 6.5.2. Referential MWUs in COMAIR 97 6.6. Summary 112 Chapter 7 Conclusions and implications 113 7.1. Summary of the major findings 113 7.2. Implications of this study 116 7.3. Limitations of this study 117 References 118 Appendix 132Docto
    corecore