Search CORE

18,726 research outputs found

From corpus-based collocation frequencies to readability measure

Author: Anagnostou N.K.
Weir G.R.S.
Publication venue
Publication date: 01/06/2006
Field of study

This paper provides a broad overview of three separate but related areas of research. Firstly, corpus linguistics is a growing discipline that applies analytical results from large language corpora to a wide variety of problems in linguistics and related disciplines. Secondly, readability research, as the name suggests, seeks to understand what makes texts more or less comprehensible to readers, and aims to apply this understanding to issues such as text rating and matching of texts to readers. Thirdly, collocation is a language feature that occurs when particular words are used frequently together for other than purely grammatical reasons. The intersection of these three aspects provides the basis for on-going research within the Department of Computer and Information Sciences at the University of Strathclyde and is the motivation for this overview. Specifically, we aim through analysis of collocation frequencies in major corpora, to afford valuable insight on the content of texts, which we believe will, in turn, provide a novel basis for estimating text readability

University of Strathclyde Institutional Repository

Comparison of Wechsler Memory Scale–Fourth Edition (WMS–IV) and Third Edition (WMS–III) dimensional structures: Improved ability to evaluate auditory and visual constructs

Author: Hoelzle James B
Nelson Nathaniel W
Smith Clifford A
Publication venue: e-Publications@Marquette
Publication date: 01/01/2011
Field of study

Dimensional structures underlying the Wechsler Memory Scale–Fourth Edition (WMS–IV) and Wechsler Memory Scale–Third Edition (WMS–III) were compared to determine whether the revised measure has a more coherent and clinically relevant factor structure. Principal component analyses were conducted in normative samples reported in the respective technical manuals. Empirically supported procedures guided retention of dimensions. An invariant two-dimensional WMS–IV structure reflecting constructs of auditory learning/memory and visual attention/memory (C1 = .97; C2 = .96) is more theoretically coherent than the replicable, heterogeneous WMS–III dimension (C1 = .97). This research suggests that the WMS–IV may have greater utility in identifying lateralized memory dysfunction

epublications@Marquette

University of St. Thomas, Minnesota

Information extraction from template-generated hidden web documents

Author: Hedley Y.
James A.
Sanderson M.
Younas M.
Publication venue
Publication date: 01/01/2004
Field of study

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (such as Google and Yahoo). Databases dynamically generate a list of documents in response to a user query – which are referred to as Hidden Web databases. Such documents are typically presented to users as templategenerated Web pages. This paper presents a new approach that identifies Web page templates in order to extract queryrelated information from documents. We propose two forms of representation to analyse the content of a document – Text with Immediate Adjacent Tag Segments (TIATS) and Text with Neighbouring Adjacent Tag Segments (TNATS). Our techniques exploit tag structures that surround the textual contents of documents in order to detect Web page templates thereby extracting query-related information. Experimental results demonstrate that TNATS detects Web page templates most effectively and extracts information with high recall and precision

White Rose Research Online

Extraction of Keyphrases from Text: Evaluation of Four Algorithms

Author: Turney Peter
Publication venue
Publication date: 01/01/1997
Field of study

This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithms keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in Microsofts Word 97, (2) an algorithm based on Eric Brills part-of-speech tagger, (3) the Summarize feature in Veritys Search 97, and (4) NRCs Extractor algorithm. For all five document collections, NRCs Extractor yields the best match with the manually generated keyphrases

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive