Search CORE

24,223 research outputs found

Exploratory Analysis of Highly Heterogeneous Document Collections

Author: Blei D. M.
Bun K. K.
Maiya A. S.
Manning C. D.
Mihalcea R.
Pecina P.
Ranganathan S. R.
Wagstaff K.
Publication venue
Publication date: 01/01/2013
Field of study

We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

arXiv.org e-Print Archive

CiteSeerX

Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

Author: Oxnard L.
Evans A.
Publication venue: School of Geography
Publication date: 01/01/2003
Field of study

Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

GP perspectives on hospital discharge letters : an interview and focus group study

Author: Dale Jeremy
Schnurr Stephanie
Scott Emma
Spencer Rachel
Weetman Katharine
Publication venue: 'Royal College of General Practitioners'
Publication date: 23/06/2020
Field of study

Background: Written discharge communication following inpatient or outpatient clinic discharge is essential for communicating information to the GP, but GPs’ opinions on discharge communication are seldom sought. Patients are sometimes copied into this communication, but the reasons for this variation, and the resultant effects, remain unclear. Aim: To explore GP perspectives on how discharge letters can be improved in order to enhance patient outcomes. Design & setting: The study used narrative interviews with 26 GPs from 13 GP practices within the West Midlands, England. Method: Interviews were transcribed and data were analysed using corpus linguistics (CL) techniques. Results Elements pivotal to a successful letter were: diagnosis, appropriate follow-up plan, medication changes and reasons, clinical summary, investigations and/or procedures and outcomes, and what information has been given to the patient. GPs supported patients receiving discharge letters and expounded a number of benefits of this practice; for example, increased patient autonomy. Nevertheless, GPs felt that if patients are to receive direct discharge letter copies, modifications such as use of lay language and avoidance of acronyms may be required to increase patient understanding. Conclusion: GPs reported that discharge letters frequently lacked content items they assessed to be important; GPs highlighted that this can have subsequent ramifications on resources and patient experiences. Templates should be devised that put discharge letter elements assessed to be important by GPs to the forefront. Future research needs to consider other perspectives on letter content, particularly those of patients

Warwick Research Archives Portal Repository

Climate Services for Resilient Development (CSRD) Partnership’s work in Latin America

Author: International Center for Tropical Agriculture
Publication venue: International Center for Tropical Agriculture
Publication date: 27/03/2020
Field of study

The Climate Services for Resilient Development (CSRD) Partnership is a private-public collaboration led by USAID, which aims to increase resilience to climate change in developing countries through the development and dissemination of climate services. The partnership began with initial projects in three countries: Colombia, Ethiopia, and Bangladesh. The International Center for Tropical Agriculture (CIAT) was the lead organization for the Colombian CSRD efforts – which then expanded to encompass work in the whole Latin American region

XML content warehousing: Improving sociological studies of mailing lists and web data

Author: Colazzo Dario
Dudouet François-Xavier
Manolescu Ioana
Nguyen Benjamin
Senellart Pierre
Vion Antoine
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

arXiv.org e-Print Archive

HAL-CentraleSupelec

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Evaluative Language and Its Solidarity-Building Role on TED.com: An Appraisal and Corpus Analysis

Author: Drasovean Anda
Tagg Caroline
Publication venue
Publication date: 01/01/2015
Field of study

Language is a key resource in the formation of online communities, which are in turn central to an understanding of contemporary social relations. This study looks at TED.com, an educational video-hosting platform with few in-built community-building functionalities, to explore the potential for users to affiliate through their language choices. Grounded in Systemic Functional Linguistics, the study uses the Appraisal framework, extended using corpus linguistic methods, in order to analyse users’ reactions to TED videos. The study shows that online participants use evaluative language to align with certain ideas and, based on these affinities, form affiliations characterized by sociability and solidarity. These affiliations raise important questions about the conception of ‘community’ in twenty-first century society