Search CORE

24 research outputs found

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Mixed-Language Arabic- English Information Retrieval

Author: Mustafa Ali Mohammed
Publication venue: Department of Computer Science
Publication date: 01/01/2013
Field of study

Includes abstract.Includes bibliographical references.This thesis attempts to address the problem of mixed querying in CLIR. It proposes mixed-language (language-aware) approaches in which mixed queries are used to retrieve most relevant documents, regardless of their languages. To achieve this goal, however, it is essential firstly to suppress the impact of most problems that are caused by the mixed-language feature in both queries and documents and which result in biasing the final ranked list. Therefore, a cross-lingual re-weighting model was developed. In this cross-lingual model, term frequency, document frequency and document length components in mixed queries are estimated and adjusted, regardless of languages, while at the same time the model considers the unique mixed-language features in queries and documents, such as co-occurring terms in two different languages. Furthermore, in mixed queries, non-technical terms (mostly those in non-English language) would likely overweight and skew the impact of those technical terms (mostly those in English) due to high document frequencies (and thus low weights) of the latter terms in their corresponding collection (mostly the English collection). Such phenomenon is caused by the dominance of the English language in scientific domains. Accordingly, this thesis also proposes reasonable re-weighted Inverse Document Frequency (IDF) so as to moderate the effect of overweighted terms in mixed queries

Cape Town University OpenUCT

Doctor of Philosophy

Author: Nguyen Thanh Hoang
Publication venue: University of Utah
Publication date: 01/05/2013
Field of study

dissertationThe explosion of structured Web data (e.g., online databases, Wikipedia infoboxes) creates many opportunities for integrating and querying these data that go far beyond the simple search capabilities provided by search engines. Although much work has been devoted to data integration in the database community, the Web brings new challenges: the Web-scale (e.g., the large and growing volume of data) and the heterogeneity in Web data. Because there are so much data, scalable techniques that require little or no manual intervention and that are robust to noisy data are needed. In this dissertation, we propose a new and effective approach for matching Web-form interfaces and for matching multilingual Wikipedia infoboxes. As a further step toward these problems, we propose a general prudent schema-matching framework that matches a large number of schemas effectively. Our comprehensive experiments for Web-form interfaces and Wikipedia infoboxes show that it can enable on-the-fly, automatic integration of large collections of structured Web data. Another problem we address in this dissertation is schema discovery. While existing integration approaches assume that the relevant data sources and their schemas have been identified in advance, schemas are not always available for structured Web data. Approaches exist that exploit information in Wikipedia to discover the entity types and their associate schemas. However, due to inconsistencies, sparseness, and noise from the community contribution, these approaches are error prone and require substantial human intervention. Given the schema heterogeneity in Wikipedia infoboxes, we developed a new approach that uses the structured information available in infoboxes to cluster similar infoboxes and infer the schemata for entity types. Our approach is unsupervised and resilient to the unpredictable skew in the entity class distribution. Our experiments, using over one hundred thousand infoboxes extracted from Wikipedia, indicate that our approach is effective and produces accurate schemata for Wikipedia entities

The University of Utah: J. Willard Marriott Digital Library

Meeting the Challenge of Media Preservation: Strategies and Solutions

Author: Indiana University Bloomington Media Preservation Initiative Task Force
Publication venue: 'Indiana University Press (Project Muse)'
Publication date: 01/09/2011
Field of study

IUScholarWorks (University of Indiana)

Semantic Interoperability in Digital Library Systems

Author: Doerr Martin
Golub Koraljka
Koch Traugott
Patel Manjula
Tsinaraki Chrisa
Publication venue: UKOLN, University of Bath
Publication date: 01/01/2005
Field of study

OPUS

Lund University Publications

Interim research assessment 2003-2005 - Computer Science

Author: Hartel P.H.
Mouthaan A.J.
Publication venue: Faculty of Electrical Engineering, Mathematics and Computer science, University of Twente
Publication date: 01/01/2007
Field of study

This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities

University of Twente Research Information

Research Self-Evaluation 2003-2008, Computer Science Department, University of Twente.

Author: Aksit Mehmet
Apers Peter M.G.
Hartel Pieter H.
Haverkort Boudewijn R.H.M.
Havinga Paul J.M.
Nijholt Antinus
Pras Aiko
Rensink Arend
van de Pol Jan Cornelis
van Sinderen Marten J.
Wieringa Roelf J.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2009
Field of study

University of Twente Research Information

Collaborative Knowledge Visualisation for Cross-Community Knowledge Exchange

Author: Novak Jasminko
Publication venue
Publication date: 06/11/2006
Field of study

The notion of communities as informal social networks based on shared interests or common practices has been increasingly used as an important unit of analysis when considering the processes of cooperative creation and sharing of knowledge. While knowledge exchange within communities has been extensively researched, different studies observed the importance of cross-community knowledge exchange for the creation of new knowledge and innovation in knowledge-intensive organizations. Especially in knowledge management a critical problem has become the need to support the cooperation and exchange of knowledge between different communities with highly specialized expertise and activities. Though several studies discuss the importance and difficulties of knowledge sharing across community boundaries, the development of technological support incorporating these findings has been little addressed. This work presents an approach to supporting cross-community knowledge exchange based on using knowledge visualisation for facilitating information access in unfamiliar community domains. The theoretical grounding and practical relevance of the proposed approach are ensured by defining a requirements model that integrates theoretical frameworks for cross-community knowledge exchange with practical needs of typical knowledge management processes and sensemaking tasks in information access in unfamiliar domains. This synthesis suggests that visualising knowledge structures of communities and supporting the discovery of relationships between them during access to community spaces, could provide valuable support for cross-community discovery and sharing of knowledge. This is the main hypothesis investigated in this thesis. Accordingly, a novel method is developed for eliciting and visualising implicit knowledge structures of individuals and communities in form of dynamic knowledge maps that make the elicited knowledge usable for semantic exploration and navigation of community spaces. The method allows unobtrusive construction of personal and community knowledge maps based on user interaction with information and their use for dynamic classification of information from a specific point of view. The visualisation model combines Document Maps presenting main topics, document clusters and relationships between knowledge reflected in community spaces with Concept Maps visualising personal and shared conceptual structures of community members. The technical realization integrates Kohonen’s self-organizing maps with extraction of word categories from texts, collaborative indexing and personalised classification based on user-induced templates. This is accompanied by intuitive visualisation and interaction with complex information spaces based on multi-view navigation of document landscapes and concept networks. The developed method is prototypically implemented in form of an application framework, a concrete system and a visual information interface for multi-perspective access to community information spaces, the Knowledge Explorer. The application framework implements services for generating and using personal and community knowledge maps to support explicit and implicit knowledge exchange between members of different communities. The Knowledge Explorer allows simultaneous visualisation of different personal and community knowledge structures and enables their use for structuring, exploring and navigating community information spaces from different points of view. The empirical evaluation in a comparative laboratory study confirms the adequacy of the developed solutions with respect to specific requirements of the cross-community problem and demonstrates much better quality of knowledge access compared to a standard information seeking reference system. The developed evaluation framework and operative measures for quality of knowledge access in cross-community contexts also provide a theoretically grounded and practically feasible method for further developing and evaluating new solutions addressing this important but little investigated problem

Duisburg-Essen Publications Online

Corpus Linguistics software:Understanding their usages and delivering two new tools

Author: Rodrigues Gomide Andressa
Publication venue: Lancaster University
Publication date: 01/01/2020
Field of study

The increasing availability of computers to ordinary users in the last few decades has led to an exponential increase in the use of Corpus Linguistics (CL) methodologies. The people exploring this data come from a variety of backgrounds and, in many cases, are not proficient corpus linguists. Despite the ongoing development of new tools, there is still an immense gap between what CL can offer and what is currently being done by researchers. This study has two outcomes. It (a) identifies the gap between potential and actual uses of CL methods and tools, and (b) enhances the usability of CL software and complement statistical application through the use of data visualization and user-friendly interfaces. The first outcome is achieved through (i) an investigation of how CL methods are reported in academic publications; (ii) a systematic observation of users of CL software as they engage in the routine tasks; and (iii) a review of four well-established pieces of software used for corpus exploration. Based on the findings, two new statistical tools for CL studies with high usability were developed and implemented on to an existing system, CQPweb. The Advanced Dispersion tool allows users to graphically explore how queries are distributed in a corpus, which makes it easier for users to understand the concept of dispersion. The tool also provides accurate dispersion measures. The Parlink Tool was designed having as its primary target audience beginners with interest in translations studies and second language education. The tool’s primary function is to make it easier for users to see possible translations for corpus queries in the parallel concordances, without the need to use external resources, such as translation memories

Lancaster E-Prints