3,242 research outputs found
Text Extraction and Web Searching in a Non-Latin Language
Recent studies of queries submitted to Internet Search Engines have shown that
non-English queries and unclassifiable queries have nearly tripled during the
last decade. Most search engines were originally engineered for English. They
do not take full account of inflectional semantics nor, for example, diacritics or
the use of capitals which is a common feature in languages other than English.
The literature concludes that searching using non-English and non-Latin based
queries results in lower success and requires additional user effort to achieve
acceptable precision.
The primary aim of this research study is to develop an evaluation methodology
for identifying the shortcomings and measuring the effectiveness of
search engines with non-English queries. It also proposes a number of solutions
for the existing situation. A Greek query log is analyzed considering the morphological
features of the Greek language. Also a text extraction experiment
revealed some problems related to the encoding and the morphological and
grammatical differences among semantically equivalent Greek terms. A first
stopword list for Greek based on a domain independent collection has been
produced and its application in Web searching has been studied. The effect of
lemmatization of query terms and the factors influencing text based image retrieval
in Greek are also studied. Finally, an instructional strategy is presented
for teaching non-English students how to effectively utilize search engines.
The evaluation of the capabilities of the search engines showed that international
and nationwide search engines ignore most of the linguistic idiosyncrasies
of Greek and other complex European languages. There is a lack of
freely available non-English resources to work with (test corpus, linguistic resources,
etc). The research showed that the application of standard IR techniques,
such as stopword removal, stemming, lemmatization and query expansion,
in Greek Web searching increases precision.
i
B!SON: A Tool for Open Access Journal Recommendation
Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, fundersâ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project
The Information-seeking Strategies of Humanities Scholars Using Resources in Languages Other Than English
ABSTRACT
THE INFORMATION-SEEKING STRATEGIES OF HUMANITIES SCHOLARS
USING RESOURCES IN LANGUAGES OTHER THAN ENGLISH
by
Carol Sabbar
The University of Wisconsin-Milwaukee, 2016
Under the Supervision of Dr. Iris Xie
This dissertation explores the information-seeking strategies used by scholars in the humanities who rely on resources in languages other than English. It investigates not only the strategies they choose but also the shifts that they make among strategies and the role that language, culture, and geography play in the information-seeking context. The study used purposive sampling to engage 40 human subjects, all of whom are post-doctoral humanities scholars based in the United States who conduct research in a variety of languages. Data were collected through semi-structured interviews and research diaries in order to answer three research questions: What information-seeking strategies are used by scholars conducting research in languages other than English? What shifts do scholars make among strategies in routine, disruptive, and/or problematic situations? And In what ways do language, culture, and geography play a role in the information-seeking context, especially in the problematic situations? The data were then analyzed using grounded theory and the constant comparative method. A new conceptual model â the information triangle â was used and is presented in this dissertation to categorize and visually map the strategies and shifts. Based on data collected, thirty distinct strategies were identified and divided into four categories: formal system, informal resource, interactive human, and hybrid strategies. Three types of shifts were considered: planned, opportunistic, and alternative. Finally, factors related to language, culture, and geography were identified and analyzed according to their roles in the information-seeking context. This study is the first of its kind to combine the study of information-seeking behaviors with the factors of language, culture, and geography, and as such, it presents numerous methodological and practical implications along with many opportunities for future research
Semantic discovery and reuse of business process patterns
Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse
The development of a model of information seeking behaviour of students in higher education when using internet search engines.
This thesis develops a model of Web information seeking behaviour of postgraduate students with a specific focus on Web search engines' use. It extends Marchionini's eight stage model of information seeking, geared towards electronic environments, to holistically encompass the physical, cognitive, affective and social dimensions of Web users' behaviour. The study recognises the uniqueness of the Web environment as a vehicle for information dissemination and retrieval, drawing on the distinction between information searching and information seeking and emphasises the importance of following user-centred holistic approaches to study information seeking behaviour. It reviews the research in the field and demonstrates that there is no comprehensive model that explains the behaviour of Web users when employing search engines for information retrieval. The methods followed to develop the study are explained with a detailed analysis of the four dimensions of information seeking (physical, cognitive affective, social). Emphasis is placed on the significance of combined methods (qualitative and quantitative) and the ways in which they can enrich the examination of human behaviour. This is concluded with a discussion of methodological issues. The study is supported by an empirical investigation, which examines the relationship between interactive information retrieval using Web search engines and human information seeking processes. This investigates the influence of cognitive elements (such as learning and problem style, and creative ability) and affective characteristics (e. g. confidence, loyalty, familiarity, ease of use), as well as the role that system experience, domain knowledge and demographics play in information seeking behaviour and in user overall satisfaction with the retrieval result. The influence of these factors is analysed by identifying users' patterns of behaviour and tactics, adopted to solve specific problems. The findings of the empirical study are incorporated into an enriched information-seeking model, encompassing use of search engines, which reveals a complex interplay between physical, cognitive, affective and social elements and that none of these characteristics can be seen in isolation when attempting to explain the complex phenomenon of information seeking behaviour. Although the model is presented in a linear fashion the dynamic, reiterative and circular character of the information seeking process is explained through an emphasis on transition patterns between the different stages. The research concludes with a discussion of problems encountered by Web information seekers which provides detailed analysis of the reasons why users express satisfaction or dissatisfaction with the results of Web searching, areas in which Web search engines can be improved and issues related to the need for students to be given additional training and support are identified. These include planning and organising information, recognising different dimensions of information intents and needs, emphasising the importance of variety in Web information seeking, promoting effective formulation of queries and ranking, reducing overload of information and assisting effective selection of Web sites and critical examination of results
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
Visual information and knowledge representation in organisations
The construction industryâs environment is continually changing. Employees are now more
geographically widespread and diverse, both culturally and educationally, than ever before. A
great deal of research has been carried out on knowledge acquisition and storage, but there is
still a distinct lack of research into knowledge presentation and communication. Information
and knowledge presentation play a significant role in daily decision-making processes, when
inappropriate decisions may result from inaccurate or poorly communicated information. The
simplified, filtered coherent presentation of explicit knowledge can be instrumental to a
successful, profitable and safety conscious business.
Wates Construction is a major construction company and employs around 1300 people
directly, as well as various subcontractors on different projects. Their current turn over is
around ÂŁ1billion, they are based in the UK and have branches in Ireland and Abu Dhabi.
Wates realised their existing information system was inefficiently conveying information to
its employees and the need to provide a simplified system, to assist staffâs decision-making
processes. Earlier IT professionalsâ attempts to make the system more usable had made no
significant difference to its performance
Recommended from our members
A user-centred approach to information retrieval
A user model is a fundamental component in user-centred information retrieval systems. It enables personalization of a user's search experience. The development of such a model involves three phases: collecting information about each user, representing such information, and integrating the model into a retrieval application. Progress in this area is typically met with privacy and scalability challenges that hinder the ability to synthesize collective knowledge from each user's search behaviour. In this thesis, I propose a framework that addresses each of these three phases. The proposed framework is based on social role theory from the social science literature and at the centre of this theory is the concept of a social position. A social position is a label for a group of users with similar behavioural patterns. Examples of such positions are traveller, patient, movie fan, and computer scientist. In this thesis, a social position acts as a label for users who are expected to have similar interests. The proposed framework does not require real users' data; rather it uses the web as a resource to model users.
The proposed framework offers a data-driven and modular design for each of the three phases of building a user model. First, I present an approach to identify social positions from natural language sentences. I formulate this task as a binary classification task and develop a method to enumerate candidate social positions. The proposed classifier achieves an accuracy score of 85.8%, which indicates that social positions can be identified with good accuracy. Through an inter-annotator agreement study, I further show a reasonable level of agreement between users when identifying social positions.
Second, I introduce a novel topic modelling-based approach to represent each social position as a multinomial distribution over words. This approach estimates a topic from a document collection for each position. To construct such a collection for a particular position, I propose a seeding algorithm that extracts a set of terms relevant to the social position. Coherence-based evaluation shows that the proposed approach learns significantly more coherent representations when compared with a relevance modelling baseline.
Third, I present a diversification approach based on the proposed framework. Diversification algorithms aim to return a result list for a search query that would potentially satisfy users with diverse information needs. I propose to identify social positions that are relevant to a search query. These positions act as an implicit representation of the many possible interpretations of the search query. Then, relevant positions are provided to a diversification technique that proportionally diversifies results based on each social position's importance. I evaluate my approach using four test collections provided by the diversity task of the Text REtrieval Conference (TREC) web tracks for 2009, 2010, 2011, and 2012. Results demonstrate that my proposed diversification approach is effective and provides statistically significant improvements over various implicit diversification approaches.
Fourth, I introduce a session-based search system under the framework of learning to rank. Such a system aims to improve the retrieval performance for a search query using previous user interactions during the search session. I present a method to match a search session to its most relevant social positions based on the session's interaction data. I then suggest identifying related sessions from query logs that are likely to be issued by users with similar information needs. Novel learning features are then estimated from the session's social positions, related sessions, and interaction data. I evaluate the proposed system using four test collections from the TREC session track. This approach achieves state-of-the-art results compared with effective session-based search systems. I demonstrate that such a strong performance is mainly attributed to features that are derived from social positions' data
- âŠ