455,424 research outputs found
Escaping the Trap of too Precise Topic Queries
At the very center of digital mathematics libraries lie controlled
vocabularies which qualify the {\it topic} of the documents. These topics are
used when submitting a document to a digital mathematics library and to perform
searches in a library. The latter are refined by the use of these topics as
they allow a precise classification of the mathematics area this document
addresses. However, there is a major risk that users employ too precise topics
to specify their queries: they may be employing a topic that is only "close-by"
but missing to match the right resource. We call this the {\it topic trap}.
Indeed, since 2009, this issue has appeared frequently on the i2geo.net
platform. Other mathematics portals experience the same phenomenon. An approach
to solve this issue is to introduce tolerance in the way queries are understood
by the user. In particular, the approach of including fuzzy matches but this
introduces noise which may prevent the user of understanding the function of
the search engine.
In this paper, we propose a way to escape the topic trap by employing the
navigation between related topics and the count of search results for each
topic. This supports the user in that search for close-by topics is a click
away from a previous search. This approach was realized with the i2geo search
engine and is described in detail where the relation of being {\it related} is
computed by employing textual analysis of the definitions of the concepts
fetched from the Wikipedia encyclopedia.Comment: 12 pages, Conference on Intelligent Computer Mathematics 2013 Bath,
U
Optimising metadata to make high-value content more accessible to Google users
Purpose: This paper shows how information in digital collections that have been catalogued using high-quality metadata can be retrieved more easily by users of search engines such as Google. Methodology/approach: The research and proposals described arose from an investigation into the observed phenomenon that pages from the Glasgow Digital Library (gdl.cdlr.strath.ac.uk) were regularly appearing near the top of Google search results shortly after publication, without any deliberate effort to achieve this. The reasons for this phenomenon are now well understood and are described in the second part of the paper. The first part provides context with a review of the impact of Google and a summary of recent initiatives by commercial publishers to make their content more visible to search engines. Findings/practical implications: The literature research provides firm evidence of a trend amongst publishers to ensure that their online content is indexed by Google, in recognition of its popularity with Internet users. The practical research demonstrates how search engine accessibility can be compatible with use of established collection management principles and high-quality metadata. Originality/value: The concept of data shoogling is introduced, involving some simple techniques for metadata optimisation. Details of its practical application are given, to illustrate how those working in academic, cultural and public-sector organisations could make their digital collections more easily accessible via search engines, without compromising any existing standards and practices
Challenges in distributed information search in a semantic digital library
Nowadays an enormous quantity of heterogeneous and distributed information is stored in the current digital
libraries. Access to these collections poses a serious challenge, however, because present search techniques
based on manually annotated metadata and linear replay of material selected by the user do not scale
effectively or efficiently to large collections. The artificial intelligent and semantic Web provides a common
framework that allows knowledge to be shared and reused. In this paper we propose a comprehensive
approach for discovering information objects in large digital collections based on analysis of recorded
semantic metadata in those objects and the application of expert system technologies. We suggest a
conceptual architecture for a semantic and intelligent search engine. OntoFAMA is a collaborative effort
that proposes a new form of interaction between people and Digital Library, where the latter is adapted to
individuals and their surroundings. We have used Case Based-Reasoning methodology to develop a
prototype for supporting efficient retrieval knowledge from digital library of Seville University
Intelligent information processing in a digital library using semantic web
With the explosive growth of information, it is
becoming increasingly difficult to retrieve the relevant
documents with current search engine only. The
information is treated as an ordinary database that
manages the contents and positions. To the individual
user, there is a great deal of useless information in
addition to the substantial amount of useful information.
This begets new challenges to docent community
and motivates researchers to look for intelligent
information retrieval approach and ontologies that
search and/or filter information automatically based on
some higher level of understanding are required. We
study improving the efficiency of search methods and
classify the search patrons into several models based on
the profiles of agent based on ontology.
We have proposed a method to efficiently search for
the target information on a Digital Library network with
multiple independent information sources. This paper
outlines the development of an expert prototype system
based in an ontology for retrieval information of the
Digital Library University of Seville. The results of this
study demonstrate that by improving representation by
incorporating more metadata from within the
information and the ontology into the retrieval process,
the effectiveness of the information retrieval is enhanced.
We used Jcolibri and Prótége for developing the
ontology and creation the expert system respectively
Integrated Framework for Discovering Digital Library Collections, An
Information seekers are generally on their own to discover and use a research library's growing array of digital collections, and coordination of these collections' development and maintenance is often not optimal. The frequent lack of a conscious design for how collections fit together is of equal concern because it means that research libraries are not making the most of the substantial investments they are making in digital initiatives. This paper proposes a framework for a research library's digital collections that offers integrated discovery and a set of best practices to underpin collection building, federated access, and sustainability. The framework's purpose is to give information seekers a powerful and easy way to search across existing and future collections and to retrieve integrated sets of results. The paper and its recommendations are based upon research undertaken by the author and a team of librarians and technologists at Cornell University Library. The team conducted structured interviews of forty-five library staff members involved in digital collection building at Cornell, studied an inventory of the library's more than fifty digital collections, and evaluated seven existing OAI and federated search production or prototype systems. Discusses the author's team's research and the rationale for their recommendations to: present a cohesive view of the library's digital collections for both browsing and searching at the object level; take a programmatic (rather than project-based) approach to digital collection building; require that all new digital collections conform to library-developed and agreed-upon OAI best practices for data providers; and implement organizational structures to sustain the library's digital collections over the long term
Classifying document types to enhance search and recommendations in digital libraries
In this paper, we address the problem of classifying documents available from
the global network of (open access) repositories according to their type. We
show that the metadata provided by repositories enabling us to distinguish
research papers, thesis and slides are missing in over 60% of cases. While
these metadata describing document types are useful in a variety of scenarios
ranging from research analytics to improving search and recommender (SR)
systems, this problem has not yet been sufficiently addressed in the context of
the repositories infrastructure. We have developed a new approach for
classifying document types using supervised machine learning based exclusively
on text specific features. We achieve 0.96 F1-score using the random forest and
Adaboost classifiers, which are the best performing models on our data. By
analysing the SR system logs of the CORE [1] digital library aggregator, we
show that users are an order of magnitude more likely to click on research
papers and thesis than on slides. This suggests that using document types as a
feature for ranking/filtering SR results in digital libraries has the potential
to improve user experience.Comment: 12 pages, 21st International Conference on Theory and Practise of
Digital Libraries (TPDL), 2017, Thessaloniki, Greec
Usability aspects of the inside-in approach for ancillary search tasks on the web
International audienceGiven the huge amount of data available over the Web nowadays, search engines become essential tools helping users to find the information they are looking for. Nonetheless, search engines often return large sets of results which must be filtered by the users to find the suitable information items. However, in many cases, filtering is not enough, as the results returned by the engine require users to perform a secondary search to complement the current information thus featuring ancillary search tasks. Such ancillary search tasks create a nested context for user tasks that increases the articulatory distance between the users and their ultimate goal. In this paper, we analyze the interplay between such ancillary searches and other primary search tasks on the Web. Moreover, we describe the inside-in approach, which aims at reducing the articulatory distance between interleaved tasks by allowing users to perform ancillary search tasks without losing the context. The inside-in approach is illustrated by means of a case study based on ancillary searches of coauthors in a digital library, using an information visualization technique
Searching for old news: User interests and behavior within a national collection
Modeling user interests helps to improve system support or refine recommendations in Interactive Information Retrieval. The aim of this study is to identify user interests in different parts of an online collection and investigate the related search behavior. To do this, we propose to use the metadata of selected facets and clicked documents as features for clustering sessions identified in user logs. We evaluate the session clusters by measuring their stability over a six-month period.
We apply our approach to data from the National Library of the Netherlands, a typical digital library with a richly annotated historical newspaper collection and a faceted search interface. Our results show that users interested in specific parts of the collection use different search techniques. We demonstrate that a metadata-based clustering helps to reveal and understand user interests in terms of the collection, and how search behavior is related to specific parts within the collection.</p
An examination of automatic video retrieval technology on access to the contents of an historical video archive
Purpose – This paper aims to provide an initial understanding of the constraints that historical video collections pose to video retrieval technology and the potential that online access offers to both archive and users.
Design/methodology/approach – A small and unique collection of videos on customs and folklore was used as a case study. Multiple methods were employed to investigate the effectiveness of technology and the modality of user access. Automatic keyframe extraction was tested on the visual content while the audio stream was used for automatic classification of speech and music clips. The user access (search vs browse) was assessed in a controlled user evaluation. A focus group and a survey provided insight on the actual use of the analogue archive. The results of these multiple studies were then compared and integrated (triangulation).
Findings – The amateur material challenged automatic techniques for video and audio indexing, thus suggesting that the technology must be tested against the material before deciding on a digitisation strategy. Two user interaction modalities, browsing vs searching, were tested in a user evaluation. Results show users preferred searching, but browsing becomes essential when the search engine fails in matching query and indexed words. Browsing was also valued for serendipitous discovery; however the organisation of the archive was judged cryptic and therefore of limited use. This indicates that the categorisation of an online archive should be thought of in terms of users who might not understand the current classification. The focus group and the survey showed clearly the advantage of online access even when the quality of the video surrogate is poor. The evidence gathered suggests that the creation of a digital version of a video archive requires a rethinking of the collection in terms of the new medium: a new archive should be specially designed to exploit the potential that the digital medium offers. Similarly, users' needs have to be considered before designing the digital library interface, as needs are likely to be different from those imagined.
Originality/value – This paper is the first attempt to understand the advantages offered and limitations held by video retrieval technology for small video archives like those often found in special collections
Identifying semantically similar arabic words using a large vocabulary speech recognition system
Users search digital libraries for book references using one or more attributes such as keywords, subject and author name. Some book titles might contain the keyword that the user specified and thus these titles will directly qualify as candidate results. On the other hand there are other titles that are relevant but do not contain the same exact search keyword. A user expects to retrieve all titles that are relevant to a specified keyword. Similarly when searching for an author name, the system should be able to retrieve the different forms of the name. The library science community developed a mechanism called authority control that allows the user to do a comprehensive search and retrieve all the records that are relevant to the query keyword. In this paper we propose an approach that allows the user to query an Arabic audio library using voice. We use a combination of class-based language models and robust interpretation to recognize and identify the spoken keywords. The mechanism uses a Large Vocabulary Recognition System (LVCSR) to implement the functionality of the authority control system. A series of experiments were performed to assess the accuracy and the robustness of the proposed approach: restricted grammar recognition with semantic interpretation, class-based statistical language models (CB_SLM) with robust interpretation, and generalized CB-SLM. The results have shown that the combination of CB-SLM and robust interpretation provides better accuracy and robustness than the traditional grammar-based parsing
- …