11 research outputs found

    Full Text and Figure Display Improves Bioscience Literature Search

    Get PDF
    When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine [1], a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface

    Effects of Summary Length and Line Spacing on Fixations, Decision Time, Correctness, and Preference of Search Engine Results on a Phablet

    Get PDF
    In previous studies, a positive relationship has been suggested between the screen size of a mobile device and the preferred summary length of a search result. The bigger the screen, the longer the summary preferred for judging the relevance of a result. While prior research has been focused on three types of devices (cell phones, PDAs, laptops), this study was concentrated on a new class of smartphone called a phablet that could eventually replace all three. In the current research, we investigated how two factors in the design of search result pages—summary length and line spacing—affect performance, behavioral and subjective measures on an information-seeking task executed on a phablet. We examined the effects of summary length (1, 3, 7, 10 lines) and line spacing (single, one and a half, double) on fixations, decision time, correctness, and preference. A direct relationship between summary length, fixations and decision time was found: as summary length increased, fixations and decision time also increased. No relationship between summary length and decision correctness was found. The optimal summary length for effectively judging the relevance of a search result—the one requiring the fewest fixations and shortest decision time—is one line. Because participants did not prefer one-line summaries, it is best to show three lines. As such, three-line summaries suggest a minimal tradeoff between performance and preference

    Enhanced web-based summary generation for search.

    Get PDF
    After a user types in a search query on a major search engine, they are presented with a number of search results. Each search result is made up of a title, brief text summary and a URL. It is then the user\u27s job to select documents for further review. Our research aims to improve the accuracy of users selecting relevant documents by improving the way these web pages are summarized. Improvements in accuracy will lead to time improvements and user experience improvements. We propose ReClose, a system for generating web document summaries. ReClose generates summary content through combining summarization techniques from query-biased and query-independent summary generation. Query-biased summaries generally provide query terms in context. Query-independent summaries focus on summarizing documents as a whole. Combining these summary techniques led to a 10% improvement in user decision making over Google generated summaries. Color-coded ReClose summaries provide keyword usage depth at a glance and also alert users to topic departures. Color-coding further enhanced ReClose results and led to a 20% improvement in user decision making over Google generated summaries. Many online documents include structure and multimedia of various forms such as tables, lists, forms and images. We propose to include this structure in web page summaries. We found that the expert user was insignificantly slowed in decision making while the majority of average users made decisions more quickly using summaries including structure without any decrease in decision accuracy. We additionally extended ReClose for use in summarizing large numbers of tweets in tracking flu outbreaks in social media. The resulting summaries have variable length and are effective at summarizing flu related trends. Users of the system obtained an accuracy of 0.86 labeling multi-tweet summaries. This showed that the basis of ReClose is effective outside of web documents and that variable length summaries can be more effective than fixed length. Overall the ReClose system provides unique summaries that contain more informative content than current search engines produce, highlight the results in a more meaningful way, and add structure when meaningful. The applications of ReClose extend far beyond search and have been demonstrated in summarizing pools of tweets

    Contextual Understanding of Sequential Data Across Multiple Modalities

    Get PDF
    In recent years, progress in computing and networking has made it possible to collect large volumes of data for various different applications in data mining and data analytics using machine learning methods. Data may come from different sources and in different shapes and forms depending on their inherent nature and the acquisition process. In this dissertation, we focus specifically on sequential data, which have been exponentially growing in recent years on platforms such as YouTube, social media, news agency sites, and other platforms. An important characteristic of sequential data is the inherent causal structure with latent patterns that can be discovered and learned from samples of the dataset. With this in mind, we target problems in two different domains of Computer Vision and Natural Language Processing that deal with sequential data and share the common characteristics of such data. The first one is action recognition based on video data, which is a fundamental problem in computer vision. This problem aims to find generalized patterns from videos to recognize or predict human actions. A video contains two important sets of information, i.e. appearance and motion. These information are complementary, and therefore an accurate recognition or prediction of activities or actions in video data depend significantly on our ability to extract them both. However, effective extraction of these information is a non-trivial task due to several challenges, such as viewpoint changes, camera motions, and scale variations, to name a few. It is thus crucial to design effective and generalized representations of video data that learn these variations and/or are invariant to such variations. We propose different models that learn and extract spatio-temporal correlations from video frames by using deep networks that overcome these challenges. The second problem that we study in this dissertation in the context of sequential data analysis is text summarization in multi-document processing. Sentences consist of sequence of words that imply context. The summarization task requires learning and understanding the contextual information from each sentence in order to determine which subset of sentences forms the best representative of a given article. With the progress made by deep learning, better representations of words have been achieved, leading in turn to better contextual representations of sentences. We propose summarization methods that combine mathematical optimization, Determinantal Point Processes (DPPs), and deep learning models that outperform the state of the art in multi-document text summarization

    Understanding search behaviour on mobile devices

    No full text
    Web search on hand-held devices has become enormously common and popular. Although a number of studies have revealed how users interact with search engine result pages (SERPs) on desktop monitors, there are still only few studies related to user interaction in mobile web search, and search results are shown in a similar way whether on a mobile phone or a desktop. Therefore, it is still difficult to know what happens between users and SERPs while searching on small screens, and this means that the current presentation of SERPs on mobile devices may not be the best. According to the findings from previous studies, including our earlier work, we can confirm that search behaviour on touch-enabled mobile devices is different from behaviour with desktop screens, and so we need to consider a different SERP presentation design for mobile devices. In this thesis, we explore several user interactions during search with the aim of improving search experience on smartphones. First, one remarkable trend of mobile devices is their enlargement of screen sizes during the last few years. This leads us to look for differences in search behaviour on different sized small screens, and if there are any, to suggest better presentation of search results for each screen size. In the first study, we investigated search performance, behaviour, and user satisfaction on three small screens (3.6 inches for early smartphones, 4.7 inches for recent smart-phones and 5.5 inches for phablets). We found no significant differences with respect to the efficiency of carrying out tasks. However, participants exhibited different search behaviours on the small, medium, and large sizes of small screens, respectively: a higher chance of scrolling with the worst user satisfaction on the smallest screen; fast information extraction with some hesitation before selecting a link on the medium screen; and less eye movements on top links on the largest screen. These results suggest that the presentation of web search results for each screen size needs to take into account differences in search behaviour. Second, although people are familiar with turning pages horizontally while reading books, vertical scrolling is the standard option that people have available while searching on mobile devices. So following a suggestion from the first study, in the second study we explored the effect of horizontal and vertical viewport control types (pagination versus scrolling) with various positions of a correct answer in mobile web search. Our findings suggest that although users are more familiar with scrolling, participants spent less time to find the correct answer with pagination, especially when the relevant result is located beyond the page fold. In addition, participants using scrolling exhibited less interest in lower-ranked results even if the documents were relevant. The overall result indicates that it is worthwhile providing different viewport controls for better search experiences in mobile web search. Third, snippets occupy the biggest space in each search result. Results from a previous study suggested that snippet length affects search performance on a desktop monitor. Due to the smaller screen, the effect seems to be much larger on smartphones. As one possible idea for a SERP presentation design from the first study, we investigated appropriate snippet lengths on mobile devices in the third study. We compared search behaviour with three different snippet lengths, that is, one line, two to three lines, and six or more lines of snippets on mobile SERPs. We found that with long snippets, participants needed longer search time for a particular task type, and the longer time consumption provided no better search accuracy. Our findings suggest that this search performance is related to viewport movements and user attention. We expect that our proposed approaches provide ways to understand mobile web search behaviour, and that the findings can be applied to a wide range of research areas such as human-computer integration, information retrieval, and even social science for a better presentation design of SERP on mobile devices

    Enhancing knowledge acquisition systems with user generated and crowdsourced resources

    Get PDF
    This thesis is on leveraging knowledge acquisition systems with collaborative data and crowdsourcing work from internet. We propose two strategies and apply them for building effective entity linking and question answering (QA) systems. The first strategy is on integrating an information extraction system with online collaborative knowledge bases, such as Wikipedia and Freebase. We construct a Cross-Lingual Entity Linking (CLEL) system to connect Chinese entities, such as people and locations, with corresponding English pages in Wikipedia. The main focus is to break the language barrier between Chinese entities and the English KB, and to resolve the synonymy and polysemy of Chinese entities. To address those problems, we create a cross-lingual taxonomy and a Chinese knowledge base (KB). We investigate two methods of connecting the query representation with the KB representation. Based on our CLEL system participating in TAC KBP 2011 evaluation, we finally propose a simple and effective generative model, which achieved much better performance. The second strategy is on creating annotation for QA systems with the help of crowd- sourcing. Crowdsourcing is to distribute a task via internet and recruit a lot of people to complete it simultaneously. Various annotated data are required to train the data-driven statistical machine learning algorithms for underlying components in our QA system. This thesis demonstrates how to convert the annotation task into crowdsourcing micro-tasks, investigate different statistical methods for enhancing the quality of crowdsourced anno- tation, and ïŹnally use enhanced annotation to train learning to rank models for passage ranking algorithms for QA.Gegenstand dieser Arbeit ist das Nutzbarmachen sowohl von Systemen zur Wissener- fassung als auch von kollaborativ erstellten Daten und Arbeit aus dem Internet. Es werden zwei Strategien vorgeschlagen, welche fĂŒr die Erstellung effektiver Entity Linking (Disambiguierung von EntitĂ€tennamen) und Frage-Antwort Systeme eingesetzt werden. Die erste Strategie ist, ein Informationsextraktions-System mit kollaborativ erstellten Online- Datenbanken zu integrieren. Wir entwickeln ein Cross-Linguales Entity Linking-System (CLEL), um chinesische EntitĂ€ten, wie etwa Personen und Orte, mit den entsprechenden Wikipediaseiten zu verknĂŒpfen. Das Hauptaugenmerk ist es, die Sprachbarriere zwischen chinesischen EntitĂ€ten und englischer Datenbank zu durchbrechen, und Synonymie und Polysemie der chinesis- chen EntitĂ€ten aufzulösen. Um diese Probleme anzugehen, erstellen wir eine cross linguale Taxonomie und eine chinesische Datenbank. Wir untersuchen zwei Methoden, die ReprĂ€sentation der Anfrage und die ReprĂ€sentation der Datenbank zu verbinden. Schließlich stellen wir ein einfaches und effektives generatives Modell vor, das auf unserem System fĂŒr die Teilnahme an der TAC KBP 2011 Evaluation basiert und eine erheblich bessere Performanz erreichte. Die zweite Strategie ist, Annotationen fĂŒr Frage-Antwort-Systeme mit Hilfe von "Crowd- sourcing" zu erstellen. "Crowdsourcing" bedeutet, eine Aufgabe via Internet an eine große Menge an angeworbene Menschen zu verteilen, die diese simultan erledigen. Verschiedene annotierte Daten sind notwendig, um die datengetriebenen statistischen Lernalgorithmen zu trainieren, die unserem Frage-Antwort System zugrunde liegen. Wir zeigen, wie die Annotationsaufgabe in Mikro-Aufgaben fĂŒr das Crowdsourcing umgewan- delt werden kann, wir untersuchen verschiedene statistische Methoden, um die QualitĂ€t der Annotation aus dem Crowdsourcing zu erweitern, und schließlich nutzen wir die erwei- erte Annotation, um Modelle zum Lernen von Ranglisten von Textabschnitten zu trainieren

    Supporting Voice-Based Natural Language Interactions for Information Seeking Tasks of Various Complexity

    Get PDF
    Natural language interfaces have seen a steady increase in their popularity over the past decade leading to the ubiquity of digital assistants. Such digital assistants include voice activated assistants, such as Amazon's Alexa, as well as text-based chat bots that can substitute for a human assistant in business settings (e.g., call centers, retail / banking websites) and at home. The main advantages of such systems are their ease of use and - in the case of voice-activated systems - hands-free interaction. The majority of tasks undertaken by users of these commercially available voice-based digital assistants are simple in nature, where the responses of the agent are often determined using a rules-based approach. However, such systems have the potential to support users in completing more complex and involved tasks. In this dissertation, I describe experiments investigating user behaviours when interacting with natural language systems and how improvements in design of such systems can benefit the user experience. Currently available commercial systems tend to be designed in a way to mimic superficial characteristics of a human-to-human conversation. However, the interaction with a digital assistant differs significantly from the interaction between two people, partly due to limitations of the underlying technology such as automatic speech recognition and natural language understanding. As computing technology evolves, it may make interactions with digital assistants resemble those between humans. The first part of this thesis explores how users will perceive the systems that are capable of human-level interaction, how users will behave while communicating with such systems, and new opportunities that may be opened by that behaviour. Even in the absence of the technology that allows digital assistants to perform on a human level, the digital assistants that are widely adopted by people around the world are found to be beneficial for a number of use-cases. The second part of this thesis describes user studies aiming at enhancing the functionality of digital assistants using the existing level of technology. In particular, chapter 6 focuses on expanding the amount of information a digital assistant is able to deliver using a voice-only channel, and chapter 7 explores how expanded capabilities of voice-based digital assistants would benefit people with visual impairments. The experiments presented throughout this dissertation produce a set of design guidelines for existing as well as potential future digital assistants. Experiments described in chapters 4, 6, and 7 focus on supporting the task of finding information online, while chapter 5 considers a case of guiding a user through a culinary recipe. The design recommendations provided by this thesis can be generalised in four categories: how naturally a user can communicate their thoughts to the system, how understandable the system's responses are to the user, how flexible the system's parameters are, and how diverse the information delivered by the system is

    Spoken conversational search: audio-only interactive information retrieval

    Get PDF
    Speech-based web search where no keyboard or screens are available to present search engine results is becoming ubiquitous, mainly through the use of mobile devices and intelligent assistants such as Apple's HomePod, Google Home, or Amazon Alexa. Currently, these intelligent assistants do not maintain a lengthy information exchange. They do not track context or present information suitable for an audio-only channel, and do not interact with the user in a multi-turn conversation. Understanding how users would interact with such an audio-only interaction system in multi-turn information seeking dialogues, and what users expect from these new systems, are unexplored in search settings. In particular, the knowledge on how to present search results over an audio-only channel and which interactions take place in this new search paradigm is crucial to incorporate while producing usable systems. Thus, constructing insight into the conversational structure of information seeking processes provides researchers and developers opportunities to build better systems while creating a research agenda and directions for future advancements in Spoken Conversational Search (SCS). Such insight has been identified as crucial in the growing SCS area. At the moment, limited understanding has been acquired for SCS, for example how the components interact, how information should be presented, or how task complexity impacts the interactivity or discourse behaviours. We aim to address these knowledge gaps. This thesis outlines the breadth of SCS and forms a manifesto advancing this highly interactive search paradigm with new research directions including prescriptive notions for implementing identified challenges. We investigate SCS through quantitative and qualitative designs: (i) log and crowdsourcing experiments investigating different interaction and results presentation styles, and (ii) the creation and analysis of the first SCS dataset and annotation schema through designing and conducting an observational study of information seeking dialogues. We propose new research directions and design recommendations based on the triangulation of three different datasets and methods: the log analysis to identify practical challenges and limitations of existing systems while informing our future observational study; the crowdsourcing experiment to validate a new experimental setup for future search engine results presentation investigations; and the observational study to establish the SCS dataset (SCSdata), form the first Spoken Conversational Search Annotation Schema (SCoSAS), and study interaction behaviours for different task complexities. Our principle contributions are based on our observational study for which we developed a novel methodology utilising a qualitative design. We show that existing information seeking models may be insufficient for the new SCS search paradigm because they inadequately capture meta-discourse functions and the system's role as an active agent. Thus, the results indicate that SCS systems have to support the user through discourse functions and be actively involved in the users' search process. This suggests that interactivity between the user and system is necessary to overcome the increased complexity which has been imposed upon the user and system by the constraints of the audio-only communication channel. We then present the first schematic model for SCS which is derived from the SCoSAS through the qualitative analysis of the SCSdata. In addition, we demonstrate the applicability of our dataset by investigating the effect of task complexity on interaction and discourse behaviour. Lastly, we present SCS design recommendations and outline new research directions for SCS. The implications of our work are practical, conceptual, and methodological. The practical implications include the development of the SCSdata, the SCoSAS, and SCS design recommendations. The conceptual implications include the development of a schematic SCS model which identifies the need for increased interactivity and pro-activity to overcome the audio-imposed complexity in SCS. The methodological implications include the development of the crowdsourcing framework, and techniques for developing and analysing SCS datasets. In summary, we believe that our findings can guide researchers and developers to help improve existing interactive systems which are less constrained, such as mobile search, as well as more constrained systems such as SCS systems

    Information-seeking processes among primary school children in Australia and Malaysia

    Get PDF
    Interest in information behaviour and information seeking has encompassed the school context. As the Internet has become one of the most important sources of information in supporting primary children’s learning environments, information behaviour and information seeking by children has become a key issue that requires more in-depth research. Research in the present study was carried out in Malaysia and Australia. This research shows how children seek information within a school context, particularly the processes taken for information seeking. The research also identifies the challenges faced by school children in seeking information from the Internet. The research adapted Kuhlthau’s (1993) model of the six stages of the information search process – initiation, selection, exploration, formulation, collection and presentation – in order to address the research objectives. The research involved three phases of data collection. The first phase involved a broad survey (quantitative data) in order to allow generalisation of results from a primary school population to identify the usage, knowledge and challenges of using the Internet in the school setting. The next phase was observation. As the research focused on the information-seeking processes undertaken by primary school children, observation was the best way to carry out the investigation. It involved observation of children seeking information from the Internet as they performed information-seeking tasks in 20-minute sessions. Three sets of tasks were set. The observations were aided by the use of a checklist and note taking. The checklist was based on the Kuhlthau (1993) model of information seeking, and the note taking served as a memory aid that contained extensive detail from the observations. The final phase of the research involved interviews (qualitative data) with the teachers, directed at determining the primary school children’s information behaviour and how they undertook information seeking in the school settings. The research provides an understanding of information-seeking processes among primary school children and makes recommendations for information technology specialists on the design elements of information retrieval systems for primary school children based on the behavioural and information-seeking approaches used by the children and their teachers. A modified model of information-seeking processes is proposed. These modifications incorporate the use of the Internet in seeking information in the school environment
    corecore