4,582 research outputs found

    Hierarchical Classification and its Application in University Search

    Get PDF
    Web search engines have been adopted by most universities for searching webpages in their own domains. Basically, a user sends keywords to the search engine and the search engine returns a flat ranked list of webpages. However, in university search, user queries are usually related to topics. Simple keyword queries are often insufficient to express topics as keywords. On the other hand, most E-commerce sites allow users to browse and search products in various hierarchies. It would be ideal if hierarchical browsing and keyword search can be seamlessly combined for university search engines. The main difficulty is to automatically classify and rank a massive number of webpages into the topic hierarchies for universities. In this thesis, we use machine learning and data mining techniques to build a novel hybrid search engine with integrated hierarchies for universities, called SEEU (Search Engine with hiErarchy for Universities). Firstly, we study the problem of effective hierarchical webpage classification. We develop a parallel webpage classification system based on Support Vector Machines. With extensive experiments on the well-known ODP (Open Directory Project) dataset, we empirically demonstrate that our hierarchical classification system is very effective and outperforms the traditional flat classification approaches significantly. Secondly, we study the problem of integrating hierarchical classification into the ranking system of keywords-based search engines. We propose a novel ranking framework, called ERIC (Enhanced Ranking by hIerarchical Classification), for search engines with hierarchies. Experimental results on four large-scale TREC (Text REtrieval Conference) web search datasets show that our ranking system with hierarchical classification outperforms the traditional flat keywords-based search methods significantly. Thirdly, we propose a novel active learning framework to improve the performance of hierarchical classification, which is important for ranking webpages in hierarchies. From our experiments on the benchmark text datasets, we find that our active learning framework can achieve good classification performance yet save a considerable number of labeling effort compared with the state-of-the-art active learning methods for hierarchical text classification. Fourthly, based on the proposed classification and ranking methods, we present a novel hierarchical classification framework for mining academic topics from university webpages. We build an academic topic hierarchy based on the commonly accepted Wikipedia academic disciplines. Based on this hierarchy, we train a hierarchical classifier and apply it to mine academic topics. According to our comprehensive analysis, the academic topics mined by our method are reasonable and consistent with the real-world topic distribution in universities. Finally, we combine all the proposed techniques together and implement the SEEU search engine. According to two usability studies conducted in the ECE and the CS departments at our university, SEEU is favored by the majority of participants. To conclude, the main contribution of this thesis is a novel search engine, called SEEU, for universities. We discuss the challenges toward building SEEU and propose effective machine learning and data mining methods to tackle them. With extensive experiments on well-known benchmark datasets and real-world university webpage datasets, we demonstrate that our system is very effective. In addition, two usability studies of SEEU in our university show that SEEU has a great promise for university search

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    A Usability Approach to Improving the User Experience in Web Directories

    Get PDF
    Submitted for the degree of Doctor of Philosophy, Queen Mary, University of Londo

    A usability approach to improving the user experience in web directories

    Get PDF
    PhDWeb directories are hierarchically organised website collections that offer users subjectbased access to the Web. They played a significant part in navigating the Web in the past but their role has been weakened in recent years due to their cumbersome expanding collections. This thesis presents a unified framework combining the advantages of personalisation and redefined directory search for improving the usability of Web directories. The thesis begins with an examination of classification schemes that identifies the rigidity of hierarchical classifications and their suitability for Web directories in contrast to faceted classifications. This leads on to an Ontological Sketch Modelling (OSM) case study which identifies the misfits affecting user navigation in Web directories from known rigidity issues. The thesis continues with a review of personalisation techniques and a discussion of the user search model of Web directories following the suggested directions of improvement from the case study. A proposed user-centred framework to improve the usability of Web directories which consists of an individual content-based personalisation model and a redefined search model is then implemented as D-Persona and D-Search respectively. The remainder of the thesis is concerned with a usability test of D-Persona and D-Search aimed at discovering the efficiency, effectiveness and user satisfaction of the solution. This involves an experimental design, test results and discussions for the comparative user study. This thesis extracts a formal definition of the rigidity of hierarchies from their characteristics and justifies why hierarchies are still better suited than facets in organising Web directories. Second, it identifies misfits causing poor usability in Web directories based on the discovered rigidity of hierarchies. Third, it proposes a solution to tackle the misfits and improve the usability of Web directories which has been experimentally proved to be successful

    Comparative Approaches to Interdisciplinary KOSs: Use Cases of Converting UDC to BCC

    Get PDF
    We take a small sample of works and compare how these are classified within both the Universal Decimal Classification and the Basic concepts Classification. We examine notational length, expressivity, network effects, and the number of subject strings. One key finding is that BCC typically synthesizes many more terms than UDC in classifying a particular document – but the length of classificatory notations is roughly equivalent for the two KOSs. BCC captures documents with fewer subject strings (generally one) but these are more complex

    Design and Evaluation of User Interfaces for Mobile Web Search

    Get PDF
    Mobiili tiedonhaku on jatkuvasti kasvava ja monimuotoistuva osa jokapäiväistä tiedonhankintaa. Aikaisemman tutkimuksen mukaan tarvitaan kuitenkin parempia käyttöliittymäratkaisuja tukemaan mobiililaitteilla tapahtuvaa verkkotiedonhakua. Väitöskirjatutkimuksessa suunniteltiin ja toteutettiin kaksi uutta hakukäyttöliittymää, joita arvioitiin käyttäjätutkimuksissa. Ensimmäinen käyttöliittymä perustuu siihen, että hakutulokset luokitellaan ryhmiin niissä esiintyvien avainsanojen perusteella. Käyttäjätutkimusten tulokset osoittavat, että luokittelulla voidaan tukea mobiilikäyttäjien tutkivaa tiedonhakua. Toinen käyttöliittymä antaa hakutulosten yhteydessä yleiskuvan hakulauseen sijaintikohdista tulosdokumenteissa. Vaikkakin menetelmän käyttö vaatii opettelua, käyttäjäarviot osoittavat että se voi auttaa sivuuttamaan huonot hakutulokset, etenkin silloin kun muut hakutulosta kuvaavat tiedot ovat epäselviä. Lisäksi väitöskirjassa tutkittiin aktiivisten mobiili-Internetin käyttäjien tiedontarpeita verkkotiedonhaun käytön ymmärtämiseksi. Tutkimustulosten mukaan hakujen tekeminen ja verkon selaaminen ovat näiden käyttäjien tärkeimpiä tiedonhankintatapoja. Niillä pyritään vastaamaan tiedontarpeisiin heti niiden ilmaantuessa, olipa käyttäjä sitten kotona, liikkeessä tai sosiaalisessa vuorovaikutustilanteessa. Mobiili tiedonhankinta on vahvasti sidoksissa käyttötilanteeseen, mikä tulee huomioida hakukäyttöliittymien suunnittelussa. Tulevaisuuden hakukäyttöliittymät voivat esimerkiksi tukea tiedonhankintaa hyödyntämällä tietoa käyttäjän sijainnista ja aktiviteeteista. Myös epämuodollisten ja tutkivien tiedontarpeiden kasvava rooli asettaa uusia haasteita vuorovaikutuksen suunnittelulle.Mobile Web search is a rapidly growing information seeking activity employed across different locations, situations, and activities. Current mobile search interfaces are based on the ranked result list, dominant in desktop interfaces. Research suggests that new paradigms are needed for better support of mobile searchers. For this dissertation, two such novel search interface techniques were designed, implemented, and evaluated. The first method, a clustering search interface that presents a category- based overview of the results, was studied both in a task-based experiment in a laboratory setting and in a longitudinal field study wherein it was used to address real information needs. The results indicate that clustering can support exploratory search needs when the searcher has trouble defining the information need, requires an overview of the search topic, or is interested in multiple results related to the same topic. The findings informed design guidelines for category-based search interfaces. How and when categorization is presented in the search interface needs to be carefully considered. Categorization methods should be improved, for better response to diverse information needs. Hybrid approaches employing contextually informed clustering, classification, and faceted browsing may offer the best match for user needs. The second presentation method, a visualization of the occurrences of the user s query phrase in a result document, can be incorporated into the ranked result list as an additional, unobtrusive result descriptor. It allows the searcher to see how often the query phrase appears in the result document, enabling the use of various evaluation strategies to assess the relevance of the results. Several iterations of the visualization were studied with users to form an understanding of the potential of this approach. The results suggest that a novel visualization can be useful in ruling out non-relevant results and can assist when the other result descriptors do not provide for a conclusive relevance assessment. However, users familiarity with well-established result descriptors means that users have to learn how to integrate the visualization into their search strategies and reconcile situations in which the visualization is in conflict with other metadata. In addition, the contextual triggers and information behaviors of mobile Internet users were studied, for understanding of the role of Web search as a mobile information seeking activity. The results from this study show that mobile Web search and browsing are important information seeking activities. They are engaged in to resolve emerging information needs as they appear, whether at home, on the go, or in social situations

    A social media and crowd-sourcing data mining system for crime prevention during and post-crisis situations

    Get PDF
    A number of large crisis situations, such as natural disasters have affected the planet over the last decade. The outcomes of such disasters are catastrophic for the infrastructures of modern societies. Furthermore, after large disasters, societies come face-to-face with important issues, such as the loss of human lives, people who are missing and the increment of the criminality rate. In many occasions, they seem unprepared to face such issues. This paper aims to present an automated system for the synchronization of the police and Law Enforcement Agencies (LEAs) for the prevention of criminal activities during and post a large crisis situation. The paper presents a review of the literature focusing on the necessity of using data mining in combination with advanced web technologies, such as social media and crowd-sourcing, for the resolution of the problems related to criminal activities caused during and post-crisis situations. The paper provides an introduction to examples of different techniques and algorithms used for social media and crowd-sourcing scanning, such as sentiment analysis and link analysis. The main focus of the paper is the ATHENA Crisis Management system. The function of the ATHENA system is based on the use of social media and crowd-sourcing for collecting crisis-related information. The system uses a number of data mining techniques to collect and analyze data from the social media for the purpose of crime prevention. A number of conclusions are drawn on the significance of social media and crowd-sourcing data mining techniques for the resolution of problems related to large crisis situations with emphasis to the ATHENA system

    Investigating the Usability of a Mobile App for Finding and Exploring Places and Events

    Get PDF
    In our two-step field study, we developed and evaluated mobEx, a mobile app for faceted exploration of social media data on Android phones. mobEx unifies the data sources of related commercial apps in the market by retrieving information from various providers. The goal of our study was to find out, if the subjects understood the metaphor of a time-wheel as novel user interface feature for finding and exploring places and events and how they use it. In addition, mobEx offers a grid-based navigation menu and a list-based navigation menu for exploring the data. Here, we were interested in gaining some qualitative insights about which type of navigation approach the users prefer when they can choose between them. In this paper, we present the design and a preliminary analysis of the results of our study
    • …