6 research outputs found

    The FĂ­schlĂĄr-News-Stories system: personalised access to an archive of TV news

    Get PDF
    The “Físchlár” systems are a family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information. Físchlár-News-Stories, described in this paper, is one of those systems, and provides access to a growing archive of broadcast TV news. Físchlár-News-Stories has several notable features including the fact that it automatically records TV news and segments a broadcast news program into stories, eliminating advertisements and credits at the start/end of the broadcast. Físchlár-News-Stories supports access to individual stories via calendar lookup, text search through closed captions, automatically-generated links between related stories, and personalised access using a personalisation and recommender system based on collaborative filtering. Access to individual news stories is supported either by browsing keyframes with synchronised closed captions, or by playback of the recorded video. One strength of the Físchlár-News-Stories system is that it is actually used, in practice, daily, to access news. Several aspects of the Físchlár systems have been published before, bit in this paper we give a summary of the Físchlár-News-Stories system in operation by following a scenario in which it is used and also outlining how the underlying system realises the functions it offers

    Feature Reduction for Product Recommendation in Internet Shopping Malls

    Get PDF
    One of the widely used methods for product recommendation in Internet shopping malls is matching product features against customers’ profiles. In this method, it is very important to choose suitable set of features for recommendation efficiency and performance, which has, however, not been rigorously researched so far. In this paper, we build a data set collected from a virtual Internet shopping experiment and adapt and apply feature reduction techniques from pattern matching and information retrieval fields to the data to analyze recommendation performance. The analysis shows that the application of SVD (Singular Value Decomposition) can be the best among the applied methods for recommendation performance

    Supporting Learning by Tracing Personal Knowledge Formation

    Get PDF
    Internet-based and mobile technologies enable new ways of learning. They offer us new possibilities to access an enormous amount of knowledge at any time and everywhere. Among many advantages, the adaptations require a rethinking of our previous learning behaviour patterns and processes. The challenge for students is no longer to get access to information and knowledge, but to select the right one and to deal with the information and knowledge overflow. The aim of this research is to define, design and validate an advanced concept to support the contemporary learning processes. Therefore, the requirements for a new approach have been assessed, the available solutions from the related area of (personal) Knowledge Management have been investigated, and the weaknesses in the context of learning identified. The identified issues have been substantiated by university students via a quantitative survey. Besides several smaller aspects, knowledge fragmentation and the nescience of the knowledge formation process have been classified as the most critical ones. To overcome these problems, a methodological concept has been developed, and a corresponding technological design created. The chosen approach is an intelligent, independent intermediate layer, which traces the different steps our knowledge entities are going through. Based on personal and individual configurations, the system provides a comprehensive and overall observation of nearly all our knowledge work activities. It supports the building and accessing of the knowledge formation paths for every important knowledge unit, later path combination and the access to automatically generated versions of our work. Moreover, it helps the users not only to remember what they did, but also gives them some strong indications why they did it. This is achieved by combining different knowledge actions and looking at the influences they have on each other. The suggested concept has been critically proved and confirmed via a qualitative expert analysis and backed up by a quantitative survey among university students

    Intelligent methods for information filtering of research resources

    Get PDF
    This thesis presents several content-based methods to address the task of filtering research resources. The explosive growth of the Web in the last decades has led to an important increase in available scientific information. This has contributed to the need for tools which help researchers to deal with huge amounts of data. Examples of such tools are digital libraries, dedicated search engines, and personalized information filters. The latter, also known as recommenders, have proved useful for non-academic purposes and in the last years have started to be considered for recommendation of scholarly resources. This thesis explores new developments in this context. In particular, we focus on two different tasks. First we explore how to make maximal use of the semi-structured information typically available for research papers, such as keywords, authors, or journal, to assess research paper similarity. This is important since in many cases the full text of the articles is not available and the information used for tasks such as article recommendation is often limited to the abstracts. To exploit all the available information, we propose several methods based on both the vector space model and language modeling. In the first case, we study how the popular combination of tf-idf and cosine similarity can be used not only with the abstract, but also with the keywords and the authors. We also combine the abstract and these extra features by using Explicit Semantic Analysis. In the second case, we estimate separate language models based on each of the features to subsequently interpolate them. Moreover, we employ Latent Dirichlet Allocation (LDA) to discover latent topics which can enrich the models, and we explore how to use the keywords and the authors to improve the performance of the standard LDA algorithm. Next, we study the information available in call for papers (CFPs) of conferences to exploit it in content-based methods to match users with CFPs. Specifically, we distinguish between textual content such as the introductory text and topics in the scope of the conference, and names of the program committee. This second type of information can be used to retrieve the research papers written by these people, which provides the system with new data about the conference. Moreover, the research papers written by the users are employed to represent their interests. Again, we explore methods based on both the vector space model and language modeling to combine the different types of information. The experimental results indicate that the use of these extra features can lead to significant improvements. In particular, our methods based on interpolation of language models perform well for the task of assessing the similarity between research papers. On the contrary, when addressing the problem of filtering CFPs the methods based on the vector space model are shown to be more robust.Dit proefschrift stelt verschillende content-gebaseerde methoden voor om het probleem van het filteren van onderzoeksgerelateerde resources aan te pakken. De explosieve groei van het internet in de laatste decennia heeft geleid tot een belangrijke toename van de beschikbare wetenschappelijke informatie. Dit heeft bijgedragen aan de behoefte aan tools die onderzoekers helpen om om te gaan met grote hoeveelheden van data. Voorbeelden van dergelijke tools zijn digitale bibliotheken, specifieke zoekmachines, en gepersonaliseerde informatiefilters. Deze laatste, ook gekend als aanbevelingssystemen, hebben ruimschoots hun nut bewezen voor niet-academische doeleinden, en in de laatste jaren is men ze ook beginnen inzetten voor de aanbeveling van wetenschappelijke resources. Dit proefschrift exploreert nieuwe ontwikkelingen in deze context. In het bijzonder richten we ons op twee verschillende taken. Eerst onderzoeken we hoe we maximaal gebruik kunnen maken van de semigestructureerde informatie die doorgaans beschikbaar is voor wetenschappelijke artikels, zoals trefwoorden, auteurs, of tijdschrift, om de gelijkenis tussen wetenschappelijke artikels te beoordelen. Dit is belangrijk omdat in veel gevallen de volledige tekst van de artikelen niet beschikbaar is en de informatie gebruikt voor taken zoals aanbeveling van artikels vaak beperkt is tot de abstracts. Om alle beschikbare informatie te benutten, stellen we een aantal methoden voor op basis van zowel het vector space model en language models. In het eerste geval bestuderen we hoe de populaire combinatie van tf-idf en cosinussimilariteit gebruikt kan worden met niet alleen de abstract, maar ook met de trefwoorden en de auteurs. We combineren ook de abstract met deze extra informatie door het gebruik van Explicit Semantic Analysis. In het tweede geval schatten we afzonderlijke taalmodellen die gebaseerd zijn op de verschillende soorten informatie om ze daarna te interpoleren. Bovendien maken we gebruik van Latent Dirichlet Allocation (LDA) om latente onderwerpen te ontdekken die de modellen kunnen verrijken, en we onderzoeken hoe de trefwoorden en de auteurs gebruikt kunnen worden om de prestaties van de standaard LDA algoritme te verbeteren. Vervolgens bestuderen we de informatie beschikbaar in de call for papers (CFPs) van conferenties om deze te exploiteren in content-gebaseerde methoden om gebruikers te matchen met CFPs. Met name maken we onderscheid tussen tekstuele inhoud, zoals de inleidende tekst en onderwerpen in het kader van de conferentie, en de namen van de programmacommissie. Dit tweede type informatie kan gebruikt worden om de artikels geschreven door deze mensen te achterhalen, wat het systeem voorziet van bijkomende gegevens over de conferentie. Bovendien worden de artikels geschreven door de gebruikers gebruikt om hun interesses te voorstellen. Opnieuw onderzoeken we methoden gebaseerd op zowel het vector space model als op language models om de verschillende soorten informatie te combineren. De experimentele resultaten tonen aan dat het gebruik van deze extra informatie kan leiden tot significante verbeteringen. In het bijzonder presteren onze methoden op basis van interpolatie van taalmodellen goed voor de taak van het beoordelen van de gelijkenis tussen wetenschappelijke artikels. Daarentegen zijn de methoden gebaseerd op het vector space model meer robuust voor het probleem van het filteren van CFPs
    corecore