4,472 research outputs found

    Processing and Linking Audio Events in Large Multimedia Archives: The EU inEvent Project

    Get PDF
    In the inEvent EU project [1], we aim at structuring, retrieving, and sharing large archives of networked, and dynamically changing, multimedia recordings, mainly consisting of meetings, videoconferences, and lectures. More specifically, we are developing an integrated system that performs audiovisual processing of multimedia recordings, and labels them in terms of interconnected “hyper-events ” (a notion inspired from hyper-texts). Each hyper-event is composed of simpler facets, including audio-video recordings and metadata, which are then easier to search, retrieve and share. In the present paper, we mainly cover the audio processing aspects of the system, including speech recognition, speaker diarization and linking (across recordings), the use of these features for hyper-event indexing and recommendation, and the search portal. We present initial results for feature extraction from lecture recordings using the TED talks. Index Terms: Networked multimedia events; audio processing: speech recognition; speaker diarization and linking; multimedia indexing and searching; hyper-events. 1

    A survey of data mining techniques for social media analysis

    Get PDF
    Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors

    Web information search and sharing :

    Get PDF
    制度:新 ; 報告番号:甲2735号 ; 学位の種類:博士(人間科学) ; 授与年月日:2009/3/15 ; 早大学位記番号:新493

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Timeout Reached, Session Ends?

    Get PDF
    Die Identifikation von Sessions zum Verständnis des Benutzerverhaltens ist ein Forschungsgebiet des Web Usage Mining. Definitionen und Konzepte werden seit über 20 Jahren diskutiert. Die Forschung zeigt, dass Session-Identifizierung kein willkürlicher Prozess sein sollte. Es gibt eine fragwürdige Tendenz zu vereinfachten mechanischen Sessions anstelle logischer Segmentierungen. Ziel der Dissertation ist es zu beweisen, wie unterschiedliche Session-Ansätze zu abweichenden Ergebnissen und Interpretationen führen. Die übergreifende Forschungsfrage lautet: Werden sich verschiedene Ansätze zur Session-Identifizierung auf Analyseergebnisse und Machine-Learning-Probleme auswirken? Ein methodischer Rahmen für die Durchführung, den Vergleich und die Evaluation von Sessions wird gegeben. Die Dissertation implementiert 135 Session-Ansätze in einem Jahr (2018) Daten einer deutschen Preisvergleichs-E-Commerce-Plattform. Die Umsetzung umfasst mechanische Konzepte, logische Konstrukte und die Kombination mehrerer Mechaniken. Es wird gezeigt, wie logische Sessions durch Embedding-Algorithmen aus Benutzersequenzen konstruiert werden: mit einem neuartigen Ansatz zur Identifizierung logischer Sessions, bei dem die thematische Nähe von Interaktionen anstelle von Suchanfragen allein verwendet wird. Alle Ansätze werden verglichen und quantitativ beschrieben sowie in drei Machine-Learning-Problemen (wie Recommendation) angewendet. Der Hauptbeitrag dieser Dissertation besteht darin, einen umfassenden Vergleich von Session-Identifikationsalgorithmen bereitzustellen. Die Arbeit bietet eine Methodik zum Implementieren, Analysieren und Evaluieren einer Auswahl von Mechaniken, die es ermöglichen, das Benutzerverhalten und die Auswirkungen von Session-Modellierung besser zu verstehen. Die Ergebnisse zeigen, dass unterschiedlich strukturierte Eingabedaten die Ergebnisse von Algorithmen oder Analysen drastisch verändern können.The identification of sessions as a means of understanding user behaviour is a common research area of web usage mining. Different definitions and concepts have been discussed for over 20 years: Research shows that session identification is not an arbitrary task. There is a tendency towards simplistic mechanical sessions instead of more complex logical segmentations, which is questionable. This dissertation aims to prove how the nature of differing session-identification approaches leads to diverging results and interpretations. The overarching research question asks: will different session-identification approaches impact analysis and machine learning tasks? A comprehensive methodological framework for implementing, comparing and evaluating sessions is given. The dissertation provides implementation guidelines for 135 session-identification approaches utilizing a complete year (2018) of traffic data from a German price-comparison e-commerce platform. The implementation includes mechanical concepts, logical constructs and the combination of multiple methods. It shows how logical sessions were constructed from user sequences by employing embedding algorithms on interaction logs; taking a novel approach to logical session identification by utilizing topical proximity of interactions instead of search queries alone. All approaches are compared and quantitatively described. The application in three machine-learning tasks (such as recommendation) is intended to show that using different sessions as input data has a marked impact on the outcome. The main contribution of this dissertation is to provide a comprehensive comparison of session-identification algorithms. The research provides a methodology to implement, analyse and compare a wide variety of mechanics, allowing to better understand user behaviour and the effects of session modelling. The main results show that differently structured input data may drastically change the results of algorithms or analysis

    A study on the personalization methods of the web

    Get PDF
    Search engine personalization is one of the various deep personalization methods. It can be said that personalization systems that help users find the information they need requires the use of contextual and semantic information analysis techniques that exist in the field of data recovery such as web personalization and the process of optimizing the methods to get to web pages in a way that are consistent with the needs of each user. What helps the current problem of search engines and accelerate their performance, is providing a proper framework for finding the correct pattern considering great items in history of users. This approach improves the advising process of the search engines as well. The aim of this paper is to introduce some process improvement methods of correct patterns and analyze them. Here we will discuss the basic concepts of web personalization and consider the three approaches of web personalization and we evaluated the methods belonging to each of them.Keywords: personalization, search engine, user preferences, data mining method

    Modelling Web Usage in a Changing Environment

    Get PDF
    Eiben, A.E. [Promotor]Kowalczyk, W. [Copromotor

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges
    corecore