26 research outputs found

    A neighborhood-based approach for clustering of linked document collections

    No full text
    This technical report addresses the problem of automatically structuring linked document collections by using clustering. In contrast to traditional clustering, we study the clustering problem in the light of available link structure information for the data set (e.g., hyperlinks among web documents or co-authorship among bibliographic data entries). Our approach is based on iterative relaxation of cluster assignments, and can be built on top of any clustering algorithm (e.g., k-means or DBSCAN). These techniques result in higher cluster purity, better overall accuracy, and make self-organization more robust. Our comprehensive experiments on three different real-world corpora demonstrate the benefits of our approach

    Visualising the South Yorkshire floods of ‘07

    Get PDF
    This paper describes initial work on developing an information system to gather, process and visualise various multimedia data sources related to the South Yorkshire (UK) floods of 2007. The work is part of the Memoir project which aims to investigate how technology can help people create and manage long-term personal memories. We are using maps to aggregate multimedia data and to stimulate remembering past events. The paper describes an initial prototype; challenges faced so far and planned future work

    Finding co-solvers on Twitter, with a little help from Linked Data

    Get PDF
    In this paper we propose a method for suggesting potential collaborators for solving innovation challenges online, based on their competence, similarity of interests and social proximity with the user. We rely on Linked Data to derive a measure of semantic relatedness that we use to enrich both user profiles and innovation problems with additional relevant topics, thereby improving the performance of co-solver recommendation. We evaluate this approach against state of the art methods for query enrichment based on the distribution of topics in user profiles, and demonstrate its usefulness in recommending collaborators that are both complementary in competence and compatible with the user. Our experiments are grounded using data from the social networking service Twitter.com

    Konstruktion von Featureräumen und Metaverfahren zur Klassifikation von Webdokumenten

    No full text
    Dieses Papier befasst sich mit der automatischen Klassifikation von Webdokumenten in eine vorgegebene Taxonomie. Wir betrachten dabei vektorbasierte Verfahren des maschinellen Lernens am Beispiel von SVM (Support Vector Machines). In diesem Papier beschreiben wir Möglichkeiten zur Generierung von Featurevektoren unter Berücksichtigung der Besonderheiten von Webdokumenten für solche Verfahren. Weiterhin untersuchen wir die Berechnung von Metaresultaten aus den partiellen Klassifikationsergebnissen

    Using Restrictive Classification and Meta Classification for Junk Elimination

    No full text
    This paper addresses the problem of performing supervised classification on document collections containing also junk documents. With junk documents we mean documents that do not belong to the topic categories (classes) we are interested in. This type of documents can typically not be covered by the training set; nevertheless in many real world applications (e.g. classification of web or intranet content, focused crawling etc.) such documents occur quite often and a classifier has to make a decision about them. We tackle this problem by using restrictive methods and ensemble-based meta methods that may decide to leave out some documents rather than assigning them to inappropriate classes with low confidence. Our experiments with four different data sets show that the proposed techniques can eliminate a relatively large fraction of junk documents while dismissing only a significantly smaller fraction of potentially interesting documents

    A Neighborhood-Based Approach for Clustering of Linked Document Collections

    No full text
    This paper addresses the problem of automatically structuring linked document collections by using clustering. In contrast to traditional clustering, we study the clustering problem in the light of available link structure information for the data set (e.g., hyperlinks among web documents or co-authorship among bibliographic data entries). Our approach is based on iterative relaxation of cluster assignments, and can be built on top of any clustering algorithm. This technique results in higher cluster purity, better overall accuracy, and make self-organization more robust

    Know the Right People? Recommender Systems for Web 2.0.

    Get PDF
    Web 2.0 applications like Flickr, YouTube, or Del.icio.us are increasingly popular online communities for creating, editing and sharing content. However, the rapid increase in size of online communities and the availability of large amounts of shared data make discovering relevant content and finding related users a difficult task. Web 2.0 applications provide a rich set of structures and annotations that can be mined for a variety of purposes. In this paper we propose a formal model to characterize users, items, and annotations in Web 2.0 environments. Based on this model we propose recommendation mechanisms using methods from social network analysis, collaborative filtering, and machine learning. Our objective is to construct collaborative recommender systems that predict the utility of items, users or groups based on the multi-dimensional social environment of a given user

    Content redundancy in YouTube and its application to video tagging

    No full text
    The emergence of large-scale social Web communities has enabled users to share online vast amounts of multimedia content. An analysis of YouTube reveals a high amount of redundancy, in the form of videos with overlapping or duplicated content. We use robust content-based video analysis techniques to detect overlapping sequences between videos. Based on the output of these techniques, we present an in-depth study of duplication and content overlap in YouTube, and analyze various dependencies between content overlap and meta data such as video titles, views, video ratings, and tags. As an application, we show that content-based links provide useful information for generating new tag assignments. We propose different tag propagation methods for automatically obtaining richer video annotations. Experiments on video clustering and classi?cation as well as a user evaluation demonstrate the viability of our approach

    Combining Text and Linguistic Document Representations for Authorship Attribution

    No full text
    In this paper, we provide several alternatives to the classical Bag-Of-Words model for automatic authorship attribution. To this end, we consider linguistic and writing style infor- mation such as grammatical structures to construct di®er- ent document representations. Furthermore we describe two techniques to combine the obtained representations: combi- nation vectors and ensemble based meta classi¯cation. Our experiments show the viability of our approach

    Verdi, the Virgin, and the censor: the politics of the cult of Mary in I Lombardi alla prima crociata and Giovanna d'Arco

    Get PDF
    Christian elements and themes developed rapidly in mid-nineteenth-century Italian culture. This essay concentrates on Giuseppe Verdi's I Lombardi alla prima crociata (1843) and Giovanna d'Arco (1845)—the first nineteenth-century Italian operas to include explicit references to the Virgin Mary. Whereas Giselda's prayer to the Virgin in I Lombardi was allowed by the censors with only one minimal emendation, evidence in the autograph score of Giovanna d'Arco reveals that numerous relevant Marian elements in this opera were modified or suppressed. The different attitude of the Milanese censors toward the two operas, both premiered at La Scala, may at first seem contradictory. Examined in the context of contemporary cultural ramifications of the cult of Mary, however, these works and their censorship acquire new meanings, shedding light on the intersections between religion and politics in the phase of the Risorgimento leading to the revolutions of 1848–49. On the one hand, Giselda's character and her prayer embody the feminine mildness, faith, and passivity characteristic of the Catholic-liberal movement, hardly posing a threat to the political and religious status quo. On the other, the Marian elements in Giovanna d'Arco suggest an appropriation of this religious icon for the purposes of an overtly revolutionary agenda, and thus prompted the intervention of the censors. What may at first seem a straightforward instance of religious censorship bears profound political implications, suggests that the censors operated according to contextual or semantic (rather than merely textual or lexical) criteria, and invites a more nuanced perception of the political meanings of Verdi's works prior to 1848
    corecore