421 research outputs found

    User modeling for exploratory search on the Social Web. Exploiting social bookmarking systems for user model extraction, evaluation and integration

    Get PDF
    Exploratory search is an information seeking strategy that extends be- yond the query-and-response paradigm of traditional Information Retrieval models. Users browse through information to discover novel content and to learn more about the newly discovered things. Social bookmarking systems integrate well with exploratory search, because they allow one to search, browse, and filter social bookmarks. Our contribution is an exploratory tag search engine that merges social bookmarking with exploratory search. For this purpose, we have applied collaborative filtering to recommend tags to users. User models are an im- portant prerequisite for recommender systems. We have produced a method to algorithmically extract user models from folksonomies, and an evaluation method to measure the viability of these user models for exploratory search. According to our evaluation web-scale user modeling, which integrates user models from various services across the Social Web, can improve exploratory search. Within this thesis we also provide a method for user model integra- tion. Our exploratory tag search engine implements the findings of our user model extraction, evaluation, and integration methods. It facilitates ex- ploratory search on social bookmarks from Delicious and Connotea and pub- lishes extracted user models as Linked Data

    Mining User Behavior in Social Environments

    Get PDF
    The growth of the Web 2.0 has brought to a widespread use of social media systems and to an increasing number of active users. This phenomenon implies that each user interacts with too many users and is overwhelmed by a huge amount of content, leading to the well know “social interaction overload” problem. In order to address this problem several research communities study Social Recommender Systems, which are information filtering systems that operate in the social media domain and aim at suggesting to the users items that are supposed to be interesting for them. Social Recommender Systems usually filter content by exploiting the social graph or by mining the user content. Since the social domain is characterized by a continuous and quick growth of the the amount of content and users, both these approaches face some problems to produce accurate and up-to-date recommendations. This PhD thesis proposes some social recommendation approaches based on the mining of the user behavior, i.e., on the exploitation of the activity of the users in social environments, in order to produce accurate and up-to-date recommendations

    Evolution of the Field of Social Media Research through Science Maps (2008-2017)

    Get PDF
    The objectives of this work were to discover the main points of interest in the field of research on Social Media, within the scientific area of Communication, and to analyse how it has evolved. A methodology based on the analysis of co-words and visualisation techniques was applied. The data was obtained from scientific publications indexed in the Web of Science (WoS) database, during the periods 2008-2012 and 2013-2017. The resulting maps showed that, during the period 2008-2012, the main areas of interest were web 2.0 and the internet in terms of social networking sites. However, during the period 2013-2017, there was a strong upward trend in the impact of social networks and platforms, especially Twitter and Facebook, in many areas (such as social movements, public relations and publicity, distribution of content, crisis communication, participatory journalism, political communication, or the configuration of public identities through social platforms, with special emphasis on youth). Finally, new scientific challenges were found in automatic analysis of content and management of big data. In conclusion, it was possible to transform a complex, underlying, dynamic and multidimensional reality into visible representations that could help experts in the field to better understand the evolution of research on Social Media

    Community Detection in Hypergraphen

    Get PDF
    Viele Datensätze können als Graphen aufgefasst werden, d.h. als Elemente (Knoten) und binäre Verbindungen zwischen ihnen (Kanten). Unter dem Begriff der "Complex Network Analysis" sammeln sich eine ganze Reihe von Verfahren, die die Untersuchung von Datensätzen allein aufgrund solcher struktureller Eigenschaften erlauben. "Community Detection" als Untergebiet beschäftigt sich mit der Identifikation besonders stark vernetzter Teilgraphen. Über den Nutzen hinaus, den eine Gruppierung verwandter Element direkt mit sich bringt, können derartige Gruppen zu einzelnen Knoten zusammengefasst werden, was einen neuen Graphen von reduzierter Komplexität hervorbringt, der die Makrostruktur des ursprünglichen Graphen unter Umständen besser hervortreten lässt. Fortschritte im Bereich der "Community Detection" verbessern daher auch das Verständnis komplexer Netzwerke im allgemeinen. Nicht jeder Datensatz lässt sich jedoch angemessen mit binären Relationen darstellen - Relationen höherer Ordnung führen zu sog. Hypergraphen. Gegenstand dieser Arbeit ist die Verallgemeinerung von Ansätzen zur "Community Detection" auf derartige Hypergraphen. Im Zentrum der Aufmerksamkeit stehen dabei "Social Bookmarking"-Datensätze, wie sie von Benutzern von "Bookmarking"-Diensten erzeugt werden. Dabei ordnen Benutzer Dokumenten frei gewählte Stichworte, sog. "Tags" zu. Dieses "Tagging" erzeugt, für jede Tag-Zuordnung, eine ternäre Verbindung zwischen Benutzer, Dokument und Tag, was zu Strukturen führt, die 3-partite, 3-uniforme (im folgenden 3,3-, oder allgemeiner k,k-) Hypergraphen genannt werden. Die Frage, der diese Arbeit nachgeht, ist wie diese Strukturen formal angemessen in "Communities" unterteilt werden können, und wie dies das Verständnis dieser Datensätze erleichtert, die potenziell sehr reich an latenten Informationen sind. Zunächst wird eine Verallgemeinerung der verbundenen Komponenten für k,k-Hypergraphen eingeführt. Die normale Definition verbundener Komponenten weist auf den untersuchten Datensätzen, recht uninformativ, alle Elemente einer einzelnen Riesenkomponente zu. Die verallgemeinerten, so genannten hyper-inzidenten verbundenen Komponenten hingegen zeigen auf den "Social Bookmarking"-Datensätzen eine charakteristische Größenverteilung, die jedoch bspw. von Spam-Verhalten zerstört wird - was eine Verbindung zwischen Verhaltensmustern und strukturellen Eigenschaften zeigt, der im folgenden weiter nachgegangen wird. Als nächstes wird das allgemeine Thema der "Community Detection" auf k,k-Hypergraphen eingeführt. Drei Herausforderungen werden definiert, die mit der naiven Anwendung bestehender Verfahren nicht gemeistert werden können. Außerdem werden drei Familien synthetischer Hypergraphen mit "Community"-Strukturen von steigender Komplexität eingeführt, die prototypisch für Situationen stehen, die ein erfolgreicher Detektionsansatz rekonstruieren können sollte. Der zentrale methodische Beitrag dieser Arbeit besteht aus der im folgenden dargestellten Entwicklung eines multipartiten (d.h. für k,k-Hypergraphen geeigneten) Verfahrens zur Erkennung von "Communities". Es basiert auf der Optimierung von Modularität, einem etablierten Verfahrung zur Erkennung von "Communities" auf nicht-partiten, d.h. "normalen" Graphen. Ausgehend vom einfachst möglichen Ansatz wird das Verfahren iterativ verfeinert, um den zuvor definierten sowie neuen, in der Praxis aufgetretenen Herausforderungen zu begegnen. Am Ende steht die Definition der "ausgeglichenen multi-partiten Modularität". Schließlich wird ein interaktives Werkzeug zur Untersuchung der so gewonnenen "Community"-Zuordnungen vorgestellt. Mithilfe dieses Werkzeugs können die Vorteile der zuvor eingeführten Modularität demonstriert werden: So können komplexe Zusammenhänge beobachtet werden, die den einfacheren Verfahren entgehen. Diese Ergebnisse werden von einer stärker quantitativ angelegten Untersuchung bestätigt: Unüberwachte Qualitätsmaße, die bspw. den Kompressionsgrad berücksichtigen, können über eine größere Menge von Beispielen die Vorteile der ausgeglichenen multi-partiten Modularität gegenüber den anderen Verfahren belegen. Zusammenfassend lassen sich die Ergebnisse dieser Arbeit in zwei Bereiche einteilen: Auf der praktischen Seite werden Werkzeuge zur Erforschung von "Social Bookmarking"-Daten bereitgestellt. Demgegenüber stehen theoretische Beiträge, die für Graphen etablierte Konzepte - verbundene Komponenten und "Community Detection" - auf k,k-Hypergraphen übertragen.Many datasets can be interpreted as graphs, i.e. as elements (nodes) and binary relations between them (edges). Under the label of complex network analysis, a vast array of graph-based methods allows the exploration of datasets purely based on such structural properties. Community detection, as a subfield of network analysis, aims to identify well-connected subparts of graphs. While the grouping of related elements is useful in itself, these groups can furthermore be collapsed into single nodes, creating a new graph of reduced complexity which may better reveal the original graph's macrostructure. Therefore, advances in community detection improve the understanding of complex networks in general. However, not every dataset can be modelled properly with binary relations - higher-order relations give rise to so-called hypergraphs. This thesis explores the generalization of community detection approaches to hypergraphs. In the focus of attention are social bookmarking datasets, created by users of online bookmarking services who assign freely chosen keywords, so-called "tags", to documents. This "tagging" creates, for each tag assignment, a ternary connection between the user, the document, and the tag, inducing particular structures called 3-partite, 3-uniform hypergraphs (henceforth called 3,3- or more generally k,k-hypergraphs). The question pursued here is how to decompose these structures in a formally adequate manner, and how this improves the understanding of these rich datasets. First, a generalization of connected components to k,k-hypergraphs is proposed. The standard definition of connected components here rather uninformatively assigns almost all elements to a single giant component. The generalized so-called hyperincident connected components, however, show a characteristic size distribution on the social bookmarking datasets that is disrupted by, e.g., spamming activity - demonstrating a link between behavioural patterns and structural features that is further explored in the following. Next, the general topic of community detection in k,k-hypergraphs is introduced. Three challenges are posited that are not met by the naive application of standard techniques, and three families of synthetic hypergraphs are introduced containing increasingly complex community setups that a successful detection approach must be able to identify. The main methodical contribution of this thesis consists of the following development of a multi-partite (i.e. suitable for k,k-hypergraphs) community detection algorithm. It is based on modularity optimization, a well-established algorithm to detect communities in non-partite, i.e. "normal" graphs. Starting from the simplest approach possible, the method is successively refined to meet the previously defined as well as empirically encountered challenges, culminating in the definition of the "balanced multi-partite modularity". Finally, an interactive tool for exploring the obtained community assignments is introduced. Using this tool, the benefits of balanced multi-partite modularity can be shown: Intricate patters can be observed that are missed by the simpler approaches. These findings are confirmed by a more quantitative examination: Unsupervised quality measures considering, e.g., compression document the advantages of this approach on a larger number of samples. To conclude, the contributions of this thesis are twofold. It provides practical tools for the analysis of social bookmarking data, complemented with theoretical contributions, the generalization of connected components and modularity from graphs to k,k-hypergraphs

    Evolution of the Field of Social Media Research through Science Maps (2008-2017)

    Get PDF
    The objectives of this work were to discover the main points of interest in the field of research on Social Media, within the scientific area of Communication, and to analyse how it has evolved. A methodology based on the analysis of co-words and visualisation techniques was applied. The data was obtained from scientific publications indexed in the Web of Science (WoS) database, during the periods 2008-2012 and 2013-2017. The resulting maps showed that, during the period 2008-2012, the main areas of interest were web 2.0 and the internet in terms of social networking sites. However, during the period 2013-2017, there was a strong upward trend in the impact of social networks and platforms, especially Twitter and Facebook, in many areas (such as social movements, public relations and publicity, distribution of content, crisis communication, participatory journalism, political communication, or the configuration of public identities through social platforms, with special emphasis on youth). Finally, new scientific challenges were found in automatic analysis of content and management of big data. In conclusion, it was possible to transform a complex, underlying, dynamic and multidimensional reality into visible representations that could help experts in the field to better understand the evolution of research on Social Media.Los objetivos de este trabajo fueron descubrir los principales focos de interés del campo de investigación de los Social Media, dentro del área científica de la Comunicación, y analizar la dinámica de su evolución. Se aplicó una metodología basada en el análisis de co-palabras y en técnicas de visualización. Los datos se obtuvieron de las publicaciones científicas indexadas en la base de datos Web of Science (WoS), durante los períodos temporales 2008-2012 y 2013-2017. Los mapas resultantes mostraron que durante el período 2008-2012 las principales áreas de interés fueron la web 2.0 y el uso de Internet en el ámbito de los medios de comunicación. Sin embargo, durante el período 2013-2017 se apreció una fuerte tendencia ascendente del impacto de las redes y las plataformas sociales, especialmente Twitter y Facebook, en numerosas áreas, tales como los movimientos sociales, las relaciones públicas y la publicidad, la difusión de contenidos, la comunicación de crisis, el periodismo participativo, la comunicación política o la configuración de las identidades públicas a través de las plataforma sociales, con especial incidencia en los adolescentes. Por último, los nuevos retos científicos se situaron en el análisis automático de contenidos y en la gestión de datos masivos, o big data. En conclusión, se consiguió transformar una realidad compleja, subyacente, dinámica y multidimensional en representaciones visibles que podrían ayudar a una mejor comprensión de la evolución del campo de investigación de los Social Media por parte de los expertos en la materia

    개인화 검색 및 파트너쉽 선정을 위한 사용자 프로파일링

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 치의과학과, 2014. 2. 김홍기.The secret of change is to focus all of your energy not on fighting the old, but on building the new. - Socrates The automatic identification of user intention is an important but highly challenging research problem whose solution can greatly benefit information systems. In this thesis, I look at the problem of identifying sources of user interests, extracting latent semantics from it, and modelling it as a user profile. I present algorithms that automatically infer user interests and extract hidden semantics from it, specifically aimed at improving personalized search. I also present a methodology to model user profile as a buyer profile or a seller profile, where the attributes of the profile are populated from a controlled vocabulary. The buyer profiles and seller profiles are used in partnership match. In the domain of personalized search, first, a novel method to construct a profile of user interests is proposed which is based on mining anchor text. Second, two methods are proposed to builder a user profile that gather terms from a folksonomy system where matrix factorization technique is explored to discover hidden relationship between them. The objective of the methods is to discover latent relationship between terms such that contextually, semantically, and syntactically related terms could be grouped together, thus disambiguating the context of term usage. The profile of user interests is also analysed to judge its clustering tendency and clustering accuracy. Extensive evaluation indicates that a profile of user interests, that can correctly or precisely disambiguate the context of user query, has a significant impact on the personalized search quality. In the domain of partnership match, an ontology termed as partnership ontology is proposed. The attributes or concepts, in the partnership ontology, are features representing context of work. It is used by users to lay down their requirements as buyer profiles or seller profiles. A semantic similarity measure is defined to compute a ranked list of matching seller profiles for a given buyer profile.1 Introduction 1 1.1 User Profiling for Personalized Search . . . . . . . . 9 1.1.1 Motivation . . . . . . . . . . . . . . . . . . . 10 1.1.2 Research Problems . . . . . . . . . . . . . . 11 1.2 User Profiling for Partnership Match . . . . . . . . 18 1.2.1 Motivation . . . . . . . . . . . . . . . . . . . 19 1.2.2 Research Problems . . . . . . . . . . . . . . 24 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . 25 1.4 System Architecture - Personalized Search . . . . . 29 1.5 System Architecture - Partnership Match . . . . . . 31 1.6 Organization of this Dissertation . . . . . . . . . . 32 2 Background 35 2.1 Introduction to Social Web . . . . . . . . . . . . . . 35 2.2 Matrix Decomposition Methods . . . . . . . . . . . 40 2.3 User Interest Profile For Personalized Web Search Non Folksonomy based . . . . . . . . . . . . . . . . 43 2.4 User Interest Profile for Personalized Web Search Folksonomy based . . . . . . . . . . . . . . . . . . . 45 2.5 Personalized Search . . . . . . . . . . . . . . . . . . 47 2.6 Partnership Match . . . . . . . . . . . . . . . . . . 52 3 Mining anchor text for building User Interest Profile: A non-folksonomy based personalized search 56 3.1 Exclusively Yours' . . . . . . . . . . . . . . . . . . . 59 3.1.1 Infer User Interests . . . . . . . . . . . . . . 61 3.1.2 Weight Computation . . . . . . . . . . . . . 64 3.1.3 Query Expansion . . . . . . . . . . . . . . . 67 3.2 Exclusively Yours' Algorithm . . . . . . . . . . . . 68 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . 71 3.3.1 DataSet . . . . . . . . . . . . . . . . . . . . 72 3.3.2 Evaluation Metrics . . . . . . . . . . . . . . 73 3.3.3 User Profile Efficacy . . . . . . . . . . . . . 74 3.3.4 Personalized vs. Non-Personalized Results . 76 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . 80 4 Matrix factorization for building Clustered User Interest Profile: A folksonomy based personalized search 82 4.1 Aggregating tags from user search history . . . . . 86 4.2 Latent Semantics in UIP . . . . . . . . . . . . . . . 90 4.2.1 Computing the tag-tag Similarity matrix . . 90 4.2.2 Tag Clustering to generate svdCUIP and modSvdCUIP 98 4.3 Personalized Search . . . . . . . . . . . . . . . . . . 101 4.4 Experimental Evaluation . . . . . . . . . . . . . . . 103 4.4.1 Data Set and Experiment Methodology . . . 103 4.4.1.1 Custom Data Set and Evaluation Metrics . . . . . . . . . . . . . . . 103 4.4.1.2 AOL Query Data Set and Evaluation Metrics . . . . . . . . . . . . . 107 4.4.1.3 Experiment set up to estimate the value of k and d . . . . . . . . . . 107 4.4.1.4 Experiment set up to compare the proposed approaches with other approaches . . . . . . . . . . . . . . . 109 4.4.2 Experiment Results . . . . . . . . . . . . . . 111 4.4.2.1 Clustering Tendency . . . . . . . . 111 4.4.2.2 Determining the value for dimension parameter, k, for the Custom Data Set . . . . . . . . . . . . . . . 113 4.4.2.3 Determining the value of distinctness parameter, d, for the Custom data set . . . . . . . . . . . . . . . 115 4.4.2.4 CUIP visualization . . . . . . . . . 117 4.4.2.5 Determining the value of the dimension reduction parameter k for the AOL data set. . . . . . . . . . . . 119 4.4.2.6 Determining the value of distinctness parameter, d, for the AOL data set . . . . . . . . . . . . . . . . . . 120 4.4.2.7 Time to generate svdCUIP and modSvd-CUIP . . . . . . . . . . . . . . . . 122 4.4.2.8 Comparison of the svdCUIP, modSvd-CUIP, and tfIdfCUIP for different classes of queries . . . . . . . . . . 123 4.4.2.9 Comparing all five methods - Improvement . . . . . . . . . . . . . . 124 4.4.3 Discussion . . . . . . . . . . . . . . . . . . . 126 5 User Profiling for Partnership Match 133 5.1 Supplier Selection . . . . . . . . . . . . . . . . . . . 137 5.2 Criteria for Partnership Establishment . . . . . . . 140 5.3 Partnership Ontology . . . . . . . . . . . . . . . . . 143 5.4 Case Study . . . . . . . . . . . . . . . . . . . . . . 147 5.4.1 Buyer Profile and Seller Profile . . . . . . . 153 5.4.2 Semantic Similarity Measure . . . . . . . . . 155 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . 160 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . 162 6 Conclusion 164 6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . 167 6.1.1 Degree of Personalization . . . . . . . . . . . 167 6.1.2 Filter Bubble . . . . . . . . . . . . . . . . . 168 6.1.3 IPR issues in Partnership Match . . . . . . . 169 Bibliography 170 Appendices 193 .1 Pairs of Query and target URL . . . . . . . . . . . 194 .2 Examples of Expanded Queries . . . . . . . . . . . 197 .3 An example of svdCUIP, modSvdCUIP, tfIdfCUIP 198Docto

    Network analysis of shared interests represented by social bookmarking behaviors

    Get PDF
    Social bookmarking is a new phenomenon characterized by a number of features including active user participation, open and collective discovery of resources, and user-generated metadata. Among others, this study pays particular attention to its nature of being at the intersection of personal information space and social information space. While users of a social bookmarking site create and maintain their own bookmark collections, the users' personal information spaces, in aggregate, build up the information space of the site as a whole. The overall goal of this study is to understand how social information space may emerge when personal information spaces of users intersect and overlap with shared interests. The main purpose of the study is two-fold: first, to see whether and how we can identify shared interest space(s) within the general information space of a social bookmarking site; and second, to evaluate the applicability of social network analysis to this end. Delicious.com, one of the most successful instances of social bookmarking, was chosen as the case. The study was carried out in three phases asking separate yet interrelated questions concerning the overall level of interest overlap, the structural patterns in the network of users connected by shared interests, and the communities of interest within the network. The results indicate that, while individual users of delicious.com have a broad range of diverse interests, there is a considerable level of overlap and commonality, providing a ground for creating implicit networks of users with shared interests. The networks constructed based on common bookmarks revealed intriguing structural patterns commonly found in well-established social systems, including a core periphery structure with a high level of connectivity, which form a basis for efficient information sharing and knowledge transfer. Furthermore, an exploratory analysis of the network communities showed that each community has a distinct theme defining the shared interests of its members, at a high level of coherence. Overall, the results suggest that networks of people with shared interests can be induced from their social bookmarking behaviors and such networks can provide a venue for investigating social mechanisms of information sharing in this new information environment. Future research can be built upon the methods and findings of this study to further explore the implication of the emergent and implicit network of shared interests
    corecore