Search CORE

60 research outputs found

Automating the Discipline Analysis with Latent Dirichlet Allocation: A Case Study on 30 Core Journals of Library and Information Science Published in 2015

Author: Ylikruuvi Kaisa
Publication venue
Publication date: 12/06/2023
Field of study

Discipline analysis is an interesting and important research area, especially in the interdisciplinary and multidisciplinary fields of science, such as library and information science (LIS). Discipline analysis helps to identify the current trends and evolution of the research topics and the main methodologies employed within a field of study. In this thesis, discipline analysis is conducted by building a topic model on library and information science articles. The latent Dirichlet allocation (LDA) algorithm is employed in the set of LIS articles, which has been previously classified intellectually by LIS researchers. The thesis aims to compare the LDA model to the result of the intellectual content analysis, previous LDA models of LIS, and the co-citation analysis model of the same data set. The data consists of 1 440 articles and conference papers published in 30 core journals of LIS in 2015. The selection of journals, and the decision to use only titles, abstracts, and keywords in the analysis, are the same as in the intellectual content analysis. Most of the data could be fetched via Scopus API and the rest were downloaded from ProQuest or collected manually from the journals’ homepages. The data preprocessing phase included the correction of errors caused by optical character recognition and XML encoding, the removal of platform-specific metadata, numbers, stopwords, and extra whitespaces, and lemmatization. The data were analysed in R with package topicmodels to perform latent Dirichlet allocation. The quality assessment values of perplexity and topic coherence were calculated with functions from packages topicmodels and topicdoc, respectively. The final LDA model consists of 14 topics: Impact Indicators, Education in LIS Studies and Education as LIS Service, Academic Libraries, Information Retrieval, Computation-Assisted Analysis (analysis method), Scientific Collaboration, Public Libraries, Interactive Information Retrieval, Knowledge and Patent Management, Bibliometrics (analysis method), Open Access, Information History, Social Media, and User Behaviour in Digital Environment. The LDA model is of good quality and it succeeds to describe the different aspects of LIS well. The model compares well to the content analysis, which was conducted using the same data set, and to previous topic models of LIS. The LDA model outperforms the result of co-citation analysis, which was performed on the same data set, and which selects labels automatically for its clusters from the titles in the data. LDA topic modelling is a suitable method for pursuing discipline analysis. Further development is still recommended to automate the process more by developing a comprehensive preprocessing framework and especially by implementing high-quality automatic topic labelling for various platforms

Trepo - Institutional Repository of Tampere University

Predictive Modeling for Navigating Social Media

Author: HU Meiqun
Publication venue: Singapore Management University
Publication date: 01/01/2012
Field of study

Social media changes the way people use the Web. It has transformed ordinary Web users from information consumers to content contributors. One popular form of content contribution is social tagging, in which users assign tags to Web resources. By the collective efforts of the social tagging community, a new information space has been created for information navigation. Navigation allows serendipitous discovery of information by examining the information objects linked to one another in the social tagging space. In this dissertation, we study prediction tasks that facilitate navigation in social tagging systems. For social tagging systems to meet complex navigation needs of users, two issues are fundamental, namely link sparseness and object selection. Link sparseness is observed for many resources that are untagged or inadequately tagged, hindering navigation to the resources. Object selection is concerned when there are a large number of information objects that are linked to the current object, requiring to select the more interesting or relevant ones for guiding navigation effectively. This dissertation focuses on three dimensions, namely the semantic, social and temporal dimensions, to address link sparseness and object selection. To address link sparseness, we study the task of tag prediction. This task aims to enrich tags for the untagged or inadequately tagged resources, such that the predicted tags can serve as navigable links to these resources. For this task, we take a topic modeling approach to exploit the latent semantic relationships between resource content and tags. To address object selection, we study the task of personalized tag recommendation and trend discovery using social annotations. Personalized tag recommendation leverages the collective wisdom from the social tagging community to recommend tags that are semantically relevant to the target resource, while being tailored to the tagging preferences of individual users. For this task, we propose a probabilistic framework which leverages the implicit social links between like-minded users, i.e. who show similar tagging preferences, to recommend suitable tags. Social tags capture the interest of the users in the annotated resources at different times. These social annotations allow us to construct temporal profiles for the annotated resources. By analyzing these temporal profiles, we unveil the non-trivial temporal trends of the annotated resources, which provide novel metrics for selecting relevant and interesting resources for guiding navigation. For trend discovery using social annotations, we propose a trend discovery process which enables us to analyze trends for a multitude of semantics encapsulated in the temporal profiles of the annotated resources

Institutional Knowledge at Singapore Management University

ProQuest OAI Repository

Development of a Course Recommender System for Students

Author: Yadav Harish
Publication venue: University of North Carolina at Chapel Hill
Publication date: 01/01/2019
Field of study

Students at the university have an information need to find the courses of their interest. The current university registration portals do not fulfill this information need completely. We have proposed the development of a recommender system which can take a course name and based on the description of that course recommend other courses to students. The recommended course list could help save time and effort for students registering for courses. The proposed system was trained with sample data collected from the course catalog of the University of North Carolina at Chapel Hill. We tested the recommender system with different courses as input and evaluated the resulting recommended courses.Master of Science in Information Scienc

Carolina Digital Repository

Modeling and Understanding Communities in Online Social Media using Probabilistic Methods

Author: Negoescu Radu Andrei
Publication venue: Lausanne, EPFL
Publication date: 27/04/2011
Field of study

The amount of multimedia content is on a constant increase, and people interact with each other and with content on a daily basis through social media systems. The goal of this thesis was to model and understand emerging online communities that revolve around multimedia content, more specifically photos, by using large-scale data and probabilistic models in a quantitative approach. The dissertation has four contributions. First, using data from two online photo management systems, this thesis examined different aspects of the behavior of users of these systems pertaining to the uploading and sharing of photos with other users and online groups. Second, probabilistic topic models were used to model online entities, such as users and groups of users, and the new proposed representations were shown to be useful for further understanding such entities, as well as to have practical applications in search and recommendation scenarios. Third, by jointly modeling users from two different social photo systems, it was shown that differences at the level of vocabulary exist, and different sharing behaviors can be observed. Finally, by modeling online user groups as entities in a topic-based model, hyper-communities were discovered in an automatic fashion based on various topic-based representations. These hyper-communities were shown, both through an objective and a subjective evaluation with a number of users, to be generally homogeneous, and therefore likely to constitute a viable exploration technique for online communities

Infoscience - École polytechnique fédérale de Lausanne

Influencing collaboration to enhance knowledge work through serendipity: user-study and design considerations

Author: Arevalo Arboleda Stephanie Gabriela
Publication venue
Publication date: 16/08/2017
Field of study

We all were strangers to someone at some point and that is the starting point to analyze unexpected encounters. The busy pace of life has alienated people from each other, hence, this created an opportunity for technology to support social experiences. Meeting new people that one would not normally encounter in the vicinity or in the regular social sphere would expand the opportunities for establishing connections. Connections that go beyond establishing friendship bonds, but finding collaborators for the development of projects. This thesis was developed in order to understand the concept of serendipity in the context of computational systems and how it can be used to facilitate encounters among knowledge workers. The analysis of this thesis is conceived within the borders of Human-Technology Interaction, using psychological and sociality approaches from a technological perspective that allows a better understanding of the people’s needs when developing tools to support social interactions. The theoretical chapters start analyzing the phenomenon of serendipity from different perspectives, along with concepts about knowledge work and matchmaking. In order to understand the phenomenon of serendipity, the term is defined from social perspectives to psychological ones. The purpose of this is to set the basic premises of the study and introduce how serendipity is approached in terms of computational systems and knowledge work. Then, it analyzes matchmaking and grouping by presenting knowledge networks, social matchmaking with professional purposes and context awareness. The user study is carried out by a set of interviews to participants in Demola (an ecosystem that joins students with projects from companies), followed by a comparison of different tools that already exist that help matchmaking. The purpose of the user study was to analyze manual matchmaking among strangers. It analyzes participants’ experiences when working with strangers to carry out different innovation projects. It also intends to determine the expectations when forming a group. Added to that, the head of Demola Tampere was interviewed to understand the manual matching participants process. The final chapter presents a set of considerations when designing for serendipity to enhance knowledge work. The conceptualization of serendipity and the user study are the basis for establishing a set of guidelines in design. Which intend to enhance matchmaking in knowledge workers by analyzing weak ties as a way of serendipity. This study emphasizes on the goals and expectations of the users when finding a professional partner. Based on the user study, a model is presented which shows a possible structure for matchmaking

Trepo - Institutional Repository of Tampere University

Content Recommendation Through Linked Data

Author: Vagliano Iacopo
Publication venue: Politecnico di Torino
Publication date: 01/01/2017
Field of study

Nowadays, people can easily obtain a huge amount of information from the Web, but often they have no criteria to discern it. This issue is known as information overload. Recommender systems are software tools to suggest interesting items to users and can help them to deal with a vast amount of information. Linked Data is a set of best practices to publish data on the Web, and it is the basis of the Web of Data, an interconnected global dataspace. This thesis discusses how to discover information useful for the user from the vast amount of structured data, and notably Linked Data available on the Web. The work addresses this issue by considering three research questions: how to exploit existing relationships between resources published on the Web to provide recommendations to users; how to represent the user and his context to generate better recommendations for the current situation; and how to effectively visualize the recommended resources and their relationships. To address the first question, the thesis proposes a new algorithm based on Linked Data which exploits existing relationships between resources to recommend related resources. The algorithm was integrated into a framework to deploy and evaluate Linked Data based recommendation algorithms. In fact, a related problem is how to compare them and how to evaluate their performance when applied to a given dataset. The user evaluation showed that our algorithm improves the rate of new recommendations, while maintaining a satisfying prediction accuracy. To represent the user and their context, this thesis presents the Recommender System Context ontology, which is exploited in a new context-aware approach that can be used with existing recommendation algorithms. The evaluation showed that this method can significantly improve the prediction accuracy. As regards the problem of effectively visualizing the recommended resources and their relationships, this thesis proposes a visualization framework for DBpedia (the Linked Data version of Wikipedia) and mobile devices, which is designed to be extended to other datasets. In summary, this thesis shows how it is possible to exploit structured data available on the Web to recommend useful resources to users. Linked Data were successfully exploited in recommender systems. Various proposed approaches were implemented and applied to use cases of Telecom Italia

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Analyzing Granger causality in climate data with time series classification methods

Author: Decubber Stijn
Demuzere Matthias
Miralles Diego
Papagiannopoulou Christina
Verhoest Niko
Waegeman Willem
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

Ghent University Academic Bibliography

Understanding people through the aggregation of their digital footprints

Author: Zinman Aaron Robert
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 160-172).Every day, millions of people encounter strangers online. We read their medical advice, buy their products, and ask them out on dates. Yet our views of them are very limited; we see individual communication acts rather than the person(s) as a whole. This thesis contends that socially-focused machine learning and visualization of archived digital footprints can improve the capacity of social media to help form impressions of online strangers. Four original designs are presented that each examine the social fabric of a different existing online world. The designs address unique perspectives on the problem of and opportunities offered by online impression formation. The first work, Is Britney Spears Span?, examines a way of prototyping strangers on first contact by modeling their past behaviors across a social network. Landscape of Words identifies cultural and topical trends in large online publics. Personas is a data portrait that characterizes individuals by collating heterogenous textual artifacts. The final design, Defuse, navigates and visualizes virtual crowds using metrics grounded in sociology. A reflection on these experimental endeavors is also presented, including a formalization of the problem and considerations for future research. A meta-critique by a panel of domain experts completes the discussion.by Aaron Robert Zinman.Ph.D

DSpace@MIT

Recommended from our members

Human-Centered Technologies for Inclusive Collection and Analysis of Public-Generated Data

Author: Jasim Mahmood
Publication venue: ScholarWorks@UMass Amherst
Publication date: 15/11/2023
Field of study

The meteoric rise in the popularity of public engagement platforms such as social media, customer review websites, and public input solicitation efforts strives for establishing an inclusive environment for the public to share their thoughts, ideas, opinions, and experiences. Many decisions made at a personal, local, or national scale are often fueled by data generated by the public. As such, inclusive collection, analysis, sensemaking, and utilization of pubic-generated data are crucial to support the exercise of successful decision-making processes. However, people often struggle to engage, participate, and share their opinions due to inaccessibility, the rigidity of traditional public engagement methods, and the lack of options to provide opinions while avoiding potential confrontations. Concurrently, data analysts and decision-makers grapple with the challenges of analyzing, sensemaking, and making informed decisions based on public-generated data, which includes high dimensionality, ambiguity present in human language, and a lack of tools and techniques catered to their needs. Novel technological interventions are therefore necessary to enable the public to share their input without barriers and allow decision-makers to capture, forage, peruse, and sublimate public-generated data into concrete and actionable insights. The goal of this dissertation is to demonstrate how human-centered approaches involve the stakeholders in the design, development, and evaluation of tools and techniques that can lead to inclusive, effective, and efficient approaches to public-generated data collection and analysis to support informed decision-making. To that end, in this dissertation, I first addressed the challenges of empowering the public to share their opinions by exploring two major opinion-sharing avenues --- social media and public consultation. To learn more about people\u27s social media experiences and challenges, I built two technology probes and conducted a qualitative exploratory study with 16 participants. This study is followed up by exploring the challenges of inclusive participation during public consultations such as town halls. Based on a formative study with 66 participants and 20 organizers, I designed and developed CommunityClick to enable reticent share their opinions silently and anonymously during town halls. Equipped with the knowledge and experiences from these works, I designed, developed, and evaluated technologies and methods to facilitate and accelerate informed data-driven decision-making based on increased public-generated data. Based on interviews with 14 analysts and decision-makers in the civic domain, I built a visual analytics system CommunityClick that can facilitate public input analysis by surfacing hidden insights, people\u27s reflections, and priorities. Leveraging the lessons learned during this work, I created a visual text analytics system that supports serendipitous discovery and balanced analysis of textual data to help make informed decisions. In this work, I contribute an understanding of how people collect and analyze public-generated data to fuel their decisions when they have increased exposure to alternative avenues for opinion-sharing. Through a series of human-centered studies, I highlight the challenges that inhibit inclusivity in opinion sharing and shortcomings of existing methods that prevent decision-makers to account for comprehensive public input that includes marginalized or unpopular opinions. To address these challenges, I designed, developed, and evaluated a collection of interactive systems including CommunityClick, CommunityPulse, and Serendyze. Through a rigorous set of evaluation strategies which include creativity sessions, controlled lab studies, in-the-wild deployment, and field experiments, I involved stakeholders to assess the effectiveness and utility of the built systems. Through the empirical evidence from these studies, I demonstrate how alternative designs for social media could enhance people\u27s social media experiences and enable them to make new connections with others to share opinions. In addition, I show how CommunityClick can be utilized to enable reticent attendees during public consultation to share their opinions while avoiding unwanted confrontation and allowing organizers to capture and account for silent feedback. I highlight how CommunityPulse allowed analysts and decision-makers to examine public input from multiple angles for an accelerated analysis and more informed decision-making. Furthermore, I demonstrate how supporting serendipitous discovery and balanced analysis using Serendyze can lead to more informed data-driven decision-making. I conclude the dissertation with a discussion on future avenues to expand this research including the facilitation of multi-user collaborative analysis, integration of multi-modal signals in the analysis of public-generated data, and potential adoption strategies for decision-support systems designed for inclusive collection and analysis of public-generated data

ScholarWorks@UMass Amherst