60 research outputs found

    Automating the Discipline Analysis with Latent Dirichlet Allocation: A Case Study on 30 Core Journals of Library and Information Science Published in 2015

    Get PDF
    Discipline analysis is an interesting and important research area, especially in the interdisciplinary and multidisciplinary fields of science, such as library and information science (LIS). Discipline analysis helps to identify the current trends and evolution of the research topics and the main methodologies employed within a field of study. In this thesis, discipline analysis is conducted by building a topic model on library and information science articles. The latent Dirichlet allocation (LDA) algorithm is employed in the set of LIS articles, which has been previously classified intellectually by LIS researchers. The thesis aims to compare the LDA model to the result of the intellectual content analysis, previous LDA models of LIS, and the co-citation analysis model of the same data set. The data consists of 1 440 articles and conference papers published in 30 core journals of LIS in 2015. The selection of journals, and the decision to use only titles, abstracts, and keywords in the analysis, are the same as in the intellectual content analysis. Most of the data could be fetched via Scopus API and the rest were downloaded from ProQuest or collected manually from the journals’ homepages. The data preprocessing phase included the correction of errors caused by optical character recognition and XML encoding, the removal of platform-specific metadata, numbers, stopwords, and extra whitespaces, and lemmatization. The data were analysed in R with package topicmodels to perform latent Dirichlet allocation. The quality assessment values of perplexity and topic coherence were calculated with functions from packages topicmodels and topicdoc, respectively. The final LDA model consists of 14 topics: Impact Indicators, Education in LIS Studies and Education as LIS Service, Academic Libraries, Information Retrieval, Computation-Assisted Analysis (analysis method), Scientific Collaboration, Public Libraries, Interactive Information Retrieval, Knowledge and Patent Management, Bibliometrics (analysis method), Open Access, Information History, Social Media, and User Behaviour in Digital Environment. The LDA model is of good quality and it succeeds to describe the different aspects of LIS well. The model compares well to the content analysis, which was conducted using the same data set, and to previous topic models of LIS. The LDA model outperforms the result of co-citation analysis, which was performed on the same data set, and which selects labels automatically for its clusters from the titles in the data. LDA topic modelling is a suitable method for pursuing discipline analysis. Further development is still recommended to automate the process more by developing a comprehensive preprocessing framework and especially by implementing high-quality automatic topic labelling for various platforms

    Predictive Modeling for Navigating Social Media

    Get PDF
    Social media changes the way people use the Web. It has transformed ordinary Web users from information consumers to content contributors. One popular form of content contribution is social tagging, in which users assign tags to Web resources. By the collective efforts of the social tagging community, a new information space has been created for information navigation. Navigation allows serendipitous discovery of information by examining the information objects linked to one another in the social tagging space. In this dissertation, we study prediction tasks that facilitate navigation in social tagging systems. For social tagging systems to meet complex navigation needs of users, two issues are fundamental, namely link sparseness and object selection. Link sparseness is observed for many resources that are untagged or inadequately tagged, hindering navigation to the resources. Object selection is concerned when there are a large number of information objects that are linked to the current object, requiring to select the more interesting or relevant ones for guiding navigation effectively. This dissertation focuses on three dimensions, namely the semantic, social and temporal dimensions, to address link sparseness and object selection. To address link sparseness, we study the task of tag prediction. This task aims to enrich tags for the untagged or inadequately tagged resources, such that the predicted tags can serve as navigable links to these resources. For this task, we take a topic modeling approach to exploit the latent semantic relationships between resource content and tags. To address object selection, we study the task of personalized tag recommendation and trend discovery using social annotations. Personalized tag recommendation leverages the collective wisdom from the social tagging community to recommend tags that are semantically relevant to the target resource, while being tailored to the tagging preferences of individual users. For this task, we propose a probabilistic framework which leverages the implicit social links between like-minded users, i.e. who show similar tagging preferences, to recommend suitable tags. Social tags capture the interest of the users in the annotated resources at different times. These social annotations allow us to construct temporal profiles for the annotated resources. By analyzing these temporal profiles, we unveil the non-trivial temporal trends of the annotated resources, which provide novel metrics for selecting relevant and interesting resources for guiding navigation. For trend discovery using social annotations, we propose a trend discovery process which enables us to analyze trends for a multitude of semantics encapsulated in the temporal profiles of the annotated resources

    Development of a Course Recommender System for Students

    Get PDF
    Students at the university have an information need to find the courses of their interest. The current university registration portals do not fulfill this information need completely. We have proposed the development of a recommender system which can take a course name and based on the description of that course recommend other courses to students. The recommended course list could help save time and effort for students registering for courses. The proposed system was trained with sample data collected from the course catalog of the University of North Carolina at Chapel Hill. We tested the recommender system with different courses as input and evaluated the resulting recommended courses.Master of Science in Information Scienc

    Modeling and Understanding Communities in Online Social Media using Probabilistic Methods

    Get PDF
    The amount of multimedia content is on a constant increase, and people interact with each other and with content on a daily basis through social media systems. The goal of this thesis was to model and understand emerging online communities that revolve around multimedia content, more specifically photos, by using large-scale data and probabilistic models in a quantitative approach. The dissertation has four contributions. First, using data from two online photo management systems, this thesis examined different aspects of the behavior of users of these systems pertaining to the uploading and sharing of photos with other users and online groups. Second, probabilistic topic models were used to model online entities, such as users and groups of users, and the new proposed representations were shown to be useful for further understanding such entities, as well as to have practical applications in search and recommendation scenarios. Third, by jointly modeling users from two different social photo systems, it was shown that differences at the level of vocabulary exist, and different sharing behaviors can be observed. Finally, by modeling online user groups as entities in a topic-based model, hyper-communities were discovered in an automatic fashion based on various topic-based representations. These hyper-communities were shown, both through an objective and a subjective evaluation with a number of users, to be generally homogeneous, and therefore likely to constitute a viable exploration technique for online communities

    Influencing collaboration to enhance knowledge work through serendipity: user-study and design considerations

    Get PDF
    We all were strangers to someone at some point and that is the starting point to analyze unexpected encounters. The busy pace of life has alienated people from each other, hence, this created an opportunity for technology to support social experiences. Meeting new people that one would not normally encounter in the vicinity or in the regular social sphere would expand the opportunities for establishing connections. Connections that go beyond establishing friendship bonds, but finding collaborators for the development of projects. This thesis was developed in order to understand the concept of serendipity in the context of computational systems and how it can be used to facilitate encounters among knowledge workers. The analysis of this thesis is conceived within the borders of Human-Technology Interaction, using psychological and sociality approaches from a technological perspective that allows a better understanding of the people’s needs when developing tools to support social interactions. The theoretical chapters start analyzing the phenomenon of serendipity from different perspectives, along with concepts about knowledge work and matchmaking. In order to understand the phenomenon of serendipity, the term is defined from social perspectives to psychological ones. The purpose of this is to set the basic premises of the study and introduce how serendipity is approached in terms of computational systems and knowledge work. Then, it analyzes matchmaking and grouping by presenting knowledge networks, social matchmaking with professional purposes and context awareness. The user study is carried out by a set of interviews to participants in Demola (an ecosystem that joins students with projects from companies), followed by a comparison of different tools that already exist that help matchmaking. The purpose of the user study was to analyze manual matchmaking among strangers. It analyzes participants’ experiences when working with strangers to carry out different innovation projects. It also intends to determine the expectations when forming a group. Added to that, the head of Demola Tampere was interviewed to understand the manual matching participants process. The final chapter presents a set of considerations when designing for serendipity to enhance knowledge work. The conceptualization of serendipity and the user study are the basis for establishing a set of guidelines in design. Which intend to enhance matchmaking in knowledge workers by analyzing weak ties as a way of serendipity. This study emphasizes on the goals and expectations of the users when finding a professional partner. Based on the user study, a model is presented which shows a possible structure for matchmaking

    Content Recommendation Through Linked Data

    Get PDF
    Nowadays, people can easily obtain a huge amount of information from the Web, but often they have no criteria to discern it. This issue is known as information overload. Recommender systems are software tools to suggest interesting items to users and can help them to deal with a vast amount of information. Linked Data is a set of best practices to publish data on the Web, and it is the basis of the Web of Data, an interconnected global dataspace. This thesis discusses how to discover information useful for the user from the vast amount of structured data, and notably Linked Data available on the Web. The work addresses this issue by considering three research questions: how to exploit existing relationships between resources published on the Web to provide recommendations to users; how to represent the user and his context to generate better recommendations for the current situation; and how to effectively visualize the recommended resources and their relationships. To address the first question, the thesis proposes a new algorithm based on Linked Data which exploits existing relationships between resources to recommend related resources. The algorithm was integrated into a framework to deploy and evaluate Linked Data based recommendation algorithms. In fact, a related problem is how to compare them and how to evaluate their performance when applied to a given dataset. The user evaluation showed that our algorithm improves the rate of new recommendations, while maintaining a satisfying prediction accuracy. To represent the user and their context, this thesis presents the Recommender System Context ontology, which is exploited in a new context-aware approach that can be used with existing recommendation algorithms. The evaluation showed that this method can significantly improve the prediction accuracy. As regards the problem of effectively visualizing the recommended resources and their relationships, this thesis proposes a visualization framework for DBpedia (the Linked Data version of Wikipedia) and mobile devices, which is designed to be extended to other datasets. In summary, this thesis shows how it is possible to exploit structured data available on the Web to recommend useful resources to users. Linked Data were successfully exploited in recommender systems. Various proposed approaches were implemented and applied to use cases of Telecom Italia

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Understanding people through the aggregation of their digital footprints

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 160-172).Every day, millions of people encounter strangers online. We read their medical advice, buy their products, and ask them out on dates. Yet our views of them are very limited; we see individual communication acts rather than the person(s) as a whole. This thesis contends that socially-focused machine learning and visualization of archived digital footprints can improve the capacity of social media to help form impressions of online strangers. Four original designs are presented that each examine the social fabric of a different existing online world. The designs address unique perspectives on the problem of and opportunities offered by online impression formation. The first work, Is Britney Spears Span?, examines a way of prototyping strangers on first contact by modeling their past behaviors across a social network. Landscape of Words identifies cultural and topical trends in large online publics. Personas is a data portrait that characterizes individuals by collating heterogenous textual artifacts. The final design, Defuse, navigates and visualizes virtual crowds using metrics grounded in sociology. A reflection on these experimental endeavors is also presented, including a formalization of the problem and considerations for future research. A meta-critique by a panel of domain experts completes the discussion.by Aaron Robert Zinman.Ph.D
    • …
    corecore