338 research outputs found

    An Ontology-Based Recommender System with an Application to the Star Trek Television Franchise

    Full text link
    Collaborative filtering based recommender systems have proven to be extremely successful in settings where user preference data on items is abundant. However, collaborative filtering algorithms are hindered by their weakness against the item cold-start problem and general lack of interpretability. Ontology-based recommender systems exploit hierarchical organizations of users and items to enhance browsing, recommendation, and profile construction. While ontology-based approaches address the shortcomings of their collaborative filtering counterparts, ontological organizations of items can be difficult to obtain for items that mostly belong to the same category (e.g., television series episodes). In this paper, we present an ontology-based recommender system that integrates the knowledge represented in a large ontology of literary themes to produce fiction content recommendations. The main novelty of this work is an ontology-based method for computing similarities between items and its integration with the classical Item-KNN (K-nearest neighbors) algorithm. As a study case, we evaluated the proposed method against other approaches by performing the classical rating prediction task on a collection of Star Trek television series episodes in an item cold-start scenario. This transverse evaluation provides insights into the utility of different information resources and methods for the initial stages of recommender system development. We found our proposed method to be a convenient alternative to collaborative filtering approaches for collections of mostly similar items, particularly when other content-based approaches are not applicable or otherwise unavailable. Aside from the new methods, this paper contributes a testbed for future research and an online framework to collaboratively extend the ontology of literary themes to cover other narrative content.Comment: 25 pages, 6 figures, 5 tables, minor revision

    An ontology matching approach for semantic modeling: A case study in smart cities

    Get PDF
    This paper investigates the semantic modeling of smart cities and proposes two ontology matching frameworks, called Clustering for Ontology Matching-based Instances (COMI) and Pattern mining for Ontology Matching-based Instances (POMI). The goal is to discover the relevant knowledge by investigating the correlations among smart city data based on clustering and pattern mining approaches. The COMI method first groups the highly correlated ontologies of smart-city data into similar clusters using the generic k-means algorithm. The key idea of this method is that it clusters the instances of each ontology and then matches two ontologies by matching their clusters and the corresponding instances within the clusters. The POMI method studies the correlations among the data properties and selects the most relevant properties for the ontology matching process. To demonstrate the usefulness and accuracy of the COMI and POMI frameworks, several experiments on the DBpedia, Ontology Alignment Evaluation Initiative, and NOAA ontology databases were conducted. The results show that COMI and POMI outperform the state-of-the-art ontology matching models regarding computational cost without losing the quality during the matching process. Furthermore, these results confirm the ability of COMI and POMI to deal with heterogeneous large-scale data in smart-city environments.publishedVersio

    Opportunity Identification for New Product Planning: Ontological Semantic Patent Classification

    Get PDF
    Intelligence tools have been developed and applied widely in many different areas in engineering, business and management. Many commercialized tools for business intelligence are available in the market. However, no practically useful tools for technology intelligence are available at this time, and very little academic research in technology intelligence methods has been conducted to date. Patent databases are the most important data source for technology intelligence tools, but patents inherently contain unstructured data. Consequently, extracting text data from patent databases, converting that data to meaningful information and generating useful knowledge from this information become complex tasks. These tasks are currently being performed very ineffectively, inefficiently and unreliably by human experts. This deficiency is particularly vexing in product planning, where awareness of market needs and technological capabilities is critical for identifying opportunities for new products and services. Total nescience of the text of patents, as well as inadequate, unreliable and untimely knowledge derived from these patents, may consequently result in missed opportunities that could lead to severe competitive disadvantage and potentially catastrophic loss of revenue. The research performed in this dissertation tries to correct the abovementioned deficiency with an approach called patent mining. The research is conducted at Finex, an iron casting company that produces traditional kitchen skillets. To \u27mine\u27 pertinent patents, experts in new product development at Finex modeled one ontology for the required product features and another for the attributes of requisite metallurgical enabling technologies from which new product opportunities for skillets are identified by applying natural language processing, information retrieval, and machine learning (classification) to the text of patents in the USPTO database. Three main scenarios are examined in my research. Regular classification (RC) relies on keywords that are extracted directly from a group of USPTO patents. Ontological classification (OC) relies on keywords that result from an ontology developed by Finex experts, which is evaluated and improved by a panel of external experts. Ontological semantic classification (OSC) uses these ontological keywords and their synonyms, which are extracted from the WordNet database. For each scenario, I evaluate the performance of three classifiers: k-Nearest Neighbor (k-NN), random forest, and Support Vector Machine (SVM). My research shows that OSC is the best scenario and SVM is the best classifier for identifying product planning opportunities, because this combination yields the highest score in metrics that are generally used to measure classification performance in machine learning (e.g., ROC-AUC and F-score). My method also significantly outperforms current practice, because I demonstrate in an experiment that neither the experts at Finex nor the panel of external experts are able to search for and judge relevant patents with any degree of effectiveness, efficiency or reliability. This dissertation provides the rudiments of a theoretical foundation for patent mining, which has yielded a machine learning method that is deployed successfully in a new product planning setting (Finex). Further development of this method could make a significant contribution to management practice by identifying opportunities for new product development that have been missed by the approaches that have been deployed to date

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Incorporating functional inter-relationships into protein function prediction algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches.</p> <p>Results</p> <p>We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the <it>k</it>-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1.</p> <p>Conclusion</p> <p>We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at <url>http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/</url>.</p

    Combining Content with User Preferences for Non-Fiction Multimedia Recommendation: A Study on TED Lectures

    Get PDF
    This paper introduces a new dataset and compares several methods for the recommendation of non-fiction audio visual material, namely lectures from the TED website. The TED dataset contains 1,149 talks and 69,023 profiles of users, who have made more than 100,000 ratings and 200,000 comments. The corresponding metadata, which we make available, can be used for training and testing generic or personalized recommender systems. We define content-based, collaborative, and methods (LSI, LDA, RP, and ESA). We compare these methods on a personalized recommendation task in two settings, a cold-start and a non-cold-start one. In the cold-start setting, semantic vector spaces perform better than keywords. In the non-cold-start setting, where collaborative information can be exploited, content-based methods are outperformed by collaborative filtering ones, but the proposed combined method shows acceptable performances, and can be used in both settings. For the generic recommendation task, LSI and RP again outperform TF-IDF

    Using Data Mining for Facilitating User Contributions in the Social Semantic Web

    Get PDF
    This thesis utilizes recommender systems to aid the user in contributing to the Social Semantic Web. In this work, we propose a framework that maps domain properties to recommendation technologies. Next, we develop novel recommendation algorithms for improving personalized tag recommendation and for recommendation of semantic relations. Finally, we introduce a framework to analyze different types of potential attacks against social tagging systems and evaluate their impact on those systems

    Revisiting Urban Dynamics through Social Urban Data:

    Get PDF
    The study of dynamic spatial and social phenomena in cities has evolved rapidly in the recent years, yielding new insights into urban dynamics. This evolution is strongly related to the emergence of new sources of data for cities (e.g. sensors, mobile phones, online social media etc.), which have potential to capture dimensions of social and geographic systems that are difficult to detect in traditional urban data (e.g. census data). However, as the available sources increase in number, the produced datasets increase in diversity. Besides heterogeneity, emerging social urban data are also characterized by multidimensionality. The latter means that the information they contain may simultaneously address spatial, social, temporal, and topical attributes of people and places. Therefore, integration and geospatial (statistical) analysis of multidimensional data remain a challenge. The question which, then, arises is how to integrate heterogeneous and multidimensional social urban data into the analysis of human activity dynamics in cities? To address the above challenge, this thesis proposes the design of a framework of novel methods and tools for the integration, visualization, and exploratory analysis of large-scale and heterogeneous social urban data to facilitate the understanding of urban dynamics. The research focuses particularly on the spatiotemporal dynamics of human activity in cities, as inferred from different sources of social urban data. The main objective is to provide new means to enable the incorporation of heterogeneous social urban data into city analytics, and to explore the influence of emerging data sources on the understanding of cities and their dynamics.&nbsp; In mitigating the various heterogeneities, a methodology for the transformation of heterogeneous data for cities into multidimensional linked urban data is, therefore, designed. The methodology follows an ontology-based data integration approach and accommodates a variety of semantic (web) and linked data technologies. A use case of data interlinkage is used as a demonstrator of the proposed methodology. The use case employs nine real-world large-scale spatiotemporal data sets from three public transportation organizations, covering the entire public transport network of the city of Athens, Greece. &nbsp;To further encourage the consumption of linked urban data by planners and policy-makers, a set of webbased tools for the visual representation of ontologies and linked data is designed and developed. The tools – comprising the OSMoSys framework – provide graphical user interfaces for the visual representation, browsing, and interactive exploration of both ontologies and linked urban data. &nbsp; After introducing methods and tools for data integration, visual exploration of linked urban data, and derivation of various attributes of people and places from different social urban data, it is examined how they can all be combined into a single platform. To achieve this, a novel web-based system (coined SocialGlass) for the visualization and exploratory analysis of human activity dynamics is designed. The system combines data from various geo-enabled social media (i.e. Twitter, Instagram, Sina Weibo) and LBSNs (i.e. Foursquare), sensor networks (i.e. GPS trackers, Wi-Fi cameras), and conventional socioeconomic urban records, but also has the potential to employ custom datasets from other sources. A real-world case study is used as a demonstrator of the capacities of the proposed web-based system in the study of urban dynamics. The case study explores the potential impact of a city-scale event (i.e. the Amsterdam Light festival 2015) on the activity and movement patterns of different social categories (i.e. residents, non-residents, foreign tourists), as compared to their daily and hourly routines in the periods&nbsp; before and after the event. The aim of the case study is twofold. First, to assess the potential and limitations of the proposed system and, second, to investigate how different sources of social urban data could influence the understanding of urban dynamics. The contribution of this doctoral thesis is the design and development of a framework of novel methods and tools that enables the fusion of heterogeneous multidimensional data for cities. The framework could foster planners, researchers, and policy makers to capitalize on the new possibilities given by emerging social urban data. Having a deep understanding of the spatiotemporal dynamics of cities and, especially of the activity and movement behavior of people, is expected to play a crucial role in addressing the challenges of rapid urbanization. Overall, the framework proposed by this research has potential to open avenues of quantitative explorations of urban dynamics, contributing to the development of a new science of cities

    Revisiting Urban Dynamics through Social Urban Data

    Get PDF
    The study of dynamic spatial and social phenomena in cities has evolved rapidly in the recent years, yielding new insights into urban dynamics. This evolution is strongly related to the emergence of new sources of data for cities (e.g. sensors, mobile phones, online social media etc.), which have potential to capture dimensions of social and geographic systems that are difficult to detect in traditional urban data (e.g. census data). However, as the available sources increase in number, the produced datasets increase in diversity. Besides heterogeneity, emerging social urban data are also characterized by multidimensionality. The latter means that the information they contain may simultaneously address spatial, social, temporal, and topical attributes of people and places. Therefore, integration and geospatial (statistical) analysis of multidimensional data remain a challenge. The question which, then, arises is how to integrate heterogeneous and multidimensional social urban data into the analysis of human activity dynamics in cities? &nbsp;To address the above challenge, this thesis proposes the design of a framework of novel methods and tools for the integration, visualization, and exploratory analysis of large-scale and heterogeneous social urban data to facilitate the understanding of urban dynamics. The research focuses particularly on the spatiotemporal dynamics of human activity in cities, as inferred from different sources of social urban data. The main objective is to provide new means to enable the incorporation of heterogeneous social urban data into city analytics, and to explore the influence of emerging data sources on the understanding of cities and their dynamics. &nbsp;In mitigating the various heterogeneities, a methodology for the transformation of heterogeneous data for cities into multidimensional linked urban data is, therefore, designed. The methodology follows an ontology-based data integration approach and accommodates a variety of semantic (web) and linked data technologies. A use case of data interlinkage is used as a demonstrator of the proposed methodology. The use case employs nine real-world large-scale spatiotemporal data sets from three public transportation organizations, covering the entire public transport network of the city of Athens, Greece. &nbsp;To further encourage the consumption of linked urban data by planners and policy-makers, a set of webbased tools for the visual representation of ontologies and linked data is designed and developed. The tools – comprising the OSMoSys framework – provide graphical user interfaces for the visual representation, browsing, and interactive exploration of both ontologies and linked urban data. &nbsp;After introducing methods and tools for data integration, visual exploration of linked urban data, and derivation of various attributes of people and places from different social urban data, it is examined how they can all be combined into a single platform. To achieve this, a novel web-based system (coined SocialGlass) for the visualization and exploratory analysis of human activity dynamics is designed. The system combines data from various geo-enabled social media (i.e. Twitter, Instagram, Sina Weibo) and LBSNs (i.e. Foursquare), sensor networks (i.e. GPS trackers, Wi-Fi cameras), and conventional socioeconomic urban records, but also has the potential to employ custom datasets from other sources. &nbsp;A real-world case study is used as a demonstrator of the capacities of the proposed web-based system in the study of urban dynamics. The case study explores the potential impact of a city-scale event (i.e. the Amsterdam Light festival 2015) on the activity and movement patterns of different social categories (i.e. residents, non-residents, foreign tourists), as compared to their daily and hourly routines in the periods&nbsp; before and after the event. The aim of the case study is twofold. First, to assess the potential and limitations of the proposed system and, second, to investigate how different sources of social urban data could influence the understanding of urban dynamics. &nbsp;The contribution of this doctoral thesis is the design and development of a framework of novel methods and tools that enables the fusion of heterogeneous multidimensional data for cities. The framework could foster planners, researchers, and policy makers to capitalize on the new possibilities given by emerging social urban data. Having a deep understanding of the spatiotemporal dynamics of cities and, especially of the activity and movement behavior of people, is expected to play a crucial role in addressing the challenges of rapid urbanization. Overall, the framework proposed by this research has potential to open avenues of quantitative explorations of urban dynamics, contributing to the development of a new science of cities
    • …
    corecore