13,548 research outputs found

    Using Taxonomy Tree to Generalize a Fuzzy Thematic Cluster

    Get PDF
    D.F. and B.M. acknowledge continuing support by the Academic Fund Program at the National Research University Higher School of Economics (grant 19-04-019 in 2018-2019) and by the International Decision Choice and Analysis Laboratory (DECAN) NRU HSE, in the framework of a subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the the Russian Academic Excellence Project “5-100”. S.N. acknowledges the support by FCT/MCTES, NOVA LINCS (UID/CEC/04516/2019).This paper presents an algorithm, ParGenFS, for generalizing, or 'lifting', a fuzzy set of topics to higher ranks of a hierarchical taxonomy of a research domain. The algorithm ParGenFS finds a globally optimal generalization of the topic set to minimize a penalty function, by balancing the number of introduced 'head subjects' and related errors, the 'gaps' and 'offshoots', differently weighted. This leads to a generalization of the topic set in the taxonomy. The usefulness of the method is illustrated on a set of 17685 abstracts of research papers on Data Science published in Springer journals for the past 20 years. We extracted a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection, lift them in the taxonomy, and interpret found head subjects to comment on the tendencies of current research.authorsversionpublishe

    The impact of knowledge management processes on organizational resilience: data mining as an instrument of measurement.

    Get PDF
    The aim of the research conducted for this thesis is to test the feasibility of using data mining (DM) to assess the relationship between and the impact of knowledge management (KM) on organizational resilience (OR). The emphasis currently placed on the value of intangible assets by private sector organizations and the recent increase in the use of data mining technologies are the key drivers in this evaluation of the use of data mining tools as an alternative to classical statistics when measuring intangibles. Data was collected using a questionnaire that was sent to the senior executives of a number of mid-sized companies located in the mid-west of the USA. Using Microsoft's SQL Server's Analytical Services (MSSAS) and the data provided by the respondents, five predictive models are built to test the suitability of the MSSAS' DM tool for assessing the relationships between and the impact of KM on OR. Of the five models constructed as part of this research, four classification models (two Naïve Bayes models, one neural network model, and one decision tree model) and one clustering model were found to be suitable tools for capturing the intricate relationships that exist between KM and OR. These models made it possible to evaluate the strengths of the relationships between KM and OR and to identify which KM processes contribute, and to what extent, to OR. In addition, the models enabled the collation of predicted OR scores, based on the responses given in the questionnaire. Finally, this research identifies some of the key challenges associated with using DM as a measurement instrument for assessing the relationship between and the impact of KM on OR. This research makes a number of significant contributions to the existing body of knowledge. It contributes to the understanding of the impact of KM on OR, to the understanding of the methods used to measure such impact and to the processes involved in measuring such impact using DM. From a practitioner perspective, this research contributes to the understanding of OR and provides a framework for achieving OR within an organizational context

    Mining Social Media and Structured Data in Urban Environmental Management to Develop Smart Cities

    Get PDF
    This research presented the deployment of data mining on social media and structured data in urban studies. We analyzed urban relocation, air quality and traffic parameters on multicity data as early work. We applied the data mining techniques of association rules, clustering and classification on urban legislative history. Results showed that data mining could produce meaningful knowledge to support urban management. We treated ordinances (local laws) and the tweets about them as indicators to assess urban policy and public opinion. Hence, we conducted ordinance and tweet mining including sentiment analysis of tweets. This part of the study focused on NYC with a goal of assessing how well it heads towards a smart city. We built domain-specific knowledge bases according to widely accepted smart city characteristics, incorporating commonsense knowledge sources for ordinance-tweet mapping. We developed decision support tools on multiple platforms using the knowledge discovered to guide urban management. Our research is a concrete step in harnessing the power of data mining in urban studies to enhance smart city development

    An association rule dynamics and classification approach to event detection and tracking in Twitter.

    Get PDF
    Twitter is a microblogging application used for sending and retrieving instant on-line messages of not more than 140 characters. There has been a surge in Twitter activities since its launch in 2006 as well as steady increase in event detection research on Twitter data (tweets) in recent years. With 284 million monthly active users Twitter has continued to grow both in size and activity. The network is rapidly changing the way global audience source for information and influence the process of journalism [Newman, 2009]. Twitter is now perceived as an information network in addition to being a social network. This explains why traditional news media follow activities on Twitter to enhance their news reports and news updates. Knowing the significance of the network as an information dissemination platform, news media subscribe to Twitter accounts where they post their news headlines and include the link to their on-line news where the full story may be found. Twitter users in some cases, post breaking news on the network before such news are published by traditional news media. This can be ascribed to Twitter subscribers' nearness to location of events. The use of Twitter as a network for information dissemination as well as for opinion expression by different entities is now common. This has also brought with it the issue of computational challenges of extracting newsworthy contents from Twitter noisy data. Considering the enormous volume of data Twitter generates, users append the hashtag (#) symbol as prefix to keywords in tweets. Hashtag labels describe the content of tweets. The use of hashtags also makes it easy to search for and read tweets of interest. The volume of Twitter streaming data makes it imperative to derive Topic Detection and Tracking methods to extract newsworthy topics from tweets. Since hashtags describe and enhance the readability of tweets, this research is developed to show how the appropriate use of hashtags keywords in tweets can demonstrate temporal evolvements of related topic in real-life and consequently enhance Topic Detection and Tracking on Twitter network. We chose to apply our method on Twitter network because of the restricted number of characters per message and for being a network that allows sharing data publicly. More importantly, our choice was based on the fact that hashtags are an inherent component of Twitter. To this end, the aim of this research is to develop, implement and validate a new approach that extracts newsworthy topics from tweets' hashtags of real-life topics over a specified period using Association Rule Mining. We termed our novel methodology Transaction-based Rule Change Mining (TRCM). TRCM is a system built on top of the Apriori method of Association Rule Mining to extract patterns of Association Rules changes in tweets hashtag keywords at different periods of time and to map the extracted keywords to related real-life topic or scenario. To the best of our knowledge, the adoption of dynamics of Association Rules of hashtag co-occurrences has not been explored as a Topic Detection and Tracking method on Twitter. The application of Apriori to hashtags present in tweets at two consecutive period t and t + 1 produces two association rulesets, which represents rules evolvement in the context of this research. A change in rules is discovered by matching every rule in ruleset at time t with those in ruleset at time t + 1. The changes are grouped under four identified rules namely 'New' rules, 'Unexpected Consequent' and 'Unexpected Conditional' rules, 'Emerging' rules and 'Dead' rules. The four rules represent different levels of topic real-life evolvements. For example, the emerging rule represents very important occurrence such as breaking news, while unexpected rules represents unexpected twist of event in an on-going topic. The new rule represents dissimilarity in rules in rulesets at time t and t+1. Finally, the dead rule represents topic that is no longer present on the Twitter network. TRCM revealed the dynamics of Association Rules present in tweets and demonstrates the linkage between the different types of rule dynamics to targeted real-life topics/events. In this research, we conducted experimental studies on tweets from different domains such as sports and politics to test the performance effectiveness of our method. We validated our method, TRCM with carefully chosen ground truth. The outcome of our research experiments include: Identification of 4 rule dynamics in tweets' hashtags namely: New rules, Emerging rules, Unexpected rules and 'Dead' rules using Association Rule Mining. These rules signify how news and events evolved in real-life scenario. Identification of rule evolvements on Twitter network using Rule Trend Analysis and Rule Trace. Detection and tracking of topic evolvements on Twitter using Transaction-based Rule Change Mining TRCM. Identification of how the peculiar features of each TRCM rules affect their performance effectiveness on real datasets

    Learning a Pose Lexicon for Semantic Action Recognition

    Get PDF
    This paper presents a novel method for learning a pose lexicon comprising semantic poses defined by textual instructions and their associated visual poses defined by visual features. The proposed method simultaneously takes two input streams, semantic poses and visual pose candidates, and statistically learns a mapping between them to construct the lexicon. With the learned lexicon, action recognition can be cast as the problem of finding the maximum translation probability of a sequence of semantic poses given a stream of visual pose candidates. Experiments evaluating pre-trained and zero-shot action recognition conducted on MSRC-12 gesture and WorkoutSu-10 exercise datasets were used to verify the efficacy of the proposed method.Comment: Accepted by the 2016 IEEE International Conference on Multimedia and Expo (ICME 2016). 6 pages paper and 4 pages supplementary materia

    Computational Generalization in Taxonomies Applied to: (1) Analyze Tendencies of Research and (2) Extend User Audiences

    Get PDF
    D.F. and B.M. acknowledge continuing support by the Academic Fund Program at the NRU HSE (grant-19-04-019 in 2018?2019) and by the DECAN Lab NRU HSE, in the framework of a subsidy granted to the HSE by the Government of the Russian Federation for the implementation of the Russian Academic Excellence Project ?5-100?. S.N. acknowledges the support by FCT/MCTES, NOVA LINCS (UID/CEC/04516/2019).We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors referred to as “gaps” and “offshoots”. Our method, ParGenFS, globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. Two applications are considered: (1) analysis of tendencies of research in Data Science; (2) audience extending for programmatic targeted advertising online. The former involves a taxonomy of Data Science derived from the celebrated ACM Computing Classification System 2012. Based on a collection of research papers published by Springer 1998–2017, and applying in-house methods for text analysis and fuzzy clustering, we derive fuzzy clusters of leaf topics in learning, retrieval and clustering. The head subjects of these clusters inform us of some general tendencies of the research. The latter involves publicly available IAB Tech Lab Content Taxonomy. Each of about 25 mln users is assigned with a fuzzy profile within this taxonomy, which is generalized offline using ParGenFS. Our experiments show that these head subjects effectively extend the size of targeted audiences at least twice without loosing quality.authorsversionpublishe
    corecore