Search CORE

228 research outputs found

Twitter data analysis by means of Strong Flipping Generalized Itemsets

Author: Aggarwal
Agrawal
Agrawal
Baralis
Barsky
Benevenuto
Bird
Brin
Cagliero
Cagliero
Cagliero
Cagliero
Cagliero
Cheong
DBDMG
Dean
Gharib
Glance
Guo
Han
Han
Han
Heymann
Hilderman
Kasneci
Kimball
Kumar Pal
Kunkle
Li
Li
Li
Lin
Luca Cagliero
Luigi Grimaudo
Mathioudakis
Pagano
Paolo Garza
Pasquier
Savasere
Srikant
Sriphaew
T.A.H. Project
T.A.M. Project
Tan
Tania Cerquitelli
Tian
Wu
Yin
Publication venue: Elsevier
Publication date: 01/01/2014
Field of study

Twitter data has recently been considered to perform a large variety of advanced analysis. Analysis ofTwitter data imposes new challenges because the data distribution is intrinsically sparse, due to a large number of messages post every day by using a wide vocabulary. Aimed at addressing this issue, generalized itemsets - sets of items at different abstraction levels - can be effectively mined and used todiscover interesting multiple-level correlations among data supplied with taxonomies. Each generalizeditemset is characterized by a correlation type (positive, negative, or null) according to the strength of thecorrelation among its items.This paper presents a novel data mining approach to supporting different and interesting targetedanalysis - topic trend analysis, context-aware service profiling - by analyzing Twitter posts. We aim atdiscovering contrasting situations by means of generalized itemsets. Specifically, we focus on comparingitemsets discovered at different abstraction levels and we select large subsets of specific (descendant)itemsets that show correlation type changes with respect to their common ancestor. To this aim, a novelkind of pattern, namely the Strong Flipping Generalized Itemset (SFGI), is extracted from Twitter mes-sages and contextual information supplied with taxonomy hierarchies. Each SFGI consists of a frequentgeneralized itemset X and the set of its descendants showing a correlation type change with respect to X. Experiments performed on both real and synthetic datasets demonstrate the effectiveness of the pro-posed approach in discovering interesting and hidden knowledge from Twitter dat

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Data mining by means of generalized patterns

Author: Cagliero Luca
Publication venue
Publication date: 01/01/2012
Field of study

The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Discovering Generalized Association Rules from Twitter

Author: Cagliero L.
Fiori A.
Publication venue: 'IOS Press'
Publication date: 01/01/2013
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Event detection in high throughput social media

Author: Weiler Michael
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 20/12/2016
Field of study

Feature Extraction and Duplicate Detection for Text Mining: A Survey

Author: Ramya R S
Venugopal K R
Publication venue: Global Journals Inc. (US)
Publication date: 22/04/2016
Field of study

Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user

Global Journal of Computer Science and Technology (GJCST)

Exploring Data Hierarchies to Discover Knowledge in Different Domains

Author: Ricupero Giuseppe
Publication venue: Politecnico di Torino
Publication date
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Mining XML documents with association rule algorithms

Author: Gürel Görkem
Publication venue: Izmir Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 2008Includes bibliographical references (leaves: 59-63)Text in English; Abstract: Turkish and Englishx, 63 leavesFollowing the increasing use of XML technology for data storage and data exchange between applications, the subject of mining XML documents has become more researchable and important topic. In this study, we considered the problem of Mining Association Rules between items in XML document. The principal purpose of this study is applying association rule algorithms directly to the XML documents with using XQuery which is a functional expression language that can be used to query or process XML data. We used three different algorithms; Apriori, AprioriTid and High Efficient AprioriTid. We give comparisons of mining times of these three apriori-like algorithms on XML documents using different support levels, different datasets and different dataset sizes

Research on Personalized Recommender System for Tourism Information Service

Author: Dan Yao
Jing Luo
Mu Zhang
Yu Huang
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 28/05/2013
Field of study

Since the development in the 1990s, Recommender system has been widely applied in various fields. The conflict between the expansion of tourism information and difficulty of tourists obtaining tourism information allows Tourism Information Recommender System to have a practical significance. Based on the existing online tourism information service and the mature recommendation algorithms, Personal Recommender System can be used to solve present problems of the key recommendation algorithms. In the first place, this research presents an overview of researches on this issue both at home and abroad, and analyzes the applications of main stream recommendation algorithms. Secondly, a comparative study of domestic and international tourism information service websites is conducted. Drawbacks in their applications are defined and advantages are adopted in the settings of Recommender System. Finally, this research provides the framework of Recommender System, which combines the design and test of algorithms and the existing tourism information recommendation websites. This system allows customers to broaden experience of tourism information service and make tourism decisions more accurately and rapidly. Keywords: Tourism information service, Personalized recommendation, Intelligence recommendation module, Apriori algorith

International Institute for Science, Technology and Education (IISTE): E-Journals

Closing the gap: Sequence mining at scale

Author: Beedkar Kaustubh
Berberich Klaus
Gemulla Rainer
Miliaraki Iris
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are sufficiently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this article, we propose MG-FSM, a scalable algorithm for frequent sequence mining on MapReduce. MG-FSM can handle so-called “gap constraints”, which can be used to limit the output to a controlled set of frequent sequences. Both positional and temporal gap constraints, as well as appropriate maximality and closedness constraints, are supported. At its heart, MG-FSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of ω-equivalency, which is a generalization of the notion of a “projected database” used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our experimental study in the contexts of text mining and session analysis suggests that MG-FSM is significantly more efficient and scalable than alternative approaches

MAnnheim DOCument Server

MPG.PuRe

Digging deep into weighted patient data through multiple-level patterns

Author: BARALIS ELENA MARIA
CAGLIERO LUCA
CERQUITELLI TANIA
CHIUSANO SILVIA ANNA
GARZA PAOLO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Large data volumes have been collected by healthcare organizations at an unprecedented rate. Today both physicians and healthcare system managers are very interested in extracting value from such data. Nevertheless, the increasing data complexity and heterogeneity prompts the need for new efficient and effective data mining approaches to analyzing large patient datasets. Generalized association rule mining algorithms can be exploited to automatically extract hidden multiple-level associations among patient data items (e.g., examinations, drugs) from large datasets equipped with taxonomies. However, in current approaches all data items are assumed to be equally relevant within each transaction, even if this assumption is rarely true. This paper presents a new data mining environment targeted to patient data analysis. It tackles the issue of extracting generalized rules from weighted patient data, where items may weight differently according to their importance within each transaction. To this aim, it proposes a novel type of association rule, namely the Weighted Generalized Association Rule (W-GAR). The usefulness of the proposed pattern has been evaluated on real patient datasets equipped with a taxonomy built over examinations and drugs. The achieved results demonstrate the effectiveness of the proposed approach in mining interesting and actionable knowledge in a real medical care scenario

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino