Search CORE

20,537 research outputs found

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Automatic domain ontology extraction for context-sensitive opinion mining

Author: Lai Chapmann C.L.
Lau Raymond Y.K.
Li Yuefeng
Ma Jian
Publication venue
Publication date: 01/01/2009
Field of study

Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline

AIS Electronic Library (AISeL)

SENATUS: An Approach to Joint Traffic Anomaly Detection and Root Cause Analysis

Author: Abdelkefi Atef
Jiang Yuming
Sharma Sachin
Publication venue
Publication date: 24/11/2017
Field of study

In this paper, we propose a novel approach, called SENATUS, for joint traffic anomaly detection and root-cause analysis. Inspired from the concept of a senate, the key idea of the proposed approach is divided into three stages: election, voting and decision. At the election stage, a small number of \nop{traffic flow sets (termed as senator flows)}senator flows are chosen\nop{, which are used} to represent approximately the total (usually huge) set of traffic flows. In the voting stage, anomaly detection is applied on the senator flows and the detected anomalies are correlated to identify the most possible anomalous time bins. Finally in the decision stage, a machine learning technique is applied to the senator flows of each anomalous time bin to find the root cause of the anomalies. We evaluate SENATUS using traffic traces collected from the Pan European network, GEANT, and compare against another approach which detects anomalies using lossless compression of traffic histograms. We show the effectiveness of SENATUS in diagnosing anomaly types: network scans and DoS/DDoS attacks

arXiv.org e-Print Archive

Quantitative Redundancy in Partial Implications

Author: Balcázar José L.
Publication venue
Publication date: 01/01/2015
Field of study

We survey the different properties of an intuitive notion of redundancy, as a function of the precise semantics given to the notion of partial implication. The final version of this survey will appear in the Proceedings of the Int. Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201

arXiv.org e-Print Archive

A Survey on Data Mining Algorithm for Market Basket Analysis

Author: Dr. M. Dhanabhakyam
Publication venue: Global Journals Inc. (US)
Publication date: 26/05/2011
Field of study

Association rule mining identifies the remarkable association or relationship between a large set of data items. With huge quantity of data constantly being obtained and stored in databases, several industries are becoming concerned in mining association rules from their databases. For example, the detection of interesting association relationships between large quantities of business transaction data can assist in catalog design, cross-marketing, lossleader analysis, and various business decision making processes. A typical example of association rule mining is market basket analysis. This method examines customer buying patterns by identifying associations among various items that customers place in their shopping baskets. The identification of such associations can assist retailers expand marketing strategies by gaining insight into which items are frequently purchased jointly by customers. It is helpful to examine the customer purchasing behavior and assists in increasing the sales and conserve inventory by focusing on the point of sale transaction data. This work acts as a broad area for the researchers to develop a better data mining algorithm. This paper presents a survey about the existing data mining algorithm for market basket analysis

Digging deep into weighted patient data through multiple-level patterns

Author: BARALIS ELENA MARIA
CAGLIERO LUCA
CERQUITELLI TANIA
CHIUSANO SILVIA ANNA
GARZA PAOLO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Large data volumes have been collected by healthcare organizations at an unprecedented rate. Today both physicians and healthcare system managers are very interested in extracting value from such data. Nevertheless, the increasing data complexity and heterogeneity prompts the need for new efficient and effective data mining approaches to analyzing large patient datasets. Generalized association rule mining algorithms can be exploited to automatically extract hidden multiple-level associations among patient data items (e.g., examinations, drugs) from large datasets equipped with taxonomies. However, in current approaches all data items are assumed to be equally relevant within each transaction, even if this assumption is rarely true. This paper presents a new data mining environment targeted to patient data analysis. It tackles the issue of extracting generalized rules from weighted patient data, where items may weight differently according to their importance within each transaction. To this aim, it proposes a novel type of association rule, namely the Weighted Generalized Association Rule (W-GAR). The usefulness of the proposed pattern has been evaluated on real patient datasets equipped with a taxonomy built over examinations and drugs. The achieved results demonstrate the effectiveness of the proposed approach in mining interesting and actionable knowledge in a real medical care scenario

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino