Search CORE

210,630 research outputs found

Recommended from our members

Towards a Domain – Specific Comparative Analysis of Data Mining Tools

Author: Lozovikas Daniel
Oo May
Subramanian Ramesh
Publication venue: CSUSB ScholarWorks
Publication date: 23/10/2023
Field of study

Advancement in technology has brought in widespread adoption and utilization of data mining tools. Successful implementation of data mining requires a careful assessment of the various data mining tools. Although several works have compared data mining tools based on usability, opensource, integrated data mining tools for statistical analysis, big/small scale, and data visualization, none of them has suggested the tools for various industry-sectors. This paper attempts to provide a comparative study of various data mining tools based on popularity and usage among various industry-sectors such as business, education, and healthcare. The factors used in the comparison are performance and scalability, data access, data preparation, data exploration and visualization, advanced modeling capabilities, programming language, operating system, interfaces, ease of use, and price/license. The following popular data mining tools are assessed: SAS Enterprise Miner, KNIME, and R for business, Moodle Learning Analytics, Blackboard Analytics, and Canvas for education, and RapidMiner, IBM Watson Health, and Tableau for healthcare. It also discusses the critical issues and challenges associated with the adoption of data mining tools. Furthermore, it suggests possible solutions to help various industries choose the best data mining tool that covers their respective data mining requirements

CSUSB ScholarWorks

On data integration workflows for an effective management of multidimensional petroleum digital ecosystems in Arabian Gulf Basins

Author: Ahmed M.
Dreher Heinz
Laiq K.
Nawaz M.
Nimmagadda Shastri
Sabry A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Data integration of multiple heterogeneous datasets from multidimensional petroleum digital ecosystems is an effective way, for extracting information and adding value to knowledge domain from multiple producing onshore and offshore basins. At present, data from multiple basins are scattered and unusable for data integration, because of scale and format differences. Ontology based warehousing and mining modeling are recommended for resolving the issues of scaling and formatting of multidimensional datasets, in which case, seismic and well-domain datasets are described. Issues, such as semantics among different data dimensions and their associated attributes are also addressed by Ontology modeling.Intelligent relationships are built among several petroleum system domains (structure, reservoir, source and seal, for example) at global scale and facilitated the integration process among multiple dimensions in a data warehouse environment. For this purpose, integrated workflows are designed for capturing and modeling unknown relationships among petroleum system data attributes in interpretable knowledge domains.This study is an effective approach in mining and interpreting data views drawn from warehoused exploration and production metadata, with special reference to Arabian onshore and offshore basins

espace@Curtin

Quantifying discrepancies in opinion spectra from online and offline networks

Author: Hahn Kyu S.
Lee Deokjae
Park Juyong
Yook Soon-Hyung
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/04/2015
Field of study

Online social media such as Twitter are widely used for mining public opinions and sentiments on various issues and topics. The sheer volume of the data generated and the eager adoption by the online-savvy public are helping to raise the profile of online media as a convenient source of news and public opinions on social and political issues as well. Due to the uncontrollable biases in the population who heavily use the media, however, it is often difficult to measure how accurately the online sphere reflects the offline world at large, undermining the usefulness of online media. One way of identifying and overcoming the online-offline discrepancies is to apply a common analytical and modeling framework to comparable data sets from online and offline sources and cross-analyzing the patterns found therein. In this paper we study the political spectra constructed from Twitter and from legislators' voting records as an example to demonstrate the potential limits of online media as the source for accurate public opinion mining.Comment: 10 pages, 4 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Mining Public Opinion about Economic Issues: Twitter and the U.S. Presidential Election

Author: .Report
A. Z.Khan
A.Bermingham
A.Boutet
A.Karami
G.Mishne
J.Stein
R.Balasubramanyan
Publication venue: 'IGI Global'
Publication date: 01/01/2018
Field of study

Opinion polls have been the bridge between public opinion and politicians in elections. However, developing surveys to disclose people's feedback with respect to economic issues is limited, expensive, and time-consuming. In recent years, social media such as Twitter has enabled people to share their opinions regarding elections. Social media has provided a platform for collecting a large amount of social media data. This paper proposes a computational public opinion mining approach to explore the discussion of economic issues in social media during an election. Current related studies use text mining methods independently for election analysis and election prediction; this research combines two text mining methods: sentiment analysis and topic modeling. The proposed approach has effectively been deployed on millions of tweets to analyze economic concerns of people during the 2012 US presidential election

arXiv.org e-Print Archive

Crossref

Scholar Commons - Institutional Repository of the University of South Carolina

Conservation science in NOAA’s National Marine Sanctuaries: description and recent accomplishments

Author: Gittings Stephen R.
Publication venue: NOAA/National Ocean Service/Office of National Marine Sanctuaries
Publication date: 01/01/2006
Field of study

This report describes cases relating to the management of national marine sanctuaries in which certain scientific information was required so managers could make decisions that effectively protected trust resources. The cases presented represent only a fraction of difficult issues that marine sanctuary managers deal with daily. They include, among others, problems related to wildlife disturbance, vessel routing, marine reserve placement, watershed management, oil spill response, and habitat restoration. Scientific approaches to address these problems vary significantly, and include literature surveys, data mining, field studies (monitoring, mapping, observations, and measurement), geospatial and biogeographic analysis, and modeling. In most cases there is also an element of expert consultation and collaboration among multiple partners, agencies with resource protection responsibilities, and other users and stakeholders. The resulting management responses may involve direct intervention (e.g., for spill response or habitat restoration issues), proposal of boundary alternatives for marine sanctuaries or reserves, changes in agency policy or regulations, making recommendations to other agencies with resource protection responsibilities, proposing changes to international or domestic shipping rules, or development of new education or outreach programs. (PDF contains 37 pages.

Aquatic Commons

Challenging Issues of Spatio-Temporal Data Mining

Author: Hossain Md. Anwar
Rashid A.N.M. Bazlur
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/03/2012
Field of study

The spatio-temporal database (STDB) has received considerable attention during the past few years, due to the emergence of numerous applications (e.g., flight control systems, weather forecast, mobile computing, etc.) that demand efficient management of moving objects. These applications record objects' geographical locations (sometimes also shapes) at various timestamps and support queries that explore their historical and future (predictive) behaviors. The STDB significantly extends the traditional spatial database, which deals with only stationary data and hence is inapplicable to moving objects, whose dynamic behavior requires re-investigation of numerous topics including data modeling, indexes, and the related query algorithms. In many application areas, huge amounts of data are generated, explicitly or implicitly containing spatial or spatiotemporal information. However, the ability to analyze these data remains inadequate, and the need for adapted data mining tools becomes a major challenge. In this paper, we have presented the challenging issues of spatio-temporal data mining. Keywords: database, data mining, spatial, temporal, spatio-tempora

International Institute for Science, Technology and Education (IISTE): E-Journals

Examining Granular Computing from a Modeling Perspective

Author: Katukuri Jayasimha R.
Raghavan Vijay V.
Xie Ying
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 01/01/2008
Field of study

In this paper, we use a set of unified components to conduct granular modeling for problem solving paradigms in several fields of computing. Each identified component may represent a potential research direction in the field of granular computing. A granular computing model for information analysis is proposed. The model may suggest that granular computing is an instrument for implementing perception based computing based on numeric computing. In addition, a novel granular language modeling technique is proposed for information extraction from web pages. This paper also suggests that the study of data mining in the framework of granular computing may address the issues of interpretability and usage of discovered patterns

DigitalCommons@Kennesaw State University

Analysis of WEKA data mining algorithms Bayes net, random forest, MLP and SMO for heart disease prediction system: A case study in Iraq

Author: Falih Saedi Ahmed Yousif
K. AL-Taie Rana Riad
Saleh Basma Jumaa
Salman Lamees Abdalhasan
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2021
Field of study

Data mining is defined as a search through large amounts of data for valuable information. The association rules, grouping, clustering, prediction, sequence modeling is some essential and most general strategies for data extraction. The processing of data plays a major role in the healthcare industry's disease detection. A variety of disease evaluations should be required to diagnose the patient. However, using data mining strategies, the number of examinations should be decreased. This decreased examination plays a crucial role in terms of time and results. Heart disease is a death-provoking disorder. In this recent instance, health issues are immense because of the availability of health issues and the grouping of various situations. Today, secret information is important in the healthcare industry to make decisions. For the prediction of cardiovascular problems, (Weka 3.8.3) tools for this analysis are used for the prediction of data extraction algorithms like sequential minimal optimization (SMO), multilayer perceptron (MLP), random forest and Bayes net. The data collected combine the prediction accuracy results, the receiver operating characteristic (ROC) curve, and the PRC value. The performance of Bayes net (94.5%) and random forest (94%) technologies indicates optimum performance rather than the sequential minimal optimization (SMO) and multilayer perceptron (MLP) methods

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

RANDOMIZATION BASED PRIVACY PRESERVING CATEGORICAL DATA ANALYSIS

Author: Guo Ling
NC DOCKS at The University of North Carolina at Charlotte
Publication venue
Publication date: 01/01/2010
Field of study

The success of data mining relies on the availability of high quality data. To ensure quality data mining, effective information sharing between organizations becomes a vital requirement in today’s society. Since data mining often involves sensitive infor- mation of individuals, the public has expressed a deep concern about their privacy. Privacy-preserving data mining is a study of eliminating privacy threats while, at the same time, preserving useful information in the released data for data mining. This dissertation investigates data utility and privacy of randomization-based mod- els in privacy preserving data mining for categorical data. For the analysis of data utility in randomization model, we first investigate the accuracy analysis for associ- ation rule mining in market basket data. Then we propose a general framework to conduct theoretical analysis on how the randomization process affects the accuracy of various measures adopted in categorical data analysis. We also examine data utility when randomization mechanisms are not provided to data miners to achieve better privacy. We investigate how various objective associ- ation measures between two variables may be affected by randomization. We then extend it to multiple variables by examining the feasibility of hierarchical loglinear modeling. Our results provide a reference to data miners about what they can do and what they can not do with certainty upon randomized data directly without the knowledge about the original distribution of data and distortion information. Data privacy and data utility are commonly considered as a pair of conflicting re- quirements in privacy preserving data mining applications. In this dissertation, we investigate privacy issues in randomization models. In particular, we focus on the attribute disclosure under linking attack in data publishing. We propose efficient so- lutions to determine optimal distortion parameters such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomiza- tion approach with l-diversity and anatomy in terms of utility preservation (under the same privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss

The University of North Carolina at Greensboro

Recommended from our members

Application of Data Mining in Air Traffic Forecasting

Author: Airbus
Airports Commission
Aviation Integrated
Benoit K.
Bierlaire M .
Boeing
Bureau
Bureau
Bureau
Bureau
Bureau of Transportation Statistics - Research and Innovative Technology Administration (BTS-RITA)
Census Bureau
Cheng T.
Cumming S.
Department
Dray L. M .
Eurocontrol
Henckels E.
Intergovernmental Panel
Nam K.
Pindyck R. S.
Schäfer A.
Smalley E.
Srivastava A.
Swan W. M .
Taneja K.
Turtschi A.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 17/06/2015
Field of study

The main goal of the study centers on developing a model for the purpose of air traffic forecasting by using off-the-shelf data mining and machine learning techniques. Although data driven modeling has been extensively applied in the aviation sector, little research has been done in the area of air traffic forecasting. This study is inspired by previous research focused on improving the Federal Aviation Administration (FAA) Terminal Area Forecasting (TAF) methodology, which historically assumed that the US air transportation system (ATS) network structure was static. Recent developments use data mining algorithms to predict the likelihood of previously un-connected airport-pairs being connected in the future, and the likelihood of connected airport-pairs becoming un-connected. Despite the innovation of this research, it does not focus on improving the FAA’s existing methodology for forecasting future air traffic levels on existing routes, which is based on relatively simple regression and growth models. We investigate different approaches for improving and developing new features within the existing data mining applications in air traffic forecasting. We focus particularly on predicting detailed traffic information for the US ATS. Initially, a 2-stage log-log model is applied to establish the significance of different inputs and to identify issues of endogeneity and multi-colinearity, while maintaining the simplicity of current models. Although the model shows high goodness of fit, it tested positive for both mentioned issues as well as presenting problems with causality. With the objective of solving these issues, a 3-stage model that is under development is introduced. This model employs logistic regression and discrete choice modelling. As part of future work, machine learning techniques such as clustering and neural networks will be applied to improve this model’s performance

City Research Online

Crossref

Scipedia