Search CORE

1,341 research outputs found

Data Mining

Author
Publication venue: 'IntechOpen'
Publication date: 27/07/2022
Field of study

The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

Directory of Open Access Books (DOAB)

Fraud detection for online banking for scalable and distributed data

Author: Haq Ikram
Publication venue: 'Federation University Australia'
Publication date: 01/01/2020
Field of study

Online fraud causes billions of dollars in losses for banks. Therefore, online banking fraud detection is an important field of study. However, there are many challenges in conducting research in fraud detection. One of the constraints is due to unavailability of bank datasets for research or the required characteristics of the attributes of the data are not available. Numeric data usually provides better performance for machine learning algorithms. Most transaction data however have categorical, or nominal features as well. Moreover, some platforms such as Apache Spark only recognizes numeric data. So, there is a need to use techniques e.g. One-hot encoding (OHE) to transform categorical features to numerical features, however OHE has challenges including the sparseness of transformed data and that the distinct values of an attribute are not always known in advance. Efficient feature engineering can improve the algorithm’s performance but usually requires detailed domain knowledge to identify correct features. Techniques like Ripple Down Rules (RDR) are suitable for fraud detection because of their low maintenance and incremental learning features. However, high classification accuracy on mixed datasets, especially for scalable data is challenging. Evaluation of RDR on distributed platforms is also challenging as it is not available on these platforms. The thesis proposes the following solutions to these challenges: • We developed a technique Highly Correlated Rule Based Uniformly Distribution (HCRUD) to generate highly correlated rule-based uniformly-distributed synthetic data. • We developed a technique One-hot Encoded Extended Compact (OHE-EC) to transform categorical features to numeric features by compacting sparse-data even if all distinct values are unknown. • We developed a technique Feature Engineering and Compact Unified Expressions (FECUE) to improve model efficiency through feature engineering where the domain of the data is not known in advance. • A Unified Expression RDR fraud deduction technique (UE-RDR) for Big data has been proposed and evaluated on the Spark platform. Empirical tests were executed on multi-node Hadoop cluster using well-known classifiers on bank data, synthetic bank datasets and publicly available datasets from UCI repository. These evaluations demonstrated substantial improvements in terms of classification accuracy, ruleset compactness and execution speed.Doctor of Philosoph

Federation ResearchOnline

Human-competitive automatic topic indexing

Author: Medelyan Olena
Publication venue: The University of Waikato
Publication date: 01/01/2009
Field of study

Topic indexing is the task of identifying the main topics covered by a document. These are useful for many purposes: as subject headings in libraries, as keywords in academic publications and as tags on the web. Knowing a document's topics helps people judge its relevance quickly. However, assigning topics manually is labor intensive. This thesis shows how to generate them automatically in a way that competes with human performance. Three kinds of indexing are investigated: term assignment, a task commonly performed by librarians, who select topics from a controlled vocabulary; tagging, a popular activity of web users, who choose topics freely; and a new method of keyphrase extraction, where topics are equated to Wikipedia article names. A general two-stage algorithm is introduced that first selects candidate topics and then ranks them by significance based on their properties. These properties draw on statistical, semantic, domain-specific and encyclopedic knowledge. They are combined using a machine learning algorithm that models human indexing behavior from examples. This approach is evaluated by comparing automatically generated topics to those assigned by professional indexers, and by amateurs. We claim that the algorithm is human-competitive because it chooses topics that are as consistent with those assigned by humans as their topics are with each other. The approach is generalizable, requires little training data and applies across different domains and languages

Research Commons@Waikato

CERN Document Server

Integrating sensors data in optimization methods for sustainable urban logistic

Author: FADDA EDOARDO
Publication venue: country:Italy
Publication date: 20/03/2018
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Efficiency of Hybrid Algorithms for Estimating the Shear Strength of Deep Reinforced Concrete Beams

Author: Barkhordari Mohammad Sadegh
Feng De-Cheng
Tehranizadeh Mohsen
Publication venue: 'Periodica Polytechnica Budapest University of Technology and Economics'
Publication date: 30/03/2022
Field of study

Earthquakes occurred in recent years have highlighted the need to examine the strength of reinforced concrete (RC) members. RC beams are one of the elements of reinforced concrete structures. Due to the dramatic increase in the population and the number of medium/high-rise buildings, in recent years, the beams of buildings have been mainly designed and executed in the type of deep beams. In this study, the artificial neural network (ANN) with optimization algorithms, including particle swarm optimization (PSO), Archimedes optimization algorithm (AOA), and sparrow search algorithm (SSA), are used to determine the shear strength of reinforced concrete deep (RCD) beams. 271 samples from experimental tests are employed to develop algorithms. The results of this study, design codes equations, and previous research are compared. Comparison between the results shows that the PSO-ANN algorithm is more accurate than previous methods. Finally, SHApley Additive exPlanations (SHAP) method is utilized to explain the predictions. SHAP reveals that the beam span and the ratio of the beam span to beam depth have the highest impact in predicting shear strength

Periodica Polytechnica (Budapest University of Technology and Economics)

Natural Language Processing in-and-for Design Research

Author: Blessing Lucienne T. M.
Luo Jianxi
Siddharth L
Publication venue
Publication date: 27/11/2021
Field of study

We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research

arXiv.org e-Print Archive

New Fundamental Technologies in Data Mining

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

Directory of Open Access Books (DOAB)

Review of feature selection techniques in Parkinson's disease using OCT-imaging data

Author: Reyero Lobo Paula
Publication venue
Publication date: 09/06/2020
Field of study

Several spectral-domain optical coherence tomography studies (OCT) reported a decrease on the macular region of the retina in Parkinson’s disease. Yet, the implication of retinal thinning with visual disability is still unclear. Macular scans acquired from patients with Parkinson’s disease (n = 100) and a control group (n = 248) were used to train several supervised classification models. The goal was to determine the most relevant retinal layers and regions for diagnosis, for which univari- ate and multivariate filter and wrapper feature selection methods were used. In addition, we evaluated the classification ability of the patient group to assess the applicability of OCT measurements as a biomarker of the disease

Archivo Digital para la Docencia y la Investigación

Applied Metaheuristic Computing

Author
Publication venue: 'MDPI AG'
Publication date: 06/12/2022
Field of study

For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

Directory of Open Access Books (DOAB)

A survey on pre-processing techniques: relevant issues in the context of environmental data mining

Author: Gibert Karina
Izquierdo Joaquín
Sànchez-Marrè Miquel
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

RiuNet