Search CORE

878 research outputs found

Attribute Selection Algorithm with Clustering based Optimization Approach based on Mean and Similarity Distance

Author: Kaseebhotla Rajasekhar
Rao K. Raghava
Rao Mallikarjuna
Publication venue: Auricle Global Society of Education and Research
Publication date: 18/08/2023
Field of study

With hundreds or thousands of attributes in high-dimensional data, the computational workload is challenging. Attributes that have no meaningful influence on class predictions throughout the classification process increase the computing load. This article's goal is to use attribute selection to reduce the size of high-dimensional data, which will lessen the computational load. Considering selected attribute subsets that cover all attributes. As a result, there are two stages to the process: filtering out superfluous information and settling on a single attribute to stand in for a group of similar but otherwise meaningless characteristics. Numerous studies on attribute selection, including backward and forward selection, have been undertaken. This experiment and the accuracy of the categorization result recommend a k-means based PSO clustering-based attribute selection. It is likely that related attributes are present in the same cluster while irrelevant attributes are not identified in any clusters. Datasets for Credit Approval, Ionosphere, Annealing, Madelon, Isolet, and Multiple Attributes are employed alongside two other high-dimensional datasets. Both databases include the class label for each data point. Our test demonstrates that attribute selection using k-means clustering may be done to offer a subset of characteristics and that doing so produces classification outcomes that are more accurate than 80%

International Journal on Recent and Innovation Trends in Computing and Communication

Homophily-Related: Adaptive Hybrid Graph Filter for Multi-View Graph Clustering

Author: Chen Jianpeng
Hao Zhifeng
He Lifang
Ling Yawen
Pu Xiaorong
Ren Yazhou
Wen Zichen
Wu Tianyi
Publication venue
Publication date: 05/01/2024
Field of study

Recently there is a growing focus on graph data, and multi-view graph clustering has become a popular area of research interest. Most of the existing methods are only applicable to homophilous graphs, yet the extensive real-world graph data can hardly fulfill the homophily assumption, where the connected nodes tend to belong to the same class. Several studies have pointed out that the poor performance on heterophilous graphs is actually due to the fact that conventional graph neural networks (GNNs), which are essentially low-pass filters, discard information other than the low-frequency information on the graph. Nevertheless, on certain graphs, particularly heterophilous ones, neglecting high-frequency information and focusing solely on low-frequency information impedes the learning of node representations. To break this limitation, our motivation is to perform graph filtering that is closely related to the homophily degree of the given graph, with the aim of fully leveraging both low-frequency and high-frequency signals to learn distinguishable node embedding. In this work, we propose Adaptive Hybrid Graph Filter for Multi-View Graph Clustering (AHGFC). Specifically, a graph joint process and graph joint aggregation matrix are first designed by using the intrinsic node features and adjacency relationship, which makes the low and high-frequency signals on the graph more distinguishable. Then we design an adaptive hybrid graph filter that is related to the homophily degree, which learns the node embedding based on the graph joint aggregation matrix. After that, the node embedding of each view is weighted and fused into a consensus embedding for the downstream task. Experimental results show that our proposed model performs well on six datasets containing homophilous and heterophilous graphs.Comment: Accepted by AAAI202

arXiv.org e-Print Archive

Knowledge Extraction From PV Power Generation With Deep Learning Autoencoder and Clustering-Based Algorithms

Author: Brenna M.
Longo M.
Miraftabzadeh S.
Publication venue
Publication date: 01/01/2023
Field of study

The unpredictable nature of photovoltaic solar power generation, caused by changing weather conditions, creates challenges for grid operators as they work to balance supply and demand. As solar power continues to become a larger part of the energy mix, managing this intermittency will be increasingly important. This paper focuses on identifying daily photovoltaic power production patterns to gain new knowledge of the generation patterns throughout the year based on unsupervised learning algorithms. The proposed data-driven model aims to extract typical daily photovoltaic power generation patterns by transforming the high dimensional temporal features of the daily PV power output into a lower latent feature space, which is learned by a deep learning autoencoder. Subsequently, the Partitioning Around Medoids (PAM) clustering algorithm is employed to identify the six distinct dominant patterns. Finally, a new algorithm is proposed to reconstruct these patterns in their original subspace. The proposed model is applied to two distinct datasets for further analysis. The results indicate that four out of the identified patterns in both datasets exhibit high correlation (over 95%) and temporal trends. These patterns correspond to distinct weather conditions, such as entirely sunny, mostly sunny, cloudy, and negligible power generation days, which were observed approximately 61% of the analyzed period. These typical patterns can be expected to be observed in other locations as well. Identified PV power generation patterns can improve forecasting models, optimize energy management systems, and aid in implementing energy storage or demand response programs and scheduling efficiently

Archivio istituzionale della ricerca - Politecnico di Milano

Automatic assistants for database exploration

Author: Sellam T.H.J. (Thibault)
Publication venue
Publication date: 03/11/2016
Field of study

CWI's Institutional Repository

An overview of clustering methods with guidelines for application in mental health research

Author: Gao Caroline X.
Publication venue: Universidad de Granada
Publication date: 27/05/2023
Field of study

Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

Repositorio Institucional Universidad de Granada

Statistical Data Modeling and Machine Learning with Applications

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

The modeling and processing of empirical data is one of the main subjects and goals of statistics. Nowadays, with the development of computer science, the extraction of useful and often hidden information and patterns from data sets of different volumes and complex data sets in warehouses has been added to these goals. New and powerful statistical techniques with machine learning (ML) and data mining paradigms have been developed. To one degree or another, all of these techniques and algorithms originate from a rigorous mathematical basis, including probability theory and mathematical statistics, operational research, mathematical analysis, numerical methods, etc. Popular ML methods, such as artificial neural networks (ANN), support vector machines (SVM), decision trees, random forest (RF), among others, have generated models that can be considered as straightforward applications of optimization theory and statistical estimation. The wide arsenal of classical statistical approaches combined with powerful ML techniques allows many challenging and practical problems to be solved. This Special Issue belongs to the section “Mathematics and Computer Science”. Its aim is to establish a brief collection of carefully selected papers presenting new and original methods, data analyses, case studies, comparative studies, and other research on the topic of statistical data modeling and ML as well as their applications. Particular attention is given, but is not limited, to theories and applications in diverse areas such as computer science, medicine, engineering, banking, education, sociology, economics, among others. The resulting palette of methods, algorithms, and applications for statistical modeling and ML presented in this Special Issue is expected to contribute to the further development of research in this area. We also believe that the new knowledge acquired here as well as the applied results are attractive and useful for young scientists, doctoral students, and researchers from various scientific specialties

Directory of Open Access Books (DOAB)

DivClust: Controlling Diversity in Deep Clustering

Author: Metaxas IM
Patras I
Tzimiropoulos G
Publication venue
Publication date: 03/04/2023
Field of study

Queen Mary Research Online

Recommended from our members

A Machine Learning Approach: Socio-economic Analysis to Support and Identify Resilient Analog Communities in Texas

Author: Mabadeje Ademide O.
Publication venue
Publication date: 26/08/2022
Field of study

Identification of analog resources or items are important during the planning and development of new communities because available information is usually limited or absent. Conventionally, analogs are made by domain experts however, this is not always readily obtainable. Coupled with this challenge, most of the available data in socioeconomic systems have high dimensionality making interpretation, and visualization of these datasets difficult. Hence, it is crucial to adopt a workflow that can be used to identify analogs regardless of its existing high dimensionality. To this end, we present a systematic and unbiased measure, group similarity score (GCS) and similarity scoring metric (SSM) to support the predictive search of missing properties for target communities and identification of analogous cities based on available socioeconomic data and modeling. Knowing that each Texan community can be characterized by its associated properties, the workflow combines both spatial and multivariate statistics in a novel manner to determine the GCS & SSM whilst visualizing the associated uncertainty space. The workflow consists of three major steps: 1) key parameter selection via feature engineering, 2) multivariate and spatial analysis using multidimensional scaling (MDS) and density-based spatial clustering of applications with noise (DBSCAN) for clustering analysis, 3) similarity ranking using a modified Mahalanobis distance function as a clustering basis on preprocessed data. Afterwards, to assess the quality of the predicted feature and analog communities obtained, K-nearest neighbor algorithm is applied, then the analog cities are found. The workflow is demonstrated using on high dimensional socio-economic data. We find analogs for each community cluster identified with their GCS and SSM in relation to 4 randomly selected communities used for testing. Thus, it is recommended to apply the integration of this workflow in uncertainty exploration, trend-mappings, and community analog assignment, and benchmarking to support decision making.IC2 InstitutePetroleum and Geosystems Engineerin

Texas ScholarWorks