Search CORE

133 research outputs found

Nonnegative factorization and the maximum edge biclique problem

Author: GILLIS Nicolas
GLINEUR François
Publication venue
Publication date
Field of study

Nonnegative matrix factorization (NMF) is a data analysis technique based on the approximation of a nonnegative matrix with a product of two nonnegative factors, which allows compression and interpretation of nonnegative data. In this paper, we study the case of rank-one factorization and show that when the matrix to be factored is not required to be nonnegative, the corresponding problem (R1NF) becomes NP-hard. This sheds new light on the complexity of NMF since any algorithm for fixed-rank NMF must be able to solve at least implicitly such rank-one subproblems. Our proof relies on a reduction of the maximum edge biclique problem to R1NF. We also link stationary points of R1NF to feasible solutions of the biclique problem, which allows us to design a new type of biclique finding algorithm based on the application of a block-coordinate descent scheme to R1NF. We show that this algorithm, whose algorithmic complexity per iteration is proportional to the number of edges in the graph, is guaranteed to converge to a biclique and that it performs competitively with existing methods on random graphs and text mining datasets.nonnegative matrix factorization, rank-one factorization, maximum edge biclique problem, algorithmic complexity, biclique finding algorithm

Research Papers in Economics

Bicluster Analysis of Cheng and Church's Algorithm to Identify Patterns of People's Welfare in Indonesia

Author: Marifni Laradea
Sumertajaya I Made
Syafitri Utami Dyah
Publication venue: Department of Informatics Engineering, Universitas Muhammadiyah Purwokerto
Publication date: 17/11/2023
Field of study

Biclustering is a method of grouping numerical data where rows and columns are grouped simultaneously. The Cheng and Church (CC) algorithm is one of the bi-clustering algorithms that try to find the maximum bi-cluster with a high similarity value, called MSR (Mean Square Residue). The association of rows and columns is called a bi-cluster if the MSR is lower than a predetermined threshold value (delta). Detection of people's welfare in Indonesia using Bi-Clustering is essential to get an overview of the characteristics of people's interest in each province in Indonesia. Bi-Clustering using the CC algorithm requires a threshold value (delta) determined by finding the MSR value of the actual data. The threshold value (delta) must be smaller than the MSR of the actual data. This study's threshold values are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8. After evaluating the optimum delta by considering the MSR value and the bi-cluster formed, the optimum delta is obtained as 0.1, with the number of bi-cluster included as 4

Jurnal Online Universitas Muhammadiyah Purwokerto

Discovery of error-tolerant biclusters from noisy gene expression data

Author: A Ben-Dor
A Gyenesei
A Poernomo
A Poernomo
A Prelic
A Subramanian
A Tanay
C Becquet
C Creighton
C Yang
G Pandey
H Cheng
H Cheng
I Dhillon
J Besson
J Han
J Liu
J Liu
J Seppänen
M Ashburner
M Zhang
Navneet Rao
R Gupta
R Gupta
R Rastogi
R Srikant
Rohit Gupta
S Bergmann
S Hanhijärvi
SC Madeira
T Calders
T Fukuda
T Hughes
T Mcintosh
Vipin Kumar
Y Cheng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, whic

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Biclustering electronic health records to unravel disease presentation patterns

Author: Matos Joana Sofia Santos de
Publication venue
Publication date: 01/01/2019
Field of study

Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2019A Esclerose Lateral Amiotrófica (ELA) é uma doença neurodegenerativa heterogénea com padrões de apresentação altamente variáveis. Dada a natureza heterogénea dos doentes com ELA, aquando do diagnóstico os clínicos normalmente estimam a progressão da doença utilizando uma taxa de decaimento funcional, calculada com base na Escala Revista de Avaliação Funcional de ELA (ALSFRS-R). A utilização de modelos de Aprendizagem Automática que consigam lidar com este padrões complexos é necessária para compreender a doença, melhorar os cuidados aos doentes e a sua sobrevivência. Estes modelos devem ser explicáveis para que os clínicos possam tomar decisões informadas. Desta forma, o nosso objectivo é descobrir padrões de apresentação da doença, para isso propondo uma nova abordagem de Prospecção de Dados: Descoberta de Meta-atributos Discriminativos (DMD), que utiliza uma combinação de Biclustering, Classificação baseada em Biclustering e Prospecção de Regras de Associação para Classificação. Estes padrões (chamados de Meta-atributos) são compostos por subconjuntos de atributos discriminativos conjuntamente com os seus valores, permitindo assim distinguir e caracterizar subgrupos de doentes com padrões similares de apresentação da doença. Os Registos de Saúde Electrónicos (RSE) utilizados neste trabalho provêm do conjunto de dados JPND ONWebDUALS (ONTology-based Web Database for Understanding Amyotrophic Lateral Sclerosis), composto por questões standardizadas acerca de factores de risco, mutações genéticas, atributos clínicos ou informação de sobrevivência de uma coorte de doentes e controlos seguidos pelo consórcio ENCALS (European Network to Cure ALS), que inclui vários países europeus, incluindo Portugal. Nesta tese a metodologia proposta foi utilizada na parte portuguesa do conjunto de dados ONWebDUALS para encontrar padrões de apresentação da doença que: 1) distinguissem os doentes de ELA dos seus controlos e 2) caracterizassem grupos de doentes de ELA com diferentes taxas de progressão (categorizados em grupos Lentos, Neutros e Rápidos). Nenhum padrão coerente emergiu das experiências efectuadas para a primeira tarefa. Contudo, para a segunda tarefa os padrões encontrados para cada um dos três grupos de progressão foram reconhecidos e validados por clínicos especialistas em ELA, como sendo características relevantes de doentes com progressão Lenta, Neutra e Rápida. Estes resultados sugerem que a nossa abordagem genérica baseada em Biclustering tem potencial para identificar padrões de apresentação noutros problemas ou doenças semelhantes.Amyotrophic Lateral Sclerosis (ALS) is a heterogeneous neurodegenerative disease with a high variability of presentation patterns. Given the heterogeneous nature of ALS patients and targeting a better prognosis, clinicians usually estimate disease progression at diagnosis using the rate of decay computed from the Revised ALS Functional Rating Scale (ALSFRS-R). In this context, the use of Machine Learning models able to unravel the complexity of disease presentation patterns is paramount for disease understanding, targeting improved patient care and longer survival times. Furthermore, explainable models are vital, since clinicians must be able to understand the reasoning behind a given model’s result before making a decision that can impact a patient’s life. Therefore we aim at unravelling disease presentation patterns by proposing a new Data Mining approach called Discriminative Meta-features Discovery (DMD), which uses a combination of Biclustering, Biclustering-based Classification and Class Association Rule Mining. These patterns (called Metafeatures) are composed of discriminative subsets of features together with their values, allowing to distinguish and characterize subgroups of patients with similar disease presentation patterns. The Electronic Health Record (EHR) data used in this work comes from the JPND ONWebDUALS (ONTology-based Web Database for Understanding Amyotrophic Lateral Sclerosis) dataset, comprised of standardized questionnaire answers regarding risk factors, genetic mutations, clinical features and survival information from a cohort of patients and controls from ENCALS (European Network to Cure ALS), a consortium of diverse European countries, including Portugal. In this work the proposed methodology was used on the ONWebDUALS Portuguese EHR data to find disease presentation patterns that: 1) distinguish the ALS patients from their controls and 2) characterize groups of ALS patients with different progression rates (categorized into Slow, Neutral and Fast groups). No clear pattern emerged from the experiments performed for the first task. However, in the second task the patterns found for each of the three progression groups were recognized and validated by ALS expert clinicians, as being relevant characteristics of slow, neutral and fast progressing patients. These results suggest that our generic Biclustering approach is a promising way to unravel disease presentation patterns and could be applied to similar problems and other diseases

Universidade de Lisboa: Repositório.UL

Pattern Recognition of Food Security in Indonesia Using Biclustering Plaid Model

Author: Afendi Farit Mochamad
Hikmah Nur
Sumertajaya I Made
Publication venue: Universitas Muhammadiyah Mataram
Publication date: 01/10/2023
Field of study

Biclustering come in various algorithms, selecting the most suitable biclustering algorithm can be a challenging task. The performance of algorithms can vary significantly depending on the specific data characteristics. The Plaid model is one of popular biclustering algorithms, has gained recognition for its efficiency and versatility across various applications, including food security. Indonesia deals with complex food security challenges. The nation's unique geographic and socioeconomic diversity demands region-specific food security solutions. Identifying province-specific food security patterns is crucial for effective policymaking and resource allocation, ultimately promoting food sufficiency and stability at the regional level. This study assesses the performance of the Plaid model in identifying food security patterns at the provincial level in Indonesia. To optimize biclusters, we explore various parameter tuning scenarios (the choice of model, the number of layers, and the threshold value for row and column releases). The selection criteria are based on the change ratio of the initial matrix's mean square residue to the mean square residue of the Plaid model, the average mean square residue, and the number of biclusters. The constant column model was selected with a mean square residue change ratio of 0.52, an average mean square plaid model residue of 4.81, and it generates 6 overlapping biclusters. The results show each bicluster has unique characteristics. Notably, Bicluster 1 that consist of 2 provinces, exhibits the lowest food security levels, marked by variables X1, X2, X4, and X7. Furthermore, the variables X1, X4, and X7 consistently appear across several biclusters. This highlights the importance of prioritizing these three variables to improve the food security status of the regions.

Directory of Open Access Journals

UMMAT Scientific Journals (Universitas Muhammadiyah Mataram)

Genetic algorithm based two-mode clustering of metabolomics data

Author: Berg R.A., van den
Hageman J.A.
Smilde A.K.
Werf M.J., van der
Westerhuis J.A.
Publication venue
Publication date: 01/01/2008
Field of study

Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources

Springer - Publisher Connector

Wageningen University & Research Publications

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Design Methodology for Self-organized Mobile Networks Based

Author: Anzola John
Bolaños Castro Sandro Javier
Tarazona Bermúdez Giovanny Mauricio
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 21/04/2021
Field of study

The methodology proposed in this article enables a systematic design of routing algorithms based on schemes of biclustering, which allows you to respond with timely techniques, clustering heuristics proposed by a researcher, and a focused approach to routing in the choice of clusterhead nodes. This process uses heuristics aimed at improving the different costs in communication surface groups called biclusters. This methodology globally enables a variety of techniques and heuristics of clustering that have been addressed in routing algorithms, but we have not explored all possible alternatives and their different assessments. Therefore, the methodology oriented design research of routing algorithms based on biclustering schemes will allow new concepts of evolutionary routing along with the ability to adapt the topological changes that occur in self-organized data networks

Re-UNIR