Search CORE

2,459 research outputs found

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals

Formal Concept Analysis Applications in Bioinformatics

Author: Roscoe Sarah
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 10/11/2020
Field of study

Bioinformatics is an important field that seeks to solve biological problems with the help of computation. One specific field in bioinformatics is that of genomics, the study of genes and their functions. Genomics can provide valuable analysis as to the interaction between how genes interact with their environment. One such way to measure the interaction is through gene expression data, which determines whether (and how much) a certain gene activates in a situation. Analyzing this data can be critical for predicting diseases or other biological reactions. One method used for analysis is Formal Concept Analysis (FCA), a computing technique based in partial orders that allows the user to examine the structural properties of binary data based on which subsets of the data set depend on each other. This thesis surveys, in breadth and depth, the current literature related to the use of FCA for bioinformatics, with particular focus on gene expression data. This includes descriptions of current data management techniques specific to FCA, such as lattice reduction, discretization, and variations of FCA to account for different data types. Advantages and shortcomings of using FCA for genomic investigations, as well as the feasibility of using FCA for this application are addressed. Finally, several areas for future doctoral research are proposed. Adviser: Jitender S. Deogu

Cancer prediction using graph-based gene selection and explainable classifier

Author: Oussalah Mourad
Rostami Mehrdad
Publication venue: 'Finnish Journal of eHealth and eWelfare'
Publication date: 01/01/2022
Field of study

Several Artificial Intelligence-based models have been developed for cancer prediction. In spite of the promise of artificial intelligence, there are very few models which bridge the gap between traditional human-centered prediction and the potential future of machine-centered cancer prediction. In this study, an efficient and effective model is developed for gene selection and cancer prediction. Moreover, this study proposes an artificial intelligence decision system to provide physicians with a simple and human-interpretable set of rules for cancer prediction. In contrast to previous deep learning-based cancer prediction models, which are difficult to explain to physicians due to their black-box nature, the proposed prediction model is based on a transparent and explainable decision forest model. The performance of the developed approach is compared to three state-of-the-art cancer prediction including TAGA, HPSO and LL. The reported results on five cancer datasets indicate that the developed model can improve the accuracy of cancer prediction and reduce the execution time

University of Oulu Repository - Jultika

Journal.fi

Dynamic Data Mining: Synergy of Bio-Inspired Clustering Methods

Author: Elena N. Benderskaya
Sofya V. Zhukova
Publication venue: 'IntechOpen'
Publication date: 21/01/2011
Field of study

IntechOpen

Literature Review of the Recent Trends and Applications in various Fuzzy Rule based systems

Author: Torra Vicenç
Varshney Ayush K.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

Fuzzy rule based systems (FRBSs) is a rule-based system which uses linguistic fuzzy variables as antecedents and consequent to represent human understandable knowledge. They have been applied to various applications and areas throughout the soft computing literature. However, FRBSs suffers from many drawbacks such as uncertainty representation, high number of rules, interpretability loss, high computational time for learning etc. To overcome these issues with FRBSs, there exists many extensions of FRBSs. This paper presents an overview and literature review of recent trends on various types and prominent areas of fuzzy systems (FRBSs) namely genetic fuzzy system (GFS), hierarchical fuzzy system (HFS), neuro fuzzy system (NFS), evolving fuzzy system (eFS), FRBSs for big data, FRBSs for imbalanced data, interpretability in FRBSs and FRBSs which use cluster centroids as fuzzy rules. The review is for years 2010-2021. This paper also highlights important contributions, publication statistics and current trends in the field. The paper also addresses several open research areas which need further attention from the FRBSs research community.Comment: 49 pages, Accepted for publication in ijf

arXiv.org e-Print Archive

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Data Mining

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Data mining is a branch of computer science that is used to automatically extract meaningful, useful knowledge and previously unknown, hidden, interesting patterns from a large amount of data to support the decision-making process. This book presents recent theoretical and practical advances in the field of data mining. It discusses a number of data mining methods, including classification, clustering, and association rule mining. This book brings together many different successful data mining studies in various areas such as health, banking, education, software engineering, animal science, and the environment

Directory of Open Access Books (DOAB)

A survey on pre-processing techniques: relevant issues in the context of environmental data mining

Author: Gibert Karina
Izquierdo Joaquín
Sànchez-Marrè Miquel
Publication venue: 'IOS Press'
Publication date: 01/01/2016
Field of study

One of the important issues related with all types of data analysis, either statistical data analysis, machine learning, data mining, data science or whatever form of data-driven modeling, is data quality. The more complex the reality to be analyzed is, the higher the risk of getting low quality data. Unfortunately real data often contain noise, uncertainty, errors, redundancies or even irrelevant information. Useless models will be obtained when built over incorrect or incomplete data. As a consequence, the quality of decisions made over these models, also depends on data quality. This is why pre-processing is one of the most critical steps of data analysis in any of its forms. However, pre-processing has not been properly systematized yet, and little research is focused on this. In this paper a survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it. Rather than providing technical details on specific pre-processing techniques, the paper focus on providing general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

RiuNet

A Spatio-Temporal Data Imputation Model for Supporting Analytics at the Edge

Author: D Bertsimas
D Stekhoven
G Carpenter
H Cai
J Honaker
J Xing
L Jiang
L Kim
M Satyanarayanan
N Jiang
NC Guan
O Troyanskaya
P Schmitt
PJ Escamilla-Ambrosio
R Little
R Mazumder
S Buuren
S Oba
T Bo
T Raghunathan
X Wang
Y He
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Current applications developed for the Internet of Things (IoT) usually involve the processing of collected data for delivering analytics and support efficient decision making. The basis for any processing mechanism is data analysis, usually having as an outcome responses in various analytics queries defined by end users or applications. However, as already noted in the respective literature, data analysis cannot be efficient when missing values are present. The research community has already proposed various missing data imputation methods paying more attention of the statistical aspect of the problem. In this paper, we study the problem and propose a method that combines machine learning and a consensus scheme. We focus on the clustering of the IoT devices assuming they observe the same phenomenon and report the collected data to the edge infrastructure. Through a sliding window approach, we try to detect IoT nodes that report similar contextual values to edge nodes and base on them to deliver the replacement value for missing data. We provide the description of our model together with results retrieved by an extensive set of simulations on top of real data. Our aim is to reveal the potentials of the proposed scheme and place it in the respective literature

Crossref

Enlighten

A fuzzy gene expression-based computational approach improves breast cancer prognostication

Author: Bontempi Gianluca
Desmedt Christine
Haibe-Kains Benjamin
Piccart Martine
Rothé Françoise
Sotiriou Christos
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

A fuzzy computational approach that takes into account several molecular subtypes in order to provide more accurate breast cancer prognosi

Crossref

Springer - Publisher Connector

PubMed Central

DI-fusion