1,516 research outputs found
Fractal feature selection model for enhancing high-dimensional biological problems
The integration of biology, computer science, and statistics has given rise to the interdisciplinary field of bioinformatics, which aims to decode biological intricacies. It produces extensive and diverse features, presenting an enormous challenge in classifying bioinformatic problems. Therefore, an intelligent bioinformatics classification system must select the most relevant features to enhance machine learning performance. This paper proposes a feature selection model based on the fractal concept to improve the performance of intelligent systems in classifying high-dimensional biological problems. The proposed fractal feature selection (FFS) model divides features into blocks, measures the similarity between blocks using root mean square error (RMSE), and determines the importance of features based on low RMSE. The proposed FFS is tested and evaluated over ten high-dimensional bioinformatics datasets. The experiment results showed that the model significantly improved machine learning accuracy. The average accuracy rate was 79% with full features in machine learning algorithms, while FFS delivered promising results with an accuracy rate of 94%
ENGINEERING HIGH-RESOLUTION EXPERIMENTAL AND COMPUTATIONAL PIPELINES TO CHARACTERIZE HUMAN GASTROINTESTINAL TISSUES IN HEALTH AND DISEASE
In recent decades, new high-resolution technologies have transformed how scientists study complex cellular processes and the mechanisms responsible for maintaining homeostasis and the emergence and progression of gastrointestinal (GI) disease. These advances have paved the way for the use of primary human cells in experimental models which together can mimic specific aspects of the GI tract such as compartmentalized stem-cell zones, gradients of growth factors, and shear stress from fluid flow. The work presented in this dissertation has focused on integrating high-resolution bioinformatics with novel experimental models of the GI epithelium systems to describe the complexity of human pathophysiology of the human small intestines, colon, and stomach in homeostasis and disease. Here, I used three novel microphysiological systems and developed four computational pipelines to describe comprehensive gene expression patterns of the GI epithelium in various states of health and disease. First, I used single cell RNAseq (scRNAseq) to establish the transcriptomic landscape of the entire epithelium of the small intestine and colon from three human donors, describing cell-type specific gene expression patterns in high resolution. Second, I used single cell and bulk RNAseq to model intestinal absorption of fatty acids and show that fatty acid oxidation is a critical regulator of the flux of long- and medium-chain fatty acids across the epithelium. Third, I use bulk RNAseq and a machine learning model to describe how inflammatory cytokines can regulate proliferation of intestinal stem cells in an experimental model of inflammatory hypoxia. Finally, I developed a high throughput platform that can associate phenotype to gene expression in clonal organoids, providing unprecedented resolution into the relationship between comprehensive gene expression patterns and their accompanying phenotypic effects. Through these studies, I have demonstrated how the integration of computational and experimental approaches can measurably advance our understanding of human GI physiology.Doctor of Philosoph
An improved dandelion optimizer algorithm for spam detection next-generation email filtering system
Spam emails have become a pervasive issue in recent years, as internet users receive increasing amounts of unwanted or fake emails. To combat this issue, automatic spam detection methods have been proposed, which aim to classify emails into spam and non-spam categories. Machine learning techniques have been utilized for this task with considerable success. In this paper, we introduce a novel approach to spam email detection by presenting significant advancements to the Dandelion Optimizer (DO) algorithm. DO is a relatively new nature-inspired optimization algorithm inspired by the flight of dandelion seeds. While DO shows promise, it faces challenges, especially in high-dimensional problems such as feature selection for spam detection. Our primary contributions focus on enhancing the DO algorithm. Firstly, we introduce a new local search algorithm based on flipping (LSAF), designed to improve DO's ability to find the best solutions. Secondly, we propose a reduction equation that streamlines the population size during algorithm execution, reducing computational complexity. To showcase the effectiveness of our modified DO algorithm, which we refer to as Improved DO (IDO), we conduct a comprehensive evaluation using the Spam base dataset from the UCI repository. However, we emphasize that our primary objective is to advance the DO algorithm, with spam email detection serving as a case study application. Comparative analysis against several popular algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Generalized Normal Distribution Optimization (GNDO), Chimp Optimization Algorithm (ChOA), Grasshopper Optimization Algorithm (GOA), Ant Lion Optimizer (ALO), and Dragonfly Algorithm (DA), demonstrates the superior performance of our proposed IDO algorithm. It excels in accuracy, fitness, and the number of selected features, among other metrics. Our results clearly indicate that IDO overcomes the local optima problem commonly associated with the standard DO algorithm, owing to the incorporation of LSAF and the reduction equation methods. In summary, our paper underscores the significant advancement made in the form of the IDO al-gorithm, which represents a promising approach for solving high-dimensional optimization prob-lems, with a keen focus on practical applications in real-world systems. While we employ spam email detection as a case study, our primary contribution lies in the improved DO algorithm, which is efficient, accurate, and outperforms several state-of-the-art algorithms in various metrics. This work opens avenues for enhancing optimization techniques and their applications in machine learning
On the Utility of Representation Learning Algorithms for Myoelectric Interfacing
Electrical activity produced by muscles during voluntary movement is a reflection of the firing patterns of relevant motor neurons and, by extension, the latent motor intent driving the movement. Once transduced via electromyography (EMG) and converted into digital form, this activity can be processed to provide an estimate of the original motor intent and is as such a feasible basis for non-invasive efferent neural interfacing. EMG-based motor intent decoding has so far received the most attention in the field of upper-limb prosthetics, where alternative means of interfacing are scarce and the utility of better control apparent. Whereas myoelectric prostheses have been available since the 1960s, available EMG control interfaces still lag behind the mechanical capabilities of the artificial limbs they are intended to steer—a gap at least partially due to limitations in current methods for translating EMG into appropriate motion commands. As the relationship between EMG signals and concurrent effector kinematics is highly non-linear and apparently stochastic, finding ways to accurately extract and combine relevant information from across electrode sites is still an active area of inquiry.This dissertation comprises an introduction and eight papers that explore issues afflicting the status quo of myoelectric decoding and possible solutions, all related through their use of learning algorithms and deep Artificial Neural Network (ANN) models. Paper I presents a Convolutional Neural Network (CNN) for multi-label movement decoding of high-density surface EMG (HD-sEMG) signals. Inspired by the successful use of CNNs in Paper I and the work of others, Paper II presents a method for automatic design of CNN architectures for use in myocontrol. Paper III introduces an ANN architecture with an appertaining training framework from which simultaneous and proportional control emerges. Paper Iv introduce a dataset of HD-sEMG signals for use with learning algorithms. Paper v applies a Recurrent Neural Network (RNN) model to decode finger forces from intramuscular EMG. Paper vI introduces a Transformer model for myoelectric interfacing that do not need additional training data to function with previously unseen users. Paper vII compares the performance of a Long Short-Term Memory (LSTM) network to that of classical pattern recognition algorithms. Lastly, paper vIII describes a framework for synthesizing EMG from multi-articulate gestures intended to reduce training burden
Computational approaches for single-cell omics and multi-omics data
Single-cell omics and multi-omics technologies have enabled the study of cellular heterogeneity with unprecedented resolution and the discovery of new cell types. The core of identifying heterogeneous cell types, both existing and novel ones, relies on efficient computational approaches, including especially cluster analysis. Additionally, gene regulatory network analysis and various integrative approaches are needed to combine data across studies and different multi-omics layers. This thesis comprehensively compared Bayesian clustering models for single-cell RNAsequencing (scRNA-seq) data and selected integrative approaches were used to study the cell-type specific gene regulation of uterus. Additionally, single-cell multi-omics data integration approaches for cell heterogeneity analysis were investigated.
Article I investigated analytical approaches for cluster analysis in scRNA-seq data, particularly, latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) models. The comparison of LDA and HDP together with the existing state-of-art methods revealed that topic modeling-based models can be useful in scRNA-seq cluster analysis. Evaluation of the cluster qualities for LDA and HDP with intrinsic and extrinsic cluster quality metrics indicated that the clustering performance of these methods is dataset dependent.
Article II and Article III focused on cell-type specific integrative analysis of uterine or decidual stromal (dS) and natural killer (dNK) cells that are important for successful pregnancy. Article II integrated the existing preeclampsia RNA-seq studies of the decidua together with recent scRNA-seq datasets in order to investigate cell-type-specific contributions of early onset preeclampsia (EOP) and late onset preeclampsia (LOP). It was discovered that the dS marker genes were enriched for LOP downregulated genes and the dNK marker genes were enriched for upregulated EOP genes. Article III presented a gene regulatory network analysis for the subpopulations of dS and dNK cells. This study identified novel subpopulation specific transcription factors that promote decidualization of stromal cells and dNK mediated maternal immunotolerance.
In Article IV, different strategies and methodological frameworks for data integration in single-cell multi-omics data analysis were reviewed in detail. Data integration methods were grouped into early, late and intermediate data integration strategies. The specific stage and order of data integration can have substantial effect on the results of the integrative analysis. The central details of the approaches were presented, and potential future directions were discussed.
Laskennallisia menetelmiä yksisolusekvensointi- ja multiomiikkatulosten analyyseihin
Yksisolusekvensointitekniikat mahdollistavat solujen heterogeenisyyden tutkimuksen ennennäkemättömällä resoluutiolla ja uusien solutyyppien löytämisen. Solutyyppien tunnistamisessa keskeisessä roolissa on ryhmittely eli klusterointianalyysi. Myös geenien säätelyverkostojen sekä eri molekyylidatatasojen yhdistäminen on keskeistä analyysissä. Väitöskirjassa verrataan bayesilaisia klusterointimenetelmiä ja yhdistetään eri menetelmillä kerättyjä tietoja kohdun solutyyppispesifisessä geeninsäätelyanalyysissä. Lisäksi yksisolutiedon integraatiomenetelmiä selvitetään kattavasti.
Julkaisu I keskittyy analyyttisten menetelmien, erityisesti latenttiin Dirichletallokaatioon (LDA) ja hierarkkiseen Dirichlet-prosessiin (HDP) perustuvien mallien tutkimiseen yksisoludatan klusterianalyysissä. Kattava vertailu näiden kahden mallin sekä olemassa olevien menetelmien kanssa paljasti, että aihemallinnuspohjaiset menetelmät voivat olla hyödyllisiä yksisoludatan klusterianalyysissä. Menetelmien suorituskyky riippui myös kunkin analysoitavan datasetin ominaisuuksista.
Julkaisuissa II ja III keskitytään naisen lisääntymisterveydelle tärkeiden kohdun stroomasolujen ja NK-immuunisolujen solutyyppispesifiseen analyysiin. Artikkelissa II yhdistettiin olemassa olevia tuloksia pre-eklampsiasta viimeisimpiin yksisolusekvensointituloksiin ja löydettiin varhain alkavan pre-eklampsian (EOP) ja myöhään alkavan pre-eklampsian (LOP) solutyyppispesifisiä vaikutuksia. Havaittiin, että erilaistuneen strooman markkerigeenien ilmentyminen vähentyi LOP:ssa ja NK-markkerigeenien ilmentyminen lisääntyi EOP:ssa. Julkaisu III analysoi strooman ja NK-solujen alapopulaatiospesifisiä geeninsäätelyverkostoja ja niiden transkriptiofaktoreita. Tutkimus tunnisti uusia alapopulaatiospesifisiä säätelijöitä, jotka edistävät strooman erilaistumista ja NK-soluvälitteistä immunotoleranssia
Julkaisu IV tarkastelee yksityiskohtaisesti strategioita ja menetelmiä erilaisten yksisoludatatasojen (multi-omiikka) integroimiseksi. Integrointimenetelmät ryhmiteltiin varhaisen, myöhäisen ja välivaiheen strategioihin ja kunkin lähestymistavan menetelmiä esiteltiin tarkemmin. Lisäksi keskusteltiin mahdollisista tulevaisuuden suunnista
Modelling tree biomass using direct and additive methods with point cloud deep learning in a temperate mixed forest
ABSTRACT: Airborne laser scanning (ALS) data has been widely used for total aboveground tree biomass (AGB) modelling, however, there is less research focusing on estimating specific tree biomass components (wood, branches, bark, and foliage). Knowledge about these biomass components is essential for carbon accounting, understanding forest nutrient cycling, and other applications. In this study, we compare additive AGB estimation (sum of estimated components) with direct AGB estimation using deep neural network (DNN) and random forest (RF) models. We utilise two point cloud DNNs: point-based Dynamic Graph Convolutional Neural Network (DGCNN) and Octree-based Convolutional Neural Network (OCNN). DNN and RF models were trained using a dataset comprised of 2336 sample plots from a mixed temperate forest in New Brunswick, Canada. Results indicate that additive AGB models perform similarly to direct models in terms of coefficient of determination (R2) and root-mean square error (RMSE), and reduced the mean absolute percentage error (MAPE) by 22% on average. Compared to RF, the DNNs provided a small improvement in performance, with OCNN explaining 5% more variation in the data (R2 = 0.76) and reducing MAPE by 20% on average. Overall, this study showcases the effectiveness of additive tree AGB models and highlights the potential of DNNs for enhanced AGB estimation. To further improve DNN performance, we recommend using larger training datasets, implementing hyperparameter optimization, and incorporating additional data such as multispectral imagery
EDIBLE FISH IDENTIFICATION BASED ON MACHINE LEARNING
Automated fish identification system has a beneficial role in various fields. Fish species can usually be identified based on visual observation and human experiences. False appreciation can cause food poisoning. The proposed system aims to efficiently and effectively identify edible fish from poisonous ones based on three machine learning (ML) techniques. A total of 300 fish images are used, collected from 20 species with differences in shapes, sizes, and colors. Hybrid features were extracted and then fed to three types of ML techniques: k-nearest neighbor (K-NN), support vector machine (SVM), and neural networks (NN). The 300 fish images are divided into two: 70% for training and 30% for testing. The accuracy rates for the presented system were 91.1%, 92.2%, and 94.4% for KNN, SVM, and NNs, respectively. The proposed system is evaluated using four terms: precision, sensitivity, F1-score, and accuracy. Results show that the proposed approach achieved higher accuracy compared with other recent pertinent studies
Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment. FMER is a subset of image processing and it
is a multidisciplinary topic to analysis. So, it requires familiarity with
other topics of Artifactual Intelligence (AI) such as machine learning, digital
image processing, psychology and more. So, it is a great opportunity to write a
book which covers all of these topics for beginner to professional readers in
the field of AI and even without having background of AI. Our goal is to
provide a standalone introduction in the field of MFER analysis in the form of
theorical descriptions for readers with no background in image processing with
reproducible Matlab practical examples. Also, we describe any basic definitions
for FMER analysis and MATLAB library which is used in the text, that helps
final reader to apply the experiments in the real-world applications. We
believe that this book is suitable for students, researchers, and professionals
alike, who need to develop practical skills, along with a basic understanding
of the field. We expect that, after reading this book, the reader feels
comfortable with different key stages such as color and depth image processing,
color and depth image representation, classification, machine learning, facial
micro-expressions recognition, feature extraction and dimensionality reduction.
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment.Comment: This is the second edition of the boo
- …