2,258 research outputs found
Computational techniques to interpret the neural code underlying complex cognitive processes
Advances in large-scale neural recording technology have significantly improved the
capacity to further elucidate the neural code underlying complex cognitive processes.
This thesis aimed to investigate two research questions in rodent models. First, what
is the role of the hippocampus in memory and specifically what is the underlying
neural code that contributes to spatial memory and navigational decision-making.
Second, how is social cognition represented in the medial prefrontal cortex at the
level of individual neurons. To start, the thesis begins by investigating memory and
social cognition in the context of healthy and diseased states that use non-invasive
methods (i.e. fMRI and animal behavioural studies). The main body of the thesis
then shifts to developing our fundamental understanding of the neural mechanisms
underpinning these cognitive processes by applying computational techniques to ana lyse stable large-scale neural recordings. To achieve this, tailored calcium imaging
and behaviour preprocessing computational pipelines were developed and optimised
for use in social interaction and spatial navigation experimental analysis. In parallel,
a review was conducted on methods for multivariate/neural population analysis. A
comparison of multiple neural manifold learning (NML) algorithms identified that non linear algorithms such as UMAP are more adaptable across datasets of varying noise
and behavioural complexity. Furthermore, the review visualises how NML can be
applied to disease states in the brain and introduces the secondary analyses that
can be used to enhance or characterise a neural manifold. Lastly, the preprocessing
and analytical pipelines were combined to investigate the neural mechanisms in volved in social cognition and spatial memory. The social cognition study explored
how neural firing in the medial Prefrontal cortex changed as a function of the social
dominance paradigm, the "Tube Test". The univariate analysis identified an ensemble
of behavioural-tuned neurons that fire preferentially during specific behaviours such
as "pushing" or "retreating" for the animal’s own behaviour and/or the competitor’s
behaviour. Furthermore, in dominant animals, the neural population exhibited greater
average firing than that of subordinate animals. Next, to investigate spatial memory,
a spatial recency task was used, where rats learnt to navigate towards one of three
reward locations and then recall the rewarded location of the session. During the
task, over 1000 neurons were recorded from the hippocampal CA1 region for five rats
over multiple sessions. Multivariate analysis revealed that the sequence of neurons encoding an animal’s spatial position leading up to a rewarded location was also active
in the decision period before the animal navigates to the rewarded location. The result
posits that prospective replay of neural sequences in the hippocampal CA1 region
could provide a mechanism by which decision-making is supported
Tensor representation-based transferability analytics and selective transfer learning of prognostic knowledge for remaining useful life prediction across machines
In recent years, deep transfer learning techniques have been successfully applied to solve RUL prediction across different working conditions. However, for RUL prediction across different machines in which the data distribution and fault evolution characteristics vary largely, the extraction and transition of prognostic knowledge become more challenging. Even if fault mode information can assist in the knowledge transfer, model bias will inevitably exist on the target machine with mixed or unknown faults. To address this issue from a transferability perspective, this paper proposes a novel selective transfer learning approach for RUL prediction across machines. First, the paper utilizes the tensor representation to construct the meta-degradation trend of each fault mode and evaluates the transferability of source domain data from fault mode and degradation characteristics through a new cross-machine transfer degree indicator (M-TDI). Second, a Long Short-Term Memory (LSTM)-based selective transfer strategy is proposed using the M-TDIs. The paper designs a training algorithm with an alternating optimization scheme to seek the optimal tensor decomposition and knowledge transfer effect. Theoretical analysis proves that the proposed approach significantly reduces the upper bound of prediction error. Furthermore, experimental results on three benchmark datasets prove the effectiveness of the proposed approach
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Smart Gas Sensors: Materials, Technologies, Practical Applications, and Use of Machine Learning – A Review
The electronic nose, popularly known as the E-nose, that combines gas sensor arrays (GSAs) with machine learning has gained a strong foothold in gas sensing technology. The E-nose designed to mimic the human olfactory system, is used for the detection and identification of various volatile compounds. The GSAs develop a unique signal fingerprint for each volatile compound to enable pattern recognition using machine learning algorithms. The inexpensive, portable and non-invasive characteristics of the E-nose system have rendered it indispensable within the gas-sensing arena. As a result, E-noses have been widely employed in several applications in the areas of the food industry, health management, disease diagnosis, water and air quality control, and toxic gas leakage detection. This paper reviews the various sensor fabrication technologies of GSAs and highlights the main operational framework of the E-nose system. The paper details vital signal pre-processing techniques of feature extraction, feature selection, in addition to machine learning algorithms such as SVM, kNN, ANN, and Random Forests for determining the type of gas and estimating its concentration in a competitive environment. The paper further explores the potential applications of E-noses for diagnosing diseases, monitoring air quality, assessing the quality of food samples and estimating concentrations of volatile organic compounds (VOCs) in air and in food samples. The review concludes with some challenges faced by E-nose, alternative ways to tackle them and proposes some recommendations as potential future work for further development and design enhancement of E-noses
Using Machine Learning in Forestry
Advanced technology has increased demands and needs for innovative approaches to apply traditional methods more economically, effectively, fast and easily in forestry, as in other disciplines. Especially recently emerging terms such as forestry informatics, precision forestry, smart forestry, Forestry 4.0, climate-intelligent forestry, digital forestry and forestry big data have started to take place on the agenda of the forestry discipline. As a result, significant increases are observed in the number of academic studies in which modern approaches such as machine learning and recently emerged automatic machine learning (AutoML) are integrated into decision-making processes in forestry. This study aims to increase further the comprehensibility of machine learning algorithms in the Turkish language, to make them widespread, and be considered a resource for researchers interested in their use in forestry. Thus, it was aimed to bring a review article to the national literature that reveals both how machine learning has been used in various forestry activities from the past to the present and its potential for use in the future
Data- og ekspertdreven variabelseleksjon for prediktive modeller i helsevesenet : mot økt tolkbarhet i underbestemte maskinlæringsproblemer
Modern data acquisition techniques in healthcare generate large collections of data from multiple sources, such as novel diagnosis and treatment methodologies. Some concrete examples are electronic healthcare record systems, genomics, and medical images. This leads to situations with often unstructured, high-dimensional heterogeneous patient cohort data where classical statistical methods may not be sufficient for optimal utilization of the data and informed decision-making. Instead, investigating such data structures with modern machine learning techniques promises to improve the understanding of patient health issues and may provide a better platform for informed decision-making by clinicians. Key requirements for this purpose include (a) sufficiently accurate predictions and (b) model interpretability. Achieving both aspects in parallel is difficult, particularly for datasets with few patients, which are common in the healthcare domain. In such cases, machine learning models encounter mathematically underdetermined systems and may overfit easily on the training data. An important approach to overcome this issue is feature selection, i.e., determining a subset of informative features from the original set of features with respect to the target variable. While potentially raising the predictive performance, feature selection fosters model interpretability by identifying a low number of relevant model parameters to better understand the underlying biological processes that lead to health issues.
Interpretability requires that feature selection is stable, i.e., small changes in the dataset do not lead to changes in the selected feature set. A concept to address instability is ensemble feature selection, i.e. the process of repeating the feature selection multiple times on subsets of samples of the original dataset and aggregating results in a meta-model. This thesis presents two approaches for ensemble feature selection, which are tailored towards high-dimensional data in healthcare: the Repeated Elastic Net Technique for feature selection (RENT) and the User-Guided Bayesian Framework for feature selection (UBayFS). While RENT is purely data-driven and builds upon elastic net regularized models, UBayFS is a general framework for ensembles with the capabilities to include expert knowledge in the feature selection process via prior weights and side constraints. A case study modeling the overall survival of cancer patients compares these novel feature selectors and demonstrates their potential in clinical practice.
Beyond the selection of single features, UBayFS also allows for selecting whole feature groups (feature blocks) that were acquired from multiple data sources, as those mentioned above. Importance quantification of such feature blocks plays a key role in tracing information about the target variable back to the acquisition modalities. Such information on feature block importance may lead to positive effects on the use of human, technical, and financial resources if systematically integrated into the planning of patient treatment by excluding the acquisition of non-informative features. Since a generalization of feature importance measures to block importance is not trivial, this thesis also investigates and compares approaches for feature block importance rankings.
This thesis demonstrates that high-dimensional datasets from multiple data sources in the medical domain can be successfully tackled by the presented approaches for feature selection. Experimental evaluations demonstrate favorable properties of both predictive performance, stability, as well as interpretability of results, which carries a high potential for better data-driven decision support in clinical practice.Moderne datainnsamlingsteknikker i helsevesenet genererer store datamengder fra flere kilder, som for eksempel nye diagnose- og behandlingsmetoder. Noen konkrete eksempler er elektroniske helsejournalsystemer, genomikk og medisinske bilder. Slike pasientkohortdata er ofte ustrukturerte, høydimensjonale og heterogene og hvor klassiske statistiske metoder ikke er tilstrekkelige for optimal utnyttelse av dataene og god informasjonsbasert beslutningstaking. Derfor kan det være lovende å analysere slike datastrukturer ved bruk av moderne maskinlæringsteknikker for å øke forståelsen av pasientenes helseproblemer og for å gi klinikerne en bedre plattform for informasjonsbasert beslutningstaking. Sentrale krav til dette formålet inkluderer (a) tilstrekkelig nøyaktige prediksjoner og (b) modelltolkbarhet. Å oppnå begge aspektene samtidig er vanskelig, spesielt for datasett med få pasienter, noe som er vanlig for data i helsevesenet. I slike tilfeller må maskinlæringsmodeller håndtere matematisk underbestemte systemer og dette kan lett føre til at modellene overtilpasses treningsdataene. Variabelseleksjon er en viktig tilnærming for å håndtere dette ved å identifisere en undergruppe av informative variabler med hensyn til responsvariablen. Samtidig som variabelseleksjonsmetoder kan lede til økt prediktiv ytelse, fremmes modelltolkbarhet ved å identifisere et lavt antall relevante modellparametere. Dette kan gi bedre forståelse av de underliggende biologiske prosessene som fører til helseproblemer.
Tolkbarhet krever at variabelseleksjonen er stabil, dvs. at små endringer i datasettet ikke fører til endringer i hvilke variabler som velges. Et konsept for å adressere ustabilitet er ensemblevariableseleksjon, dvs. prosessen med å gjenta variabelseleksjon flere ganger på en delmengde av prøvene i det originale datasett og aggregere resultater i en metamodell. Denne avhandlingen presenterer to tilnærminger for ensemblevariabelseleksjon, som er skreddersydd for høydimensjonale data i helsevesenet: "Repeated Elastic Net Technique for feature selection" (RENT) og "User-Guided Bayesian Framework for feature selection" (UBayFS). Mens RENT er datadrevet og bygger på elastic net-regulariserte modeller, er UBayFS et generelt rammeverk for ensembler som muliggjør inkludering av ekspertkunnskap i variabelseleksjonsprosessen gjennom forhåndsbestemte vekter og sidebegrensninger. En case-studie som modellerer overlevelsen av kreftpasienter sammenligner disse nye variabelseleksjonsmetodene og demonstrerer deres potensiale i klinisk praksis.
Utover valg av enkelte variabler gjør UBayFS det også mulig å velge blokker eller grupper av variabler som representerer de ulike datakildene som ble nevnt over. Kvantifisering av viktigheten av variabelgrupper spiller en nøkkelrolle for forståelsen av hvorvidt datakildene er viktige for responsvariablen. Tilgang til slik informasjon kan føre til at bruken av menneskelige, tekniske og økonomiske ressurser kan forbedres dersom informasjonen integreres systematisk i planleggingen av pasientbehandlingen. Slik kan man redusere innsamling av ikke-informative variabler. Siden generaliseringen av viktighet av variabelgrupper ikke er triviell, undersøkes og sammenlignes også tilnærminger for rangering av viktigheten til disse variabelgruppene.
Denne avhandlingen viser at høydimensjonale datasett fra flere datakilder fra det medisinske domenet effektivt kan håndteres ved bruk av variabelseleksjonmetodene som er presentert i avhandlingen. Eksperimentene viser at disse kan ha positiv en effekt på både prediktiv ytelse, stabilitet og tolkbarhet av resultatene. Bruken av disse variabelseleksjonsmetodene bærer et stort potensiale for bedre datadrevet beslutningsstøtte i klinisk praksis
Detection and diabetic retinopathy grading using digital retinal images
Diabetic Retinopathy is an eye disorder that affects people suffering from diabetes. Higher sugar levels in blood leads to damage of blood vessels in eyes and may even cause blindness. Diabetic retinopathy is identified by red spots known as microanuerysms and bright yellow lesions called exudates. It has been observed that early detection of exudates and microaneurysms may save the patient’s vision and this paper proposes a simple and effective technique for diabetic retinopathy. Both publicly available and real time datasets of colored images captured by fundus camera have been used for the empirical analysis. In the proposed work, grading has been done to know the severity of diabetic retinopathy i.e. whether it is mild, moderate or severe using exudates and micro aneurysms in the fundus images. An automated approach that uses image processing, features extraction and machine learning models to predict accurately the presence of the exudates and micro aneurysms which can be used for grading has been proposed. The research is carried out in two segments; one for exudates and another for micro aneurysms. The grading via exudates is done based upon their distance from macula whereas grading via micro aneurysms is done by calculating their count. For grading using exudates, support vector machine and K-Nearest neighbor show the highest accuracy of 92.1% and for grading using micro aneurysms, decision tree shows the highest accuracy of 99.9% in prediction of severity levels of the disease
A Robust Multilabel Method Integrating Rule-based Transparent Model, Soft Label Correlation Learning and Label Noise Resistance
Model transparency, label correlation learning and the robust-ness to label
noise are crucial for multilabel learning. However, few existing methods study
these three characteristics simultaneously. To address this challenge, we
propose the robust multilabel Takagi-Sugeno-Kang fuzzy system (R-MLTSK-FS) with
three mechanisms. First, we design a soft label learning mechanism to reduce
the effect of label noise by explicitly measuring the interactions between
labels, which is also the basis of the other two mechanisms. Second, the
rule-based TSK FS is used as the base model to efficiently model the inference
relationship be-tween features and soft labels in a more transparent way than
many existing multilabel models. Third, to further improve the performance of
multilabel learning, we build a correlation enhancement learning mechanism
based on the soft label space and the fuzzy feature space. Extensive
experiments are conducted to demonstrate the superiority of the proposed
method.Comment: This paper has been accepted by IEEE Transactions on Fuzzy System
Boosting precision crop protection towards agriculture 5.0 via machine learning and emerging technologies: A contextual review
Crop protection is a key activity for the sustainability and feasibility of agriculture in a current context of climate change, which is causing the destabilization of agricultural practices and an increase in the incidence of current or invasive pests, and a growing world population that requires guaranteeing the food supply chain and ensuring food security. In view of these events, this article provides a contextual review in six sections on the role of artificial intelligence (AI), machine learning (ML) and other emerging technologies to solve current and future challenges of crop protection. Over time, crop protection has progressed from a primitive agriculture 1.0 (Ag1.0) through various technological developments to reach a level of maturity closelyin line with Ag5.0 (section 1), which is characterized by successfully leveraging ML capacity and modern agricultural devices and machines that perceive, analyze and actuate following the main stages of precision crop protection (section 2). Section 3 presents a taxonomy of ML algorithms that support the development and implementation of precision crop protection, while section 4 analyses the scientific impact of ML on the basis of an extensive bibliometric study of >120 algorithms, outlining the most widely used ML and deep learning (DL) techniques currently applied in relevant case studies on the detection and control of crop diseases, weeds and plagues. Section 5 describes 39 emerging technologies in the fields of smart sensors and other advanced hardware devices, telecommunications, proximal and remote sensing, and AI-based robotics that will foreseeably lead the next generation of perception-based, decision-making and actuation systems for digitized, smart and real-time crop protection in a realistic Ag5.0. Finally, section 6 highlights the main conclusions and final remarks
A Literature Review of Fault Diagnosis Based on Ensemble Learning
The accuracy of fault diagnosis is an important indicator to ensure the reliability of key equipment systems. Ensemble learning integrates different weak learning methods to obtain stronger learning and has achieved remarkable results in the field of fault diagnosis. This paper reviews the recent research on ensemble learning from both technical and field application perspectives. The paper summarizes 87 journals in recent web of science and other academic resources, with a total of 209 papers. It summarizes 78 different ensemble learning based fault diagnosis methods, involving 18 public datasets and more than 20 different equipment systems. In detail, the paper summarizes the accuracy rates, fault classification types, fault datasets, used data signals, learners (traditional machine learning or deep learning-based learners), ensemble learning methods (bagging, boosting, stacking and other ensemble models) of these fault diagnosis models. The paper uses accuracy of fault diagnosis as the main evaluation metrics supplemented by generalization and imbalanced data processing ability to evaluate the performance of those ensemble learning methods. The discussion and evaluation of these methods lead to valuable research references in identifying and developing appropriate intelligent fault diagnosis models for various equipment. This paper also discusses and explores the technical challenges, lessons learned from the review and future development directions in the field of ensemble learning based fault diagnosis and intelligent maintenance
- …