145 research outputs found

    Optimal Motor Unit Subset Selection for Accurate Motor Intention Decoding: Towards Dexterous Real-Time Interfacing

    Get PDF
    Objective: Motor unit (MU) discharge timings encode human motor intentions to the finest degree. Whilst tapping into such information can bring significant gains to a range of applications, current approaches to MU decoding from surface signals do not scale well with the demands of dexterous human-machine interfacing (HMI). To optimize the forward estimation accuracy and time-efficiency of such systems, we propose the inclusion of task-wise initialization and MU subset selection. Methods: Offline analyses were conducted on data recorded from 11 non-disabled subjects. Task-wise decomposition was applied to identify MUs from high-density surface electromyography (HD-sEMG) pertaining to 18 wrist/forearm motor tasks. The activities of a selected subset of MUs were extracted from test data and used for forward estimation of intended motor tasks and joint kinematics. To that end, various combinations of subset selection and estimation algorithms (both regression and classification-based) were tested for a range of subset sizes. Results: The mutual information-based minimum Redundancy Maximum Relevance (mRMR-MI) criterion retained MUs with the highest predicative power. When the portion of tracked MUs was reduced down to 25%, the regression performance decreased only by 3% (R2=0.79) while classification accuracy dropped by 2.7% (accuracy = 74%) when kernel-based estimators were considered. Conclusion and Significance: Careful selection of tracked MUs can optimize the efficiency of MU-driven interfacing. In particular, prioritization of MUs exhibiting strong nonlinear relationships with target motions is best leveraged by kernel-based estimators. Hence, this frees resources for more robust and adaptive MU decoding techniques to be implemented in future

    Data- og ekspertdreven variabelseleksjon for prediktive modeller i helsevesenet : mot økt tolkbarhet i underbestemte maskinlæringsproblemer

    Get PDF
    Modern data acquisition techniques in healthcare generate large collections of data from multiple sources, such as novel diagnosis and treatment methodologies. Some concrete examples are electronic healthcare record systems, genomics, and medical images. This leads to situations with often unstructured, high-dimensional heterogeneous patient cohort data where classical statistical methods may not be sufficient for optimal utilization of the data and informed decision-making. Instead, investigating such data structures with modern machine learning techniques promises to improve the understanding of patient health issues and may provide a better platform for informed decision-making by clinicians. Key requirements for this purpose include (a) sufficiently accurate predictions and (b) model interpretability. Achieving both aspects in parallel is difficult, particularly for datasets with few patients, which are common in the healthcare domain. In such cases, machine learning models encounter mathematically underdetermined systems and may overfit easily on the training data. An important approach to overcome this issue is feature selection, i.e., determining a subset of informative features from the original set of features with respect to the target variable. While potentially raising the predictive performance, feature selection fosters model interpretability by identifying a low number of relevant model parameters to better understand the underlying biological processes that lead to health issues. Interpretability requires that feature selection is stable, i.e., small changes in the dataset do not lead to changes in the selected feature set. A concept to address instability is ensemble feature selection, i.e. the process of repeating the feature selection multiple times on subsets of samples of the original dataset and aggregating results in a meta-model. This thesis presents two approaches for ensemble feature selection, which are tailored towards high-dimensional data in healthcare: the Repeated Elastic Net Technique for feature selection (RENT) and the User-Guided Bayesian Framework for feature selection (UBayFS). While RENT is purely data-driven and builds upon elastic net regularized models, UBayFS is a general framework for ensembles with the capabilities to include expert knowledge in the feature selection process via prior weights and side constraints. A case study modeling the overall survival of cancer patients compares these novel feature selectors and demonstrates their potential in clinical practice. Beyond the selection of single features, UBayFS also allows for selecting whole feature groups (feature blocks) that were acquired from multiple data sources, as those mentioned above. Importance quantification of such feature blocks plays a key role in tracing information about the target variable back to the acquisition modalities. Such information on feature block importance may lead to positive effects on the use of human, technical, and financial resources if systematically integrated into the planning of patient treatment by excluding the acquisition of non-informative features. Since a generalization of feature importance measures to block importance is not trivial, this thesis also investigates and compares approaches for feature block importance rankings. This thesis demonstrates that high-dimensional datasets from multiple data sources in the medical domain can be successfully tackled by the presented approaches for feature selection. Experimental evaluations demonstrate favorable properties of both predictive performance, stability, as well as interpretability of results, which carries a high potential for better data-driven decision support in clinical practice.Moderne datainnsamlingsteknikker i helsevesenet genererer store datamengder fra flere kilder, som for eksempel nye diagnose- og behandlingsmetoder. Noen konkrete eksempler er elektroniske helsejournalsystemer, genomikk og medisinske bilder. Slike pasientkohortdata er ofte ustrukturerte, høydimensjonale og heterogene og hvor klassiske statistiske metoder ikke er tilstrekkelige for optimal utnyttelse av dataene og god informasjonsbasert beslutningstaking. Derfor kan det være lovende å analysere slike datastrukturer ved bruk av moderne maskinlæringsteknikker for å øke forståelsen av pasientenes helseproblemer og for å gi klinikerne en bedre plattform for informasjonsbasert beslutningstaking. Sentrale krav til dette formålet inkluderer (a) tilstrekkelig nøyaktige prediksjoner og (b) modelltolkbarhet. Å oppnå begge aspektene samtidig er vanskelig, spesielt for datasett med få pasienter, noe som er vanlig for data i helsevesenet. I slike tilfeller må maskinlæringsmodeller håndtere matematisk underbestemte systemer og dette kan lett føre til at modellene overtilpasses treningsdataene. Variabelseleksjon er en viktig tilnærming for å håndtere dette ved å identifisere en undergruppe av informative variabler med hensyn til responsvariablen. Samtidig som variabelseleksjonsmetoder kan lede til økt prediktiv ytelse, fremmes modelltolkbarhet ved å identifisere et lavt antall relevante modellparametere. Dette kan gi bedre forståelse av de underliggende biologiske prosessene som fører til helseproblemer. Tolkbarhet krever at variabelseleksjonen er stabil, dvs. at små endringer i datasettet ikke fører til endringer i hvilke variabler som velges. Et konsept for å adressere ustabilitet er ensemblevariableseleksjon, dvs. prosessen med å gjenta variabelseleksjon flere ganger på en delmengde av prøvene i det originale datasett og aggregere resultater i en metamodell. Denne avhandlingen presenterer to tilnærminger for ensemblevariabelseleksjon, som er skreddersydd for høydimensjonale data i helsevesenet: "Repeated Elastic Net Technique for feature selection" (RENT) og "User-Guided Bayesian Framework for feature selection" (UBayFS). Mens RENT er datadrevet og bygger på elastic net-regulariserte modeller, er UBayFS et generelt rammeverk for ensembler som muliggjør inkludering av ekspertkunnskap i variabelseleksjonsprosessen gjennom forhåndsbestemte vekter og sidebegrensninger. En case-studie som modellerer overlevelsen av kreftpasienter sammenligner disse nye variabelseleksjonsmetodene og demonstrerer deres potensiale i klinisk praksis. Utover valg av enkelte variabler gjør UBayFS det også mulig å velge blokker eller grupper av variabler som representerer de ulike datakildene som ble nevnt over. Kvantifisering av viktigheten av variabelgrupper spiller en nøkkelrolle for forståelsen av hvorvidt datakildene er viktige for responsvariablen. Tilgang til slik informasjon kan føre til at bruken av menneskelige, tekniske og økonomiske ressurser kan forbedres dersom informasjonen integreres systematisk i planleggingen av pasientbehandlingen. Slik kan man redusere innsamling av ikke-informative variabler. Siden generaliseringen av viktighet av variabelgrupper ikke er triviell, undersøkes og sammenlignes også tilnærminger for rangering av viktigheten til disse variabelgruppene. Denne avhandlingen viser at høydimensjonale datasett fra flere datakilder fra det medisinske domenet effektivt kan håndteres ved bruk av variabelseleksjonmetodene som er presentert i avhandlingen. Eksperimentene viser at disse kan ha positiv en effekt på både prediktiv ytelse, stabilitet og tolkbarhet av resultatene. Bruken av disse variabelseleksjonsmetodene bærer et stort potensiale for bedre datadrevet beslutningsstøtte i klinisk praksis

    Neuromuscular Control Modelling of Human Perturbed Posture Through Piecewise Affine Autoregressive With Exogenous Input Models

    Get PDF
    In this study, the neuromuscular control modeling of the perturbed human upright stance is assessed through piecewise affine autoregressive with exogenous input (PWARX) models. Ten healthy subjects underwent an experimental protocol where visual deprivation and cognitive load are applied to evaluate whether PWARX can be used for modeling the role of the central nervous system (CNS) in balance maintenance in different conditions. Balance maintenance is modeled as a single-link inverted pendulum; and kinematic, dynamic, and electromyography (EMG) data are used to fit the PWARX models of the CNS activity. Models are trained on 70% and tested on the 30% of unseen data belonging to the remaining dataset. The models are able to capture which factors the CNS is subjected to, showing a fitting accuracy higher than 90% for each experimental condition. The models present a switch between two different control dynamics, coherent with the physiological response to a sudden balance perturbation and mirrored by the data-driven lag selection for data time series. The outcomes of this study indicate that hybrid postural control policies, yet investigated for unperturbed stance, could be an appropriate motor control paradigm when balance maintenance undergoes external disruption

    Filter-wrapper combination and embedded feature selection for gene expression data

    Get PDF
    Biomedical and bioinformatics datasets are generally large in terms of their number of features - and include redundant and irrelevant features, which affect the effectiveness and efficiency of classification of these datasets. Several different features selection methods have been utilised in various fields, including bioinformatics, to reduce the number of features. This study utilised Filter-Wrapper combination and embedded (LASSO) feature selection methods on both high and low dimensional datasets before classification was performed. The results illustrate that the combination of filter and wrapper feature selection to create a hybrid form of feature selection provides better performance than using filter only. In addition, LASSO performed better on high dimensional data

    Supervised And Semi-supervised Learning Using Informative Feature Subspaces

    Get PDF
    Tez (Doktora) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2010Thesis (PhD) -- İstanbul Technical University, Institute of Science and Technology, 2010Web madenciliği, biyoinformatik ve konuşma tanıma gibi birçok farklı alanda çok yüksek miktarda etiketsiz veri ve farklı öznitelik uzayları bulunmaktadır. Birlikte öğrenme (Co-training) algoritması gibi yarı-eğitmenli algoritmalar etiketsiz verinin kullanımını amaçlamaktadır. Rastgele öznitelik alt uzayları (RAS) metodu farklı öznitelik alt uzaylarını kullanarak sınıflandırıcı eğitmeyi ve bu sınıflandırıcıları, topluluklarda birleştirmeyi amaçlamaktadır. Bu tez çalışmasında, sınıflandırıcı toplulukları için ilişkili öznitelik alt uzayları rastgele seçilerek; bilgi içeren ve çeşitliliği sağlanmış öznitelik alt uzaylarının oluşturulması sağlanmıştır. Oluşturulan sınıflandırıcı toplulukları, eğitmenli ve yarı-eğitmenli öğrenme için kullanılmıştır. Önerdiğimiz ilk yöntem, öznitelik alt uzaylarını karşılıklı bilgi miktarına bağlı ilişki değerlerini kullanarak seçmektedir. Bu yöntem Rel-RAS (eğitmenli) ve Rel-RASCO (yarı-eğitmenli) algoritmalarında kullanılmıştır. İkinci yöntem, ilişkili ve artık olmayan öznitelik alt uzaylarını seçmek için, mRMR (en düşük artıklık ve en yüksek ilişkili) öznitelik seçme algoritmasının değiştirilmiş şeklini kullanmaktadır. Bu yöntem mRMR-RAS (eğitmenli) ve mRMR-RASCO (yarı-eğitmenli) algoritmalarında kullanılmıştır. Önerilen yöntemlerin deneysel analizleri belirli sayıda veri kümesinde gerçekleştirilmiş ve mevcut yöntemlerle karşılaştırılmıştır. Aynı zamanda önerilen yöntemlerle oluşturulmuş sınıflandırıcı topluluklarının teorik analizleri; Kohavi Wolpert (KW) varyans, bilgi kuramı tabanlı düşük düzeyli çeşitlilik (LOD) ve bilgi kuramı sayısı (ITS) kullanılarak gerçekleştirilmiştir. LOD ve KW-varyansının davranışları arasında benzerlik bulunmuş ve topluluk sınıflandırma başarımının ITS ile açıklanabileceği görülmüştür.In many different fields, such as web mining, bioinformatics, speech recognition, there is an abundance of unlabeled data and different feature views. Semi-supervised learning algorithms such as Co-training aim to make use of unlabeled data. Random (feature) subspace (RAS) methods aim to use different feature subspaces to train different classifiers and combine them in an ensemble. In this thesis, we obtain informative and diverse feature subspaces for classifier ensembles by means of randomly drawing relevant feature subspaces. We then use these ensembles for supervised and semi-supervised learning. Our first algorithm produces relevant random subspaces using the mutual information based relevance values. This method is used in Rel-RAS (supervised) and Rel-RASCO (semi-supervised) algorithms. The second algorithm modifies the mRMR (Minimum Redundancy Maximum Relevance) feature selection algorithm to produce random feature subsets that are both relevant and non-redundant. This method is used in mRMR-RAS (supervised) and mRMR-RASCO (semi-supervised) algorithms. We perform experimental analysis of our methods on a number of datasets and compare them to existing methods. We also do theoretical analysis of classifier ensembles produced by our methods using Kohavi Wolpert (KW) variance, information theory based low order diversity (LOD) and information theoretic scores (ITS). We find out that LOD has a similar tendency with KW-variance and ensemble accuracy of the algorithms can be explained using ITS.DoktoraPh

    OFSET_mine:an integrated framework for cardiovascular diseases risk prediction based on retinal vascular function

    Get PDF
    As cardiovascular disease (CVD) represents a spectrum of disorders that often manifestfor the first time through an acute life-threatening event, early identification of seemingly healthy subjects with various degrees of risk is a priority.More recently, traditional scores used for early identification of CVD risk are slowly being replaced by more sensitive biomarkers that assess individual, rather than population risks for CVD. Among these, retinal vascular function, as assessed by the retinal vessel analysis method (RVA), has been proven as an accurate reflection of subclinical CVD in groups of participants without overt disease but with certain inherited or acquired risk factors. Furthermore, in order to correctly detect individual risk at an early stage, specialized machine learning methods and featureselection techniques that can cope with the characteristics of the data need to bedevised.The main contribution of this thesis is an integrated framework, OFSET_mine, that combinesnovel machine learning methods to produce a bespoke solution for Cardiovascular Risk Prediction based on RVA data that is also applicable to other medical datasets with similar characteristics. The three identified essential characteristics are 1) imbalanced dataset,2) high dimensionality and 3) overlapping feature ranges with the possibility of acquiring new samples. The thesis proposes FiltADASYN as an oversampling method that deals with imbalance, DD_Rank as a feature selection method that handles high dimensionality, and GCO_mine as a method for individual-based classification, all three integrated within the OFSET_mine framework.The new oversampling method FiltADASYN extends Adaptive Synthetic Oversampling(ADASYN) with an additional step to filter the generated samples and improve the reliability of the resultant sample set. The feature selection method DD_Rank is based on Restricted Boltzmann Machine (RBM) and ranks features according to their stability and discrimination power. GCO_mine is a lazy learning method based on Graph Cut Optimization (GCO), which considers both the local arrangements and the global structure of the data.OFSET_mine compares favourably to well established composite techniques. Itex hibits high classification performance when applied to a wide range of benchmark medical datasets with variable sample size, dimensionality and imbalance ratios.When applying OFSET _mine on our RVA data, an accuracy of 99.52% is achieved. In addition, using OFSET, the hybrid solution of FiltADASYN and DD_Rank, with Random Forest on our RVA data produces risk group classifications with accuracy 99.68%. This not only reflects the success of the framework but also establishes RVAas a valuable cardiovascular risk predicto
    corecore