141 research outputs found

    Advances and Applications of DSmT for Information Fusion. Collected Works, Volume 5

    Get PDF
    This fifth volume on Advances and Applications of DSmT for Information Fusion collects theoretical and applied contributions of researchers working in different fields of applications and in mathematics, and is available in open-access. The collected contributions of this volume have either been published or presented after disseminating the fourth volume in 2015 in international conferences, seminars, workshops and journals, or they are new. The contributions of each part of this volume are chronologically ordered. First Part of this book presents some theoretical advances on DSmT, dealing mainly with modified Proportional Conflict Redistribution Rules (PCR) of combination with degree of intersection, coarsening techniques, interval calculus for PCR thanks to set inversion via interval analysis (SIVIA), rough set classifiers, canonical decomposition of dichotomous belief functions, fast PCR fusion, fast inter-criteria analysis with PCR, and improved PCR5 and PCR6 rules preserving the (quasi-)neutrality of (quasi-)vacuous belief assignment in the fusion of sources of evidence with their Matlab codes. Because more applications of DSmT have emerged in the past years since the apparition of the fourth book of DSmT in 2015, the second part of this volume is about selected applications of DSmT mainly in building change detection, object recognition, quality of data association in tracking, perception in robotics, risk assessment for torrent protection and multi-criteria decision-making, multi-modal image fusion, coarsening techniques, recommender system, levee characterization and assessment, human heading perception, trust assessment, robotics, biometrics, failure detection, GPS systems, inter-criteria analysis, group decision, human activity recognition, storm prediction, data association for autonomous vehicles, identification of maritime vessels, fusion of support vector machines (SVM), Silx-Furtif RUST code library for information fusion including PCR rules, and network for ship classification. Finally, the third part presents interesting contributions related to belief functions in general published or presented along the years since 2015. These contributions are related with decision-making under uncertainty, belief approximations, probability transformations, new distances between belief functions, non-classical multi-criteria decision-making problems with belief functions, generalization of Bayes theorem, image processing, data association, entropy and cross-entropy measures, fuzzy evidence numbers, negator of belief mass, human activity recognition, information fusion for breast cancer therapy, imbalanced data classification, and hybrid techniques mixing deep learning with belief functions as well

    Hard and soft clustering of categorical time series based on two novel distances with an application to biological sequences

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG.[Abstract]: Two novel distances between categorical time series are introduced. Both of them measure discrepancies between extracted features describing the underlying serial dependence patterns. One distance is based on well-known association measures, namely Cramer's v and Cohen's κ. The other one relies on the so-called binarization of a categorical process, which indicates the presence of each category by means of a canonical vector. Binarization is used to construct a set of innovative association measures which allow to identify different types of serial dependence. The metrics are used to perform crisp and fuzzy clustering of nominal series. The proposed approaches are able to group together series generated from similar stochastic processes, achieve accurate results with series coming from a broad range of models and are computationally efficient. Extensive simulation studies show that both hard and soft clustering algorithms outperform several alternative procedures proposed in the literature. Two applications involving biological sequences from different species highlight the usefulness of the introduced techniques.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C-2020-14The research of Ángel López-Oriona and José A. Vilar has been supported by the Ministerio de Economía y Competitividad (MINECO) grants MTM2017-82724-R and PID2020-113578RB-100, the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020-14), and the Centro de Investigación del Sistema Universitario de Galicia “CITIC” grant ED431G 2019/01; all of them through the European Regional Development Fund (ERDF). This work has received funding for open access charge by Universidade da Coruña/CISUG. The author Ángel López-Oriona is very grateful to researcher Maite Freire for her lessons about DNA theory

    Potential Alzheimer\u27s Disease Plasma Biomarkers

    Get PDF
    In this series of studies, we examined the potential of a variety of blood-based plasma biomarkers for the identification of Alzheimer\u27s disease (AD) progression and cognitive decline. With the end goal of studying these biomarkers via mixture modeling, we began with a literature review of the methodology. An examination of the biomarkers with demographics and other health factors found evidence of minimal risk of confounding along the causal pathway from biomarkers to cognitive performance. Further study examined the usefulness of linear combinations of biomarkers, achieved via partial least squares (PLS) analysis, as predictors of various cognitive assessment scores and clinical cognitive diagnosis. The identified biomarker linear combinations were not effective at predicting cognitive outcomes. The final study of our biomarkers utilized mixture modeling through the extension of group-based trajectory modeling (GBTM). We modeled five biomarkers, covering a range of functions within the body, to identify distinct trajectories over time. Final models showed statistically significant differences in baseline risk factors and cognitive assessments between developmental trajectories of the biomarker outcomes. This course of study has added valuable information to the field of plasma biomarker research in relation to Alzheimer’s disease and cognitive decline

    Using Probability Density Functions to Evaluate Models (PDFEM, v1.0) to compare a biogeochemical model with satellite-derived chlorophyll

    Get PDF
    Global biogeochemical ocean models are invaluable tools to examine how physical, chemical, and biological processes interact in the ocean. Satellite-derived ocean color properties, on the other hand, provide observations of the surface ocean, with unprecedented coverage and resolution. Advances in our understanding of marine ecosystems and biogeochemistry are strengthened by the combined use of these resources, together with sparse in situ data. Recent modeling advances allow the simulation of the spectral properties of phytoplankton and remote sensing reflectances, bringing model outputs closer to the kind of data that ocean color satellites can provide. However, comparisons between model outputs and analogous satellite products (e.g., chlorophyll a) remain problematic. Most evaluations are based on point-by-point comparisons in space and time, where spuriously large errors can occur from small spatial and temporal mismatches, whereas global statistics provide no information on how well a model resolves processes at regional scales. Here, we employ a unique suite of methodologies, the Probability Density Functions to Evaluate Models (PDFEM), which generate a robust comparison of these resources. The probability density functions of physical and biological properties of Longhurst's provinces are compared to evaluate how well a model resolves related processes. Differences in the distributions of chlorophyll a concentration (mg m−3) provide information on matches and mismatches between models and observations. In particular, mismatches help isolate regional sources of discrepancy, which can lead to improving both simulations and satellite algorithms. Furthermore, the use of radiative transfer in the model to mimic remotely sensed products facilitates model–observation comparisons of optical properties of the ocean.</p

    An overview of clustering methods with guidelines for application in mental health research

    Get PDF
    Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie

    Unsupervised learning methods for identifying and evaluating disease clusters in electronic health records

    Get PDF
    Introduction Clustering algorithms are a class of algorithms that can discover groups of observations in complex data and are often used to identify subtypes of heterogeneous diseases in electronic health records (EHR). Evaluating clustering experiments for biological and clinical significance is a vital but challenging task due to the lack of consensus on best practices. As a result, the translation of findings from clustering experiments to clinical practice is limited. Aim The aim of this thesis was to investigate and evaluate approaches that enable the evaluation of clustering experiments using EHR. Methods We conducted a scoping review of clustering studies in EHR to identify common evaluation approaches. We systematically investigated the performance of the identified approaches using a cohort of Alzheimer's Disease (AD) patients as an exemplar comparing four different clustering methods (K-means, Kernel K-means, Affinity Propagation and Latent Class Analysis.). Using the same population, we developed and evaluated a method (MCHAMMER) that tested whether clusterable structures exist in EHR. To develop this method we tested several cluster validation indexes and methods of generating null data to see which are the best at discovering clusters. In order to enable the robust benchmarking of evaluation approaches, we created a tool that generated synthetic EHR data that contain known cluster labels across a range of clustering scenarios. Results Across 67 EHR clustering studies, the most popular internal evaluation metric was comparing cluster results across multiple algorithms (30% of studies). We examined this approach conducting a clustering experiment on AD patients using a population of 10,065 AD patients and 21 demographic, symptom and comorbidity features. K-means found 5 clusters, Kernel K means found 2 clusters, Affinity propagation found 5 and latent class analysis found 6. K-means 4 was found to have the best clustering solution with the highest silhouette score (0.19) and was more predictive of outcomes. The five clusters found were: typical AD (n=2026), non-typical AD (n=1640), cardiovascular disease cluster (n=686), a cancer cluster (n=1710) and a cluster of mental health issues, smoking and early disease onset (n=1528), which has been found in previous research as well as in the results of other clustering methods. We created a synthetic data generation tool which allows for the generation of realistic EHR clusters that can vary in separation and number of noise variables to alter the difficulty of the clustering problem. We found that decreasing cluster separation did increase cluster difficulty significantly whereas noise variables increased cluster difficulty but not significantly. To develop the tool to assess clusters existence we tested different methods of null dataset generation and cluster validation indices, the best performing null dataset method was the min max method and the best performing indices we Calinksi Harabasz index which had an accuracy of 94%, Davies Bouldin index (97%) silhouette score ( 93%) and BWC index (90%). We further found that when clusters were identified using the Calinski Harabasz index they were more likely to have significantly different outcomes between clusters. Lastly we repeated the initial clustering experiment, comparing 10 different pre-processing methods. The three best performing methods were RBF kernel (2 clusters), MCA (4 clusters) and MCA and PCA (6 clusters). The MCA approach gave the best results highest silhouette score (0.23) and meaningful clusters, producing 4 clusters; heart and circulatory( n=1379), early onset mental health (n=1761), male cluster with memory loss (n = 1823), female with more problem (n=2244). Conclusion We have developed and tested a series of methods and tools to enable the evaluation of EHR clustering experiments. We developed and proposed a novel cluster evaluation metric and provided a tool for benchmarking evaluation approaches in synthetic but realistic EHR

    Evaluation of optimal solutions in multicriteria models for intelligent decision support

    Get PDF
    La memoria se enmarca dentro de la optimización y su uso para la toma de decisiones. La secuencia lógica ha sido la modelación, implementación, resolución y validación que conducen a una decisión. Para esto, hemos utilizado herramientas del análisis multicrerio, optimización multiobjetivo y técnicas de inteligencia artificial. El trabajo se ha estructurado en dos partes (divididas en tres capítulos cada una) que se corresponden con la parte teórica y con la parte experimental. En la primera parte se analiza el contexto del campo de estudio con un análisis del marco histórico y posteriormente se dedica un capítulo a la optimización multicriterio en el se recogen modelos conocidos, junto con aportaciones originales de este trabajo. En el tercer capítulo, dedicado a la inteligencia artificial, se presentan los fundamentos del aprendizaje estadístico , las técnicas de aprendizaje automático y de aprendizaje profundo necesarias para las aportaciones en la segunda parte. La segunda parte contiene siete casos reales a los que se han aplicado las técnicas descritas. En el primer capítulo se estudian dos casos: el rendimiento académico de los estudiantes de la Universidad Industrial de Santander (Colombia) y un sistema objetivo para la asignación del premio MVP en la NBA. En el siguiente capítulo se utilizan técnicas de inteligencia artificial a la similitud musical (detección de plagios en Youtube), la predicción del precio de cierre de una empresa en el mercado bursátil de Nueva York y la clasificación automática de señales espaciales acústicas en entornos envolventes. En el último capítulo a la potencia de la inteligencia artificial se le incorporan técnicas de análisis multicriterio para detectar el fracaso escolar universitario de manera precoz (en la Universidad Industrial de Santander) y, para establecer un ranking de modelos de inteligencia artificial de se recurre a métodos multicriterio. Para acabar la memoria, a pesar de que cada capítulo contiene una conclusión parcial, en el capítulo 8 se recogen las principales conclusiones de toda la memoria y una bibliografía bastante exhaustiva de los temas tratados. Además, el trabajo concluye con tres apéndices que contienen los programas y herramientas, que a pesar de ser útiles para la comprensión de la memoria, se ha preferido poner por separado para que los capítulos resulten más fluidos

    Development of a novel approach to treating recreational runners with Achilles tendinopathy

    Get PDF
    Introduction Achilles tendinopathy (AT) is one of the most common running injuries, for which pain-guided progressive tendon loading is the most recommended treatment. However, 21-44 % of individuals with AT who complete a rehabilitation program report continued pain and dysfunction. I developed and conducted four linked studies that synthesised the existing biomechanical literature, identified new impairments, tested feasibility, and developed rationale for a new intervention. Aims The overarching aim of this PhD was to synthesise and extend existing knowledge in order that clinicians have the information and procedures required to better implement rehabilitation in recreational runners with AT. Specific objectives were to synthesize and update data on biomechanical impairments in AT; to identify clinically measurable physical and psychological impairments in runners with AT versus controls; to assess the feasibility of a novel intervention focussed on a SSC (stretch-shorten cycle) activity progression for recreational runners with AT; and to provide further rationale for end-stage SSC rehabilitation by investigating the Achilles tendon forces produced during different exercises used in the rehabilitation of runners with AT. The impact of success would be improved outcomes in the athletically active population. Methods Study 1 was a systematic review in which quality, risk of bias and evidence levels of manuscripts investigating the biomechanical aspects that precede or are associated with AT during running and hopping were synthesised. Study 2 was a case control study in which clinically accessible measures were implemented to investigate physical and psychological impairments associated with disease severity and impairments in runners with AT in comparison to matched controls. Study 3 evaluated the feasibility of education and exercise supplemented by a pain-guided hopping intervention as a novel treatment approach for recreational runners with AT. Finally, in study 4, I measured Achilles tendon forces in a laboratory, and investigated the relationship with pain, in runners with AT during 12 rehabilitation exercises. Results The systematic review reported that only 17% of the investigated biomechanical parameters were different between groups suggesting that either the correct biomechanical factors have not been investigated or such factors are rarely associated with the development and presence of AT. However, the lack of prospective (2 of 16) and high-quality (4 of 16) studies precluded strong conclusions. The case-control study found that runners with AT had lower physical activity scores (OR = 0.19, 95%CI = 0.05-0.71, p = 0.01), lower seated heel raise 6 repetition maximum (RM) (OR < 0.001, 95%CI = 2.39e -06 -0.006, p < 0.001) and were shorter on stature (OR = 0.86, 95%CI = 0.77-0.96, p = 0.01) than their healthy counterparts. No psychological differences were found. Further, 46% of AT severity variance was explained by higher BMI ( = -0.41; p = 0.001), weaker leg curl 6RM ( = 0.32; p = 0.009), and higher pain during hopping ( = -0.43; p = 0.001). The cross-sectional study identified a progression of Achilles tendon forces across the 12 evaluated exercises, and two distinct exercise clusters with discrepant Achilles tendon force profiles, that may inform AT rehabilitation. Further, moderate to strong correlation between Achilles tendon forces and pain was only present in 27% of the participants. Finally, the feasibility study reported that education and exercise supplemented by a pain-guided hopping intervention was a feasible intervention with some caveats to be addressed in terms of adherence and fidelity improvement, and adverse events reduction. Conclusion In this thesis, I have found that activity level, plantar flexor strength and education appear to be more important than biomechanical and psychological aspects in the presentation of runners with AT. A novel management approach including education and a hopping intervention is feasible and a hierarchical exercise progression based on Achilles tendon forces has been proposed. These identified small pieces will inform a novel biopsychosocial approach that will be tested in a future randomised controlled trial

    MIXING IT UP: THE IMPACT OF EPISODIC INTROGRESSION ON THE EVOLUTION OF HIGH-LATITUDE MESOCARNIVORES

    Get PDF
    At high latitudes, climatic oscillations have triggered repeated episodes of organismal divergence by geographically isolating populations. For terrestrial species, extended isolation in glacial refugia – ice-free regions that enable terrestrial species persistence through glacial maxima – is hypothesized to stimulate allopatric divergence. Alternatively, upon glacial recession, divergent populations expanded from independent glacial refugia and often contacted other diverging populations. In the absence of reproductive isolating mechanisms, this biogeographic process may trigger hybridization and ultimately, gene flow between divergent taxa. My dissertation research aims to understand how these episodic periods of isolation and contact have impacted the evolution of high latitude species. To understand the role of episodic isolation and gene flow on the evolution and diversification of high-latitude species, my dissertation integrates genetic, genomic, and morphometric characters across multiple high-latitude mesocarnivore mammals within the hyper-diverse Mustelidae family. Overall, I identified substantial cryptic diversity in the Arctic and highlight the complementary roles of glacial and interglacial cycles in the evolution and structuring of high latitude biota

    Flexible non-parametric tests of sample exchangeability and feature independence

    Full text link
    In scientific studies involving analyses of multivariate data, two questions often arise for the researcher. First, is the sample exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units? Second, are the features independent of one another, or can the features be grouped so that the groups are mutually independent? We propose a non-parametric approach that addresses these two questions. Our approach is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. In the exchangeability detection setting, through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find that our approach compares favorably in various scenarios of interest. We apply our method to problems in population and statistical genetics, including stratification detection and linkage disequilibrium splitting. We also consider other application domains, applying our approach to post-clustering single-cell chromatin accessibility data and World Values Survey data, where we show how users can partition features into independent groups, which helps generate new scientific hypotheses about the features.Comment: Main Text: 25 pages Supplementary Material: 39 page
    corecore