2,445 research outputs found

    Supervised Classification Using Homogeneous Logical Proportions for Binary and Nominal Features

    Get PDF
    International audienceThe notion of homogeneous logical proportions has been recently introduced in close relation with the idea of analogical proportion. The four homogeneous proportions have intuitive meanings, which can be related with classification tasks. In this paper, we proposed a supervised classification algorithm using homogeneous logical proportions and provide results for all. A final comparison with previous works using similar methodologies and with other classifiers is provided

    Fusion of aerial images and sensor data from a ground vehicle for improved semantic mapping

    Get PDF
    This work investigates the use of semantic information to link ground level occupancy maps and aerial images. A ground level semantic map, which shows open ground and indicates the probability of cells being occupied by walls of buildings, is obtained by a mobile robot equipped with an omnidirectional camera, GPS and a laser range finder. This semantic information is used for local and global segmentation of an aerial image. The result is a map where the semantic information has been extended beyond the range of the robot sensors and predicts where the mobile robot can find buildings and potentially driveable ground

    Predicting progression of mild cognitive impairment to dementia using neuropsychological data: a supervised learning approach using time windows

    Get PDF
    Background: Predicting progression from a stage of Mild Cognitive Impairment to dementia is a major pursuit in current research. It is broadly accepted that cognition declines with a continuum between MCI and dementia. As such, cohorts of MCI patients are usually heterogeneous, containing patients at different stages of the neurodegenerative process. This hampers the prognostic task. Nevertheless, when learning prognostic models, most studies use the entire cohort of MCI patients regardless of their disease stages. In this paper, we propose a Time Windows approach to predict conversion to dementia, learning with patients stratified using time windows, thus fine-tuning the prognosis regarding the time to conversion. Methods: In the proposed Time Windows approach, we grouped patients based on the clinical information of whether they converted (converter MCI) or remained MCI (stable MCI) within a specific time window. We tested time windows of 2, 3, 4 and 5 years. We developed a prognostic model for each time window using clinical and neuropsychological data and compared this approach with the commonly used in the literature, where all patients are used to learn the models, named as First Last approach. This enables to move from the traditional question "Will a MCI patient convert to dementia somewhere in the future" to the question "Will a MCI patient convert to dementia in a specific time window". Results: The proposed Time Windows approach outperformed the First Last approach. The results showed that we can predict conversion to dementia as early as 5 years before the event with an AUC of 0.88 in the cross-validation set and 0.76 in an independent validation set. Conclusions: Prognostic models using time windows have higher performance when predicting progression from MCI to dementia, when compared to the prognostic approach commonly used in the literature. Furthermore, the proposed Time Windows approach is more relevant from a clinical point of view, predicting conversion within a temporal interval rather than sometime in the future and allowing clinicians to timely adjust treatments and clinical appointments.FCT under the Neuroclinomics2 project [PTDC/EEI-SII/1937/2014, SFRH/BD/95846/2013]; INESC-ID plurianual [UID/CEC/50021/2013]; LASIGE Research Unit [UID/CEC/00408/2013

    A phase II two stage clinical trial design to handle latent heterogeneity for a binary response.

    Get PDF
    Phase II clinical trial are generally single arm trial where a homogeneity assumption is placed on the response. In practice, this assumption may be violated resulting in a heterogeneous response. This heterogeneous or overdispersed response can be decomposed into distinct subgroups based on the etiology of the heterogeneity. A general classification model is developed to quantify the heterogeneity. The most common Phase II trial design used in practice is the Simon 2-stage design which relies on the assumption of response homogeneity. This design is shown to be flawed under the assumption of heterogeneity with errors exceeding the target trial errors. To correct for the error inflation, a modification is made to the Simon design if heterogeneity is detected after the first stage trial conduct. The trial sample size is increased using an empirical estimate for the variance inflation factor and the trial is then completed with design parameters constructed through the posterior predictive Beta-binomial distribution given the first stage results. The new design, denoted the 2-stage Heterogeneity Adaptive (2HA) design, is applied to a two subgroup problem under latent heterogeneity. Latent heterogeneity represents the most general form of heterogeneity, no information is known prior to trial conduct. The results, through simulation, show that the target errors can be maintained with this modification to the Simon design under a wide range of heterogeneity

    Predictive Modelling Approach to Data-Driven Computational Preventive Medicine

    Get PDF
    This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus. Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance. In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining

    From Graph Coloring to Receptor Clustering

    Get PDF
    1. Hued colorings for planar graphs, graphs of higher genus and K4-minor free graphs.;For integers k, r \u3e 0, a (k,r) -coloring of a graph G is a proper coloring of the vertices of G with k colors such that every vertex v of degree d(v) is adjacent to vertices with at least min{lcub}d(v) ,r{rcub} different colors. The r-hued chromatic number, denoted by Xr (G), is the smallest integer k for which a graph G has a ( k,r)-coloring. A list assignment L of G is a function that assigns to every vertex v of G a set L(v) of positive integers. For a given list assignment L of G, an ( L,r)-coloring of G is a proper coloring c of the vertices such that every vertex v of degree d(v) is adjacent to vertices with at least min{lcub} d(v),r{rcub} different colors and c(v) epsilon L(v). The r-hued choice number of G, XL,r(G), is the least integer k such that every list assignment L with | L(v)| = k, ∀ v epsilon V(G), permits an (L,r)-coloring. It is known that for any graph G, Xr(G) ≀ XL,r( G). Using Euler distributions, we proved the following results, where (ii) and (iii) are best possible. (i) If G is planar, then XL,2(G) ≀ 6. Moreover, XL,2( (G) ≀ 5 when Delta (G) ≀ 4. (ii) If G is planar, then X2( G) ≀ 5. (iii) If G is a graph with genus g(G) ≄ 1, then XL,2 (G) ≀ Âœ 7+1+48gG .;Let K(r) = r + 3 if 2 ≀ r ≀ 3, and K(r) = 3r/2+1 if r≄ 4. We proved that if G is a K4-minor free graph, then (i) Xr(G) ≀ K(r), and the bound can be attained; (ii) XL,r(G) ≀ K( r)+1. This extends a previous result in [Discrete Math. 269 (2003) 303--309].;2. Quantitative description and impact of VEGF receptor clustering .;Cell membrane-bound receptors control signal initiation in many important cellular signaling pathways. Microscopic imaging and modern labeling techniques reveal that certain receptor types tend to co-localize in clusters, ranging from a few to hundreds of members. Here, we further develop a method of defining receptor clusters in the membrane based on their mutual distance, and apply it to a set of transmission microscopy (TEM) images of vascular endothelial growth factor (VEGF) receptors. We clarify the difference between the observed distributions and random placement. Moreover, we outline a model of clustering based on the hypothesis of pre-existing domains that have a high affinity for receptors. The observed results are consistent with the combination of two distributions, one corresponding to the placement of clusters, and the other to that of random placement of individual receptors within the clusters. Further, we use the preexisting domain model to calculate the probability distribution of cluster sizes. By comparing to the experimental result, we estimate the likely area and attractiveness of the clustering domains.;Furthermore, as VEGF signaling is involved in the process of blood vessel development and maintenance, it is of our interest to investigate the impact of VEGF receptors (VEGFR) clustering. VEGF signaling is initiated by binding of the bivalent VEGF ligand to the membrane-bound receptors (VEGFR), which in turn stimulates receptor dimerization. To address these questions, we have formulated the simplest possible model. We have postulated the existence of a single high affinity region in the cell membrane, which acts as a transient trap for receptors. We have defined an ODE model by introducing high- and low-density receptor variables and introduce the corresponding reactions from a realistic model of VEGF signal initiation. Finally, we use the model to investigate the relation between the degree of VEGFR concentration, ligand availability, and signaling. In conclusion, our simulation results provide a deeper understanding of the role of receptor clustering in cell signaling

    Analysis Tools for Small and Big Data Problems

    Get PDF
    The dissertation focuses on two separate problems. Each is informed by real-world applications. The first problem involves the assessment of an ordinal measurement system in a manufacturing setting. A random-effects model is proposed that is applicable to this repeatability and reproducibility context, and a Bayesian framework is adopted to facilitate inference. This first problem is an example of an analysis tool to solve a small data problem.;The second problem involves statistical machine learning applied to big data problems. As more and more data become available, a need increases to automate the ability to identify particularly relevant features in a prediction or forecasting context. This often involves expanding features using kernel functions to better facilitate predictive capabilities. Simultaneously, there are often manifolds embedded within big data structures that can be exploited to improve predictive performance on real data sets. Bringing together manifold learning with kernel methods provides a powerful and novel tool developed in this dissertation.;This dissertation has the advantage of contributing to a more-classical problem in statistics involving ordinal data and to cutting edge machine learning techniques for the analysis of big data. It is our contention that statisticians need to understand both problem types. The novel tools developed here are demonstrated on practical applications with strong results
    • 

    corecore