121 research outputs found

    Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.

    Get PDF
    BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression

    Unsupervised Anomaly Detection of High Dimensional Data with Low Dimensional Embedded Manifold

    Get PDF
    Anomaly detection techniques are supposed to identify anomalies from loads of seemingly homogeneous data and being able to do so can lead us to timely, pivotal and actionable decisions, saving us from potential human, financial and informational loss. In anomaly detection, an often encountered situation is the absence of prior knowledge about the nature of anomalies. Such circumstances advocate for ‘unsupervised’ learning-based anomaly detection techniques. Compared to its ‘supervised’ counterpart, which possesses the luxury to utilize a labeled training dataset containing both normal and anomalous samples, unsupervised problems are far more difficult. Moreover, high dimensional streaming data from tons of interconnected sensors present in modern day industries makes the task more challenging. To carry out an investigative effort to address these challenges is the overarching theme of this dissertation. In this dissertation, the fundamental issue of similarity measure among observations, which is a central piece in any anomaly detection techniques, is reassessed. Manifold hypotheses suggests the possibility of low dimensional manifold structure embedded in high dimensional data. In the presence of such structured space, traditional similarity measures fail to measure the true intrinsic similarity. In light of this revelation, reevaluating the notion of similarity measure seems more pressing rather than providing incremental improvements over any of the existing techniques. A graph theoretic similarity measure is proposed to differentiate and thus identify the anomalies from normal observations. Specifically, the minimum spanning tree (MST), a graph-based approach is proposed to approximate the similarities among data points in the presence of high dimensional structured space. It can track the structure of the embedded manifold better than the existing measures and help to distinguish the anomalies from normal observations. This dissertation investigates further three different aspects of the anomaly detection problem and develops three sets of solution approaches with all of them revolving around the newly proposed MST based similarity measure. In the first part of the dissertation, a local MST (LoMST) based anomaly detection approach is proposed to detect anomalies using the data in the original space. A two-step procedure is developed to detect both cluster and point anomalies. The next two sets of methods are proposed in the subsequent two parts of the dissertation, for anomaly detection in reduced data space. In the second part of the dissertation, a neighborhood structure assisted version of the nonnegative matrix factorization approach (NS-NMF) is proposed. To detect anomalies, it uses the neighborhood information captured by a sparse MST similarity matrix along with the original attribute information. To meet the industry demands, the online version of both LoMST and NS-NMF is also developed for real-time anomaly detection. In the last part of the dissertation, a graph regularized autoencoder is proposed which uses an MST regularizer in addition to the original loss function and is thus capable of maintaining the local invariance property. All of the approaches proposed in the dissertation are tested on 20 benchmark datasets and one real-life hydropower dataset. When compared with the state of art approaches, all three approaches produce statistically significant better outcomes. “Industry 4.0” is a reality now and it calls for anomaly detection techniques capable of processing a large amount of high dimensional data generated in real-time. The proposed MST based similarity measure followed by the individual techniques developed in this dissertation are equipped to tackle each of these issues and provide an effective and reliable real-time anomaly identification platform

    A high-performance IoT solution to reduce frost damages in stone fruits

    Full text link
    [EN] Agriculture is one of the key sectors where technology is opening new opportunities to break up the market. The Internet of Things (IoT) could reduce the production costs and increase the product quality by providing intelligence services via IoT analytics. However, the hard weather conditions and the lack of connectivity in this field limit the successful deployment of such services as they require both, ie, fully connected infrastructures and highly computational resources. Edge computing has emerged as a solution to bring computing power in close proximity to the sensors, providing energy savings, highly responsive web services, and the ability to mask transient cloud outages. In this paper, we propose an IoT monitoring system to activate anti-frost techniques to avoid crop loss, by defining two intelligent services to detect outliers caused by the sensor errors. The former is a nearest neighbor technique and the latter is the k-means algorithm, which provides better quality results but it increases the computational cost. Cloud versus edge computing approaches are analyzed by targeting two different low-power GPUs. Our experimental results show that cloud-based approaches provides highest performance in general but edge computing is a compelling alternative to mask transient cloud outages and provide highly responsive data analytic services in technologically hostile environments.This work was partially supported by the Fundación Séneca del Centro de Coordinación de la Investigación de la Región de Murcia under Project 20813/PI/18, and by Spanish Ministry of Science, Innovation and Universities under grants TIN2016-78799-P (AEI/FEDER, UE) and RTC-2017-6389-5. Finally, we thank the farmers for the availability of their resources to be able to asses and improve the IoT monitoring system proposed.Guillén-Navarro, MA.; Martínez-España, R.; López, B.; Cecilia-Canales, JM. (2021). A high-performance IoT solution to reduce frost damages in stone fruits. Concurrency and Computation: Practice and Experience (Online). 33(2):1-14. https://doi.org/10.1002/cpe.529911433

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    DESIGN AND SIMULATION OF AN EFFICIENT MODEL FOR CREDIT CARDS FRAUD DETECTION

    Get PDF
    In this study a model which can improve the accuracy and reliability of credit card fraud detection was proposed. This is with a few to mitigating contentious issues regarding online transaction of credit card, such as  amount of transactions that have resulted in payment default and the number of credit card fraud cases that have been recorded, all of which have put the economy in jeopardy.   To address this challenge,sample dataset was sourced from online repository database of Kaggle. The feature extraction on the data was performed using Principal Component Analysis (PCA). The credit card fraud detection model was designed using Neuro-fuzzy logic technique, clustering was done using Hierarchical Density Based Spatial Clustering of Application with Noise (HDBSCAN) .The simulation of the proposed model was done in Python programming environment.The performance evaluation of the model was carried out by comparing the proposed model with Neuro-Fuzzy (NF) technique using performance metrics such as precision, recall, F1-score and accuracy.  The simulation result showed that the proposed model (NF + HDBSCAN) had precision of 98.75%, recall of 98.70%, F1-Score of 97.65% and accuracy 99.75% . NF had Precision of 94.60%, recall of 94.50%, F1-Score of 95.50% and accuracy 95.70% using training dataset. Likewise, when test dataset were used, the proposed (NF + HDBSCAN) had precision of 93.50%, recall of 95.50%, F1-Score of 94.50% and accuracy 95.50%. NF had Precision of 92.50%, recall of 93.00%, F1-Score of 94.00% and accuracy 93.50%.  The simulation results of the proposed model was viable, reliable and showed possibility of being designed as module which could be  integrated into the existing credit card design for lowering fraud rate and assisting fraud investigators

    Data-Driven and Hybrid Methods for Naval Applications

    Get PDF
    The goal of this PhD thesis is to study, design and develop data analysis methods for naval applications. Data analysis is improving our ways to understand complex phenomena by profitably taking advantage of the information laying behind a collection of data. In fact, by adopting algorithms coming from the world of statistics and machine learning it is possible to extract valuable information, without requiring specific domain knowledge of the system generating the data. The application of such methods to marine contexts opens new research scenarios, since typical naval problems can now be solved with higher accuracy rates with respect to more classical techniques, based on the physical equations governing the naval system. During this study, some major naval problems have been addressed adopting state-of-the-art and novel data analysis techniques: condition-based maintenance, consisting in assets monitoring, maintenance planning, and real-time anomaly detection; energy and consumption monitoring, in order to reduce vessel consumption and gas emissions; system safety for maneuvering control and collision avoidance; components design, in order to detect possible defects at design stage. A review of the state-of-the-art of data analysis and machine learning techniques together with the preliminary results of the application of such methods to the aforementioned problems show a growing interest in these research topics and that effective data-driven solutions can be applied to the naval context. Moreover, for some applications, data-driven models have been used in conjunction with domain-dependent methods, modelling physical phenomena, in order to exploit both mechanistic knowledge of the system and available measurements. These hybrid methods are proved to provide more accurate and interpretable results with respect to both the pure physical or data-driven approaches taken singularly, thus showing that in the naval context it is possible to offer new valuable methodologies by either providing novel statistical methods or improving the state-of-the-art ones

    Topological inference in graphs and images

    Get PDF
    corecore