7 research outputs found

    Stochastic information granules extraction for graph embedding and classification

    Get PDF
    3noopenGraphs are data structures able to efficiently describe real-world systems and, as such, have been extensively used in recent years by many branches of science, including machine learning engineering. However, the design of efficient graph-based pattern recognition systems is bottlenecked by the intrinsic problem of how to properly match two graphs. In this paper, we investigate a granular computing approach for the design of a general purpose graph-based classification system. The overall framework relies on the extraction of meaningful pivotal substructures on the top of which an embedding space can be build and in which the classification can be performed without limitations. Due to its importance, we address whether information can be preserved by performing stochastic extraction on the training data instead of performing an exhaustive extraction procedure which is likely to be unfeasible for large datasets. Tests on benchmark datasets show that stochastic extraction can lead to a meaningful set of pivotal substructures with a much lower memory footprint and overall computational burden, making the proposed strategies suitable also for dealing with big datasets.openAccademicoBaldini, Luca; Martino, Alessio; Rizzi, AntonelloBaldini, Luca; Martino, Alessio; Rizzi, Antonell

    Similarity Discriminant Analysis

    Get PDF

    A user-friendly guide to using distance measures to compare time series in ecology

    Get PDF
    Time series are a critical component of ecological analysis, used to track changes in biotic and abiotic variables. Information can be extracted from the properties of time series for tasks such as classification (e.g., assigning species to individual bird calls); clustering (e.g., clustering similar responses in population dynamics to abrupt changes in the environment or management interventions); prediction (e.g., accuracy of model predictions to original time series data); and anomaly detection (e.g., detecting possible catastrophic events from population time series). These common tasks in ecological research all rely on the notion of (dis-) similarity, which can be determined using distance measures. A plethora of distance measures have been described, predominantly in the computer and information sciences, but many have not been introduced to ecologists. Furthermore, little is known about how to select appropriate distance measures for time-series-related tasks. Therefore, many potential applications remain unexplored. Here, we describe 16 properties of distance measures that are likely to be of importance to a variety of ecological questions involving time series. We then test 42 distance measures for each property and use the results to develop an objective method to select appropriate distance measures for any task and ecological dataset. We demonstrate our selection method by applying it to a set of real-world data on breeding bird populations in the UK and discuss other potential applications for distance measures, along with associated technical issues common in ecology. Our real-world population trends exhibit a common challenge for time series comparisons: a high level of stochasticity. We demonstrate two different ways of overcoming this challenge, first by selecting distance measures with properties that make them well suited to comparing noisy time series and second by applying a smoothing algorithm before selecting appropriate distance measures. In both cases, the distance measures chosen through our selection method are not only fit-for-purpose but are consistent in their rankings of the population trends. The results of our study should lead to an improved understanding of, and greater scope for, the use of distance measures for comparing ecological time series and help us answer new ecological questions

    Single class classifier using FMCD based non-metric distance for timber defect detection

    Get PDF
    In this work, we propose a robust Mahalanobis one class classifier with Fast Minimum Covariance Determinant estimator (MC-FMCD) for species independent timber defect detection. Having known in timber inspection research that there is a lack of defect samples compared to defect-free samples (imbalanced data), this unsupervised approach applies outlier detection concept with no training samples required. We employ a non-segmenting approach where a timber image will be divided into non-overlapping local regions and the statistical texture features will then be extracted from each of the region. The defect detection works by calculating the Mahalanobis distance (MD) between the features and the distribution average estimate. The distance distribution is approximated using chi-square distribution to determine outlier (defects). The approach is further improved by proposing a robust distribution estimator derived from FMCD algorithm which enhances the defect detection performance. The MC-FMCD is found to perform well in detecting various types of defects across various defect ratios and over multiple timber species. However, blue stain evidently shows poor performance consistently across all timber species. Moreover, the MC-FMCD performs significantly better than the classical MD which confirms that using the robust estimator clearly improved the timber defect detection over using the conventional mean as the average estimator

    Improving the robustness and reliability of population-based global biodiversity indicators

    Get PDF
    The current global biodiversity crisis is complicated by a data crisis. Reliable tools are needed to guide scientific research and conservation policy decisions, but the data underlying those tools is incomplete and biased. For example, the Living Planet Index (LPI) tracks the changing status of global vertebrate biodiversity, but gaps, biases and quality issues plague the aggregated data used to calculate trends. Unfortunately, we have little understanding of how reliable biodiversity indicators are. In this thesis I develop a suite of tools to assess and improve the reliability of trends in the LPI and similar indicators. First, I explore distance measures as a flexible toolset for comparing time series and trends. I test distance measures for properties related to time series comparisons and rate their relative sensitivities, then expand the results into a framework for choosing an appropriate distance measure for any time series comparison task in ecology. I use the framework to select an appropriate metric for determining trend accuracy. Second, I construct a model of trend reliability from accuracy measurements of sampled trend replicates calculated from artificially generated time series datasets. I apply the model to the LPI to reveal that the majority of trends need more data to be considered reliable, particularly across the global south, and for reptiles and amphibians everywhere. Finally, I develop a method to account for sampling error and serial correlation in confidence intervals of indicators that use aggregated abundance data from different sources. I show that the new method results in more robust and accurate confidence intervals across a wide range of dataset parameters, without reducing trend accuracy. I also apply the method to the LPI to reveal that the current method used by the LPI results in inaccurate and overly wide confidence intervals

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
    corecore