1,979 research outputs found

    Query-driven learning for predictive analytics of data subspace cardinality

    Get PDF
    Fundamental to many predictive analytics tasks is the ability to estimate the cardinality (number of data items) of multi-dimensional data subspaces, defined by query selections over datasets. This is crucial for data analysts dealing with, e.g., interactive data subspace explorations, data subspace visualizations, and in query processing optimization. However, in many modern data systems, predictive analytics may be (i) too costly money-wise, e.g., in clouds, (ii) unreliable, e.g., in modern Big Data query engines, where accurate statistics are difficult to obtain/maintain, or (iii) infeasible, e.g., for privacy issues. We contribute a novel, query-driven, function estimation model of analyst-defined data subspace cardinality. The proposed estimation model is highly accurate in terms of prediction and accommodating the well-known selection queries: multi-dimensional range and distance-nearest neighbors (radius) queries. Our function estimation model: (i) quantizes the vectorial query space, by learning the analysts’ access patterns over a data space, (ii) associates query vectors with their corresponding cardinalities of the analyst-defined data subspaces, (iii) abstracts and employs query vectorial similarity to predict the cardinality of an unseen/unexplored data subspace, and (iv) identifies and adapts to possible changes of the query subspaces based on the theory of optimal stopping. The proposed model is decentralized, facilitating the scaling-out of such predictive analytics queries. The research significance of the model lies in that (i) it is an attractive solution when data-driven statistical techniques are undesirable or infeasible, (ii) it offers a scale-out, decentralized training solution, (iii) it is applicable to different selection query types, and (iv) it offers a performance that is superior to that of data-driven approaches

    Image processing system based on similarity/dissimilarity measures to classify binary images from contour-based features

    Get PDF
    Image Processing Systems (IPS) try to solve tasks like image classification or segmentation based on its content. Many authors proposed a variety of techniques to tackle the image classification task. Plenty of methods address the performance of the IPS [1], as long as the influence of many external circumstances, such as illumination, rotation, and noise [2]. However, there is an increasing interest in classifying shapes from binary images (BI). Shape Classification (SC) from BI considers a segmented image as a sample (backgroundsegmentation [3]) and aims to identify objects based in its shape..

    Nuclei/Cell Detection in Microscopic Skeletal Muscle Fiber Images and Histopathological Brain Tumor Images Using Sparse Optimizations

    Get PDF
    Nuclei/Cell detection is usually a prerequisite procedure in many computer-aided biomedical image analysis tasks. In this thesis we propose two automatic nuclei/cell detection frameworks. One is for nuclei detection in skeletal muscle fiber images and the other is for brain tumor histopathological images. For skeletal muscle fiber images, the major challenges include: i) shape and size variations of the nuclei, ii) overlapping nuclear clumps, and iii) a series of z-stack images with out-of-focus regions. We propose a novel automatic detection algorithm consisting of the following components: 1) The original z-stack images are first converted into one all-in-focus image. 2) A sufficient number of hypothetical ellipses are then generated for each nuclei contour. 3) Next, a set of representative training samples and discriminative features are selected by a two-stage sparse model. 4) A classifier is trained using the refined training data. 5) Final nuclei detection is obtained by mean-shift clustering based on inner distance. The proposed method was tested on a set of images containing over 1500 nuclei. The results outperform the current state-of-the-art approaches. For brain tumor histopathological images, the major challenges are to handle significant variations in cell appearance and to split touching cells. The proposed novel automatic cell detection consists of: 1) Sparse reconstruction for splitting touching cells. 2) Adaptive dictionary learning for handling cell appearance variations. The proposed method was extensively tested on a data set with over 2000 cells. The result outperforms other state-of-the-art algorithms with F1 score = 0.96

    Fuzzy-Neural Cost Estimation for Engine Tests

    Get PDF
    This chapter discusses artificial computational intelligence methods as applied to cost prediction. We present the development of a suite of hybrid fuzzy-neural systems for predicting the cost of performing engine tests at NASA’s Stennis Space Center testing facilities. The system is composed of several adaptive network-based fuzzy inference systems (ANFIS), with or without neural subsystems. The output produced by each system in the suite is a rough order of magnitude (ROM) cost estimate for performing the engine test. Basic systems predict cost based solely on raw test data, whereas others use preprocessing of these data, such as principal components and locally linear embedding (LLE), before entering the fuzzy engines. Backpropagation neural networks and radial basis functions networks (RBFNs) are also used to aid in the cost prediction by merging the costs estimated by several ANFIS into a final cost estimate

    Evaluation of algorithms for estimating wheat acreage from multispectral scanner data

    Get PDF
    The author has identified the following significant results. Fourteen different classification algorithms were tested for their ability to estimate the proportion of wheat in an area. For some algorithms, accuracy of classification in field centers was observed. The data base consisted of ground truth and LANDSAT data from 55 sections (1 x 1 mile) from five LACIE intensive test sites in Kansas and Texas. Signatures obtained from training fields selected at random from the ground truth were generally representative of the data distribution patterns. LIMMIX, an algorithm that chooses a pure signature when the data point is close enough to a signature mean and otherwise chooses the best mixture of a pair of signatures, reduced the average absolute error to 6.1% and the bias to 1.0%. QRULE run with a null test achieved a similar reduction

    Improved Aircraft Environmental Impact Segmentation via Metric Learning

    Full text link
    Accurate modeling of aircraft environmental impact is pivotal to the design of operational procedures and policies to mitigate negative aviation environmental impact. Aircraft environmental impact segmentation is a process which clusters aircraft types that have similar environmental impact characteristics based on a set of aircraft features. This practice helps model a large population of aircraft types with insufficient aircraft noise and performance models and contributes to better understanding of aviation environmental impact. Through measuring the similarity between aircraft types, distance metric is the kernel of aircraft segmentation. Traditional ways of aircraft segmentation use plain distance metrics and assign equal weight to all features in an unsupervised clustering process. In this work, we utilize weakly-supervised metric learning and partial information on aircraft fuel burn, emissions, and noise to learn weighted distance metrics for aircraft environmental impact segmentation. We show in a comprehensive case study that the tailored distance metrics can indeed make aircraft segmentation better reflect the actual environmental impact of aircraft. The metric learning approach can help refine a number of similar data-driven analytical studies in aviation.Comment: 32 pages, 11 figure

    Similarity-based methods for machine diagnosis

    Get PDF
    This work presents a data-driven condition-based maintenance system based on similarity-based modeling (SBM) for automatic machinery fault diagnosis. The proposed system provides information about the equipment current state (degree of anomaly), and returns a set of exemplars that can be employed to describe the current state in a sparse fashion, which can be examined by the operator to assess a decision to be made. The system is modular and data-agnostic, enabling its use in different equipment and data sources with small modifications. The main contributions of this work are: the extensive study of the proposition and use of multiclass SBM on different databases, either as a stand-alone classification method or in combination with an off-the-shelf classifier; novel methods for selecting prototypes for the SBM models; the use of new similarity functions; and a new production-ready fault detection service. These contributions achieved the goal of increasing the SBM models performance in a fault classification scenario while reducing its computational complexity. The proposed system was evaluated in three different databases, achieving higher or similar performance when compared with previous works on the same database. Comparisons with other methods are shown for the recently developed Machinery Fault Database (MaFaulDa) and for the Case Western Reserve University (CWRU) bearing database. The proposed techniques increase the generalization power of the similarity model and of the associated classifier, having accuracies of 98.5% on MaFaulDa and 98.9% on CWRU database. These results indicate that the proposed approach based on SBM is worth further investigation.Este trabalho apresenta um sistema de manutenção preditiva para diagnóstico automático de falhas em máquinas. O sistema proposto, baseado em uma técnica denominada similarity-based modeling (SBM), provê informações sobre o estado atual do equipamento (grau de anomalia), e retorna um conjunto de amostras representativas que pode ser utilizado para descrever o estado atual de forma esparsa, permitindo a um operador avaliar a melhor decisão a ser tomada. O sistema é modular e agnóstico aos dados, permitindo que seja utilizado em variados equipamentos e dados com pequenas modificações. As principais contribuições deste trabalho são: o estudo abrangente da proposta do classificador SBM multi-classe e o seu uso em diferentes bases de dados, seja como um classificador ou auxiliando outros classificadores comumente usados; novos métodos para a seleção de amostras representativas para os modelos SBM; o uso de novas funções de similaridade; e um serviço de detecção de falhas pronto para ser utilizado em produção. Essas contribuições atingiram o objetivo de melhorar o desempenho dos modelos SBM em cenários de classificação de falhas e reduziram sua complexidade computacional. O sistema proposto foi avaliado em três bases de dados, atingindo desempenho igual ou superior ao desempenho de trabalhos anteriores nas mesmas bases. Comparações com outros métodos são apresentadas para a recém-desenvolvida Machinery Fault Database (MaFaulDa) e para a base de dados da Case Western Reserve University (CWRU). As técnicas propostas melhoraram a capacidade de generalização dos modelos de similaridade e do classificador final, atingindo acurácias de 98.5% na MaFaulDa e 98.9% na base de dados CWRU. Esses resultados apontam que a abordagem proposta baseada na técnica SBM tem potencial para ser investigada em mais profundidade
    corecore