6,636 research outputs found

    PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

    Full text link
    This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

    API design for machine learning software: experiences from the scikit-learn project

    Get PDF
    Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper also comments on implementation details specific to the Python ecosystem and analyzes obstacles faced by users and developers of the library

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Ship manoeuvring model parameter identification using intelligent machine learning method and the beetle antennae search algorithm

    Get PDF
    In order to identify more accurately and efficiently the unknown parameters of a ship motions model, a novel Nonlinear Least Squares Support Vector Machine (NLSSVM) algorithm, whose penalty factor and Radial Basis Function (RBF) kernel parameters are optimised by the Beetle Antennae Search algorithm (BAS), is proposed and investigated Aiming at validating the accuracy and applicability of the proposed method, the method is employed to identify the linear and nonlinear parameters of the first-order nonlinear Nomoto model with training samples from numerical simulation and experimental data. Subsequently, the identified parameters are applied in predicting the ship motion. The predicted results illustrate that the new NLSSVM-BAS algorithm can be applied in identifying ship motion's model, and the effectiveness is verified. Compared among traditional identification approaches with the proposed method, the results display that the accuracy is improved. Moreover, the robust and stability of the NLSSVM-BAS are verified by adding noise in the training sample data

    A Study on Comparison of Classification Algorithms for Pump Failure Prediction

    Get PDF
    The reliability of pumps can be compromised by faults, impacting their functionality. Detecting these faults is crucial, and many studies have utilized motor current signals for this purpose. However, as pumps are rotational equipped, vibrations also play a vital role in fault identification. Rising pump failures have led to increased maintenance costs and unavailability, emphasizing the need for cost-effective and dependable machinery operation. This study addresses the imperative challenge of defect classification through the lens of predictive modeling. With a problem statement centered on achieving accurate and efficient identification of defects, this study’s objective is to evaluate the performance of five distinct algorithms: Fine Decision Tree, Medium Decision Tree, Bagged Trees (Ensemble), RUS-Boosted Trees, and Boosted Trees. Leveraging a comprehensive dataset, the study meticulously trained and tested each model, analyzing training accuracy, test accuracy, and Area Under the Curve (AUC) metrics. The results showcase the supremacy of the Fine Decision Tree (91.2% training accuracy, 74% test accuracy, AUC 0.80), the robustness of the Ensemble approach (Bagged Trees with 94.9% training accuracy, 99.9% test accuracy, and AUC 1.00), and the competitiveness of Boosted Trees (89.4% training accuracy, 72.2% test accuracy, AUC 0.79) in defect classification. Notably, Support Vector Machines (SVM), Artificial Neural Networks (ANN), and k-Nearest Neighbors (KNN) exhibited comparatively lower performance. Our study contributes valuable insights into the efficacy of these algorithms, guiding practitioners toward optimal model selection for defect classification scenarios. This research lays a foundation for enhanced decision-making in quality control and predictive maintenance, fostering advancements in the realm of defect prediction and classification

    Building multiclass classifiers for remote homology detection and fold recognition

    Get PDF
    BACKGROUND: Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS: We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION: Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results
    corecore