292,659 research outputs found

    Developing a Generic Predictive Computational Model using Semantic data Pre-Processing with Machine Learning Techniques and its application for Stock Market Prediction Purposes

    Get PDF
    In this paper, we present a Generic Predictive Computational Model (GPCM) and apply it by building a Use Case for the FTSE 100 index forecasting. This involves the mining of heterogeneous data based on semantic methods (ontology), graph-based methods (knowledge graphs, graph databases) and advanced Machine Learning methods. The main focus of our research is data pre-processing aimed at a more efficient selection of input features. The GPCM model pipeline’s cycles involve the propagation of the (initially raw) data to the Graph Database structured by an ontology and regular updates of the features’ weights in the Graph Database by the feedback loop from the Machine Learning Engine. The Graph Database queries output the most valuable features that, in turn, serve as the input for the Machine Learning-based prediction. The end-product of this process is fed back to the Graph Database to update the weights. We report on practical experiments evaluating the effectiveness of the GPCM application in forecasting the FTSE 100 index. The underlying dataset contains multiple parameters related to predicting time-series data, where Long Short-Term Memory (LSTM) is known to be one of the most efficient machine learning methods. The most challenging task here has been to overcome the known restrictions of LSTM, which is capable of analysing one input parameter only. We solved this problem by combining several parallel LSTMs, a Concatenation unit, which merges the LSTMs’ outputs (into a time-series matrix), and a Linear Regression Unit, which produces the final resul

    Evaluation of machine learning classifiers for mineralogy mapping based on near infrared hyperspectral imaging

    Get PDF
    The exploration of mineral resources is a major challenge in a world that seeks sustainable energy, renewable energy, advanced engineering, and new commercial technological devices. The rapid decrease in mineral reserves shifted the focus to under-explored and low accessibility areas that led to the use of on-site portable techniques for mineral mapping purposes, such as near infrared hyperspectral image sensors. The large datasets acquired with these instruments needs data pre-processing, a series of mathematical manipulations that can be achieved using machine learning. The aim of this thesis is to improve an existing method for mineralogy mapping, by focusing on the mineral classification phase. More specifically, a spectral similarity index was utilized to support machine learning classifiers. This was introduced because of the inability of the employed classification models to recognize samples that are not part of a given database; the models always classified samples based on one of the known labels of the database. This could be a problem in hyperspectral images as the pure component found in a sample could correspond to a mineral but also to noise or artefacts due to a variety of reasons, such as baseline correction. The spectral similarity index calculates the similarity between a sample spectrum and its assigned database class spectrum; this happens through the use of a threshold that defines whether the sample belongs to a class or not. The metrics utilized in the spectral similarity index were the spectral angler mapper, the correlation coefficient and five different distances. The machine learning classifiers used to evaluate the spectral similarity index were the decision tree, k-nearest neighbor, and support vector machine. Simulated distortions were also introduced in the dataset to test the robustness of the indexes and to choose the best classifier. The spectral similarity index was assessed with a dataset of nine minerals acquired from the Geological Survey of Finland retrieved from a Specim SWIR camera. The validation of the indexes was assessed with two mine samples obtained with a VTT active hyperspectral sensor prototype. The support vector machine was chosen after the comparison between the three classifiers as it showed higher tolerance to distorted data. With the evaluation of the spectral similarity indexes, was found out that the best performances were achieved with SAM and Chebyshev distance, which maintained high stability with smaller and bigger threshold changes. The best threshold value found is the one that, in the dataset analysed, corresponded to the number of spectra available for each class. As for the validation procedure no reference was available; because of this reason, the results of the mine samples obtained with the spectral similarity index were compared with results that can be obtained through visual interpretation, which were in agreement. The method proposed can be useful to future mineral exploration as it is of great importance to correctly classify minerals found during explorations, regardless the database utilized

    A duct mapping method using least squares support vector machines

    Get PDF
    International audienceThis paper introduces a “refractivity from clutter” (RFC) approach with an inversion method based on a pregenerated database. The RFC method exploits the information contained in the radar sea clutter return to estimate the refractive index profile. Whereas initial efforts are based on algorithms giving a good accuracy involving high computational needs, the present method is based on a learning machine algorithm in order to obtain a real-time system. This paper shows the feasibility of a RFC technique based on the least squares support vector machine inversion method by comparing it to a genetic algorithm on simulated and noise-free data, at 1 and 5 GHz. These data are simulated in the presence of ideal trilinear surface-based ducts. The learning machine is based on a pregenerated database computed using Latin hypercube sampling to improve the efficiency of the learning. The results show that little accuracy is lost compared to a genetic algorithm approach. The computational time of a genetic algorithm is very high, whereas the learning machine approach is real time. The advantage of a real-time RFC system is that it could work on several azimuths in near real time

    An Assessment Tool for Academic Research Managers in the Third World

    Full text link
    The academic evaluation of the publication record of researchers is relevant for identifying talented candidates for promotion and funding. A key tool for this is the use of the indexes provided by Web of Science and SCOPUS, costly databases that sometimes exceed the possibilities of academic institutions in many parts of the world. We show here how the data in one of the bases can be used to infer the main index of the other one. Methods of data analysis used in Machine Learning allow us to select just a few of the hundreds of variables in a database, which later are used in a panel regression, yielding a good approximation to the main index in the other database. Since the information of SCOPUS can be freely scraped from the Web, this approach allows to infer for free the Impact Factor of publications, the main index used in research assessments around the globe.Comment: 31 pages, 10 tables, 13 figure

    Galaxy classification: deep learning on the OTELO and COSMOS databases

    Get PDF
    Context. The accurate classification of hundreds of thousands of galaxies observed in modern deep surveys is imperative if we want to understand the universe and its evolution. Aims. Here, we report the use of machine learning techniques to classify early- and late-type galaxies in the OTELO and COSMOS databases using optical and infrared photometry and available shape parameters: either the Sersic index or the concentration index. Methods. We used three classification methods for the OTELO database: 1) u-r color separation , 2) linear discriminant analysis using u-r and a shape parameter classification, and 3) a deep neural network using the r magnitude, several colors, and a shape parameter. We analyzed the performance of each method by sample bootstrapping and tested the performance of our neural network architecture using COSMOS data. Results. The accuracy achieved by the deep neural network is greater than that of the other classification methods, and it can also operate with missing data. Our neural network architecture is able to classify both OTELO and COSMOS datasets regardless of small differences in the photometric bands used in each catalog. Conclusions. In this study we show that the use of deep neural networks is a robust method to mine the cataloged dataComment: 20 pages, 10 tables, 14 figures, Astronomy and Astrophysics (in press

    Restructuring databases for knowledge discovery by consolidation and link formation

    Get PDF
    Databases often inaccurately identify entities of interest. Two operations, consolidation and link formation, which complement the usual machine learning techniques that use similarity-based clustering to discover classifications, are proposed as essential components of KDD systems for certain applications. Consolidation relates identifiers present in a database to a set of real world entities (RWE’s) which are not uniquely identified in the database. Consolidation may also be viewed as a transformation of representation from the identifiers present in the original database to the RWE’s. Link formation constructs structured relationships between consolidated RWE’s through identifiers and events explicitly represented in the database. Consolidation and link formation are easily implemented as index creation in relational database management systems. An operational knowledge discovery system identifies potential money laundering in a database of large cash transactions using consolidation and link formation
    corecore