3,460 research outputs found

    Machine Learning for Fluid Mechanics

    Full text link
    The field of fluid mechanics is rapidly advancing, driven by unprecedented volumes of data from field measurements, experiments and large-scale simulations at multiple spatiotemporal scales. Machine learning offers a wealth of techniques to extract information from data that could be translated into knowledge about the underlying fluid mechanics. Moreover, machine learning algorithms can augment domain knowledge and automate tasks related to flow control and optimization. This article presents an overview of past history, current developments, and emerging opportunities of machine learning for fluid mechanics. It outlines fundamental machine learning methodologies and discusses their uses for understanding, modeling, optimizing, and controlling fluid flows. The strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation. Machine learning provides a powerful information processing framework that can enrich, and possibly even transform, current lines of fluid mechanics research and industrial applications.Comment: To appear in the Annual Reviews of Fluid Mechanics, 202

    Features extraction using random matrix theory.

    Get PDF
    Representing the complex data in a concise and accurate way is a special stage in data mining methodology. Redundant and noisy data affects generalization power of any classification algorithm, undermines the results of any clustering algorithm and finally encumbers the monitoring of large dynamic systems. This work provides several efficient approaches to all aforementioned sides of the analysis. We established, that notable difference can be made, if the results from the theory of ensembles of random matrices are employed. Particularly important result of our study is a discovered family of methods based on projecting the data set on different subsets of the correlation spectrum. Generally, we start with traditional correlation matrix of a given data set. We perform singular value decomposition, and establish boundaries between essential and unimportant eigen-components of the spectrum. Then, depending on the nature of the problem at hand we either use former or later part for the projection purpose. Projecting the spectrum of interest is a common technique in linear and non-linear spectral methods such as Principal Component Analysis, Independent Component Analysis and Kernel Principal Component Analysis. Usually the part of the spectrum to project is defined by the amount of variance of overall data or feature space in non-linear case. The applicability of these spectral methods is limited by the assumption that larger variance has important dynamics, i.e. if the data has a high signal-to-noise ratio. If it is true, projection of principal components targets two problems in data mining, reduction in the number of features and selection of more important features. Our methodology does not make an assumption of high signal-to-noise ratio, instead, using the rigorous instruments of Random Matrix Theory (RNIT) it identifies the presence of noise and establishes its boundaries. The knowledge of the structure of the spectrum gives us possibility to make more insightful projections. For instance, in the application to router network traffic, the reconstruction error procedure for anomaly detection is based on the projection of noisy part of the spectrum. Whereas, in bioinformatics application of clustering the different types of leukemia, implicit denoising of the correlation matrix is achieved by decomposing the spectrum to random and non-random parts. For temporal high dimensional data, spectrum and eigenvectors of its correlation matrix is another representation of the data. Thus, eigenvalues, components of the eigenvectors, inverse participation ratio of eigenvector components and other operators of eigen analysis are spectral features of dynamic system. In our work we proposed to extract spectral features using the RMT. We demonstrated that with extracted spectral features we can monitor the changing dynamics of network traffic. Experimenting with the delayed correlation matrices of network traffic and extracting its spectral features, we visualized the delayed processes in the system. We demonstrated in our work that broad range of applications in feature extraction can benefit from the novel RMT based approach to the spectral representation of the data

    Econometrics meets sentiment : an overview of methodology and applications

    Get PDF
    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    Feature Space Modeling for Accurate and Efficient Learning From Non-Stationary Data

    Get PDF
    A non-stationary dataset is one whose statistical properties such as the mean, variance, correlation, probability distribution, etc. change over a specific interval of time. On the contrary, a stationary dataset is one whose statistical properties remain constant over time. Apart from the volatile statistical properties, non-stationary data poses other challenges such as time and memory management due to the limitation of computational resources mostly caused by the recent advancements in data collection technologies which generate a variety of data at an alarming pace and volume. Additionally, when the collected data is complex, managing data complexity, emerging from its dimensionality and heterogeneity, can pose another challenge for effective computational learning. The problem is to enable accurate and efficient learning from non-stationary data in a continuous fashion over time while facing and managing the critical challenges of time, memory, concept change, and complexity simultaneously. Feature space modeling is one of the most effective solutions to address this problem. For non-stationary data, selecting relevant features is even more critical than stationary data due to the reduction of feature dimension which can ensure the best use a computational resource to produce higher accuracy and efficiency by data mining algorithms. In this dissertation, we investigated a variety of feature space modeling techniques to improve the overall performance of data mining algorithms. In particular, we built Relief based feature sub selection method in combination with data complexity iv analysis to improve the classification performance using ovarian cancer image data collected in a non-stationary batch mode. We also collected time series health sensor data in a streaming environment and deployed feature space transformation using Singular Value Decomposition (SVD). This led to reduced dimensionality of feature space resulting in better accuracy and efficiency produced by Density Ration Estimation Method in identifying potential change points in data over time. We have also built an unsupervised feature space modeling using matrix factorization and Lasso Regression which was successfully deployed in conjugate with Relative Density Ratio Estimation to address the botnet attacks in a non-stationary environment. Relief based feature model improved 16% accuracy of Fuzzy Forest classifier. For change detection framework, we observed 9% improvement in accuracy for PCA feature transformation. Due to the unsupervised feature selection model, for 2% and 5% malicious traffic ratio, the proposed botnet detection framework exhibited average 20% better accuracy than One Class Support Vector Machine (OSVM) and average 25% better accuracy than Autoencoder. All these results successfully demonstrate the effectives of these feature space models. The fundamental theme that repeats itself in this dissertation is about modeling efficient feature space to improve both accuracy and efficiency of selected data mining models. Every contribution in this dissertation has been subsequently and successfully employed to capitalize on those advantages to solve real-world problems. Our work bridges the concepts from multiple disciplines ineffective and surprising ways, leading to new insights, new frameworks, and ultimately to a cross-production of diverse fields like mathematics, statistics, and data mining
    corecore