15,442 research outputs found

    Big Data Preprocessing for Multivariate Time Series Forecast

    Get PDF
    Big data platforms alleviate collecting and organizing large datasets of varying content. A downside of this is the heavy preprocessing required to analyze their data by conventional analysis techniques. Especially time series data is found challenging to transform from platform-provided raw format into tables of feature and target values, required by supervised machine learning models. This thesis presents an experiment of preprocessing a data-platform-extracted collection of multivariate time series and forecasting it by machine learning models such as neural networks and support vector machines. Reviewed techniques of data preprocessing and time series analysis literature are utilized, but also custom solutions such as log level-based target variable, and valuedistribution-based feature elimination are developed. No significant forecasting accuracies are achieved, which indicates the difficulty of modelling big data. The expected reason for this is the inadequate validation of model parameters and preprocessing decisions, which would require excessive testing to improve.Big data -alustat helpottavat isojen datamäärien talletusta ja hallintaa. Niiden haittapuolena on kuitenkin laaja data-analyysiin vaadittava esikäsittelyn tarve, mikäli halutaan käyttää tavanomaisia analyysimenetelmiä. Erityisen haastavaksi todetaan aikasarjojen muuntaminen alustan tarjoamasta muodosta ohjatun koneoppimisen vaatimaan taulumuotoon, koostuen ennustettavasta kohdemuuttujasta sekä muista ominaisuusmuuttujista. Tässä tutkielmassa tutkitaan usean muuttujan aikasarjadatan esikäsittelyä, sekä käsitellyn datan ennustamista koneoppimismenetelmillä, kuten neuroverkoilla ja tukivektorimallinnuksella. Tutkimusmenetelmät perustuvat kirjallisuuteen datan esikäsittelystä ja aikasarja-analyysistä, mutta myös uusia menetelmiä kehitetään, kuten lokitasoon perustuva kohdemuuttuja sekä muuttujien arvojakaumaan perustuva karsiminen. Ennustustulokset jättävät kuitenkin toivomisen varaa, mikä kertoo big datan mallinnuksen vaikeudesta. Epäiltyinä syinä ovat liian vähäinen malliparametrien ja esikäsittelyvalintojen optimointi, joiden täydentäminen vaatisi resursseihin nähden liian kattavaa testausta

    Damage identification in structural health monitoring: a brief review from its implementation to the Use of data-driven applications

    Get PDF
    The damage identification process provides relevant information about the current state of a structure under inspection, and it can be approached from two different points of view. The first approach uses data-driven algorithms, which are usually associated with the collection of data using sensors. Data are subsequently processed and analyzed. The second approach uses models to analyze information about the structure. In the latter case, the overall performance of the approach is associated with the accuracy of the model and the information that is used to define it. Although both approaches are widely used, data-driven algorithms are preferred in most cases because they afford the ability to analyze data acquired from sensors and to provide a real-time solution for decision making; however, these approaches involve high-performance processors due to the high computational cost. As a contribution to the researchers working with data-driven algorithms and applications, this work presents a brief review of data-driven algorithms for damage identification in structural health-monitoring applications. This review covers damage detection, localization, classification, extension, and prognosis, as well as the development of smart structures. The literature is systematically reviewed according to the natural steps of a structural health-monitoring system. This review also includes information on the types of sensors used as well as on the development of data-driven algorithms for damage identification.Peer ReviewedPostprint (published version
    corecore