Search CORE

4 research outputs found

Data normalization in machine learning

Author: Golub Y. I.
Starovoitov V. V.
Голуб Ю. И.
Старовойтов В. В.
Publication venue: 'United Institute of Informatics Problems of the National Academy of Sciences of Belarus'
Publication date: 01/01/2021
Field of study

В задачах машинного обучения исходные данные часто заданы в разных единицах измерения и типах шкал. Такие данные следует преобразовывать в единое представление путем их нормализации или стандартизации. В работе показана разница между этими операциями. Систематизированы основные типы шкал, операции над данными, представленными в этих шкалах, и основные варианты нормализации функций. Предложена новая шкала частей и приведены примеры использования нормализации данных для их более корректного анализа. На сегодняшний день универсального метода нормализации данных, превосходящего другие методы, не существует, но нормализация исходных данных позволяет повысить точность их классификации. Кластеризацию данных методами, использующими функции расстояния, лучше выполнять после преобразования всех признаков в единую шкалу. Результаты классификации и кластеризации разными методами можно сравнивать различными оценочными функциями, которые зачастую имеют разные диапазоны значений. Для выбора наиболее точной функции можно выполнить нормализацию нескольких из них и сравнить оценки в единой шкале. Правила разделения признаков древовидных классификаторов инвариантны к шкалам количественных признаков. Они используют только операцию сравнения. Возможно, благодаря этому свойству классификатор типа «случайный лес» в результате многочисленных экспериментов признан одним из лучших при анализе данных разной природы. In machine learning, the input data is often given in different dimensions. As a result of the scientific papers review, it is shown that the initial data described in different types of scales and units of measurement should be converted into a single representation by normalization or standardization. The difference between these operations is shown. The paper systematizes the basic operations presented in these scales, as well as the main variants of the function normalization. A new scale of parts is suggested and examples of the data normalization for correct analysis are given. Analysis of publications has shown that there is no universal method of data normalization, but normalization of the initial data makes it possible to increase the accuracy of their classification. It is better to perform data clustering by methods using distance functions after converting all features into a single scale. The results of classification and clustering by different methods can be compared with different scoring functions, which often have different ranges of values. To select the most accurate function, it is reasonable to normalize several functions and to compare their estimates on a single scale. The rules for separating features of tree-like classifiers are invariant to scales of quantitative features. Only comparison operation is used. Perhaps due to this property, the random forest classifier, as a result of numerous experiments, is recognized as one of the best classifiers in the analysis of data of different nature

Belarusian State University of Informatics and Radioelectronics Repository

Нормализация данных в машинном обучении

Author: V. V. Starovoitov
Yu. I. Golub
В. В. Старовойтов
Ю. И. Голуб
Publication venue: 'United Institute of Informatics Problems of the National Academy of Sciences of Belarus'
Publication date: 30/09/2021
Field of study

In machine learning, the input data is often given in different dimensions. As a result of the scientific papers review, it is shown that the initial data described in different types of scales and units of measurement should be converted into a single representation by normalization or standardization. The difference between these operations is shown. The paper systematizes the basic operations presented in these scales, as well as the main variants of the function normalization. A new scale of parts is suggested and examples of the data normalization for correct analysis are given. Analysis of publications has shown that there is no universal method of data normalization, but normalization of the initial data makes it possible to increase the accuracy of their classification. It is better to perform data clustering by methods using distance functions after converting all features into a single scale. The results of classification and clustering by different methods can be compared with different scoring functions, which often have different ranges of values. To select the most accurate function, it is reasonable to normalize several functions and to compare their estimates on a single scale. The rules for separating features of tree-like classifiers are invariant to scales of quantitative features. Only comparison operation is used. Perhaps due to this property, the random forest classifier, as a result of numerous experiments, is recognized as one of the best classifiers in the analysis of data of different nature.В задачах машинного обучения исходные данные часто заданы в разных единицах измерения и типах шкал. Такие данные следует преобразовывать в единое представление путем их нормализации или стандартизации. В работе показана разница между этими операциями. Систематизированы основные типы шкал, операции над данными, представленными в этих шкалах, и основные варианты нормализации функций. Предложена новая шкала частей и приведены примеры использования нормализации данных для их более корректного анализа.На сегодняшний день универсального метода нормализации данных, превосходящего другие методы, не существует, но нормализация исходных данных позволяет повысить точность их классификации. Кластеризацию данных методами, использующими функции расстояния, лучше выполнять после преобразования всех признаков в единую шкалу.Результаты классификации и кластеризации разными методами можно сравнивать различными оценочными функциями, которые зачастую имеют разные диапазоны значений. Для выбора наиболее точной функции можно выполнить нормализацию нескольких из них и сравнить оценки в единой шкале.Правила разделения признаков древовидных классификаторов инвариантны к шкалам количественных признаков. Они используют только операцию сравнения. Возможно, благодаря этому свойству классификатор типа «случайный лес» в результате многочисленных экспериментов признан одним из лучших при анализе данных разной природы

Informatics (E-Journal) / Информатика

Recurrent Neural Network Dual Resistance Control of Multiple Memory Shape Memory Alloys

Author: Ruvinov Igor
Publication venue: 'University of Waterloo'
Publication date: 15/08/2018
Field of study

Shape memory alloys (SMAs) are materials with extraordinary thermomechanical properties which have caused numerous engineering advances. NiTi SMAs in particular have been studied for decades revealing many useful characteristics relative to other SMA compositions. Their application has correspondingly been widespread, seeing use in the robotics, automotive, and aerospace industries, among others. Nevertheless, several limitations inherent to SMAs exist which inhibit their applicability, including their inherent single transformation temperature and their complex hysteretic actuation behaviour. To overcome the former challenge, one method utilizes high energy laser processing to perform localized vaporization of nickel and accurately adjust its transformation temperatures. This method can reliably produce NiTi SMAs with multiple monolithic transformation memories. There have also been attempts to overcome the latter of the aforementioned challenges by designing systems which model NiTi's hysteretic behaviour. When applied to actuators with a single transformation memory, these methods require the use of external sensors for modeling actuators with varying current and load, driving up the cost, weight, and complexity of the actuator. Embedding a second transformation memory with different phase into NiTi actuators can overcome this issue. By measuring electrical resistance across the two phases, sufficient information can be extracted for differentiating events caused by heating from those caused by applied load. The current study examines NiTi wires with two embedded transformation memories and utilizes recurrent neural networks for interpreting the sensed data. The knowledge gained through this study was used to create a recurrent neural network-based model which can accurately estimate the position and force applied to the NiTi actuator without the use of external sensors. The first part of the research focused on obtaining a comprehensive thermomechanical characterization of laser processed and thermomechanically post-processed NiTi wires with two embedded transformation memories, with one memory exhibiting full SME and the second partial PE at room temperature. A second objective of this section was to acquire cycling data from the processed wires which would be used for training the artificial neural networks in the following section of the study. The selected laser processing and post-processing parameters resulted in a transformation temperature increase of 61.5°C and 35.3°C for Af and Ms, respectively, relative to base metal. Furthermore, the post-processing was found to successfully restore the majority of the lost mechanical properties, with the ultimate tensile strength recovered to 84% of its corresponding base metal value. This research resulted in the fabrication of NiTi wires with two distinct embedded transformation memories, exhibiting sufficient mechanical and cyclic properties for the next phase of the research. Once an acceptable amount of NiTi actuation cycling data was acquired, the second part of the research consisted of training multiple recurrent neural network architectures with varying hyperparameters on the data and selecting the model which achieved the best performance. The hyperparameter optimization was performed on data with constant applied load, resulting in a model which successfully estimated the actuator's position with 99.2% accuracy. The optimized hyperparameters were then used to create a recurrent neural network model which was trained to estimate both position and force using the full acquired data set, capitalizing on the two embedded memories. The model achieved overall position and force estimation accuracy of 98.5% and 96.0%, respectively, on data used to train it, and 96.6% and 89.8%, respectively, on data it had never before encountered. The result of this study was the successful development of an accurate RNN-based position and force estimation model for NiTi actuators with two embedded phases. Using this model, a position controller was implemented which resulted in 95.9% position accuracy under varying applied loads

University of Waterloo's Institutional Repository