Search CORE

239,633 research outputs found

Extremely fast decision tree mining for evolving data streams

Author: Bifet Albert
Fan Wei
He Cheng
Holmes Geoffrey
Pfahringer Bernhard
Qian Jianfeng
Zhang Jiajin
Zhang Jianfeng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Nowadays real-time industrial applications are generating a huge amount of data continuously every day. To process these large data streams, we need fast and efficient methodologies and systems. A useful feature desired for data scientists and analysts is to have easy to visualize and understand machine learning models. Decision trees are preferred in many real-time applications for this reason, and also, because combined in an ensemble, they are one of the most powerful methods in machine learning. In this paper, we present a new system called STREAMDM-C++, that implements decision trees for data streams in C++, and that has been used extensively at Huawei. Streaming decision trees adapt to changes on streams, a huge advantage since standard decision trees are built using a snapshot of data, and can not evolve over time. STREAMDM-C++ is easy to extend, and contains more powerful ensemble methods, and a more efficient and easy to use adaptive decision trees. We compare our new implementation with VFML, the current state of the art implementation in C, and show how our new system outperforms VFML in speed using less resources

Crossref

Research Commons@Waikato

Online structural damage classification methodology for offshore wind turbine foundations using data stream analysis

Author: León Medina Jersson Xavier
Parés Mariné Núria
Pozo Montero Francesc
Publication venue: University of Patras
Publication date: 01/01/2023
Field of study

Structural health monitoring (SHM) of wind turbines is crucial to improve maintenance and extend their lifespan. This study develops an online data analysis methodology using data stream analysis to classify damage in the links of an offshore wind turbine foundation. The methodology is validated using a laboratory-scaled jacket-type wind turbine foundation structure. 2460 measurements of the healthy structure were acquired, and a 5mm crack was applied to four different links to determine the four unhealthy classes. 820 measurements were taken for each of the unhealthy structures, resulting in a dataset with 5740 instances. As this is an imbalanced multiclass classification problem, a random sampler approach was used to treat the data. The only data obtained was from eight triaxial accelerometers distributed throughout the structure. Three different tree-based stream data classifiers were compared: Hoeffding Tree classifier, Extremely Fast Decision Tree classifier, and Hoeffding Adaptive Tree classifier. Each classification model underwent a tuning parameter procedure, and high values of the receiving operating characteristic area under the curve (ROC AUC) metric were achieved as a result. It is important to note that stream learning differs from batch learning.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Author: Ise Masayuki
Konishi Osamu
Minegishi Tatsuya
Niimi Ayahiko
Publication venue: IEEE SMC Hiroshima Chapter
Publication date: 01/11/2009
Field of study

Recently, because of increasing amount of data in the society, data stream mining targeting large scale data has attracted attention. The data mining is a technology of discovery new knowledge and patterns from the massive amounts of data, and what the data correspond to data stream is data stream mining. In this paper, we propose the feature selection with online decision tree. At first, we construct online type decision tree to regard credit card transaction data as data stream on data stream mining. At second, we select attributes thought to be important for detection of illegal use. We apply VFDT (Very Fast Decision Tree learner) algorithm to online type decision tree construction

Hiroshima University Institutional Repository

Okayama University Scientific Achievement Repository

Non-uniform Feature Sampling for Decision Tree Ensembles

Author: Kyrillidis Anastasios
Zouzias Anastasios
Publication venue
Publication date: 24/03/2014
Field of study

We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset:

(i)

\emph{leverage scores-based} and

(ii)

\emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm [3]Comment: 7 pages, 7 figures, 1 tabl

arXiv.org e-Print Archive

Crossref

Fast Supervised Hashing with Decision Trees for High-Dimensional Data

Author: Hengel Anton van den
Lin Guosheng
Shen Chunhua
Shi Qinfeng
Suter David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the Hamming space. Non-linear hash functions have demonstrated the advantage over linear ones due to their powerful generalization capability. In the literature, kernel functions are typically used to achieve non-linearity in hashing, which achieve encouraging retrieval performance at the price of slow evaluation and training time. Here we propose to use boosted decision trees for achieving non-linearity in hashing, which are fast to train and evaluate, hence more suitable for hashing with high dimensional data. In our approach, we first propose sub-modular formulations for the hashing binary code inference problem and an efficient GraphCut based block search method for solving large-scale inference. Then we learn hash functions by training boosted decision trees to fit the binary codes. Experiments demonstrate that our proposed method significantly outperforms most state-of-the-art methods in retrieval precision and training time. Especially for high-dimensional data, our method is orders of magnitude faster than many methods in terms of training time.Comment: Appearing in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014, Ohio, US

arXiv.org e-Print Archive

CiteSeerX

Crossref

Adelaide Research & Scholarship