7 research outputs found

    Perceptually Important Points-Based Data Aggregation Method for Wireless Sensor Networks

    Get PDF
    يستهلك إرسال واستقبال البيانات معظم الموارد في شبكات الاستشعار اللاسلكية (WSNs). تعد الطاقة التي توفرها البطارية أهم مورد يؤثر على عمر WSN في عقدة المستشعر. لذلك، نظرًا لأن عُقد المستشعر تعمل بالاعتماد على بطاريتها المحدودة ، فإن توفير الطاقة ضروري. يمكن تعريف تجميع البيانات كإجراء مطبق للقضاء على عمليات الإرسال الزائدة عن الحاجة ، ويوفر معلومات مدمجة إلى المحطات الأساسية ، مما يؤدي بدوره إلى تحسين فعالية الطاقة وزيادة عمر الشبكات اللاسلكية ذات للطاقة المحدودة. في هذا البحث ، تم اقتراح طريقة تجميع البيانات المستندة إلى النقاط المهمة إدراكيًا (PIP-DA) لشبكات المستشعرات اللاسلكية لتقليل البيانات الزائدة عن الحاجة قبل إرسالها إلى المحطة الاساسية. من خلال استخدام مجموعة بيانات Intel Berkeley Research Lab (IBRL) ، تم قياس كفاءة الطريقة المقترحة. توضح النتائج التجريبية فوائد الطريقة المقترحة حيث تعمل على تقليل الحمل على مستوى عقدة الاستشعار حتى 1.25٪ في البيانات المتبقية وتقليل استهلاك الطاقة حتى 93٪ مقارنة ببروتوكولات PFF و ATP.The transmitting and receiving of data consume the most resources in Wireless Sensor Networks (WSNs). The energy supplied by the battery is the most important resource impacting WSN's lifespan in the sensor node. Therefore, because sensor nodes run from their limited battery, energy-saving is necessary. Data aggregation can be defined as a procedure applied for the elimination of redundant transmissions, and it provides fused information to the base stations, which in turn improves the energy effectiveness and increases the lifespan of energy-constrained WSNs. In this paper, a Perceptually Important Points Based Data Aggregation (PIP-DA) method for Wireless Sensor Networks is suggested to reduce redundant data before sending them to the sink. By utilizing Intel Berkeley Research Lab (IBRL) dataset, the efficiency of the proposed method was measured. The experimental findings illustrate the benefits of the proposed method as it reduces the overhead on the sensor node level up to 1.25% in remaining data and reduces the energy consumption up to 93% compared to prefix frequency filtering (PFF) and ATP protocols

    A prediction scheme using perceptually important points and dynamic time warping

    Get PDF
    An algorithmic method for assessing statistically the efficient market hypothesis (EMH) is developed based on two data mining tools, perceptually important points (PIPs) used to dynamically segment price series into subsequences, and dynamic time warping (DTW) used to find similar historical subsequences. Then predictions are made from the mappings of the most similar subsequences, and the prediction error statistic is used for the EMH assessment. The predictions are assessed on simulated price paths composed of stochastic trend and chaotic deterministic time series, and real financial data of 18 world equity markets and the GBP/USD exchange rate. The main results establish that the proposed algorithm can capture the deterministic structure in simulated series, confirm the validity of EMH on the examined equity indices, and indicate that prediction of the exchange rates using PIPs and DTW could beat at cases the prediction of last available price

    Développement d’un outil de segmentation des comportements d’achat des clients en se basant sur leurs données morphologiques

    Get PDF
    RÉSUMÉ : L’analyse des marchés commerciaux est actuellement un processus aussi bien scientifique qu’industriel. Il consiste à recueillir et explorer des informations reliées aux clients en vue de mieux comprendre leurs comportements, habitudes et intérêts. Cette analyse est fortement utilisée par les entreprises afin de les guider dans leurs décisions opérationnelles et stratégiques. Ce mémoire présente un des outils d’analyse des marchés les plus utilisés dans la littérature, qui est la segmentation. Cette approche vise à diviser un ensemble d’individus hétérogènes, en groupes plus homogènes, en se basant sur des critères prédéterminés. La segmentation des marchés est employée dans plusieurs domaines, et vise à diviser un ensemble de clients en plus petits groupes ayant des comportements similaires. Elle peut être employée pour l’amélioration de l’impact et des revenus des produits et services existants ou pour préparer l’introduction de nouveaux produits sur les marchés. Dans notre cas, la segmentation des clients est basée sur l’évolution de tailles des vêtements qu’ils commandent, ou ce qu’on appellera leurs données morphologiques. Les clients font leurs achats sur une plateforme d’achat en ligne fournie par notre partenaire industriel. Notre partenaire est une entreprise qui se charge de la sous-traitance des programmes d’uniformes pour ses clients. Dans ce travail, nous avons utilisé des méthodes de « data mining » pour analyser l’historique des données d’achat d’un des clients de notre partenaire pour une durée d’étude déterminée. Les analyses sont par la suite concentrées sur le type de vêtements le plus commandé, soit les chemises. Des séries chronologiques des tailles commandées pour chaque client sont construites. Ces séries chronologiques permettent d’étudier l’évolution des tailles dans le temps. La segmentation est par la suite appliquée sur ces séries chronologiques afin d’obtenir des groupes ayant des comportements similaires. La similarité entre les clients est basée sur une métrique appelée le Dynamic Time Warping. Cette distance a été choisie parce qu’elle est la plus adaptée pour comparer les séries temporelles en se basant sur leurs formes et en ne tenant pas compte du décalage sur l’axe temporel. Plusieurs tests basés sur différentes méthodes et algorithmes ont eu lieu. Aussi plusieurs structures et variantes des séries temporelles ont été testées. Les séries temporelles retenues sont des séries exprimées en fonction de la variation des mensurations dans le temps. Elles ont été normalisées avec la norme z. Deux segmentations avec des nombres de groupes égaux à six et à dix ont été réalisées. Finalement, une évaluation et une analyse des résultats de la segmentation ont été effectuées pour valider les résultats recueillis. En premier lieu, l’évaluation s’est basée sur la visualisation des groupes obtenus et sur le critère de silhouette. Ce critère permet de mesurer la qualité d’une segmentation donnée, en se basant sur l’homogénéité intragroupe et l’hétérogénéité inter-groupe. Nous avons trouvé que la segmentation en dix groupes a permis d’avoir de meilleurs groupes. Dans un second temps, pour chacun des cas, l’analyse des groupes nous a permis de comprendre la structure et les caractéristiques de ces derniers. Nous avons trouvé que la majorité des clients ont des courbes d’évolution croissante. Nous avons aussi constaté que les groupes ayant des courbes d’évolution stables, ont des pourcentages de retour relativement élevés par rapport à leurs pourcentages d’achat. La répartition des hommes et des femmes dans les groupes est semblable à celle du groupe de clients initial. Cependant, les groupes ayant un pourcentage de femme plus important que les autres ont des moyennes de tailles plus petites.----------ABSTRACT : The analysis of commercial markets is currently a scientific as well as an industrial process. It consists of collecting and exploring information related to clients in order to better understand their behaviours, habits and interests. This analysis is widely used by companies to guide them in their operational and strategic decisions. This paper presents one of the most widely used market analysis tools in the literature, namely segmentation. This approach aims to divide a set of heterogeneous individuals into more homogeneous groups, based on predetermined criteria. Market segmentation is used in several areas and aims to divide a set of customers into smaller groups with similar behaviors. It can be used to improve the impact and revenues of existing products and services or to prepare for the introduction of new products to markets. In our case, customer segmentation is based on the size evolution of the clothes they order, or what we will call their morphological data. Customers make their purchases on an online shopping platform provided by our partner. Our partner is a company that subcontracts uniform programs for its customers. In this work, we used data mining methods to analyze the purchase data history of one of our partner’s customers for a specific study period. The analyses are then focused on the most ordered type of clothing, namely shirts. Time series of the sizes ordered for each customer are built. These time series make it possible to study the evolution of sizes over time. Segmentation is then applied to these time series to obtain groups with similar behaviors. The similarity between customers is based on a metric called Dynamic Time Warping. This distance was chosen because it is the most suitable for comparing time series based on their shapes and ignoring the time axis offset. Several tests based on different methods and algorithms were carried out. Also, several structures and variants of the time series were tested. The time series used are series expressed as a function of the variation in measurements over time, and which were normalized with the z-norm. Two segmentations with numbers of groups equal to six and ten were performed. Finally, an evaluation and analysis of the segmentation results was carried out to validate the results collected. First, the evaluation was based on the visualization of the groups obtained and on the silhouette criterion. This criterion makes it possible to measure the quality of a given segmentation, based on intra-group homogeneity and inter-group heterogeneity. We found that the segmentation into ten groups allowed us to have better groups. This segmentation improved the silhouette index obtained with six groups. In a second step, for each case, the analysis of the groups allowed us to understand their structure and characteristics. We found that most clients have increasing trend curves. We also found that groups with stable evolution curves have relatively high return rates compared to their purchase rates. The distribution of men and women in the groups is similar to that of the original client group. However, groups with a higher percentage of women were found to have smaller average sizes

    Features Extraction from Time Series

    Get PDF
    Time series can be found in various domains like medicine, engineering, and finance. Generally speaking, a time series is a sequence of data that represents recorded values of a phenomenon over time. This thesis studies time series mining, including transformation and distance measure, anomaly or anomalies detection, clustering and remaining useful life estimation. In the course of the first mining task (transformation and distance measure), in order to increase the accuracy of distance measure between transformed series (symbolic series), we introduce a novel calculation of distance between symbols. By integrating this newly defined method to symbolic aggregate approximation and its extensions, the experimental results show this proposed method is promising. During the process of the second mining task (anomaly or anomalies detection), for the purpose of improving the accuracy of anomaly or anomalies detection, we propose a distance measure method and an anomalies detection calculation. These proposed methods, together with previous published anomaly detection methods, are applied to real ECG data selected from MIT-BIH database. The experimental results show that our proposed outperforms other methods. During the course of the third mining task (clustering), we present an automatic clustering method, called AT-means, which can automatically carry out clustering for a given time series dataset: from the calculation of global average time series to the setting of initial centres and the determination of the number of clusters. The performance of the proposed method was tested on 10 benchmark time series datasets obtained from UCR database. For comparison, the K-means method with three different conditions are also applied to the same datasets. The experimental results show the proposed method outperforms the compared K-means approaches. During the process of the fourth mining task (remaining useful life estimation), all the original data are transformed into low-dimensional space through principal components analysis. We then proposed a novel multidimensional time series distance measure method, called as multivariate time series warping distance (MTWD), for remaining useful life estimation. This whole process is tested on the CMAPSS (Commercial Modular Aero Propulsion System Simulation) datasets and the performance is compared with two existing methods. The experimental results show that the estimated remaining useful life (RUL) values are closer to real RUL values when compared with the comparison methods. Our work contributes to the time series mining by introducing novel approaches to distance measure, anomalies detection, clustering and RUL estimation. We furthermore apply our proposed methods and related methods to benchmark datasets. The experimental results show that our methods are better than previously published methods in terms of accuracy and efficiency
    corecore