19 research outputs found
NSL-BP: A Meta Classifier Model Based Prediction of Amazon Product Reviews
In machine learning, the product rating prediction based on the semantic analysis of the consumers' reviews is a relevant topic. Amazon is one of the most popular online retailers, with millions of customers purchasing and reviewing products. In the literature, many research projects work on the rating prediction of a given review. In this research project, we introduce a novel approach to enhance the accuracy of rating prediction by machine learning methods by processing the reviewed text. We trained our model by using many methods, so we propose a combined model to predict the ratings of products corresponding to a given review content. First, using k-means and LDA, we cluster the products and topics so that it will be easy to predict the ratings having the same kind of products and reviews together. We trained low, neutral, and high models based on clusters and topics of products. Then, by adopting a stacking ensemble model, we combine NaĂŻve Bayes, Logistic Regression, and SVM to predict the ratings. We will combine these models into a two-level stack. We called this newly introduced model, NSL model, and compared the prediction performance with other methods at state of the art
Drift-Aware Methodology for Anomaly Detection in Smart Grid
Energy efficiency and sustainability are important factors to address in the context of smart cities. In this sense, smart metering and nonintrusive load monitoring play a crucial role in fighting energy thefts and for optimizing the energy consumption of the home, building, city, and so forth. The estimated number of smart meters will exceed 800 million by 2020. By providing near real-time data about power consumption, smart meters can be used to analyze electricity usage trends and to point out anomalies guaranteeing companies' safety and avoiding energy wastes. In literature, there are many proposals approaching the problem of anomaly detection. Most of them are limited because they lack context and time awareness and the false positive rate is affected by the change in consumer habits. This research work focuses on the need to define anomaly detection method capable of facing the concept drift, for instance, family structure changes; a house becomes a second residence, and so forth. The proposed methodology adopts long short term memory network in order to profile and forecast the consumers' behavior based on their recent past consumptions. The continuous monitoring of the consumption prediction errors allows us to distinguish between possible anomalies and changes (drifts) in normal behavior that correspond to different error motifs. The experimental results demonstrate the suitability of the proposed framework by pointing out an anomaly in a near real-time after a training period of one week
Performance Assessment in Fingerprinting and Multi Component Quantitative NMR Analyses
An interlaboratory comparison (ILC) was organized with the aim to set up quality control indicators suitable for multicomponent quantitative analysis by nuclear magnetic resonance (NMR) spectroscopy. A total of 36 NMR data sets (corresponding to 1260 NMR spectra) were produced by 30 participants using 34 NMR spectrometers. The calibration line method was chosen for the quantification of a five-component model mixture. Results show that quantitative NMR is a robust quantification tool and that 26 out of 36 data sets resulted in statistically equivalent calibration lines for all considered NMR signals. The performance of each laboratory was assessed by means of a new performance index (named Qp-score) which is related to the difference between the experimental and the consensus values of the slope of the calibration lines. Laboratories endowed with a Qp-score falling within the suitable acceptability range are qualified to produce NMR spectra that can be considered statistically equivalent in terms of relative intensities of the signals. In addition, the specific response of nuclei to the experimental excitation/relaxation conditions was addressed by means of the parameter named NR. NR is related to the difference between the theoretical and the consensus slopes of the calibration lines and is specific for each signal produced by a well-defined set of acquisition parameters
A case study in smart manufacturing: predictive analysis of cure cycle results for a composite component
Aim: This work proposes a workflow monitoring sensor observations over time to identify and predict relevant changes or anomalies in the cure cycle (CC) industrial process. CC is a procedure developed in an autoclave consisting of applying high temperatures to provide composite materials. Knowing anomalies in advance could improve efficiency and avoid product discard due to poor quality, benefiting sustainability and the environment.Methods: The proposed workflow exploits machine learning techniques for monitoring and early validating the CC process according to the time-temperature constraints in a real industrial case study. It uses CC's data produced by the thermocouples in the autoclave along the cycle to train an LSTM model. Fast Low-cost Online Semantic Segmentation algorithm is used for better characterizing the time series of temperature. The final objective is predicting future temperatures minute by minute to forecast if the cure will satisfy the constraints of quality control or raise the alerts for eventually recovering the process.Results: Experimentation, conducted on 142 time series (of 550 measurements, on average), shows that the framework identifies invalid CCs with significant precision and recall values after the first 2 hours of the process.Conclusion: By acting as an early-alerting system for the quality control office, the proposal aims to reduce defect rates and resource usage, bringing positive environmental impacts. Moreover, the framework could be adapted to other manufacturing targets by adopting specific datasets and tuning thresholds
A Perceived Risk Index Leveraging Social Media Data: Assessing Severity of Fire on Microblogging
Fires represent a significant threat to the environment, infrastructure, and human safety, often spreading rapidly with wide-ranging consequences such as economic losses and life risks. Early detection and swift response to fire outbreaks are crucial to mitigating their impact. While satellite-based monitoring is effective, it may miss brief or indoor fires. This paper introduces a novel Perceived Risk Index (PRI) that, complementing satellite data, leverages social media data to provide insights into the severity of fire events. In the light of the results of statistical analysis, the PRI incorporates the number of fire-related tweets and the associated emotional expressions to gauge the perceived risk. The index's evaluation involves the development of a comprehensive system that collects, classifies, annotates, and correlates social media posts with satellite data, presenting the findings in an interactive dashboard. Experimental results using diverse datasets of real-fire tweets demonstrate an average best correlation of 77% between PRI and the brightness values of fires detected by satellites. This correlation extends to the real intensity of the corresponding fires, showcasing the potential of social media platforms in furnishing information for emergency response and decision-making. The proposed PRI proves to be a valuable tool for ongoing monitoring efforts, having the potential to capture data on fires missed by satellites. This contributes to the development to more effective strategies for mitigating the environmental, infrastructural, and safety impacts of fire events
Time-aware adaptive tweets ranking through deep learning
Generally, tweets about brands, news and so forth, are mostly delivered to the Twitter user in a reverse chronological order choosing among those twitted by the so-called followed users. Recently, Twitter is facing with information overload by introducing new filtering features, such as “while you are away” in order to show only a few tweets summarizing the posted ones, and ranking the tweets considering the quality, in addition to timeliness. Trivially enough we state that the strategy to rank the tweets to maximize the user engagement and, why not, augmenting the tweet and re-tweet rates, is not unique. There are several dimensions affecting the ranking, such as time, location, semantic, publisher authority, quality, and so on. We point out that the tweet ranking model should vary according to the user's context, interests and how those change along the timeline, cyclically, weekly or at specific date-time when the user logs in. In this work, we introduce a deep learning method attempting to re-adapt the ranking of the tweets by preferring those that are more likely interesting for the user. User's interests are extracted by mainly considering previous user re-tweets, replies and also the time when they occurred. We evaluate a ranking model by measuring how many tweets that will be re-tweeted in the near future were included in the top-ranked tweet list. The results of the proposed ranking model revealed good performances overcoming the methods that consider only the reverse-chronological order or user's interest score. In addition, we pointed out that in our dataset the most impacting features on the performance of proposed ranking model are: publisher authority, tweet content measures, and time-awareness
Toward reliable machine learning with Congruity: a quality measure based on formal concept analysis
The spreading of machine learning (ML) and deep learning (DL) methods in different and critical application domains, like medicine and healthcare, introduces many opportunities but raises risks and opens ethical issues, mainly attaining to the lack of transparency. This contribution deals with the lack of transparency of ML and DL models focusing on the lack of trust in predictions and decisions generated. In this sense, this paper establishes a measure, namely Congruity, to provide information about the reliability of ML/DL model results. Congruity is defined by the lattice extracted through the formal concept analysis built on the training data. It measures how much the incoming data items are close to the ones used at the training stage of the ML and DL models. The general idea is that the reliability of trained model results is highly correlated with the similarity of input data and the training set. The objective of the paper is to demonstrate the correlation between the Congruity and the well-known Accuracy of the whole ML/DL model. Experimental results reveal that the value of correlation between Congruity and Accuracy of ML model is greater than 80% by varying ML models