444 research outputs found
SMOTE for regression
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable
A Benchmark dataset for predictive maintenance
The paper describes the MetroPT data set, an outcome of a eXplainable
Predictive Maintenance (XPM) project with an urban metro public transportation
service in Porto, Portugal. The data was collected in 2022 that aimed to
evaluate machine learning methods for online anomaly detection and failure
prediction. By capturing several analogic sensor signals (pressure,
temperature, current consumption), digital signals (control signals, discrete
signals), and GPS information (latitude, longitude, and speed), we provide a
dataset that can be easily used to evaluate online machine learning methods.
This dataset contains some interesting characteristics and can be a good
benchmark for predictive maintenance models
Are the States United? An analysis of US hotels’ offers through TripAdvisor’s eyes
This empirical data-driven research aims to unveil thought-provoking insights on the U.S. hotel offer across its 50 states. Information of more than 30,000 hotels was collected through web scraping from TripAdvisor. Using such data, 50 support vector machine models were trained to model the TripAdvisor score, one per state, to assess the convergent and divergent factors in customer satisfaction across all the U.S. states. A conceptual model is proposed and validated through the data-driven support vector machine models developed for each state to identify convergent features across the states to explain customer satisfaction (here represented by TripAdvisor score). Hotel size, price, and stars are not moderated by the location, expressed by the corresponding state, although these highly influence satisfaction, whereas both hotel number of published photos and the amenities are affected by the location. Thus, adaptation issues were found regarding amenities and published photos within each state’s offer.info:eu-repo/semantics/acceptedVersio
Leveraging national tourist offices through data analytics
Purpose
This study aims to propose a data-driven approach, based on open-source tools, that makes it possible to understand customer satisfaction of the accommodation offer of a whole country.
Design/methodology/approach
The method starts by extracting information from all hotels of Portugal available at TripAdvisor through Web scraping. Then, a support vector machine is adopted for modeling the TripAdvisor score, which is considered a proxy of customer satisfaction. Finally, knowledge extraction from the model is achieved using sensitivity analysis to unveil the influence of features on the score.
Findings
The model of the TripAdvisor score achieved a mean absolute percentage error of around 5 per cent, proving the value of modeling the extracted data. The number of rooms of the unit and the minimum price are the two most relevant features, showing that customers appreciate smaller and more expensive units, whereas the location of the hotel does not hold significant relevance.
Originality/value
National tourist offices can use the proposed approach to understand what drives tourists’ satisfaction, helping to shape a country’s strategy. For example, licensing new hotels may take into account the unit size and other characteristics that make it more attractive to tourists. Furthermore, the procedure can be replicated at any time and in any country, making it a valuable tool for data-driven decision support on a national scale.info:eu-repo/semantics/acceptedVersio
The mediating role of self-criticism, experiential avoidance and negative urgency on the relationship between ED-related symptoms and difficulties in emotion regulation
Objective: Difficulties in emotion regulation are thought to play a transdiagnostic role across eating disorders (ED). In the current study, we explored with a path analysis the mediating role of self-criticism, experiential avoidance and negative urgency on the relationship between ED-related symptoms and dimensions of difficulties in emotion regulation. Method: Participants were 103 female outpatients recruited at a Portuguese ED hospital unit, diagnosed with an ED, aged 14–60 years old (M = 28.0, SD = 10.5), body mass index (BMI) ranging from 11.72 to 39.44 (M = 20.1, SD = 5.4). Results: The path analysis resulted in a model with an adequate fit to the data (SRMR = 0.05; RMSEA = 0.07 [0.00, 0.12], PCLOSE = 0.269; TLI = 0.97; IFI = 0.99; GFI = 0.95). A final model in which the relationship between ED-related symptoms and dimensions of difficulties in emotion regulation was mediated by self-criticism, experiential avoidance and negative urgency, accounted for a variance of 71% for strategies, 57% for non-acceptance, 62% for impulses, 56% for goals and 20% for clarity. Conclusion: Results suggest that self-criticism, experiential avoidance and negative urgency, combined, are relevant in the relationship between ED-related symptoms and difficulties in emotion regulation. ED treatment and emotion regulation skills may be enhanced through the inclusion of specific components that target self-criticism, experiential avoidance and negative urgency, as they become prominent during the therapeutic process.FCT - Fundação para a Ciência e a Tecnologia(POCI‐01‐0145‐FEDER‐028145
- …