10 research outputs found

    Sentiment analysis of health care tweets: review of the methods used.

    Get PDF
    BACKGROUND: Twitter is a microblogging service where users can send and read short 140-character messages called "tweets." There are several unstructured, free-text tweets relating to health care being shared on Twitter, which is becoming a popular area for health care research. Sentiment is a metric commonly used to investigate the positive or negative opinion within these messages. Exploring the methods used for sentiment analysis in Twitter health care research may allow us to better understand the options available for future research in this growing field. OBJECTIVE: The first objective of this study was to understand which tools would be available for sentiment analysis of Twitter health care research, by reviewing existing studies in this area and the methods they used. The second objective was to determine which method would work best in the health care settings, by analyzing how the methods were used to answer specific health care questions, their production, and how their accuracy was analyzed. METHODS: A review of the literature was conducted pertaining to Twitter and health care research, which used a quantitative method of sentiment analysis for the free-text messages (tweets). The study compared the types of tools used in each case and examined methods for tool production, tool training, and analysis of accuracy. RESULTS: A total of 12 papers studying the quantitative measurement of sentiment in the health care setting were found. More than half of these studies produced tools specifically for their research, 4 used open source tools available freely, and 2 used commercially available software. Moreover, 4 out of the 12 tools were trained using a smaller sample of the study's final data. The sentiment method was trained against, on an average, 0.45% (2816/627,024) of the total sample data. One of the 12 papers commented on the analysis of accuracy of the tool used. CONCLUSIONS: Multiple methods are used for sentiment analysis of tweets in the health care setting. These range from self-produced basic categorizations to more complex and expensive commercial software. The open source and commercial methods are developed on product reviews and generic social media messages. None of these methods have been extensively tested against a corpus of health care messages to check their accuracy. This study suggests that there is a need for an accurate and tested tool for sentiment analysis of tweets trained using a health care setting-specific corpus of manually annotated tweets first

    Exploiting Action Categories in Learning Complex Games

    Get PDF

    Potato yield prediction using machine learning techniques and Sentinel 2 data

    Get PDF
    Producción CientíficaTraditional potato growth models evidence certain limitations, such as the cost of obtaining the input data required to run the models, the lack of spatial information in some instances, or the actual quality of input data. In order to address these issues, we develop a model to predict potato yield using satellite remote sensing. In an effort to offer a good predictive model that improves the state of the art on potato precision agriculture, we use images from the twin Sentinel 2 satellites (European Space Agency—Copernicus Programme) over three growing seasons, applying different machine learning models. First, we fitted nine machine learning algorithms with various pre-processing scenarios using variables from July, August and September based on the red, red-edge and infra-red bands of the spectrum. Second, we selected the best performing models and evaluated them against independent test data. Finally, we repeated the previous two steps using only variables corresponding to July and August. Our results showed that the feature selection step proved vital during data pre-processing in order to reduce multicollinearity among predictors. The Regression Quantile Lasso model (11.67% Root Mean Square Error, RMSE; R2 = 0.88 and 9.18% Mean Absolute Error, MAE) and Leap Backwards model (10.94% RMSE, R2 = 0.89 and 8.95% MAE) performed better when predictors with a correlation coefficient > 0.5 were removed from the dataset. In contrast, the Support Vector Machine Radial (svmRadial) performed better with no feature selection method (11.7% RMSE, R2 = 0.93 and 8.64% MAE). In addition, we used a random forest model to predict potato yields in Castilla y León (Spain) 1–2 months prior to harvest, and obtained satisfactory results (11.16% RMSE, R2 = 0.89 and 8.71% MAE). These results demonstrate the suitability of our models to predict potato yields in the region studied

    Exploring the Evolution of New Mobile Services

    Get PDF

    Energy Data Analytics for Smart Meter Data

    Get PDF
    The principal advantage of smart electricity meters is their ability to transfer digitized electricity consumption data to remote processing systems. The data collected by these devices make the realization of many novel use cases possible, providing benefits to electricity providers and customers alike. This book includes 14 research articles that explore and exploit the information content of smart meter data, and provides insights into the realization of new digital solutions and services that support the transition towards a sustainable energy system. This volume has been edited by Andreas Reinhardt, head of the Energy Informatics research group at Technische Universität Clausthal, Germany, and Lucas Pereira, research fellow at Técnico Lisboa, Portugal

    Vorausschauende und reaktive Mehrzieloptimierung für die Produktionssteuerung einer Matrixproduktion

    Get PDF
    Ein immer vielfältigeres Produktionsprogramm mit unsicheren Stückzahlen macht es schwierig, Produktionssysteme wirtschaftlich zu betreiben. Verursacht die Produktindividualisierung unterschiedliche Bearbeitungszeiten an den Produktionsstationen, entstehen Taktzeitverluste. Schwankungen in den Anteilen der Produktvarianten können zudem zu dynamischen Engpässen führen. Das Konzept der Matrixproduktion verfolgt eine Flexibilisierung der Produktionsstruktur durch Auflösung der starren Verkettung, der Taktzeitbindung sowie durch den Einsatz redundanter Mehrzweckstationen. Diese Maßnahmen erlauben es der Produktionssteuerung, die Reihenfolge der Arbeitsvorgänge innerhalb der Grenzen des Vorranggraphs zu variieren und die Route jedes Auftrags anzupassen. Eine reaktive Mehrzielsteuerung ist erforderlich, um diese Freiheitsgrade zu nutzen und die unterschiedlichen Zielgrößen der Produktionssysteme zu erfüllen. Durch die Verwendung von Domänenwissen bei der Optimierung kann die Effizienz für spezifische Problem gesteigert werden. Aufgrund der Vielfalt der Produktionssysteme und Zielgrößen sollte sich die Produktionssteuerung jedoch selbstständig an den jeweiligen Anwendungsfall und die Zielgrößen anpassen können. Da die Dauern für Bearbeitungs-, Transport- und Rüstzeiten wichtige Eingangsgrößen für die Produktionssteuerung sind, wird eine Methode zur Ermittlung realistischer Werte benötigt. Aufgrund der Komplexität der Steuerungsentscheidung sind Heuristiken am besten geeignet. Insbesondere die Monte Carlo Tree Search (MCTS) als iteratives Suchbaumverfahren hat gute Eigenschaften für den Einsatz als reaktive Produktionssteuerung. Bisher fehlten jedoch Ansätze, die den Anforderungen an die Steuerung einer Matrixproduktion gerecht werden. In dieser Arbeit wird eine reaktive Mehrzielsteuerung auf Basis von MCTS für die Produktionssteuerung einer Matrixproduktion unter Berücksichtigung von Rüst- und Transportvorgängen entwickelt. Zusätzlich wird eine auf lokaler Suche basierende Post-Optimierung in den MCTS Ablauf integriert. Um schnell eine hohe Lösungsqualität für unterschiedliche Zielsetzungen und Produktionssysteme zu erreichen, werden zwei Methoden zur selbstständigen Anpassung der Produktionssteuerung entwickelt. Um die Genauigkeit der in der Produktionssteuerung verwendeten Dauern zu gewährleisten, wird eine Methode zur Ableitung und Aktualisierung der zugrunde liegenden Verteilungen vorgestellt. Die detaillierten Auswertungen anhand verschiedener Anwendungsfälle zeigen, dass die Produktionssteuerung in der Lage ist, verschiedene Ziele erfolgreich zu optimieren. Die Methoden zur selbstständigen Anpassung führen zudem zu einem schnelleren Anstieg der Lösungsgüte. Der Vergleich mit optimalen Referenzlösungen und mit Benchmark-Problemen aus der Literatur belegt ebenfalls die hohe Lösungsgüte. Die Anwendung auf ein reales Praxisbeispiel demonstriert das Verhalten der Produktionssteuerung bei Ausfällen und Abweichungen. Diese Arbeit untersucht detailliert das Verhalten der Produktionssteuerung und den Einfluss der entwickelten Methoden auf die Erreichbarkeit der unterschiedlichen Zielgrößen, den Anstieg der Lösungsgüte und die erreichte absolute Lösungsgüte

    Besoin en eau et rendements des céréales en Méditerranée du Sud : observation, prévision saisonnière et impact du changement climatique

    Get PDF
    Le secteur agricole est l'un des piliers de l'économie marocaine. En plus de contribuer à 15% au Produit Intérieur Brut (PIB) et de fournir 35% des opportunités d'emploi, il a un impact sur les taux de croissance. Ces dernières sont affectées négativement ou positivement par les conditions climatiques et la pluviométrie en particulier. Lors des années de sécheresse, caractérisées par une baisse de la production agricole, en particulier celle des céréales, la croissance de l'économie marocaine a été sévèrement affectée et les importations alimentaires du royaume ont augmenté de manière significative. Dans ce contexte, il est important d'évaluer l'impact de la sécheresse agricole sur les rendements céréaliers et de développer des modèles de prévision précoce des rendements, ainsi que de déterminer l'impact futur du changement climatique sur le rendement du blé et leurs besoins en eau. Le but de ce travail est, premièrement, d'approfondir la compréhension de la relation entre le rendement des céréales et la sécheresse agricole au Maroc. Afin de détecter la sécheresse, nous avons utilisé des indices de sécheresse agricole provenant de différentes données satellitaires. En outre, nous avons utilisé les sorties du système d'assimilation des données terrestres (LDAS). Deuxièmement, nous avons développé des modèles empiriques de la prévision précoce des rendements des céréales à l'échelle provinciale. Pour atteindre cet objectif, nous avons construit des modèles de prévision en utilisant des données multi-sources comme prédicteurs, y compris des indices basés sur la télédétection, des données météorologiques et des indices climatiques régionaux. Pour construire ces modèles, nous nous sommes appuyés sur des algorithmes de machine learning tels que : Multiple Linear Regression (MLR), Support Vector Machine (SVM), Random Forest (RF) et eXtreme Gradient Boost (XGBoost). Enfin, nous avons évalué l'impact du changement climatique sur le rendement du blé et ses besoins en eau. Pour ce faire, nous nous sommes appuyés sur cinq modèles climatiques régionaux disponibles dans la base de données Med-CORDEX sous deux scénarios RCP4.5 et RCP8.5, ainsi que sur le modèle AquaCrop et nous nous sommes basés sur trois périodes, la période de référence 1991-2010, la deuxième période 2041-2060 et la troisième période 2081-2100. Les résultats ont montré qu'il y a une corrélation étroite entre le rendement des céréales et les indices de sécheresse liés à l'état de végétation pendant le stade d'épiaison (mars et avril) et qui sont liés à la température de surface pendant le stade de développement en janvier-février, et qui sont liés à l'humidité du sol pendant le stade d'émergence en novembre-décembre. Les résultats ont également montré que les sorties du LDAS sont capables de suivre avec précision la sécheresse agricole. En ce qui concerne la prévision du rendement, les résultats ont montré que la combinaison des données provenant de sources multiples a donné des meilleurs résultats que les modèles basés sur une seule source. Dans ce contexte, le modèle XGBoost a été capable de prévoir le rendement des céréales dès le mois de janvier (environ quatre mois avant la récolte) avec des métriques statistiques satisfaisants (R² = 0.88 et RMSE = 0.22 t. ha^-1). En ce qui concerne l'impact du changement climatique sur le rendement et les besoins en eau du blé, les résultats ont montré que l'augmentation de la température de l'air entraînera un raccourcissement du cycle de croissance du blé d'environ 50 jours. Les résultats ont également montré une diminution du rendement du blé jusqu'à 30% si l'augmentation du CO2 n'est pas prise en compte. Cependant, l'effet de la fertilisation au CO2 peut compenser les pertes du rendement, et ce dernier peut augmenter jusqu'à 27%. Finalement, les besoins en eau devraient diminuer de 13 à 42%, et cette diminution est associée à une modification de calendrier d'irrigation, le pic des besoins arrivant deux mois plus tôt que dans les conditions actuelles.The agricultural sector is one of the pillars of the Moroccan economy. In addition to contributing 15% in GDP and providing 35% of employment opportunities, it has an impact on growth rates that are negatively or positively affected by climatic conditions and rainfall in particular. During drought years characterized by a decline in agricultural production and in particular cereal production, the growth of the Moroccan economy was severely affected and the kingdom's food imports increased significantly. In this context, it's important to assess the impact of agricultural drought on cereal yields and to develop early yield prediction models, as well as to determine the future impact of climate change on wheat yield and water requirements. The aim of this work is, firstly to further understand the linkage between cereal yield and agricultural drought in Morocco. In order to identify this drought, we used agricultural drought indices from remotely sensed satellite data. In addition, we used the outputs of Land Data Assimilation System (LDAS). Secondly, to develop empirical models for early prediction of cereal yields at provincial scale. To achieve this goal, we built forecasting models using multi-source data as predictors, including remote sensing-based indices, weather data and regional climate indices. And to build these models, we relied on machine learning algorithms such as Multiple Linear Regression (MLR), Support Vector Machine (SVM), Random Forest (RF) and eXtreme Gradient Boost (XGBoost). Finally, to evaluate the impact of climate change on the wheat yield its water requirements. To do this, we relied on five regional climate models available in the Med-CORDEX database under two scenarios RCP4.5 and RCP8.5, as well as the AquaCrop model and we based on three periods, the reference period 1991-2010, the second period 2041-2060 and the third period 2081-2100. The results showed that there is a close correlation between cereals yield and drought indices related to canopy condition during the heading stage (March and April) and which are related to surface temperature during the development stage in January -February, and which are related to soil moisture during the emergence stage in November -December. The results also showed that the outputs of LDAS are able to accurately monitor agricultural drought. Concerning, cereal yield forecasting, the results showed that combining data from multiple sources outperformed models based on one data set only. In this context, the XGBoost was able to predict cereal yield as early as January (about four months before harvest) with satisfactory statistical metrics (R² = 0.88 and RMSE = 0.22 t. ha^-1). Regarding the impact of climate change on wheat yield and water requirements, the results showed that the increase in air temperature will result in a shortening of the wheat growth cycle by about 50 days. The results also showed a decrease in wheat yield up to 30% if the rising in CO2 was not taken into account. The effect of fertilizing of CO2 can offset the yield losses, and yield can increase up to 27 %. Finally, water requirements are expected to decrease by 13 to 42%, and this decrease is associated with a change in temporal patterns, with the requirement peak coming two months earlier than under current conditions

    Low-resource learning in complex games

    Get PDF
    This project is concerned with learning to take decisions in complex domains, in games in particular. Previous work assumes that massive data resources are available for training, but aside from a few very popular games, this is generally not the case, and the state of the art in such circumstances is to rely extensively on hand-crafted heuristics. On the other hand, human players are able to quickly learn from only a handful of examples, exploiting specific characteristics of the learning problem to accelerate their learning process. Designing algorithms that function in a similar way is an open area of research and has many applications in today’s complex decision problems. One solution presented in this work is design learning algorithms that exploit the inherent structure of the game. Specifically, we take into account how the action space can be clustered into sets called types and exploit this characteristic to improve planning at decision time. Action types can also be leveraged to extract high-level strategies from a sparse corpus of human play, and this generates more realistic trajectories during planning, further improving performance. Another approach that proved successful is using an accurate model of the environment to reduce the complexity of the learning problem. Similar to how human players have an internal model of the world that allows them to focus on the relevant parts of the problem, we decouple learning to win from learning the rules of the game, thereby making supervised learning more data efficient. Finally, in order to handle partial observability that is usually encountered in complex games, we propose an extension to Monte Carlo Tree Search that plans in the Belief Markov Decision Process. We found that this algorithm doesn’t outperform the state of the art models on our chosen domain. Our error analysis indicates that the method struggles to handle the high uncertainty of the conditions required for the game to end. Furthermore, our relaxed belief model can cause rollouts in the belief space to be inaccurate, especially in complex games. We assess the proposed methods in an agent playing the highly complex board game Settlers of Catan. Building on previous research, our strongest agent combines planning at decision time with prior knowledge extracted from an available corpus of general human play; but unlike this prior work, our human corpus consists of only 60 games, as opposed to many thousands. Our agent defeats the current state of the art agent by a large margin, showing that the proposed modifications aid in exploiting general human play in highly complex games
    corecore