65 research outputs found

    Attitudes expressed in online comments about environmental factors in the tourism sector: an exploratory study

    Get PDF
    The object of this exploratory study is to identify the positive, neutral and negative environment factors that affect users who visit Spanish hotels in order to help the hotel managers decide how to improve the quality of the services provided. To carry out the research a Sentiment Analysis was initially performed, grouping the sample of tweets (n = 14459) according to the feelings shown and then a textual analysis was used to identify the key environment factors in these feelings using the qualitative analysis software Nvivo (QSR International, Melbourne, Australia). The results of the exploratory study present the key environment factors that affect the users experience when visiting hotels in Spain, such as actions that support local traditions and products, the maintenance of rural areas respecting the local environment and nature, or respecting air quality in the areas where hotels have facilities and offer services. The conclusions of the research can help hotels improve their services and the impact on the environment, as well as improving the visitors experience based on the positive, neutral and negative environment factors which the visitors themselves identified

    Identifying the urban space for locals and tourists through “Foursquare” data in Barcelona

    Full text link

    Social search in collaborative tagging networks : the role of ties

    Get PDF
    [no abstract

    Can we predict a riot? Disruptive event detection using Twitter

    Get PDF
    In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook, and YouTube. In these highly interactive systems, the general public are able to post real-time reactions to “real world” events, thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly small-scale incidents, using streamed data is a non-trivial task but would be of high value to public safety organisations such as local police, who need to respond accordingly. To address this challenge, we present an end-to-end integrated event detection framework that comprises five main components: data collection, pre-processing, classification, online clustering, and summarization. The integration between classification and clustering enables events to be detected, as well as related smaller-scale “disruptive events,” smaller incidents that threaten social safety and security or could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely temporal, spatial, and textual content. We evaluate our framework on a large-scale, real-world dataset from Twitter. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We use ground-truth data based on intelligence gathered by the London Metropolitan Police Service, which provides a record of actual terrestrial events and incidents during the riots, and show that our system can perform as well as terrestrial sources, and even better in some cases

    Identificación del espacio urbano por residentes y turistas, a través de datos de “Foursquare” en Barcelona

    Get PDF
    Barcelona is an important touristic city in the world. According to Annual Report of Tourism of Barcelona (2014), more than 7.5 million tourists visited here in that year. The studies related to tourism of Barcelona are numerous; however, the comparison of activities and land uses between tourists and locals is scarcely analyzed. In fact, tourism may be a dominant factor of urban development as well as a source of social conflict. Therefore, it is crucial to understand the co-living situation of tourists and residents in a touristic city. The main objective of the study is to identify touristic users and local users through their Foursquare behaviors. Furthermore, it explores the difference of geospatial activities and POIs’ usages between the two groups. The analytical period is from April of 2012 to September of 2013, based on the monitoring span of Foursquare data. After filtration, the total check-ins during this period is 80,936 coming from 4,250 Foursquare users. The POIs of Foursquare are 13,887 in Barcelona. The geographic range of data roughly covers the central conurbation of the Metropolitan area of Barcelona. The methodology includes four parts. The first step is to select indicators of behavior and standardization. The second step consists of selecting two short-period samples and classifying them into tourists and locals by K-means clustering. After the manual examination of the initial result, a threshold of classification is introduced to improve the result. Finally, the same method of identification is applied to the whole dataset. According to the result, the difference of POI usages verifies that the identification is effective. It reflects the typical activities of tourists and locals separately in the city. The most visited POIs of tourists are: outdoor resorts, transport, restaurants, hotel, and store. The corresponding rank of locals is restaurants, workplaces, outdoor resorts, educational places, and transport. Moreover, the two groups appear different Foursquare behaviors, regardless of the length of analyzing period. In general, behaviors of tourists -- the stay duration, number of check-ins, and total travel distance, are smaller than the local group. K-means clustering can effectively identify users who possess the extreme values of attributes. However, it is unavoidable to introduce artificial intervention for users without extreme-characteristics. Besides, the geospatial distribution and active time also embody differences between locals and tourists. In terms of movement scale, tourists seem more concentrated than the residents. With regard to the active time, tourists’ active period is similar every day. On the contrary, locals show an evident periodic variation daily and weekly. It is undeniable that this paper has several limitations. Firstly, Foursquare data has bias. The high proportion of check-ins is restaurants because Foursquare aims to provide practical information about places for users. What’s more, the lack of demographic information of users also limits the scope of the study, due to the privacy policy. In sum, this study demonstrates that it is possible to distinguish tourists from locals via Foursquare data, though the uncertainty of data is recognized. How to improve the accuracy of the unsupervised identification and cooperate with other datasets will be the object of further investigation. Furthermore, whether the identification model can be universally applied is another issue that is worth to test in the future.Barcelona es una importante ciudad turística en el mundo. Según el Informe Anual de Turismo de Barcelona (2014), más de 7,5 millones de turistas la visitaron este año. Los estudios relacionados con el turismo en Barcelona son numerosos, sin embargo, la comparación de actividades y usos del espacio entre turistas y residentes es poco analizada. De hecho, el turismo puede ser un factor dominante del desarrollo urbano, así como una fuente de conflicto social. Por lo tanto, es crucial comprender la situación de convivencia de turistas y residentes en una ciudad turística. El objetivo principal del estudio es identificar usuarios turísticos y usuarios locales a través de sus comportamientos de Foursquare. Además, explora la diferencia entre las actividades geoespaciales y los usos de los puntos de interés (POIs) entre los dos grupos. El período analizado abarca desde abril de 2012 a septiembre de 2013, según el intervalo de monitoreo de los datos de Foursquare. Después de la filtración, el total de los registros durante este período son 80,936 provenientes de 4,250 usuarios de Foursquare. Los POIs de Foursquare son 13,887 en Barcelona. El rango geográfico de los datos cubre aproximadamente la conurbación central del área metropolitana de Barcelona. La metodología incluye cuatro partes. El primer paso es seleccionar indicadores de comportamiento y estandarización. El segundo paso consiste en seleccionar dos muestras de corto período y clasificarlas en turistas y locales por agrupación de K-means. Después del examen manual del resultado inicial, se introduce un umbral de clasificación para mejorar el resultado. Finalmente, el mismo método de identificación se aplica a todo el conjunto de datos. De acuerdo con el resultado, la diferencia de uso de POIs verifica que la identificación sea efectiva, reflejando las actividades típicas de turistas y residentes por separado en la ciudad. Los POIs más visitados de los turistas son: complejos turísticos al aire libre, transporte, restaurantes, hoteles y tiendas. El rango correspondiente de los residentes es: restaurantes, lugares de trabajo, centros turísticos al aire libre, lugares educativos y transporte. Además, independientemente de la duración del período de análisis, los dos grupos tienen diferentes comportamientos de Foursquare. En general, los comportamientos de los turistas: la duración de la estadía, el número de registros y la distancia total de viaje son menores que los del grupo de locales. El cluster de K-means puede identificar efectivamente a los usuarios que poseen los valores extremos de los atributos. Sin embargo, es inevitable introducir una intervención artificial para usuarios sin características extremas. Además, la distribución geoespacial y el tiempo activo también representan diferencias entre los lugareños y los turistas. En términos de escala de movimiento, los turistas parecen más concentrados que los residentes. Con respecto al tiempo activo, el período activo de los turistas es similar todos los días. Por el contrario, los residentes muestran una evidente variación periódica diaria y semanal. Es innegable que este trabajo presenta limitaciones. En primer lugar, los datos de Foursquare tienen sesgo. La alta proporción de check-ins en restaurantes es producto de que Foursquare tiene como objetivo proporcionar información práctica sobre los lugares para los usuarios. Además, la falta de información demográfica de los usuarios también limita el alcance del estudio, debido a su política de privacidad. En resumen, este estudio demuestra que es posible distinguir a los turistas de los residentes a través de los datos de Foursquare, aunque se reconoce la incertidumbre de los datos. Cómo mejorar la precisión de la identificación no supervisada y cooperar con otros conjuntos de datos será objeto de investigación adicional. Además, si el modelo de identificación puede aplicarse universalmente es otro tema que vale la pena probar en el futuro

    Event detection in social networks

    Get PDF

    Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context

    Full text link
    [EN] The majority of datasets suffer from class imbalance where samples of a dominant class significantly outnumber the samples available for the minority class that is to be detected. Prediction and classification machine learning models work best when there are roughly equal numbers of each class type. This paper explores sampling techniques that can be used to overcome this class imbalance problem in a cyberbullying context. A newly classified cyberbullying dataset, including detailed descriptions of the criteria used in its classification, was used to examine the feasibility of applying text mining techniques, to automate the detection of cyberbullying text when the dataset shows a significant class imbalance between the positive, cyberbullying, sample and the negative, not cyberbullying, samples. In this paper, we will investigate if oversampling the minority positive class or undersampling the majority negative class affects the performance of a prediction model. A compromise solution where the positive class is partially oversampled, and the negative class is partially undersampled is also examined. Although not strictly a class imbalance solution, sampling using the most frequently observed features was also explored.Colton, D.; Hofmann, M. (2019). Sampling Techniques to Overcome Class Imbalance in a Cyberbullying Context. Journal of Computer-Assisted Linguistic Research. 3(3):21-40. https://doi.org/10.4995/jclr.2019.11112SWORD214033Cardie, Claire. 1997. "Improving minority class prediction using case-specific feature weights." Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann. 57-65.Chan, Philip K., and Salvatore J. Stolfo. 1998. "Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection." In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. AAAI Press. 164-168.Chawla, Nitesh V. and Bowyer, Kevin W. and Hall, Lawrence O. and Kegelmeyer, W. Philip. 2002. "SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research. 321-357. https://doi.org/10.1613/jair.953Chen, Ying, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. "Detecting Offensive Language in Social Media to Protect Adolescent Online Safety." Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom). IEEE. 71-80. https://doi.org/10.1109/SocialCom-PASSAT.2012.55Cionnaith, Fiachra Ó. 2012. Third suicide in weeks linked to cyberbullying. Accessed 03 14, 2019. http://www.irishexaminer.com/ireland/third-suicide-in-weeks-linked-to-cyberbullying-212271.html.Dadvar, M. , F. M. G. de Jong, R. J. F. Ordelman, and R. B. Trieschnigg. 2012. "Improved cyberbullying detection using gender information." https://doi.org/10.1007/978-3-642-36973-5_62Dadvar, Maral, Dolf Trieschnigg, Roeland Ordelman, and Franciska de Jong. 2013. "Improving Cyberbullying Detection with User Context." In Lecture Notes in Computer Science, 693-696. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_62Dadvar, Maral, Roeland Ordelman, Franciska de Jong, and Dolf Trieschnigg. 2012. "Towards User Modelling in the Combat against Cyberbullying." Lecture Notes in Computer Science, 277-283. https://doi.org/10.1007/978-3-642-31178-9_34Dinakar, Karthik, Roi Reichart, and Henry Lieberman. 2011. "Modeling the Detection of Textual Cyberbullying." The Social Mobile Web, Papers from the 2011 ICWSM Workshop, Barcelona, Catalonia, Spain, July 21, 2011. Association for the Advancement of Artificial Intelligence.FBM, Fundación Barcelona Media. 2009. CAW 2.0 Training Datasets. Barcelona.García, Vicente, José Sánchez, Mollineda R.A, Roberto Alejo, and José Sotoca. 2007. "The class imbalance problem in pattern classification and learning." II Congreso Español de Informática.Kontostathis, April, Kelly Reynolds, Andy Garron, and Lynne Edwards. 2013. "Detecting Cyberbullying: Query Terms and Techniques." Proceedings of the 5th Annual ACM Web Science Conference. New York: ACM. 195-204. https://doi.org/10.1145/2464464.2464499Kontostathis, April, Lynne Edwards, and Amanda Leatherman. 2009. "ChatCoder: Toward the Tracking and Categorization of Internet Predators." Proc. Text Mining Workshop 2009 Held In Conjunction With The Ninth Siam International Conference On Data Mining (Sdm 2009). Sparks, Nv. May 2009.Kubat, Miroslav, and Stan Matwin. 1997. "Addressing the Curse of Imbalanced Training Sets: One-Sided Selection." Proceedings of the Fourteenth International Conference on Machine Learning.Morgan Kaufmann. 179-186.Nahar, Vinita, Xue Li, and Chaoyi Pang. 2013. "A step towards combating cyberbullying: Automated detection."Nahar, Vinita, Xue Li, and Chaoyi Pang. 2013. "An Effective Approach for Cyberbullying Detection." Communications in Information Science and Management Engineering. 238-247.Quinlan, J. Ross. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers IncReynolds, K., A. Kontostathis, and L. Edwards. 2011. "Using Machine Learning to Detect Cyberbullying." 2011 10th International Conference on Machine Learning and Applications and Workshops (ICMLA). Honolulu. 241-244. https://doi.org/10.1109/ICMLA.2011.152Riegel, Ralph. 2013. Cyber-bullies claimed lives of five teens. 25 01. Accessed 03 14, 2019. http://www.herald.ie/news/cyberbullies-claimed-lives-of-five-teens-29043544.html.RuleQuest Research. n.d. Data Mining Tools See5 and C5.0. Accessed 03 2013. https://www.rulequest.com/see5-info.html.Smith-Spark, Laura. 2013. Hanna Smith suicide fuels calls for action on Ask.fm cyberbullying. 09 08. Accessed 03 14, 2019. http://www.cnn.com/2013/08/07/world/europe/uk-social-media-bullying/index.html.U.S. Department of Health and Human Services. 2018. What Is Bullying. 26 06. Accessed 03 31, 2019. https://www.stopbullying.gov/what-is-bullying/index.html.Weiss, Gary, Kate McCarthy, and Bibi Zabar. 2007. "Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?" Proceedings of the 2007 International Conference on Data Mining, DMIN 2007. Las Vegas: CSREA Press. 35-41.Xu, Jun-Ming, Kwang-Sung Jun, Xiaojin Zhu, and Amy Bellmore. 2012. "Learning from Bullying Traces in Social Media." Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: Association for Computational Linguistics. 656-666.Xu, Jun-Ming, Xiaojin Zhu, and Amy Bellmore. 2012. "Fast Learning for Sentiment Analysis on Bullying." Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining. Beijing: ACM. 10:1-10:6. https://doi.org/10.1145/2346676.2346686Yin, Dawei, Brian Davison, Zhenzhen Xue, Liangjie Hong, April Kontostathis, and Lynne Edwards. 2009. "Detection of Harassment on Web 2.0." Proceedings of the Content Analysis in the WEB. 1-7
    corecore