515 research outputs found

    The role of emotional variables in the classification and prediction of collective social dynamics

    Full text link
    We demonstrate the power of data mining techniques for the analysis of collective social dynamics within British Tweets during the Olympic Games 2012. The classification accuracy of online activities related to the successes of British athletes significantly improved when emotional components of tweets were taken into account, but employing emotional variables for activity prediction decreased the classifiers' quality. The approach could be easily adopted for any prediction or classification study with a set of problem-specific variables.Comment: 16 pages, 9 figures, 2 tables and 1 appendi

    COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts

    Get PDF
    © 2020 The Authors. Published by MIT Press. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.1162/qss_a_00066The COVID-19 pandemic requires a fast response from researchers to help address biological, medical and public health issues to minimize its impact. In this rapidly evolving context, scholars, professionals and the public may need to quickly identify important new studies. In response, this paper assesses the coverage of scholarly databases and impact indicators during 21 March to 18 April 2020. The rapidly increasing volume of research, is particularly accessible through Dimensions, and less through Scopus, the Web of Science, and PubMed. Google Scholar’s results included many false matches. A few COVID-19 papers from the 21,395 in Dimensions were already highly cited, with substantial news and social media attention. For this topic, in contrast to previous studies, there seems to be a high degree of convergence between articles shared in the social web and citation counts, at least in the short term. In particular, articles that are extensively tweeted on the day first indexed are likely to be highly read and relatively highly cited three weeks later. Researchers needing wide scope literature searches (rather than health focused PubMed or medRxiv searches) should start with Dimensions (or Google Scholar) and can use tweet and Mendeley reader counts as indicators of likely importance

    Early Mendeley readers correlate with later citation counts

    Get PDF
    This is an accepted manuscript of an article published by Springer in Scientometrics on 26/03/2018, available online: https://doi.org/10.1007/s11192-018-2715-9 The accepted version of the publication may differ from the final published version.Counts of the number of readers registered in the social reference manager Mendeley have been proposed as an early impact indicator for journal articles. Although previous research has shown that Mendeley reader counts for articles tend to have a strong positive correlation with synchronous citation counts after a few years, no previous studies have compared early Mendeley reader counts with later citation counts. In response, this first diachronic analysis compares reader counts within a month of publication with citation counts after 20 months for ten fields. There were moderate or strong correlations in eight out of ten fields, with the two exceptions being the smallest categories (n=18, 36) with wide confidence intervals. The correlations are higher than the correlations between later citations and early citations, showing that Mendeley reader counts are more useful early impact indicators than citation counts

    Opinion Mining on Non-English Short Text

    Full text link
    As the type and the number of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. In this paper, we investigate the problem of mining opinions on the collection of informal short texts. Both positive and negative sentiment strength of texts are detected. We focus on a non-English language that has few resources for text mining. This approach would help enhance the sentiment analysis in languages where a list of opinionated words does not exist. We propose a new method projects the text into dense and low dimensional feature vectors according to the sentiment strength of the words. We detect the mixture of positive and negative sentiments on a multi-variant scale. Empirical evaluation of the proposed framework on Turkish tweets shows that our approach gets good results for opinion mining

    Emotional persistence in online chatting communities

    Get PDF
    How do users behave in online chatrooms, where they instantaneously read and write posts? We analyzed about 2.5 million posts covering various topics in Internet relay channels, and found that user activity patterns follow known power-law and stretched exponential distributions, indicating that online chat activity is not different from other forms of communication. Analysing the emotional expressions (positive, negative, neutral) of users, we revealed a remarkable persistence both for individual users and channels. I.e. despite their anonymity, users tend to follow social norms in repeated interactions in online chats, which results in a specific emotional "tone" of the channels. We provide an agent-based model of emotional interaction, which recovers qualitatively both the activity patterns in chatrooms and the emotional persistence of users and channels. While our assumptions about agent's emotional expressions are rooted in psychology, the model allows to test different hypothesis regarding their emotional impact in online communication.Comment: 34 pages, 4 main and 12 supplementary figure

    Negative emotions boost users activity at BBC Forum

    Full text link
    We present an empirical study of user activity in online BBC discussion forums, measured by the number of posts written by individual debaters and the average sentiment of these posts. Nearly 2.5 million posts from over 18 thousand users were investigated. Scale free distributions were observed for activity in individual discussion threads as well as for overall activity. The number of unique users in a thread normalized by the thread length decays with thread length, suggesting that thread life is sustained by mutual discussions rather than by independent comments. Automatic sentiment analysis shows that most posts contain negative emotions and the most active users in individual threads express predominantly negative sentiments. It follows that the average emotion of longer threads is more negative and that threads can be sustained by negative comments. An agent based computer simulation model has been used to reproduce several essential characteristics of the analyzed system. The model stresses the role of discussions between users, especially emotionally laden quarrels between supporters of opposite opinions, and represents many observed statistics of the forum.Comment: 29 pages, 6 figure

    U.S. academic libraries: understanding their web presence and their relationship with economic indicators

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11192-013-1001-0The main goal of this research is to analyze the web structure and performance of units and services belonging to U.S. academic libraries in order to check their suitability for webometric studies. Our objectives include studying their possible correlation with economic data and assessing their use for complementary evaluation purposes. We conducted a survey of library homepages, institutional repositories, digital collections, and online catalogs (a total of 374 URLs) belonging to the 100 U.S. universities with the highest total expenditures in academic libraries according to data provided by the National Center for Education Statistics. Several data points were taken and analyzed, including web variables (page count, external links, and visits) and economic variables (total expenditures, expenditures on printed and electronic books, and physical visits). The results indicate that the variety of URL syntaxes is wide, diverse and complex, which produces a misrepresentation of academic libraries’ web resources and reduces the accuracy of web analysis. On the other hand, institutional and web data indicators are not highly correlated. Better results are obtained by correlating total library expenditures with URL mentions measured by Google (r = 0.546) and visits measured by Compete (r = 0.573), respectively. Because correlation values obtained are not highly significant, we estimate such correlations will increase if users can avoid linkage problems (due to the complexity of URLs) and gain direct access to log files (for more accurate data about visits).Orduña Malea, E.; Regazzi, JJ. (2014). U.S. academic libraries: understanding their web presence and their relationship with economic indicators. Scientometrics. 98(1):315-336. doi:10.1007/s11192-013-1001-0S315336981Adecannby, J. (2011). Web link analysis of interrelationship between top ten African universities and world universities. Annals of library and information studies, 58(2), 128–138.Aguillo, I. F. (2009). Measuring the institutions’ footprint in the web. Library Hi Tech, 27(4), 540–556.Aguillo, I. F., Ortega, J. L., & Fernández, M. (2008). Webometric Ranking of World Universities: Introduction, methodology, and future developments. Higher education in Europe, 33(2/3), 234–244.Aguillo, I. F., Ortega, J. L., Fernandez, M., & Utrilla, A. M. (2010). Indicators for a webometric ranking of open Access repositories. Scientometrics, 82(3), 477–486.Arakaki, M., & Willet, P. (2009). Webometric analysis of departments of librarianship and information science: A follow-up study. Journal of information science, 35(2), 143–152.Arlitsch, K., & O’Brian, P. S. (2012). Invisible institutional repositories: Addresing the low indexing ratios of IR in Google Scholar. Library Hi Tech, 30(1), 60–81.Bar-Ilan, J. (1999). Search engine results over time—A case study on search engine stability”. Cybermetrics, 2/3. Retrieved February 18, 2013 from http://www.cindoc.csic.es/cybermetrics/articles/v2i1p1.html.Bar-Ilan, J. (2001). Data collection methods on the Web for informetric purposes: A review and analysis. Scientometrics, 50(1), 7–32.Bermejo, F. (2007). The internet audience: Constitution & measurement. New York: Peter Lang Pub Incorporated.Buigues-Garcia, M., & Gimenez-Chornet, V. (2012). Impact of Web 2.0 on national libraries. International Journal of Information Management, 32(1), 3–10.Chu, H., He, S., & Thelwall, M. (2002). Library and information science schools in Canada and USA: A Webometric perspective. Journal of education for Library and Information Science, 43(2), 110–125.Chua, Alton, Y. K., & Goh, D. H. (2010). A study of Web 2.0 applications in library websites. Library and Information Science Research, 32(3), 203–211.Gallego, I., García, I.-M., & Rodríguez, L. (2009). Universities’ websites: Disclosure practices and the revelation of financial information. The International Journal of Digital Accounting Research, 9(15), 153–192.Gomes, B. & Smith, B. T. (2003). Detecting query-specific duplicate documents. [Patent]. Retrieved February 18, 2013 from http://www.patents.com/Detecting-query-specific-duplicate-documents/US6615209/en-US .Harinarayana, N. S., & Raju, N. V. (2010). Web 2.0 features in university library web sites. Electronic Library, 28(1), 69–88.Lewandowski, D., Wahlig, H., & Meyer-Bautor, G. (2006). The freshness of web search engine databases. Journal of Information Science, 32(2), 131–148.Mahmood, K., & Richardson, J. V, Jr. (2012). Adoption of Web 2.0 in US academic libraries: A survey of ARL library websites. Program, 45(4), 365–375.Orduña-Malea, E., & Ontalba-Ruipérez, J-A. (2012). Selective linking from social platforms to university websites: A case study of the Spanish academic system. Scientometrics. (in press).Ortega, J. L., & Aguillo, I. F. (2009). Mapping World-class universities on the Web. Information Processing and Management, 45(2), 272–279.Ortega, José L. & Aguillo, Isidro F. (2009b). North America Academic Web Space: Multicultural Canada vs. The United States Homogeneity. In: ASIST & ISSI pre-conference symposium on informetrics and scientometrics.Phan, T., Hardesty, L., Sheckells, C., & George, A. (2009). Documentation for the academic libraries survey (ALS) public-use data file: Fiscal year 2008. Washington DC: National Center for Education Statistics. Institute of Education Sciences U.S. Department of Education.Qiu, J., Cheng, J., & Wang, Z. (2004). An analysis of backlinks counts and web impact factors for Chinese university websites. Scientometrics, 60(3), 463–473.Regazzi, J. J. (2012a). Constrained?—An analysis of U.S. Academic Libraries and shifts in spending, staffing and utilization, 1998–2008. College and Research Libraries, 73(5), 449–468.Regazzi, J. J. (2012b). Comparing Academic Library Spending with Public Libraries, Public K-12 Schools, Higher Education Public Institutions, and Public Hospitals Between 1998–2008. Journal of Academic Librarianship, 38(4), 205–216.Rousseau, R. (1999). Daily time series of common single word searches in AltaVista and NorthernLight. Cybermetrics, 2/3. Retrieved February 18, 2013 from http://www.cindoc.csic.es/cybermetrics/articles/v2i1p2.html .Sato, S., & Itsumura, H. (2011). How do people use open access papers in non-academic activities? A link analysis of papers deposited in institutional repositories. Library, Information and Media Studies, 9(1), 51–64.Scholze, F. (2007). Measuring research impact in an open access environment. Liber Quarterly: The Journal of European Research Libraries, 17(1–4), 220–232.Smith, A. G. (2011). Wikipedia and institutional repositories: An academic symbiosis? In: Proceedings of the ISSI 2011 conference. Durban, South Africa, 4–7 July 2011. Retrieved February 18, 2013 from http://www.vuw.ac.nz/staff/alastair_smith/publns/SmithAG2011_ISSI_paper.pdf .Smith, A.G. (2012). Webometric evaluation of institutional repositories. In: Proceedings of the 8th international conference on webometrics informetrics and scientometrics & 13th collnet meeting. Seoul (Korea), 722–729.Smith, A., & Thelwall, M. (2002). Web impact factors for Australasian Universities. Scientometrics, 54(3), 363–380.Tang, R., & Thelwall, M. (2008). A hyperlink analysis of US public and academic libraries’ web sites. Library Quarterly, 78(4), 419–435.Thelwall, M. (2008). Extracting accurate and complete results from search engines: Case study Windows Live. Journal of the American Society for Information Science and Technology, 59(1), 38–50.Thelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. San Rafael: Morgan & Claypool.Thelwall, M., & Sud, P. (2011). A comparison of methods for collecting web citation data for academic organisations. Journal of the American Society for Information Science and Technology, 62(8), 1488–1497.Thelwall, M., Sud, P., & Wilkinson, D. (2012). Link and co-inlink network diagrams with URL citations or title mentions. Journal of the American Society for Information Science and Technology, 63(10), 1960–1972.Thelwall, M., & Zuccala, A. (2008). A University-centred European Union link analysis. Scientometrics, 75(3), 407–442.Uyar, A. (2009a). Google stemming mechanisms. Journal of Information Science, 35(5), 499–514.Uyar, A. (2009b). Investigation of the accuracy of search engine hit counts. Journal of Information Science, 35(4), 469–480.Zuccala, A., Thelwall, M., Oppenheim, C., & Dhiensa, R. (2007). Web intelligence analyses of digital libraries: A case study of the National Electronic Library for Health (NeLH). Journal of Documentation, 63(4), 558–589

    Proposal for a multilevel university cybermetric analysis model

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11192-012-0868-5Universities’ online seats have gradually become complex systems of dynamic information where all their institutions and services are linked and potentially accessible. These online seats now constitute a central node around which universities construct and document their main activities and services. This information can be quantitative measured by cybermetric techniques in order to design university web rankings, taking the university as a global reference unit. However, previous research into web subunits shows that it is possible to carry out systemic web analyses, which open up the possibility of carrying out studies which address university diversity, necessary for both describing the university in greater detail and for establishing comparable ranking units. To address this issue, a multilevel university cybermetric analysis model is proposed, based on parts (core and satellite), levels (institutional and external) and sublevels (contour and internal), providing a deeper analysis of institutions. Finally the model is integrated into another which is independent of the technique used, and applied by analysing Harvard University as an example of use.Orduña Malea, E.; Ontalba Ruipérez, JA. (2013). Proposal for a multilevel university cybermetric analysis model. Scientometrics. 95(3):863-884. doi:10.1007/s11192-012-0868-5S863884953Acosta Márquez, T., Igartua Perosanz, J.J. & Gómez Isla, J. (2009). Páginas web de las universidades españolas. Enred: revista digital de la Universidad de Salamanca, 5 [online; discontinued].Aguillo, I. F. (1998). Hacia un concepto documental de sede web. El Profesional de la Información, 7(1–2), 45–46.Aguillo, I. F. (2009). Measuring the institutions’ footprint in the web. Library Hi Tech, 27(4), 540–556.Aguillo, I. F., Granadino, B., Ortega, J. L., & Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetrics indicators. Journal of the American Society for Information Science and Technology, 57(10), 1296–1302.Aguillo, I. F., Ortega, J. L., & Fernández, M. (2008). Webometric Ranking of World Universities: introduction, methodology, and future developments. Higher Education in Europe, 33(2/3), 234–244.Ayan, N., Li, W.-S., & Kolak, O. (2002). Automatic extraction of logical domains in a web site. Data & Knowledge Engineering, 43(2), 179–205.Barjak, F., Li, X., & Thelwall, M. (2007). Which factors explain the Web impact of scientists’ personal homepages? Journal of the American Society for Information Science and Technology, 58(2), 200–211.Berners-Lee, T., & Fischetti, M. (2000). Tejiendo la Red. Madrid: Siglo XXI.Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227.Buenadicha, M., Chamorro, A., Miranda, F. J., & González, O. R. (2001). A new web assessment index: Spanish Universities Analysis. Internet Research, 11(3), 226–234.Castells, M. (2001). La galaxia Internet. Barcelona: Plaza y Janés.Chu, H., He, S., & Thelwall, M. (2002). Library and Information Science Schools in Canada and USA: a Webometric perspective. Journal of Education for Library and Information Science, 43(2), 110–125.Crowston, K., & Williams, M. (2000). Reproduced and Emergent Genres of Communication on the World Wide Web. The Information Society: an International Journal, 16(3), 201–215.Goldfarb, A. (2006). The (teaching) role of universities in the diffusion of the Internet. International Journal of Industrial Organization, 24(2), 203–225.Ingwersen, P. (1998). The calculation of web impact factors. Journal of Documentation, 54(2), 236–243.Katz, R. N. (2008a). The tower and the cloud: Higher education in the age of cloud computing. USA: Educause.Katz, R. N. (2008b). The gathering cloud: is this the end of the middle. In R. N. Katz (Ed.), The tower and the cloud: Higher education in the age of cloud computing (p. 2008). USA: Educause.Li, X. (2005). National and international university departmental Web site interlinking: a webometric analysis. [Unpublished doctoral dissertation]. Wolverhampton, UK: University of Wolverhampton.Li, X., Thelwall, M., Musgrove, P., & Wilkinson, D. (2003). The relationship between the links/Web Impact Factors of computer science departments in UK and their RAE (Research Assessment Exercise) ranking in 2001. Scientometrics, 57(2), 239–255.Middleton, I., McConnell, M., & Davidson, G. (1999). Presenting a model for the structure and content of a University World Wide Web site. Journal of Information Science, 25(3), 217–219.Orduña-Malea, E. (2012). Propuesta de un modelo de análisis redinformétrico multinivel para el estudio sistémico de las universidades españolas (2010). Valencia: Polytechnic University of Valencia.Ortega, J. L., & Aguillo, Isidro. F. (2007). La web académica española en el contexto del Espacio Europeo de Educación Superior: estudio exploratorio. El profesional de la información, 16(5), 417–425.Pareja, V. M., Ortega, J. L., Prieto, J. A., Arroyo, N., & Aguillo, I. F. (2005). Desarrollo y aplicación del concepto de sede web como unidad documental de análisis en Cibermetría. Jornadas Españolas de Documentación, 9, 325–340.Saorín, T. (2012). Arquitectura de la dispersión: gestionar los riesgos cíclicos de fragmentación de las webs corporativas. Anuario ThinkEPI, 6, 281–287.Tang, R., & Thelwall, M. (2003). U.S. academic departmental Web-site interlinking: disciplinary differences. Library & Information Science Research, 25(4), 437–458.Tang, R., & Thelwall, M. (2004). Patterns of national and international web inlinks to US academic departments: an analysis of disciplinary variations. Scientometrics, 60(3), 475–485.Thelwall, M. (2002a). A research and institutional size based model for national university Web site interlinking. Journal of Documentation, 58(6), 683–694.Thelwall, M. (2002b). Conceptualizing documentation on the Web: an evaluation of different heuristic-based models for counting links between university web sites. Journal of the American Society for Information Science and Technology, 53(12), 995–1005.Thelwall, M. (2003). Web use and peer interconnectivity metrics for academic Web sites. Journal of Information Science, 29(1), 11–20.Thelwall, M. (2009). Introduction to Webometrics: quantitative web research for the social sciences. San Rafael: Morgan & Claypool.Thelwall, M., & Harries, G. (2004a). Can personal Web pages that link to universities yield information about the wider dissemination of research? Journal of Information Science, 30(3), 243–256.Thelwall, M., & Harries, G. (2004b). Do better scholars’ Web publications have significantly higher online impact? Journal of American Society for Information Science and Technology, 55(2), 149–159.Thelwall, M., Vaughan, L., & Björneborn, L. (2005). Webometrics. Annual Review of Information Science and Technology, 39, 81–135.Thomas, O., & Willet, P. (2000). Webometric analysis of Departments of librarianship and information science. Journal of Information Science, 26(6), 421–428.Tíscar, L. (2009). El papel de la universidad en la construcción de su identidad digital. Revista de universidad y sociedad del conocimiento, 6(1), 15–21.Van Vught, F. A. (2009). Diversity and differentiation in higher education. In F. Van Vught (Ed.), Mapping the higher education landscape: toward a European classification of higher education (pp. 1–16). The Netherlands: Springer.Yolku, O. (2001). Use of news articles and announcements on official websites of universities. Turkish Online Journal of Educational Technology, 10(2), 287–296

    The Early Bird Catches The Term: Combining Twitter and News Data For Event Detection and Situational Awareness

    Full text link
    Twitter updates now represent an enormous stream of information originating from a wide variety of formal and informal sources, much of which is relevant to real-world events. In this paper we adapt existing bio-surveillance algorithms to detect localised spikes in Twitter activity corresponding to real events with a high level of confidence. We then develop a methodology to automatically summarise these events, both by providing the tweets which fully describe the event and by linking to highly relevant news articles. We apply our methods to outbreaks of illness and events strongly affecting sentiment. In both case studies we are able to detect events verifiable by third party sources and produce high quality summaries

    Emotional Analysis of Blogs and Forums Data

    Full text link
    We perform a statistical analysis of emotionally annotated comments in two large online datasets, examining chains of consecutive posts in the discussions. Using comparisons with randomised data we show that there is a high level of correlation for the emotional content of messages.Comment: REVTEX format, 5 pages, 6 figures, 2 tables, accepted to Acta Physica Polonica
    corecore