18 research outputs found

    Are web mentions accurate substitutes for inlinks for Spanish universities?

    Full text link
    This article is (c) Emerald Group Publishing and permission has been granted for this version to appear here. Emerald does not grant permission for this article to be further copied/distributed or hosted elsewhere without the express permission from Emerald Group Publishing Limitedurpose – Title and URL mentions have recently been proposed as web visibility indicators instead of inlink counts. The objective of this study is to determine the accuracy of these alternative web mention indicators in the Spanish academic system, taking into account their complexity (multi-domains) and diversity (different official languages). Design/methodology/approach – Inlinks, title and URL mentions from 76 Spanish universities were manually extracted from the main search engines (Google, Google Scholar, Yahoo!, Bing and Exalead). Several statistical methods, such as correlation, difference tests and regression models, were used. Findings – Web mentions, despite some limitations, can be used as substitutes for inlinks in the Spanish academic system, although these indicators are more likely to be influenced by the environment (language, web domain policy, etc.) than inlinks. Research limitations/implications – Title mentions provide unstable results caused by the multiple name variants which an institution can present (such as acronyms and other language versions). URL mentions are more stable, but they may present atypical points due to some shortcomings, the effect of which is that URL mentions do not have the same meaning as inlinks. Practical implications – Web mentions should be used with caution and after a cleaning-up process. Moreover, these counts do not necessarily signify connectivity, so their use in global web analysis should be limited. Originality/value – Web mentions have previously been used in some specific academic systems (US, UK and China), but this study analyses, in depth and for the first time, an entire non-English speaking European country (Spain), with complex academic web behaviour, which helps to better explain previous web mention results.Ortega, JL.; Orduña Malea, E.; Aguillo, IF. (2014). Are web mentions accurate substitutes for inlinks for Spanish universities?. Online Information Review. 38(1):59-77. doi:10.1108/OIR-10-2012-0189S5977381Adecannby, J. (2011), “Web link analysis of interrelationship between top ten African universities and world universities”, Annals of Library and Information Studies, Vol. 58 No. 2, pp. 128-138.Aguillo, I. (2009). Measuring the institution’s footprint in the web. Library Hi Tech, 27(4), 540-556. doi:10.1108/073788309Aguillo, I.F. , Ortega, J.L. and Fernández, M. (2008), “Webometric ranking of world universities: introduction, methodology, and future developments”, Higher Education in Europe, Vol. 33 Nos 2/3, pp. 234-244.Aguillo, I. F., Granadino, B., Ortega, J. L., & Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetrics indicators. Journal of the American Society for Information Science and Technology, 57(10), 1296-1302. doi:10.1002/asi.20433Barabási, A.-L., & Albert, R. (1999). Emergence of Scaling in Random Networks. Science, 286(5439), 509-512. doi:10.1126/science.286.5439.509Bar-Ilan, J. (2005). The use of web search engines in information science research. Annual Review of Information Science and Technology, 38(1), 231-288. doi:10.1002/aris.1440380106Bar-Ilan, J. (2004). A microscopic link analysis of academic institutions within a country - the case of Israel. Scientometrics, 59(3), 391-403. doi:10.1023/b:scie.0000018540.33706.c1Bar-Ilan, J. (2005). What do we know about links and linking? A framework for studying links in academic environments. Information Processing & Management, 41(4), 973-986. doi:10.1016/j.ipm.2004.02.005Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216-1227. doi:10.1002/asi.20077Bland, J.M. and Altman, D.G. (1996), “Transforming data”, British Medical Journal, Vol. 312 No. 7033, p.Cronin, B., Snyder, H. W., Rosenbaum, H., Martinson, A., & Callahan, E. (1998). Invoked on the Web. Journal of the American Society for Information Science, 49(14), 1319-1328. doi:10.1002/(sici)1097-4571(1998)49:143.0.co;2-wFriedman, M. (1937). The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. Journal of the American Statistical Association, 32(200), 675-701. doi:10.1080/01621459.1937.10503522Harries, G., Wilkinson, D., Price, L., Fairclough, R., & Thelwall, M. (2004). Hyperlinks as a data source for science mapping. Journal of Information Science, 30(5), 436-447. doi:10.1177/0165551504046736Heimeriks, G. and Van den Besselaar, P. (2006), “Analyzing hyperlink networks: the meaning of hyperlink based indicators of knowledge”, Cybermetrics, Vol. 10, available at: http://cybermetrics.cindoc.csic.es/articles/v10i1p1.pdf (accessed 10 July 2013).Heimeriks, G., Hörlesberger, M., & Van den Besselaar, P. (2003). Scientometrics, 58(2), 391-413. doi:10.1023/a:1026296812830Kousha, K. and Horri, A. (2004), “The relationship between scholarly publishing and the counts of academic inlinks to Iranian university web sites: exploring academic link creation motivations”, Journal of Information Management and Scientometrics, Vol. 1 No. 2, pp. 13-22.Kousha, K., & Thelwall, M. (2009). Google book search: Citation analysis for social science and the humanities. Journal of the American Society for Information Science and Technology, 60(8), 1537-1549. doi:10.1002/asi.21085Kretschmer, H., & Aguillo, I. F. (2004). Visibility of collaboration on the Web. Scientometrics, 61(3), 405-426. doi:10.1023/b:scie.0000045118.68430.fdOrduña-Malea, E. (2012), “Fuentes de enlaces web para análisis cibermétricos (2012)”, Anuario Thinkepi, Vol. 6 No. 1, pp. 276-280.Orduña-Malea, E. (2013), “Espacio universitario español en la Web (2010): estudio descriptivo de instituciones y productos académicos a través del análisis de subdominios y subdirectorios”, Revista Española de Documentación Científica, Vol. 36 No. 3.Orduña-Malea, E., & Ontalba-Ruipérez, J.-A. (2012). Proposal for a multilevel university cybermetric analysis model. Scientometrics, 95(3), 863-884. doi:10.1007/s11192-012-0868-5Orduña-Malea, E., Serrano-Cobos, J., Ontalba-Ruipérez, J. A., & Lloret-Romero, N. (2010). Presencia y visibilidad web de las universidades públicas españolas. Revista española de Documentación Científica, 33(2), 246-278. doi:10.3989/redc.2010.2.740Ortega, J. L., & Aguillo, I. F. (2008). Visualization of the Nordic academic web: Link analysis using social network tools. Information Processing & Management, 44(4), 1624-1633. doi:10.1016/j.ipm.2007.09.010Ortega, J. L., & Aguillo, I. F. (2009). Análisis estructural de la web académica iberoamericana. Revista española de Documentación Científica, 32(3), 51-65. doi:10.3989/redc.2009.3.689Ortega, J. L., Aguillo, I., Cothey, V., & Scharnhorst, A. (2007). Maps of the academic web in the European Higher Education Area — an exploration of visual web indicators. Scientometrics, 74(2), 295-308. doi:10.1007/s11192-008-0218-9Qiu, J., Chen, J., & Wang, Z. (2004). An analysis of backlink counts and Web Impact Factorsfor Chinese university websites. Scientometrics, 60(3), 463-473. doi:10.1023/b:scie.0000034387.76981.83Seeber, M., Lepori, B., Lomi, A., Aguillo, I., & Barberio, V. (2012). Factors affecting web links between European higher education institutions. Journal of Informetrics, 6(3), 435-447. doi:10.1016/j.joi.2012.03.001Seidman, E. (2007), “We are flattered, but …”, Bing Community, available at: www.bing.com/community/site_blogs/b/search/archive/2007/03/28/we-are-flattered-but.aspx (accessed 20 October 2012).Smith, A.G. (1999), “A tale of two web spaces: comparing sites using web impact factors”, Journal of Documentation, Vol. 55 No. 5, pp. 577-592.Smith, A., & Thelwall, M. (2002). Scientometrics, 54(3), 363-380. doi:10.1023/a:1016030415822Stuart, D., & Thelwall, M. (2006). Investigating triple helix relationships using URL citations: a case study of the UK West Midlands automobile industry. Research Evaluation, 15(2), 97-106. doi:10.3152/147154406781775968Thelwall, M. (2001). Extracting macroscopic information from Web links. Journal of the American Society for Information Science and Technology, 52(13), 1157-1168. doi:10.1002/asi.1182Thelwall, M. (2002). An initial exploration of the link relationship between UK university Web sites. Aslib Proceedings, 54(2), 118-126. doi:10.1108/00012530210435248Thelwall, M. and Aguillo, I.F. (2003), “La salud de las web universitarias españolas”, Revista Española de Documentación Científica, Vol. 26 No. 3, pp. 291-305.Thelwall, M., & Kousha, K. (2008). Online presentations as a source of scientific impact? An analysis of PowerPoint files citing academic journals. Journal of the American Society for Information Science and Technology, 59(5), 805-815. doi:10.1002/asi.20803Thelwall, M., & Sud, P. (2011). A comparison of methods for collecting web citation data for academic organizations. Journal of the American Society for Information Science and Technology, 62(8), 1488-1497. doi:10.1002/asi.21571Thelwall, M., & Sud, P. (2012). Webometric research with the Bing Search API 2.0. Journal of Informetrics, 6(1), 44-52. doi:10.1016/j.joi.2011.10.002Thelwall, M., & Zuccala, A. (2008). A university-centred European Union link analysis. Scientometrics, 75(3), 407-420. doi:10.1007/s11192-007-1831-8Thelwall, M., Tang, R., & Price, L. (2003). Scientometrics, 56(3), 417-432. doi:10.1023/a:1022387105904Thelwall, M., Binns, R., Harries, G., Page-Kennedy, T., Price, L., & Wilkinson, D. (2002). Scientometrics, 53(1), 95-111. doi:10.1023/a:1014836021080Vaughan, L. (2012). An Alternative Data Source for Web Hyperlink Analysis: «Sites Linking In» at Alexa Internet. Collnet Journal of Scientometrics and Information Management, 6(1), 31-42. doi:10.1080/09737766.2012.10700922Vaughan, L., & Romero-Frías, E. (2012). Exploring Web keyword analysis as an alternative to link analysis: a multi-industry case. Scientometrics, 93(1), 217-232. doi:10.1007/s11192-012-0640-xVaughan, L., & Shaw, D. (2003). Bibliographic and Web citations: What is the difference? Journal of the American Society for Information Science and Technology, 54(14), 1313-1322. doi:10.1002/asi.10338Vaughan, L., & Yang, R. (2012). Web data as academic and business quality estimates: A comparison of three data sources. Journal of the American Society for Information Science and Technology, 63(10), 1960-1972. doi:10.1002/asi.22659Vaughan, L., & You, J. (2010). Word co-occurrences on Webpages as a measure of the relatedness of organizations: A new Webometrics concept. Journal of Informetrics, 4(4), 483-491. doi:10.1016/j.joi.2010.04.005Vaughan, L., Kipp, M. E. I., & Gao, Y. (2007). Why are Websites co-linked? The case of Canadian universities. Scientometrics, 72(1), 81-92. doi:10.1007/s11192-007-1707-y(The) Washington Post(2009), “It's official: Yahoo-Microsoft announce ten-year search/ad pact”, The Washington Post, available at: www.washingtonpost.com/wp-dyn/content/article/2009/07/29/AR2009072901108.html (accessed 27 February 2013).Wilkinson, D., Harries, G., Thelwall, M., & Price, L. (2003). Motivations for academic web site interlinking: evidence for the Web as a novel source of information on informal scholarly communication. Journal of Information Science, 29(1), 49-56. doi:10.1177/016555150302900105Zhang, Y. (2006). The Effect of Open Access on Citation Impact: A Comparison Study Based on Web Citation Analysis. Libri, 56(3). doi:10.1515/libr.2006.14

    Disclosing the network structure of private companies on the web: the case of Spanish IBEX 35 share index

    Full text link
    [EN] Purpose - It is common for an international company to have different brands, products or services, information for investors, a corporate blog, affiliates, branches in different countries, etc. If all these contents appear as independent additional web domains (AWDs), the company should be represented on the web by all these web domains, since many of these AWDs may acquire remarkable performance that could mask or distort the real web performance of the company, affecting therefore on the understanding of web metrics. The purpose of this paper is to determine the amount, type, web impact and topology of the AWDs in commercial companies in order to get a better understanding on their complete web impact and structure. Design/methodology/approach - The set of companies belonging to the Spanish IBEX-35 stock index has been analysed as testing bench. The authors proceeded to identify and categorise all AWDs belonging to these companies, and to apply both web impact (web presence and visibility) and network metrics. Findings - The results show that AWDs get a high web presence but relatively low web visibility, due to certain opacity or less dissemination of some AWDs favoring its isolation. This is verified by the low network density values obtained, that occur because AWDs are strongly connected with the corporate domain (although asymmetrically), but very weakly linked each other. Research limitations/implications - The categories used to classify the various AWD, although they are clearly distinguishable conceptually, have certain limitations in practice, since they depend on the form adopted by companies to publish certain content or to provide certain services or products. Otherwise, the use of web indicators presents certain problems of accuracy that could be softened if applied with caution and in a relational basis. Originality/value - Although the processes of AWDs creation and categorisation are complex (web policy seems not to be driven by a defined or conscious plan), their influence on the web performance of IBEX 35 companies is meaningful. This research measures the AWDs influence on companies under webometric terms for the first time.This research has been funded under the project APOSTD/2013/002 from the Regional Ministry of Education, Culture and Sport (Generalitat Valenciana) in Spain.Orduña Malea, E.; Delgado López-Cózar, E.; Serrano-Cobos, J.; Lloret Romero, MN. (2015). Disclosing the network structure of private companies on the web: the case of Spanish IBEX 35 share index. Online Information Review. 39(3):360-382. https://doi.org/10.1108/OIR-11-2014-0282S36038239

    Webometric analysis of institutional repositories of Malaysian public universities

    Get PDF
    An institutional repository (IR) is one of the resources available in most university libraries that have attracted external publishers, search engines and social media to link, share and index IR content. The traditional citation-based indicators of a publication may not reflect the IR quality and have led to the creation of new indicators such as webometrics or web metrics. This study aims to analyse and explore Malaysia’s public university IR visibility, the numbers of an external link, page count, PDF count and URL web mention. We utilised backlinks web crawler and web search engine to collect raw data. A visualisation was created using the force-directed graphing method to interpret the IR network in the webspace. This study revealed that two research universities, Universiti Malaya (UM) and Universiti Putra Malaysia (UPM), dominate web visibility based on webometrics indicators. All non-research universities are at the bottom of the rankings. This study shows institutional repositories from research universities are more visible in academic social networks and digital library sites. In contrast, non-research universities need to improve their visibility by mapping the universities’ IRs websites through hyperlink exchange and collaboration activities between each university and promoting the university publication to the academic social network sites

    U.S. academic libraries: understanding their web presence and their relationship with economic indicators

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11192-013-1001-0The main goal of this research is to analyze the web structure and performance of units and services belonging to U.S. academic libraries in order to check their suitability for webometric studies. Our objectives include studying their possible correlation with economic data and assessing their use for complementary evaluation purposes. We conducted a survey of library homepages, institutional repositories, digital collections, and online catalogs (a total of 374 URLs) belonging to the 100 U.S. universities with the highest total expenditures in academic libraries according to data provided by the National Center for Education Statistics. Several data points were taken and analyzed, including web variables (page count, external links, and visits) and economic variables (total expenditures, expenditures on printed and electronic books, and physical visits). The results indicate that the variety of URL syntaxes is wide, diverse and complex, which produces a misrepresentation of academic libraries’ web resources and reduces the accuracy of web analysis. On the other hand, institutional and web data indicators are not highly correlated. Better results are obtained by correlating total library expenditures with URL mentions measured by Google (r = 0.546) and visits measured by Compete (r = 0.573), respectively. Because correlation values obtained are not highly significant, we estimate such correlations will increase if users can avoid linkage problems (due to the complexity of URLs) and gain direct access to log files (for more accurate data about visits).Orduña Malea, E.; Regazzi, JJ. (2014). U.S. academic libraries: understanding their web presence and their relationship with economic indicators. Scientometrics. 98(1):315-336. doi:10.1007/s11192-013-1001-0S315336981Adecannby, J. (2011). Web link analysis of interrelationship between top ten African universities and world universities. Annals of library and information studies, 58(2), 128–138.Aguillo, I. F. (2009). Measuring the institutions’ footprint in the web. Library Hi Tech, 27(4), 540–556.Aguillo, I. F., Ortega, J. L., & Fernández, M. (2008). Webometric Ranking of World Universities: Introduction, methodology, and future developments. Higher education in Europe, 33(2/3), 234–244.Aguillo, I. F., Ortega, J. L., Fernandez, M., & Utrilla, A. M. (2010). Indicators for a webometric ranking of open Access repositories. Scientometrics, 82(3), 477–486.Arakaki, M., & Willet, P. (2009). Webometric analysis of departments of librarianship and information science: A follow-up study. Journal of information science, 35(2), 143–152.Arlitsch, K., & O’Brian, P. S. (2012). Invisible institutional repositories: Addresing the low indexing ratios of IR in Google Scholar. Library Hi Tech, 30(1), 60–81.Bar-Ilan, J. (1999). Search engine results over time—A case study on search engine stability”. Cybermetrics, 2/3. Retrieved February 18, 2013 from http://www.cindoc.csic.es/cybermetrics/articles/v2i1p1.html.Bar-Ilan, J. (2001). Data collection methods on the Web for informetric purposes: A review and analysis. Scientometrics, 50(1), 7–32.Bermejo, F. (2007). The internet audience: Constitution & measurement. New York: Peter Lang Pub Incorporated.Buigues-Garcia, M., & Gimenez-Chornet, V. (2012). Impact of Web 2.0 on national libraries. International Journal of Information Management, 32(1), 3–10.Chu, H., He, S., & Thelwall, M. (2002). Library and information science schools in Canada and USA: A Webometric perspective. Journal of education for Library and Information Science, 43(2), 110–125.Chua, Alton, Y. K., & Goh, D. H. (2010). A study of Web 2.0 applications in library websites. Library and Information Science Research, 32(3), 203–211.Gallego, I., García, I.-M., & Rodríguez, L. (2009). Universities’ websites: Disclosure practices and the revelation of financial information. The International Journal of Digital Accounting Research, 9(15), 153–192.Gomes, B. & Smith, B. T. (2003). Detecting query-specific duplicate documents. [Patent]. Retrieved February 18, 2013 from http://www.patents.com/Detecting-query-specific-duplicate-documents/US6615209/en-US .Harinarayana, N. S., & Raju, N. V. (2010). Web 2.0 features in university library web sites. Electronic Library, 28(1), 69–88.Lewandowski, D., Wahlig, H., & Meyer-Bautor, G. (2006). The freshness of web search engine databases. Journal of Information Science, 32(2), 131–148.Mahmood, K., & Richardson, J. V, Jr. (2012). Adoption of Web 2.0 in US academic libraries: A survey of ARL library websites. Program, 45(4), 365–375.Orduña-Malea, E., & Ontalba-Ruipérez, J-A. (2012). Selective linking from social platforms to university websites: A case study of the Spanish academic system. Scientometrics. (in press).Ortega, J. L., & Aguillo, I. F. (2009). Mapping World-class universities on the Web. Information Processing and Management, 45(2), 272–279.Ortega, José L. & Aguillo, Isidro F. (2009b). North America Academic Web Space: Multicultural Canada vs. The United States Homogeneity. In: ASIST & ISSI pre-conference symposium on informetrics and scientometrics.Phan, T., Hardesty, L., Sheckells, C., & George, A. (2009). Documentation for the academic libraries survey (ALS) public-use data file: Fiscal year 2008. Washington DC: National Center for Education Statistics. Institute of Education Sciences U.S. Department of Education.Qiu, J., Cheng, J., & Wang, Z. (2004). An analysis of backlinks counts and web impact factors for Chinese university websites. Scientometrics, 60(3), 463–473.Regazzi, J. J. (2012a). Constrained?—An analysis of U.S. Academic Libraries and shifts in spending, staffing and utilization, 1998–2008. College and Research Libraries, 73(5), 449–468.Regazzi, J. J. (2012b). Comparing Academic Library Spending with Public Libraries, Public K-12 Schools, Higher Education Public Institutions, and Public Hospitals Between 1998–2008. Journal of Academic Librarianship, 38(4), 205–216.Rousseau, R. (1999). Daily time series of common single word searches in AltaVista and NorthernLight. Cybermetrics, 2/3. Retrieved February 18, 2013 from http://www.cindoc.csic.es/cybermetrics/articles/v2i1p2.html .Sato, S., & Itsumura, H. (2011). How do people use open access papers in non-academic activities? A link analysis of papers deposited in institutional repositories. Library, Information and Media Studies, 9(1), 51–64.Scholze, F. (2007). Measuring research impact in an open access environment. Liber Quarterly: The Journal of European Research Libraries, 17(1–4), 220–232.Smith, A. G. (2011). Wikipedia and institutional repositories: An academic symbiosis? In: Proceedings of the ISSI 2011 conference. Durban, South Africa, 4–7 July 2011. Retrieved February 18, 2013 from http://www.vuw.ac.nz/staff/alastair_smith/publns/SmithAG2011_ISSI_paper.pdf .Smith, A.G. (2012). Webometric evaluation of institutional repositories. In: Proceedings of the 8th international conference on webometrics informetrics and scientometrics & 13th collnet meeting. Seoul (Korea), 722–729.Smith, A., & Thelwall, M. (2002). Web impact factors for Australasian Universities. Scientometrics, 54(3), 363–380.Tang, R., & Thelwall, M. (2008). A hyperlink analysis of US public and academic libraries’ web sites. Library Quarterly, 78(4), 419–435.Thelwall, M. (2008). Extracting accurate and complete results from search engines: Case study Windows Live. Journal of the American Society for Information Science and Technology, 59(1), 38–50.Thelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. San Rafael: Morgan & Claypool.Thelwall, M., & Sud, P. (2011). A comparison of methods for collecting web citation data for academic organisations. Journal of the American Society for Information Science and Technology, 62(8), 1488–1497.Thelwall, M., Sud, P., & Wilkinson, D. (2012). Link and co-inlink network diagrams with URL citations or title mentions. Journal of the American Society for Information Science and Technology, 63(10), 1960–1972.Thelwall, M., & Zuccala, A. (2008). A University-centred European Union link analysis. Scientometrics, 75(3), 407–442.Uyar, A. (2009a). Google stemming mechanisms. Journal of Information Science, 35(5), 499–514.Uyar, A. (2009b). Investigation of the accuracy of search engine hit counts. Journal of Information Science, 35(4), 469–480.Zuccala, A., Thelwall, M., Oppenheim, C., & Dhiensa, R. (2007). Web intelligence analyses of digital libraries: A case study of the National Electronic Library for Health (NeLH). Journal of Documentation, 63(4), 558–589

    TV in the Age of the Internet: Information Quality of Science Fiction TV Fansites

    Get PDF
    Thesis (Ph.D.) - Indiana University, Information Science, 2011Communally created Web 2.0 content on the Internet has begun to compete with information provided by traditional gatekeeper institutions, such as academic journals, medical professionals, and large corporations. On the one hand, such gatekeepers need to understand the nature of this competition, as well as to try to ensure that the general public are not endangered by poor quality information. On the other hand, advocates of free and universal access to basic social services have argued that communal efforts can provide as good or better-quality versions of commonly needed resources. This dissertation arises from these needs to understand the nature and quality of information being produced on such websites. Website-oriented information quality (IQ) literature spans at least 15 different academic fields, a survey of which identified two types of IQ: perceptual and artifactual fitness-related, and representational accuracy and completeness-related. The current project studied websites in terms of all of these, except perceptual fitness. This study may be the only of its kind to have targeted fansites: websites made by fans of a mass media franchise. Despite the Internet's becoming a primary means by which millions of people consume and co-produce their entertainment, little academic attention has been paid to the IQ of sites about the mass media. For this study, the four central non-studio-affiliated sites about a highly popular and fan-engaging science fiction television franchise, Stargate, were chosen, and their IQ examined across sites having different sizes as well as editorial and business models. As exhaustive of samples as possible were collected from each site. Based on 21 relevant variables from the IQ literature, four qualitative and 17 exploratory statistical analyses were conducted. Key findings include: five possibly new IQ criteria; smaller sites concerned more with pleasing connoisseuring fans than the general public; larger sites being targeted towards older users; professional editors serving their own interests more than users'; wikis' greater user freedom attracting more invested and balanced writers; for-profit sites being more imposing upon, and less protecting of, users than non-profit sites; and the emergence of common writing styles, themes, data fields, advertisement types, linking strategies, and page types

    Collaboration between UK Universities: A machine-learning based webometric analysis

    Get PDF
    A thesis submittedCollaboration is essential for some types of research, which is why some agencies include collaboration among the requirements for funding research projects. Studying collaborative relationships is important because analyses of collaboration networks can give insights into knowledge based innovation systems, the roles that different organisations play in a research field and the relationships between scientific disciplines. Co-authored publication data is widely used to investigate collaboration between organisations, but this data is not free and thus may not be accessible for some researchers. Hyperlinks have some similarities with citations, so hyperlink data may be used as an indicator to estimate the extent of collaboration between academic institutions and may be able to show types of relationships that are not present in co-authorship data. However, it has been shown that using raw hyperlink counts for webometric research can sometimes produce unreliable results, so researchers have attempted to find alternate counting methods and have tried to identify the reasons why hyperlinks may have been created in academic websites. This thesis uses machine learning techniques, an approach that has not previously been widely used in webometric research, to automatically classify hyperlinks and text in university websites in an attempt to filter out irrelevant hyperlinks when investigating collaboration between academic institutions. Supervised machine learning methods were used to automatically classify the web page types that can be found in Higher Education Institutions’ websites. The results were assessed to see whether ii automatically filtered hyperlink data gave better results than raw hyperlink data in terms of identifying patterns of collaboration between UK universities. Unsupervised learning methods were used to automatically identify groups of university departments that are collaborating or that may benefit from collaborating together, based on their co-appearance in research clusters. Results show that the machine learning methods used in this thesis can automatically identify both the source and target web page categories of hyperlinks in university websites with up to 78% accuracy; which means that it can increase the possibility for more effective hyperlink classification or for identifying the reasons why hyperlinks may have been created in university websites, if those reasons can be inferred from the relationship between the source and target page types. When machine learning techniques were used to filter hyperlinks that may not have been created because of collaboration from the hyperlink data, there was an increased correlation between hyperlink data and other collaboration indicators. This emphasises the possibility for using machine learning methods to make hyperlink data a more reliable data source for webometric research. The reasons for university name mentions in the different web page types found in an academic institution’s website are broadly the same as the reasons for link creation, this means that classification based on inter-page relationships may also be used to improve name mentions data for webometrics research. iii Clustering research groups based on the text in their homepages may be useful for identifying those research groups or departments with similar research interests which may be valuable for policy makers in monitoring research fields; based on the sizes of identified clusters and for identifying future collaborators; based on co-appearances in clusters, if identical research interests is a factor that can influence the choice of a future collaborator. In conclusion, this thesis shows that machine learning techniques can be used to significantly improve the quality of hyperlink data for webometrics research, and can also be used to analyse other web based data to give additional insights that may be beneficial for webometrics studies

    Web manifestations of knowledge-based innovation systems in the UK

    Get PDF
    Innovation is widely recognised as essential to the modern economy. The term knowledgebased innovation system has been used to refer to innovation systems which recognise the importance of an economy’s knowledge base and the efficient interactions between important actors from the different sectors of society. Such interactions are thought to enable greater innovation by the system as a whole. Whilst it may not be possible to fully understand all the complex relationships involved within knowledge-based innovation systems, within the field of informetrics bibliometric methodologies have emerged that allows us to analyse some of the relationships that contribute to the innovation process. However, due to the limitations in traditional bibliometric sources it is important to investigate new potential sources of information. The web is one such source. This thesis documents an investigation into the potential of the web to provide information about knowledge-based innovation systems in the United Kingdom. Within this thesis the link analysis methodologies that have previously been successfully applied to investigations of the academic community (Thelwall, 2004a) are applied to organisations from different sections of society to determine whether link analysis of the web can provide a new source of information about knowledge-based innovation systems in the UK. This study makes the case that data may be collected ethically to provide information about the interconnections between web sites of various different sizes and from within different sectors of society, that there are significant differences in the linking practices of web sites within different sectors, and that reciprocal links provide a better indication of collaboration than uni-directional web links. Most importantly the study shows that the web provides new information about the relationships between organisations, rather than just a repetition of the same information from an alternative source. Whilst the study has shown that there is a lot of potential for the web as a source of information on knowledge-based innovation systems, the same richness that makes it such a potentially useful source makes applications of large scale studies very labour intensive.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    The Janus Faced Scholar:a Festschrift in honour of Peter Ingwersen

    Get PDF

    Making Sense of Online Public Health Debates with Visual Analytics Systems

    Get PDF
    Online debates occur frequently and on a wide variety of topics. Particularly, online debates about various public health topics (e.g., vaccines, statins, cannabis, dieting plans) are prevalent in today’s society. These debates are important because of the real-world implications they can have on public health. Therefore, it is important for public health stakeholders (i.e., those with a vested interest in public health) and the general public to have the ability to make sense of these debates quickly and effectively. This dissertation investigates ways of enabling sense-making of these debates with the use of visual analytics systems (VASes). VASes are computational tools that integrate data analytics (e.g., webometrics or natural language processing), data visualization, and human-data interaction. This dissertation consists of three stages. In the first stage, I describe the design and development of a novel VAS, called VINCENT (VIsual aNalytiCs systEm for investigating the online vacciNe debaTe), for making sense of the online vaccine debate. VINCENT helps users to make sense of data (i.e., online presence, geographic location, sentiments, and focus) from a collection of vaccine focused websites. In the second stage, I discuss the results of a user study of VINCENT. Participants in the study were asked to complete a set of ten sense-making tasks that required investigating a provided set of websites. Based on the positive outcomes of the study, in stage three of the dissertation I generalize the findings from the first two stages and present a framework called ODIN (Online Debate entIty aNalyzer). This framework consists of various attributes that are important to consider when analyzing online public health debates and provides methods of collecting and analyzing that data. Overall, this dissertation provides visual analytics researchers an in-depth analysis on the considerations and challenges for creating VASes to make sense of online public health debates
    corecore