267 research outputs found
HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web
When users interact with the Web today, they leave sequential digital trails
on a massive scale. Examples of such human trails include Web navigation,
sequences of online restaurant reviews, or online music play lists.
Understanding the factors that drive the production of these trails can be
useful for e.g., improving underlying network structures, predicting user
clicks or enhancing recommendations. In this work, we present a general
approach called HypTrails for comparing a set of hypotheses about human trails
on the Web, where hypotheses represent beliefs about transitions between
states. Our approach utilizes Markov chain models with Bayesian inference. The
main idea is to incorporate hypotheses as informative Dirichlet priors and to
leverage the sensitivity of Bayes factors on the prior for comparing hypotheses
with each other. For eliciting Dirichlet priors from hypotheses, we present an
adaption of the so-called (trial) roulette method. We demonstrate the general
mechanics and applicability of HypTrails by performing experiments with (i)
synthetic trails for which we control the mechanisms that have produced them
and (ii) empirical trails stemming from different domains including website
navigation, business reviews and online music played. Our work expands the
repertoire of methods available for studying human trails on the Web.Comment: Published in the proceedings of WWW'1
A Semantic Grid Oriented to E-Tourism
With increasing complexity of tourism business models and tasks, there is a
clear need of the next generation e-Tourism infrastructure to support flexible
automation, integration, computation, storage, and collaboration. Currently
several enabling technologies such as semantic Web, Web service, agent and grid
computing have been applied in the different e-Tourism applications, however
there is no a unified framework to be able to integrate all of them. So this
paper presents a promising e-Tourism framework based on emerging semantic grid,
in which a number of key design issues are discussed including architecture,
ontologies structure, semantic reconciliation, service and resource discovery,
role based authorization and intelligent agent. The paper finally provides the
implementation of the framework.Comment: 12 PAGES, 7 Figure
Data triangulation in a user evaluation of the sealife semantic web browsers
There is a need for greater attention to triangulation of data in user-centred evaluation of Semantic Web Browsers. This paper discusses triangulation of data gathered during development of a novel framework for user-centred evaluation of Semantic Web Browsers. The data was triangulated from three sources: quantitative data from web server logs and questionnaire results, and qualitative data from semi-structured interviews. This paper shows how triangulation was essential in validation and completeness of the results, and was indispensable in ensuring accurate interpretation of the results in determining user satisfaction
Computational fact checking from knowledge networks
Traditional fact checking by expert journalists cannot keep up with the
enormous volume of information that is now generated online. Computational fact
checking may significantly enhance our ability to evaluate the veracity of
dubious information. Here we show that the complexities of human fact checking
can be approximated quite well by finding the shortest path between concept
nodes under properly defined semantic proximity metrics on knowledge graphs.
Framed as a network problem this approach is feasible with efficient
computational techniques. We evaluate this approach by examining tens of
thousands of claims related to history, entertainment, geography, and
biographical information using a public knowledge graph extracted from
Wikipedia. Statements independently known to be true consistently receive
higher support via our method than do false ones. These findings represent a
significant step toward scalable computational fact-checking methods that may
one day mitigate the spread of harmful misinformation
Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses
Background: The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolation. Results: Over 40,000 annotated Influenza A protein sequences were collected by combining information from more than 90,000 documents from NCBI public databases. Metadata values were automatically extracted, aggregated and reconciled from several document fields by applying user-defined structural rules. For each property, values were recovered from ≥88.8% of records, with accuracy exceeding 96% in most cases. Because of semantic heterogeneity, each property required up to six different structural rules to be combined. Significant quality differences between databases were found: GenBank documents yield values more reliably than documents extracted from GenPept. Using a simple set of semantic rules and a reasoner, we reconstructed relationships between sequences from the same isolate, thus identifying 7640 isolates. Validation of isolate metadata against a simple ontology highlighted more than 400 inconsistencies, leading to over 3,000 property value corrections. Conclusion: To overcome the quality issues inherent in public databases, automated knowledge aggregation with embedded intelligence is needed for large-scale analyses. Our results show that user-controlled intuitive approaches, based on combination of simple rules, can reliably automate various curation tasks, reducing the need for manual corrections to approximately 5% of the records. Emerging semantic technologies possess desirable features to support today's knowledge aggregation tasks, with a potential to bring immediate benefits to this field
RDFScape: Semantic Web meets Systems Biology
<p>Abstract</p> <p>Background</p> <p>The recent availability of high-throughput data in molecular biology has increased the need for a formal representation of this knowledge domain. New ontologies are being developed to formalize knowledge, e.g. about the functions of proteins. As the Semantic Web is being introduced into the Life Sciences, the basis for a distributed knowledge-base that can foster biological data analysis is laid. However, there still is a dichotomy, in tools and methodologies, between the use of ontologies in biological investigation, that is, in relation to experimental observations, and their use as a knowledge-base.</p> <p>Results</p> <p>RDFScape is a plugin that has been developed to extend a software oriented to biological analysis with support for reasoning on ontologies in the semantic web framework. We show with this plugin how the use of ontological knowledge in biological analysis can be extended through the use of inference. In particular, we present two examples relative to ontologies representing biological pathways: we demonstrate how these can be abstracted and visualized as interaction networks, and how reasoning on causal dependencies within elements of pathways can be implemented.</p> <p>Conclusions</p> <p>The use of ontologies for the interpretation of high-throughput biological data can be improved through the use of inference. This allows the use of ontologies not only as annotations, but as a knowledge-base from which new information relevant for specific analysis can be derived.</p
Proposal for a multilevel university cybermetric analysis model
The final publication is available at Springer via http://dx.doi.org/10.1007/s11192-012-0868-5Universities’ online seats have gradually become complex systems of dynamic information where all their institutions and services are linked and potentially accessible. These online seats now constitute a central node around which universities construct and document their main activities and services. This information can be quantitative measured by cybermetric techniques in order to design university web rankings, taking the university as a global reference unit. However, previous research into web subunits shows that it is possible to carry out systemic web analyses, which open up the possibility of carrying out studies which address university diversity, necessary for both describing the university in greater detail and for establishing comparable ranking units. To address this issue, a multilevel university cybermetric analysis model is proposed, based on parts (core and satellite), levels (institutional and external) and sublevels (contour and internal), providing a deeper analysis of institutions. Finally the model is integrated into another which is independent of the technique used, and applied by analysing Harvard University as an example of use.Orduña Malea, E.; Ontalba Ruipérez, JA. (2013). Proposal for a multilevel university cybermetric analysis model. Scientometrics. 95(3):863-884. doi:10.1007/s11192-012-0868-5S863884953Acosta Márquez, T., Igartua Perosanz, J.J. & Gómez Isla, J. (2009). Páginas web de las universidades españolas. Enred: revista digital de la Universidad de Salamanca, 5 [online; discontinued].Aguillo, I. F. (1998). Hacia un concepto documental de sede web. El Profesional de la Información, 7(1–2), 45–46.Aguillo, I. F. (2009). Measuring the institutions’ footprint in the web. Library Hi Tech, 27(4), 540–556.Aguillo, I. F., Granadino, B., Ortega, J. L., & Prieto, J. A. (2006). Scientific research activity and communication measured with cybermetrics indicators. Journal of the American Society for Information Science and Technology, 57(10), 1296–1302.Aguillo, I. F., Ortega, J. L., & Fernández, M. (2008). Webometric Ranking of World Universities: introduction, methodology, and future developments. Higher Education in Europe, 33(2/3), 234–244.Ayan, N., Li, W.-S., & Kolak, O. (2002). Automatic extraction of logical domains in a web site. Data & Knowledge Engineering, 43(2), 179–205.Barjak, F., Li, X., & Thelwall, M. (2007). Which factors explain the Web impact of scientists’ personal homepages? Journal of the American Society for Information Science and Technology, 58(2), 200–211.Berners-Lee, T., & Fischetti, M. (2000). Tejiendo la Red. Madrid: Siglo XXI.Björneborn, L., & Ingwersen, P. (2004). Toward a basic framework for webometrics. Journal of the American Society for Information Science and Technology, 55(14), 1216–1227.Buenadicha, M., Chamorro, A., Miranda, F. J., & González, O. R. (2001). A new web assessment index: Spanish Universities Analysis. Internet Research, 11(3), 226–234.Castells, M. (2001). La galaxia Internet. Barcelona: Plaza y Janés.Chu, H., He, S., & Thelwall, M. (2002). Library and Information Science Schools in Canada and USA: a Webometric perspective. Journal of Education for Library and Information Science, 43(2), 110–125.Crowston, K., & Williams, M. (2000). Reproduced and Emergent Genres of Communication on the World Wide Web. The Information Society: an International Journal, 16(3), 201–215.Goldfarb, A. (2006). The (teaching) role of universities in the diffusion of the Internet. International Journal of Industrial Organization, 24(2), 203–225.Ingwersen, P. (1998). The calculation of web impact factors. Journal of Documentation, 54(2), 236–243.Katz, R. N. (2008a). The tower and the cloud: Higher education in the age of cloud computing. USA: Educause.Katz, R. N. (2008b). The gathering cloud: is this the end of the middle. In R. N. Katz (Ed.), The tower and the cloud: Higher education in the age of cloud computing (p. 2008). USA: Educause.Li, X. (2005). National and international university departmental Web site interlinking: a webometric analysis. [Unpublished doctoral dissertation]. Wolverhampton, UK: University of Wolverhampton.Li, X., Thelwall, M., Musgrove, P., & Wilkinson, D. (2003). The relationship between the links/Web Impact Factors of computer science departments in UK and their RAE (Research Assessment Exercise) ranking in 2001. Scientometrics, 57(2), 239–255.Middleton, I., McConnell, M., & Davidson, G. (1999). Presenting a model for the structure and content of a University World Wide Web site. Journal of Information Science, 25(3), 217–219.Orduña-Malea, E. (2012). Propuesta de un modelo de análisis redinformétrico multinivel para el estudio sistémico de las universidades españolas (2010). Valencia: Polytechnic University of Valencia.Ortega, J. L., & Aguillo, Isidro. F. (2007). La web académica española en el contexto del Espacio Europeo de Educación Superior: estudio exploratorio. El profesional de la información, 16(5), 417–425.Pareja, V. M., Ortega, J. L., Prieto, J. A., Arroyo, N., & Aguillo, I. F. (2005). Desarrollo y aplicación del concepto de sede web como unidad documental de análisis en Cibermetría. Jornadas Españolas de Documentación, 9, 325–340.Saorín, T. (2012). Arquitectura de la dispersión: gestionar los riesgos cíclicos de fragmentación de las webs corporativas. Anuario ThinkEPI, 6, 281–287.Tang, R., & Thelwall, M. (2003). U.S. academic departmental Web-site interlinking: disciplinary differences. Library & Information Science Research, 25(4), 437–458.Tang, R., & Thelwall, M. (2004). Patterns of national and international web inlinks to US academic departments: an analysis of disciplinary variations. Scientometrics, 60(3), 475–485.Thelwall, M. (2002a). A research and institutional size based model for national university Web site interlinking. Journal of Documentation, 58(6), 683–694.Thelwall, M. (2002b). Conceptualizing documentation on the Web: an evaluation of different heuristic-based models for counting links between university web sites. Journal of the American Society for Information Science and Technology, 53(12), 995–1005.Thelwall, M. (2003). Web use and peer interconnectivity metrics for academic Web sites. Journal of Information Science, 29(1), 11–20.Thelwall, M. (2009). Introduction to Webometrics: quantitative web research for the social sciences. San Rafael: Morgan & Claypool.Thelwall, M., & Harries, G. (2004a). Can personal Web pages that link to universities yield information about the wider dissemination of research? Journal of Information Science, 30(3), 243–256.Thelwall, M., & Harries, G. (2004b). Do better scholars’ Web publications have significantly higher online impact? Journal of American Society for Information Science and Technology, 55(2), 149–159.Thelwall, M., Vaughan, L., & Björneborn, L. (2005). Webometrics. Annual Review of Information Science and Technology, 39, 81–135.Thomas, O., & Willet, P. (2000). Webometric analysis of Departments of librarianship and information science. Journal of Information Science, 26(6), 421–428.Tíscar, L. (2009). El papel de la universidad en la construcción de su identidad digital. Revista de universidad y sociedad del conocimiento, 6(1), 15–21.Van Vught, F. A. (2009). Diversity and differentiation in higher education. In F. Van Vught (Ed.), Mapping the higher education landscape: toward a European classification of higher education (pp. 1–16). The Netherlands: Springer.Yolku, O. (2001). Use of news articles and announcements on official websites of universities. Turkish Online Journal of Educational Technology, 10(2), 287–296
- …