Search CORE

168,265 research outputs found

Web Browsing Behavior Analysis and Interactive Hypervideo

Author: Brooke J.
Guyon I.
Hauger D.
Hauger D.
Leiva L. A.
Luis A. Leiva
Müller-Tomfelde C.
Roberto Vivó
Smith J.
Špakov O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2013
Field of study

© ACM, 2013. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in, ACM Transactions on the Web, Vol. 7, No. 4, Article 20, Publication date: October 2013.http://doi.acm.org/ 10.1145/2529995.2529996[EN] Processing data on any sort of user interaction is well known to be cumbersome and mostly time consuming. In order to assist researchers in easily inspecting fine-grained browsing data, current tools usually display user interactions as mouse cursor tracks, a video-like visualization scheme. However, to date, traditional online video inspection has not explored the full capabilities of hypermedia and interactive techniques. In response to this need, we have developed SMT 2ǫ, a Web-based tracking system for analyzing browsing behavior using feature-rich hypervideo visualizations. We compare our system to related work in academia and the industry, showing that ours features unprecedented visualization capabilities. We also show that SMT 2ǫ efficiently captures browsing data and is perceived by users to be both helpful and usable. A series of prediction experiments illustrate that raw cursor data are accessible and can be easily handled, providing evidence that the data can be used to construct and verify research hypotheses. Considering its limitations, it is our hope that SMT 2ǫ will assist researchers, usability practitioners, and other professionals interested in understanding how users browse the Web.This work was partially supported by the MIPRCV Consolider Ingenio 2010 program (CSD2007-00018) and the TIN2009-14103-C03-03 project. It is also supported by the 7th Framework Program of the European Commision (FP7/2007-13) under grant agreement No. 287576 (CasMaCat).Leiva Torres, LA.; Vivó Hernando, RA. (2013). Web Browsing Behavior Analysis and Interactive Hypervideo. ACM Transactions on the Web. 7(4):20:1-20:28. https://doi.org/10.1145/2529995.2529996S20:120:287

Crossref

RiuNet

Soft Concurrent Constraint Programming

Author: Bella G.
Bella G.
Bistarelli S.
Bistarelli S.
Bistarelli S.
Boer F. D.
Chen S.
De Nicola R.
Dubois D.
Fargier H.
Francesca Rossi
Schiex T.
Scott D.
Stefano Bistarelli
Ugo Montanari
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2002
Field of study

Soft constraints extend classical constraints to represent multiple consistency levels, and thus provide a way to express preferences, fuzziness, and uncertainty. While there are many soft constraint solving formalisms, even distributed ones, by now there seems to be no concurrent programming framework where soft constraints can be handled. In this paper we show how the classical concurrent constraint (cc) programming framework can work with soft constraints, and we also propose an extension of cc languages which can use soft constraints to prune and direct the search for a solution. We believe that this new programming paradigm, called soft cc (scc), can be also very useful in many web-related scenarios. In fact, the language level allows web agents to express their interaction and negotiation protocols, and also to post their requests in terms of preferences, and the underlying soft constraint solver can find an agreement among the agents even if their requests are incompatible.Comment: 25 pages, 4 figures, submitted to the ACM Transactions on Computational Logic (TOCL), zipped file

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università di Padova

Emotion Dynamics of Public Opinions on Twitter

Author: Kumar Durgesh
Nandi Sukumar
Naskar Debashis
Onaindia De La Rivaherrera Eva
Singh Sanasam Ranbir
Publication venue: Association for Computing Machinery
Publication date: 01/03/2020
Field of study

[EN] Recently, social media has been considered the fastest medium for information broadcasting and sharing. Considering the wide range of applications such as viral marketing, political campaigns, social advertisement, and so on, influencing characteristics of users or tweets have attracted several researchers. It is observed from various studies that influential messages or users create a high impact on a social ecosystem. In this study, we assume that public opinion on a social issue on Twitter carries a certain degree of emotion, and there is an emotion flow underneath the Twitter network. In this article, we investigate social dynamics of emotion present in users' opinions and attempt to understand (i) changing characteristics of users' emotions toward a social issue over time, (ii) influence of public emotions on individuals' emotions, (iii) cause of changing opinion by social factors, and so on. We study users' emotion dynamics over a collection of 17.65M tweets with 69.36K users and observe 63% of the users are likely to change their emotional state against the topic into their subsequent tweets. Tweets were coming from the member community shows higher influencing capability than the other community sources. It is also observed that retweets influence users more than hashtags, mentions, and replies.The work described in this article was carried out in the OSiNT Lab (https://www.iitg.ac.in/cseweb/osint/), Indian Institute of Technology Guwahati, India. The creation of the dataset used in this study was partly supported by the Ministry of Information and Electronic Technology, Government of India.Naskar, D.; Singh, SR.; Kumar, D.; Nandi, S.; Onaindia De La Rivaherrera, E. (2020). Emotion Dynamics of Public Opinions on Twitter. ACM Transactions on Information Systems. 38(2):1-24. https://doi.org/10.1145/3379340124382Ahmed, S., Jaidka, K., & Cho, J. (2016). Tweeting India’s Nirbhaya protest: a study of emotional dynamics in an online social movement. Social Movement Studies, 16(4), 447-465. doi:10.1080/14742837.2016.1192457Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). Machine Learning, 50(1/2), 5-43. doi:10.1023/a:1020281327116Araujo, T., Neijens, P., & Vliegenthart, R. (2016). Getting the word out on Twitter: the role of influentials, information brokers and strong ties in building word-of-mouth for brands. International Journal of Advertising, 36(3), 496-513. doi:10.1080/02650487.2016.1173765Berger, J. (2011). Arousal Increases Social Transmission of Information. Psychological Science, 22(7), 891-893. doi:10.1177/0956797611413294Bi, B., Tian, Y., Sismanis, Y., Balmin, A., & Cho, J. (2014). Scalable topic-specific influence analysis on microblogs. Proceedings of the 7th ACM international conference on Web search and data mining. doi:10.1145/2556195.2556229Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8. doi:10.1016/j.jocs.2010.12.007Chen, W., Wang, C., & Wang, Y. (2010). Scalable influence maximization for prevalent viral marketing in large-scale social networks. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’10. doi:10.1145/1835804.1835934Ding, Z., Jia, Y., Zhou, B., Zhang, J., Han, Y., & Yu, C. (2013). An Influence Strength Measurement via Time-Aware Probabilistic Generative Model for Microblogs. Lecture Notes in Computer Science, 372-383. doi:10.1007/978-3-642-37401-2_38Ding, Z., Wang, H., Guo, L., Qiao, F., Cao, J., & Shen, D. (2015). Finding Influential Users and Popular Contents on Twitter. Web Information Systems Engineering – WISE 2015, 267-275. doi:10.1007/978-3-319-26187-4_23Feldman Barrett, L., & Russell, J. A. (1998). Independence and bipolarity in the structure of current affect. Journal of Personality and Social Psychology, 74(4), 967-984. doi:10.1037/0022-3514.74.4.967Ferrara, E., & Yang, Z. (2015). Measuring Emotional Contagion in Social Media. PLOS ONE, 10(11), e0142390. doi:10.1371/journal.pone.0142390Hillmann, R., & Trier, M. (2012). Dissemination Patterns and Associated Network Effects of Sentiments in Social Networks. 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. doi:10.1109/asonam.2012.88Kwak, H., Lee, C., Park, H., & Moon, S. (2010). What is Twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web - WWW ’10. doi:10.1145/1772690.1772751Myers, S. A., Zhu, C., & Leskovec, J. (2012). Information diffusion and external influence in networks. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’12. doi:10.1145/2339530.2339540Nguyen, H. T., Ghosh, P., Mayo, M. L., & Dinh, T. N. (2017). Social Influence Spectrum at Scale. ACM Transactions on Information Systems, 36(2), 1-26. doi:10.1145/3086700Pal, A., & Counts, S. (2011). Identifying topical authorities in microblogs. Proceedings of the fourth ACM international conference on Web search and data mining - WSDM ’11. doi:10.1145/1935826.1935843Peng, S., Wang, G., & Xie, D. (2017). Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges. IEEE Network, 31(1), 11-17. doi:10.1109/mnet.2016.1500104nmRussell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161-1178. doi:10.1037/h0077714Shi, J., Hu, P., Lai, K. K., & Chen, G. (2018). Determinants of users’ information dissemination behavior on social networking sites. Internet Research, 28(2), 393-418. doi:10.1108/intr-01-2017-0038Silva, A., Guimarães, S., Meira, W., & Zaki, M. (2013). ProfileRank. Proceedings of the 7th Workshop on Social Network Mining and Analysis - SNAKDD ’13. doi:10.1145/2501025.2501033Stieglitz, S., & Dang-Xuan, L. (2013). Emotions and Information Diffusion in Social Media—Sentiment of Microblogs and Sharing Behavior. Journal of Management Information Systems, 29(4), 217-248. doi:10.2753/mis0742-1222290408Vardasbi, A., Faili, H., & Asadpour, M. (2017). SWIM. ACM Transactions on Information Systems, 36(1), 1-33. doi:10.1145/3072652Wang, Y., Li, Y., Fan, J., & Tan, K.-L. (2018). Location-aware Influence Maximization over Dynamic Social Streams. ACM Transactions on Information Systems, 36(4), 1-35. doi:10.1145/3230871Watts, D. J., & Dodds, P. S. (2007). Influentials, Networks, and Public Opinion Formation. Journal of Consumer Research, 34(4), 441-458. doi:10.1086/518527Weng, J., Lim, E.-P., Jiang, J., & He, Q. (2010). TwitterRank. Proceedings of the third ACM international conference on Web search and data mining - WSDM ’10. doi:10.1145/1718487.1718520Wolfsfeld, G., Segev, E., & Sheafer, T. (2013). Social Media and the Arab Spring. The International Journal of Press/Politics, 18(2), 115-137. doi:10.1177/1940161212471716Yik, M. S. M., Russell, J. A., & Barrett, L. F. (1999). Structure of self-reported current affect: Integration and beyond. Journal of Personality and Social Psychology, 77(3), 600-619. doi:10.1037/0022-3514.77.3.600Zhang, J., Zhang, R., Sun, J., Zhang, Y., & Zhang, C. (2016). TrueTop: A Sybil-Resilient System for User Influence Measurement on Twitter. IEEE/ACM Transactions on Networking, 24(5), 2834-2846. doi:10.1109/tnet.2015.2494059Zhang, Y., Moe, W. W., & Schweidel, D. A. (2017). Modeling the role of message content and influencers in social media rebroadcasting. International Journal of Research in Marketing, 34(1), 100-119. doi:10.1016/j.ijresmar.2016.07.003Ziegler, C.-N., & Lausen, G. (2005). Propagation Models for Trust and Distrust in Social Networks. Information Systems Frontiers, 7(4-5), 337-358. doi:10.1007/s10796-005-4807-

RiuNet

Study about the different use of explicit and implicit tags in social bookmarking

Author: Bar-Ilan
Bateman
Ding
Farooq
Fu
Furnas
Illig
Jäschke
Koutrika
Körner
Lipczak
Liu
Marinho
Marlow
Mason
Melenhorst
Millen
Millen
Oliveira
Robu
Schmitz
Subramanya
Taylor
Yeung
Zhang
Publication venue: 'Wiley'
Publication date: 01/02/2012
Field of study

This is the accepted version of the following article: Arolas, E. E., & Ladrón-de-Guevar, F. G. (2012). Uses of explicit and implicit tags in social bookmarking. Journal of the American Society for Information Science and Technology, 63(2), 313-322. doi:10.1002/asi.21663, which has been published in final form at http://dx.doi.org/10.1002/asi.21663Although Web 2.0 contains many tools with different functionalities, they all share a common social nature. One tool in particular, social bookmarking systems (SBSs), allows users to store and share links to different types of resources, i.e., websites, videos, images. To identify and classify these resources so that they can be retrieved and shared, fragments of text are used. These fragments of text, usually words, are called tags. A tag that is found on the inside of a resource text is referred to as an obvious or explicit tag. There are also nonobvious or implicit tags, which don't appear in the resource text. The purpose of this article is to describe the present situation of the SBSs tool and then to also determine the principal features of and how to use explicit tags. It will be taken into special consideration which HTML tags with explicit tags are used more frequently.Estelles Arolas, E.; González Ladrón De Guevara, FR. (2012). Study about the different use of explicit and implicit tags in social bookmarking. Journal of the American Society for Information Science and Technology. 63(2):313-322. doi:10.1002/asi.21663S313322632Bar-Ilan, J., Zhitomirsky-Geffet, M., Miller, Y., & Shoham, S. (2010). The effects of background information and social interaction on image tagging. Journal of the American Society for Information Science and Technology, 61(5), 940-951. doi:10.1002/asi.21306Bateman, S., Muller, M. J., & Freyne, J. (2009). Personalized retrieval in social bookmarking. Proceedinfs of the ACM 2009 international conference on Supporting group work - GROUP ’09. doi:10.1145/1531674.1531688Delicious' Blog 2010 What's next for Delicious http://blog.delicious.com/blog/2010/12/whats-next-for-delicious.htmlDing, Y., Jacob, E. K., Zhang, Z., Foo, S., Yan, E., George, N. L., & Guo, L. (2009). Perspectives on social tagging. Journal of the American Society for Information Science and Technology, 60(12), 2388-2401. doi:10.1002/asi.21190Eisterlehner , F. Hotho , A. Jäschke , R. ECML PKDD Discovery Challenge 2009 (DC09)Farooq, U., Kannampallil, T. G., Song, Y., Ganoe, C. H., Carroll, J. M., & Giles, L. (2007). Evaluating tagging behavior in social bookmarking systems. Proceedings of the 2007 international ACM conference on Conference on supporting group work - GROUP ’07. doi:10.1145/1316624.1316677Farooq , U. Zhang , S.M. Carroll , J. 2009 Sensemaking of scholarly literature through taggingFu, W.-T., Kannampallil, T., Kang, R., & He, J. (2010). Semantic imitation in social tagging. ACM Transactions on Computer-Human Interaction, 17(3), 1-37. doi:10.1145/1806923.1806926Furnas, G. W., Landauer, T. K., Gomez, L. M., & Dumais, S. T. (1987). The vocabulary problem in human-system communication. Communications of the ACM, 30(11), 964-971. doi:10.1145/32206.32212Golder , S.A. Huberman , B.A. 2005 The structure of collaborative tagging systems http://www.hpl.hp.com/research/idl/papers/tagsKörner, C., Benz, D., Hotho, A., Strohmaier, M., & Stumme, G. (2010). Stop thinking, start tagging. Proceedings of the 19th international conference on World wide web - WWW ’10. doi:10.1145/1772690.1772744Koutrika, G., Effendi, F. A., Gyöngyi, Z., Heymann, P., & Garcia-Molina, H. (2008). Combating spam in tagging systems. ACM Transactions on the Web, 2(4), 1-34. doi:10.1145/1409220.1409225Lipczak, M., & Milios, E. (2010). The impact of resource title on tags in collaborative tagging systems. Proceedings of the 21st ACM conference on Hypertext and hypermedia - HT ’10. doi:10.1145/1810617.1810648Marinho, L. B., Nanopoulos, A., Schmidt-Thieme, L., Jäschke, R., Hotho, A., Stumme, G., & Symeonidis, P. (2010). Social Tagging Recommender Systems. Recommender Systems Handbook, 615-644. doi:10.1007/978-0-387-85820-3_19Marlow, C., Naaman, M., Boyd, D., & Davis, M. (2006). HT06, tagging paper, taxonomy, Flickr, academic article, to read. Proceedings of the seventeenth conference on Hypertext and hypermedia - HYPERTEXT ’06. doi:10.1145/1149941.1149949Mathes , A. 2004 Folksonomies-Cooperative classification and communication through shared metadata http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.htmlMelenhorst, M., & van Setten, M. (2007). Usefulness of Tags in Providing Access to Large Information Systems. 2007 IEEE International Professional Communication Conference. doi:10.1109/ipcc.2007.4464070Millen, D., Feinberg, J., & Kerr, B. (2005). Social bookmarking in the enterprise. Queue, 3(9), 28. doi:10.1145/1105664.1105676Robu, V., Halpin, H., & Shepherd, H. (2009). Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Transactions on the Web, 3(4), 1-34. doi:10.1145/1594173.1594176Schmitz, C., Hotho, A., Jäschke, R., & Stumme, G. (s. f.). Mining Association Rules in Folksonomies. Data Science and Classification, 261-270. doi:10.1007/3-540-34416-0_28Smith , G. 2004 Atomiq: Folksonomy: social classification http://atomiq.org/archives/2004/08/folksonomy_social_classification.htmlSubramanya, S. B., & Liu, H. (2008). Socialtagger - collaborative tagging for blogs in the long tail. Proceeding of the 2008 ACM workshop on Search in social media - SSM ’08. doi:10.1145/1458583.1458588Au Yeung, C., Gibbins, N., & Shadbolt, N. (2009). Contextualising tags in collaborative tagging systems. Proceedings of the 20th ACM conference on Hypertext and hypermedia - HT ’09. doi:10.1145/1557914.1557958Zhang, N., Zhang, Y., & Tang, J. (2009). A tag recommendation system for folksonomy. Proceeding of the 2nd ACM workshop on Social web search and mining - SWSM ’09. doi:10.1145/1651437.165144

Crossref

RiuNet

CAP Theorem: Revision of its related consistency models

Author: Bernabeu Aubán José Manuel
García Escriva José Ramón
GONZÁLEZ DE MENDÍVIL MORENO JOSÉ RAMÓN
Juan Marín Rubén de
Muñoz-Escoí Francesc D.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

[EN] The CAP theorem states that only two of these properties can be simultaneously guaranteed in a distributed service: (i) consistency, (ii) availability, and (iii) network partition tolerance. This theorem was stated and proved assuming that "consistency" refers to atomic consistency. However, multiple consistency models exist and atomic consistency is located at the strongest edge of that spectrum. Many distributed services deployed in cloud platforms should be highly available and scalable. Network partitions may arise in those deployments and should be tolerated. One way of dealing with CAP constraints consists in relaxing consistency. Therefore, it is interesting to explore the set of consistency models not supported in an available and partition-tolerant service (CAP-constrained models). Other weaker consistency models could be maintained when scalable services are deployed in partitionable systems (CAP-free models). Three contributions arise: (1) multiple other CAP-constrained models are identified, (2) a borderline between CAP-constrained and CAP-free models is set, and (3) a hierarchy of consistency models depending on their strength and convergence is built.Muñoz-Escoí, FD.; Juan Marín, RD.; García Escriva, JR.; González De Mendívil Moreno, JR.; Bernabeu Aubán, JM. (2019). CAP Theorem: Revision of its related consistency models. The Computer Journal. 62(6):943-960. https://doi.org/10.1093/comjnl/bxy142S943960626Davidson, S. B., Garcia-Molina, H., & Skeen, D. (1985). Consistency in a partitioned network: a survey. ACM Computing Surveys, 17(3), 341-370. doi:10.1145/5505.5508Gilbert, S., & Lynch, N. (2002). Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2), 51-59. doi:10.1145/564585.564601Muñoz-Escoí, F. D., & Bernabéu-Aubán, J. M. (2016). A survey on elasticity management in PaaS systems. Computing, 99(7), 617-656. doi:10.1007/s00607-016-0507-8Brewer, E. (2012). CAP twelve years later: How the «rules» have changed. Computer, 45(2), 23-29. doi:10.1109/mc.2012.37Attiya, H., Ellen, F., & Morrison, A. (2017). Limitations of Highly-Available Eventually-Consistent Data Stores. IEEE Transactions on Parallel and Distributed Systems, 28(1), 141-155. doi:10.1109/tpds.2016.2556669Viotti, P., & Vukolić, M. (2016). Consistency in Non-Transactional Distributed Storage Systems. ACM Computing Surveys, 49(1), 1-34. doi:10.1145/2926965Burckhardt, S. (2014). Principles of Eventual Consistency. Foundations and Trends® in Programming Languages, 1(1-2), 1-150. doi:10.1561/2500000011Herlihy, M. P., & Wing, J. M. (1990). Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3), 463-492. doi:10.1145/78969.78972Lamport. (1979). How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, C-28(9), 690-691. doi:10.1109/tc.1979.1675439Ladin, R., Liskov, B., Shrira, L., & Ghemawat, S. (1992). Providing high availability using lazy replication. ACM Transactions on Computer Systems, 10(4), 360-391. doi:10.1145/138873.138877Yu, H., & Vahdat, A. (2002). Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Transactions on Computer Systems, 20(3), 239-282. doi:10.1145/566340.566342Curino, C., Jones, E., Zhang, Y., & Madden, S. (2010). Schism. Proceedings of the VLDB Endowment, 3(1-2), 48-57. doi:10.14778/1920841.1920853Das, S., Agrawal, D., & El Abbadi, A. (2013). ElasTraS. ACM Transactions on Database Systems, 38(1), 1-45. doi:10.1145/2445583.2445588Chen, Z., Yang, S., Tan, S., He, L., Yin, H., & Zhang, G. (2014). A new fragment re-allocation strategy for NoSQL database systems. Frontiers of Computer Science, 9(1), 111-127. doi:10.1007/s11704-014-3480-4Kamal, J., Murshed, M., & Buyya, R. (2016). Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable OLTP applications. Future Generation Computer Systems, 56, 421-435. doi:10.1016/j.future.2015.09.024Elghamrawy, S. M., & Hassanien, A. E. (2017). A partitioning framework for Cassandra NoSQL database using Rendezvous hashing. The Journal of Supercomputing, 73(10), 4444-4465. doi:10.1007/s11227-017-2027-5Muñoz-Escoí, F. D., García-Escrivá, J.-R., Sendra-Roig, J. S., Bernabéu-Aubán, J. M., & González de Mendívil, J. R. (2018). Eventual Consistency: Origin and Support. Computing and Informatics, 37(5), 1037-1072. doi:10.4149/cai_2018_5_1037Fischer, M. J., Lynch, N. A., & Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), 374-382. doi:10.1145/3149.21412

RiuNet

Academica-e

What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors

Author: Alarte Julián
Silva Josep
Tamarit Muñoz Salvador
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2019
Field of study

"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in PUBLICATION, {VOL 13, ISS 2, (APR 2019)} http://doi.acm.org/10.1145/3316810"[EN] A Web template is a resource that implements the structure and format of a website, making it ready for plugging content into already formatted and prepared pages. For this reason, templates are one of the main development resources for website engineers, because they increase productivity. Templates are also useful for the final user, because they provide uniformity and a common look and feel for all webpages. However, from the point of view of crawlers and indexers, templates are an important problem, because templates usually contain irrelevant information, such as advertisements, menus, and banners. Processing and storing this information leads to a waste of resources (storage space, bandwidth, etc.). It has been measured that templates represent between 40% and 50% of data on the Web. Therefore, identifying templates is essential for indexing tasks. There exist many techniques and tools for template extraction, but, unfortunately, it is not clear at all which template extractor should a user/system use, because they have never been compared, and because they present different (complementary) features such as precision, recall, and efficiency. In this work, we compare the most advanced template extractors. We implemented and evaluated five of the most advanced template extractors in the literature. To compare all of them, we implemented a workbench, where they have been integrated and evaluated. Thanks to this workbench, we can provide a fair empirical comparison of all methods using the same benchmarks, technology, implementation language, and evaluation criteria.This work has been partially supported by the EU (FEDER) and the Spanish Ministerio de Ciencia, Innovacion y Universidades/AEI under grant TIN2016-76843-C4-1-R and by the Generalitat Valenciana under grants PROMETEO-II/2015/013 (SmartLogic) and Prometeo/2019/098 (DeepTrust).Alarte, J.; Silva, J.; Tamarit Muñoz, S. (2019). What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors. ACM Transactions on the Web. 13(2):9:1-9:19. https://doi.org/10.1145/3316810S9:19:19132Alarte, J., Insa, D., Silva, J., & Tamarit, S. (2015). TeMex. Proceedings of the 24th International Conference on World Wide Web - WWW ’15 Companion. doi:10.1145/2740908.2742835Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49. Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49.Alassi, D., & Alhajj, R. (2013). Effectiveness of template detection on noise reduction and websites summarization. Information Sciences, 219, 41-72. doi:10.1016/j.ins.2012.07.022Bar-Yossef, Z., & Rajagopalan, S. (2002). Template detection via data mining and its applications. Proceedings of the eleventh international conference on World Wide Web - WWW ’02. doi:10.1145/511446.511522Chakrabarti, D., Kumar, R., & Punera, K. (2007). Page-level template detection via isotonic smoothing. Proceedings of the 16th international conference on World Wide Web - WWW ’07. doi:10.1145/1242572.1242582Chen, L., Ye, S., & Li, X. (2006). Template detection for large scale search engines. Proceedings of the 2006 ACM symposium on Applied computing - SAC ’06. doi:10.1145/1141277.1141534Gibson, D., Punera, K., & Tomkins, A. (2005). The volume and evolution of web page templates. Special interest tracks and posters of the 14th international conference on World Wide Web - WWW ’05. doi:10.1145/1062745.1062763Kim, C., & Shim, K. (2011). TEXT: Automatic Template Extraction from Heterogeneous Web Pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612-626. doi:10.1109/tkde.2010.140Barbara Ann Kitchenham David Budgen and Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Chapman 8 Hall/CRC. Barbara Ann Kitchenham David Budgen and Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Chapman 8 Hall/CRC.Kołcz, A., & Yih, W. (s. f.). Site-Independent Template-Block Detection. Lecture Notes in Computer Science, 152-163. doi:10.1007/978-3-540-74976-9_17Kohlschütter, C. (2009). A densitometric analysis of web template content. Proceedings of the 18th international conference on World wide web - WWW ’09. doi:10.1145/1526709.1526909Jing Li and C. I. Ezeife. 2006. Cleaning web pages for effective web content mining. In Database and Expert Systems Applications Stéphane Bressan Josef Küng and Roland Wagner (Eds.). Springer Berlin 560--571. 10.1007/11827405_55 Jing Li and C. I. Ezeife. 2006. Cleaning web pages for effective web content mining. In Database and Expert Systems Applications Stéphane Bressan Josef Küng and Roland Wagner (Eds.). Springer Berlin 560--571. 10.1007/11827405_55Bing Liu. 2006. Web Data Mining: Exploring Hyperlinks Contents and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc. Secaucus NJ. Bing Liu. 2006. Web Data Mining: Exploring Hyperlinks Contents and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc. Secaucus NJ.Liu, L., Han, W., Buttler, D., Pu, C., & Tang, W. (1999). An XJML-based wrapper generator for Web information extraction. Proceedings of the 1999 ACM SIGMOD international conference on Management of data - SIGMOD ’99. doi:10.1145/304182.304570Ma, L., Goharian, N., Chowdhury, A., & Chung, M. (2003). Extracting unstructured data from template generated web documents. Proceedings of the twelfth international conference on Information and knowledge management - CIKM ’03. doi:10.1145/956863.956961Manjula, R., & Chilambuchelvan, A. (2013). Extracting templates from Web pages. 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE). doi:10.1109/icgce.2013.6823541Christopher D. Manning Prabhakar Raghavan and Hinrich SchÃijtze. 2008. Introduction to Information Retrieval. Cambridge University Press New York NY. Christopher D. Manning Prabhakar Raghavan and Hinrich SchÃijtze. 2008. Introduction to Information Retrieval. Cambridge University Press New York NY.Meng, X., Hu, D., & Li, C. (2003). Schema-guided wrapper maintenance for web-data extraction. Proceedings of the fifth ACM international workshop on Web information and data management - WIDM ’03. doi:10.1145/956699.956701Nguyen, D. Q., Nguyen, D. Q., Pham, S. B., & Bui, T. D. (2009). A Fast Template-Based Approach to Automatically Identify Primary Text Content of a Web Page. 2009 International Conference on Knowledge and Systems Engineering. doi:10.1109/kse.2009.39Schäfer, R. (2016). Accurate and efficient general-purpose boilerplate detection for crawled web corpora. Language Resources and Evaluation, 51(3), 873-889. doi:10.1007/s10579-016-9359-2Sivakumar, P. (2015). Effectual Web Content Mining using Noise Removal from Web Pages. Wireless Personal Communications, 84(1), 99-121. doi:10.1007/s11277-015-2596-7Song, D., Sun, F., & Liao, L. (2013). A hybrid approach for content extraction with text density and visual importance of DOM nodes. Knowledge and Information Systems, 42(1), 75-96. doi:10.1007/s10115-013-0687-xR. Uma and B. Latha. 2018. Noise elimination from web pages for efficacious information retrieval. Cluster Comput. (Mar. 2018). https://link.springer.com/article/10.1007/s10586-018-2366-x#citeas. R. Uma and B. Latha. 2018. Noise elimination from web pages for efficacious information retrieval. Cluster Comput. (Mar. 2018). https://link.springer.com/article/10.1007/s10586-018-2366-x#citeas.Uzun, E., Agun, H. V., & Yerlikaya, T. (2013). A hybrid approach for extracting informative content from web pages. Information Processing & Management, 49(4), 928-944. doi:10.1016/j.ipm.2013.02.005Vieira, K., da Costa Carvalho, A. L., Berlt, K., de Moura, E. S., da Silva, A. S., & Freire, J. (2009). On Finding Templates on Web Collections. World Wide Web, 12(2), 171-211. doi:10.1007/s11280-009-0059-3Vieira, K., da Silva, A. S., Pinto, N., de Moura, E. S., Cavalcanti, J. M. B., & Freire, J. (2006). A fast and robust method for web page template detection and removal. Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM ’06. doi:10.1145/1183614.1183654Thijs Vogels Octavian-Eugen Ganea and Carsten Eickhoff. 2018. Web2Text: Deep structured boilerplate removal. CoRR abs/1801.02607 (2018). Retrieved from http://arxiv.org/abs/1801.02607. Thijs Vogels Octavian-Eugen Ganea and Carsten Eickhoff. 2018. Web2Text: Deep structured boilerplate removal. CoRR abs/1801.02607 (2018). Retrieved from http://arxiv.org/abs/1801.02607.Wang, Y., Fang, B., Cheng, X., Guo, L., & Xu, H. (2008). Incremental web page template detection. Proceeding of the 17th international conference on World Wide Web - WWW ’08. doi:10.1145/1367497.1367749Yi, L., Liu, B., & Li, X. (2003). Eliminating noisy information in Web pages for data mining. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03. doi:10.1145/956750.956785Zheng, S., Song, R., Wen, J.-R., & Giles, C. L. (2009). Efficient record-level wrapper induction. Proceeding of the 18th ACM conference on Information and knowledge management - CIKM ’09. doi:10.1145/1645953.1645962Zheng, S., Song, R., Wen, J.-R., & Wu, D. (2007). Joint optimization of wrapper generation and template detection. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’07. doi:10.1145/1281192.128128

RiuNet