601,492 research outputs found

    TeMex: The Web Template Extractor

    Full text link
    "© ACM} 2015. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM, In Proceedings of the 24th International Conference on World Wide Web (pp. 155-158), http://dx.doi.org/10.1145/2740908.2742835This paper presents and describes TeMex, a site-level web template extractor. TeMex is fully automatic, and it can work with online webpages without any preprocessing stage (no information about the template or the associated webpages is needed) and, more importantly, it does not need a prede- fined set of webpages to perform the analysis. TeMex only needs a URL. Contrarily to previous approaches, it includes a mechanism to identify webpage candidates that share the same template. This mechanism increases both recall and precision, and it also reduces the amount of webpages loaded and processed. We describe the tool and its internal architecture, and we present the results of its empirical evaluation.This work has been partially supported by the EU (FEDER) and the Spanish Ministerio de Economía y Competitividad (Secretaría de Estado de Investigación, Desarrollo e Innovación) under Grant TIN2013-44742-C4-1-R and by the Generalitat Valenciana under Grant PROMETEOII/2015/013. David Insa was partially supported by the Spanish Ministerio de Educación under FPU Grant AP2010-4415. Salvador Tamarit was partially supported by research project POLCA, Programming Large Scale Heterogeneous Infrastructures (610686), funded by the European Union, STREP FP7.Alarte, J.; Insa Cabrera, D.; Silva Galiana, JF.; Tamarit Muñoz, S. (2015). TeMex: The Web Template Extractor. ACM. https://doi.org/10.1145/2740908.2742835SOverlay extension. Available from URL: https://developer.mozilla.org/en-US/Add-ons/Overlay_Extensions, 2005.J. Alarte, D. Insa, J. Silva, and S. Tamarit. Automatic Detection of Webpages that Share the Same Web Template. In M. H. ter Beek and A. Ravara, editors, Proceedings of the 10th International Workshop on Automated Specification and Verification of Web Systems (WWV 14), volume 163 of Electronic Proceedings in Theoretical Computer Science, pages 2--15. Open Publishing Association, July 2014.J. Alarte, D. Insa, J. Silva, and S. Tamarit. A Benchmark Suite for Template Detection and Content Extraction. CoRR, abs/1409.6182, 2014.Z. Bar-Yossef and S. Rajagopalan. Template detection via data mining and its applications. In Proceedings of the 11th International Conference on World Wide Web (WWW'02), pages 580--591, New York, NY, USA, 2002. ACM.M. Baroni, F. Chantree, A. Kilgarriff, and S. Sharoff. Cleaneval: a Competition for Cleaning Web Pages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC'08), pages 638--643. European Language Resources Association, may 2008.D. Gibson, K. Punera, and A. Tomkins. The volume and evolution of web page templates. In A. Ellis and T. Hagino, editors, Proceedings of the 14th International Conference on World Wide Web (WWW'05), pages 830--839. ACM, may 2005.T. Gottron. Evaluating content extraction on HTML documents. In V. Grout, D. Oram, and R. Picking, editors, Proceedings of the 2nd International Conference on Internet Technologies and Applications (ITA'07), pages 123--132. National Assembly for Wales, sep 2007.D. d. C. Reis, P. B. Golgher, A. S. Silva, and A. H. F. Laender. Automatic web news extraction using tree edit distance. In Proceedings of the 13th International Conference on World Wide Web (WWW'04), pages 502--511, New York, NY, USA, 2004. ACM.K. Vieira, A. L. da Costa Carvalho, K. Berlt, E. S. de Moura, A. S. da Silva, and J. Freire. On finding templates on web collections. World Wide Web, 12(2):171--211, 2009.K. Vieira, A. S. da Silva, N. Pinto, E. S. de Moura, J. a. M. B. Cavalcanti, and J. Freire. A fast and robust method for web page template detection and removal. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06), pages 258--267, New York, NY, USA, 2006. ACM.T. Weninger, W. Henry Hsu, and J. Han. CETR: Content Extraction via Tag Ratios. In M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, editors, Proceedings of the 19th International Conference on World Wide Web (WWW'10), pages 971--980. ACM, apr 2010.L. Yi, B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD'03), pages 296--305, New York, NY, USA, 2003. ACM

    A flexible service selection for executing virtual services

    Full text link
    [EN] With the adoption of a service-oriented paradigm on the Web, many software services are likely to fulfil similar functional needs for end-users. We propose to aggregate functionally equivalent software services within one single virtual service, that is, to associate a functionality, a graphical user interface (GUI), and a set of selection rules. When an end user invokes such a virtual service through its GUI to answer his/her functional need, the software service that best responds to the end-user s selection policy is selected and executed and the result is then rendered to the end-user through the GUI of the virtual service. A key innovation in this paper is the flexibility of our proposed service selection policy. First, each selection policy can refer to heterogeneous parameters (e.g., service price, end-user location, and QoS). Second, additional parameters can be added to an existing or new policy with little investment. Third, the end users themselves define a selection policy to apply during the selection process, thanks to the GUI element added as part of the virtual service design. This approach was validated though the design, implementation, and testing of an end-to-end architecture, including the implementation of several virtual services and utilizing several software services available today on the Web.This work was partially supported in part by SERVERY (Service Platform for Innovative Communication Environment), a CELTIC project that aims to create a Service Marketplace that bridges the Internet and Telco worlds by merging the flexibility and openness of the former with the trustworthiness and reliability of the latter, enabling effective and profitable cooperation among actors.Laga, N.; Bertin, E.; Crespi, N.; Bedini, I.; Molina Moreno, B.; Zhao, Z. (2013). A flexible service selection for executing virtual services. World Wide Web. 16(3):219-245. doi:10.1007/s11280-012-0184-2S219245163Aggarwal, R., Verma, K., Miller, J., and Milnor, W.: Constraint Driven Web Service Composition in METEOR-S. In Proceedings of the 2004 IEEE international Conference on Services Computing (September 2004). IEEE Computer Society, Washington, DC, 23–30.Apple Inc. Apple app store.: Available at: www.apple.com/iphone/appstore/ , accessed on May 22nd, 2012.Atzeni, P., Catarci, T., Pernici, B.: Multi-Channel adaptive information Systems. World Wide Web 10(4), 345–347 (2007)Baresi, L., Bianchini, D., Antonellis, V.D., Fugini, M.G., Pernici, B., Plebani, P.: Context-aware Composition of e-Service. In Technologies for E-Services: Third International Workshop, vol. 2819, 28–41, TES 2003, Berlin, German, 2003.Ben Hassine, A., Matsubara, S., Ishida, T.: In Proceedings of the 5th international conference on The Semantic Web (ISWC’06), Isabel Cruz, Stefan Decker, Dean Allemang, Chris Preist, and Daniel Schwabe (Eds.). Springer-Verlag, Berlin, Heidelberg, 130–143 (2006).Blum, N., Dutkowski, S., Magedanz, T.: InSeRt - An Intent-based Service Request API for Service Exposure in Next Generation Networks. In Proceedings of 32nd Annual IEEE Software Engineering Workshop. Porto Sani Resort, Kassandra, Greece, 2008 pp21–30.Boussard, M., Fodor, S., Crespi, N., Iribarren, V., Le Rouzic, J.P., Bedini, I., Marton, G., Moro Fernandez, D., Lorenzo Duenas, O., Molina, B.: SERVERY: the Web-Telco marketplace. ICT-Mobile Summit 2009, Santander (2009)Cabrera, Ó., Oriol, M., Franch, X., Marco, J., López, L., Fragoso, O., Santaolaya, R.: WeSSQoS: A Configurable SOA System for Quality-aware Web Service Selection. CoRR 2011, abs/1110.5574.Casati, F., Ilnicki, S., Jin, L., Krishnamoorthy, V., Shan, M.: Adaptive and Dynamic Service Composition in eFlow. Lecture Notes in Computer Science, Volume 1789/2000, 13–31, 2000.Cibrán, M. A., Verheecke, B., Vanderperren, W., Suvée, D., and Jonckers, V.: “Aspect-oriented Programming for Dynamic Web Service Selection, Integration and Management.” In Proc. World Wide Web 2007, pp. 211–242.Crespi, N., Boussard, M. Fodor, S.: Converging Web 2.0 with telecommunications. eStrategies Projects, Vol. 10, 108–109. British Publishers, ISSN 1758–2369, June 2009.Dey, A.K., Salber, D., Abowd, G.D.: A conceptual framework and a toolkit for supporting the rapid prototyping of context-aware applications. Hum. Comput. Interact. 16, 1–67 (2001)Ding, Q., Li, X., and Zhou, X.: Reputation Based Service Selection in Grid Environment. In Proceedings of the 2008 international Conference on Computer Science and Software Engineering - Volume 03 (December. 2008). CSSE. IEEE Computer Society, Washington, DC, 58–61.Fielding, R.T.: Architectural Styles and the Design of Network-based Software Architectures. Thesis dissertation, 2000.Franch, X., Grünbacher, P., Oriol, M., Burgstaller, B., Dhungana, D., López, L., Marco, J., Pimentel, J.: Goal-driven Adaptation of Service-Based Systems from Runtime Monitoring Data. REFS 2011.Frolund, S., Koisten, J.: QML: A Language for Quality of Service Specification. HP Labs technical reports. Available at http://www.hpl.hp.com/techreports/98/HPL-98-10.html , accessed on May 22nd, 2012.Google. Android market.: Available at: www.android.com/market/ , accessed on May 22nd, 2012.Google. Intents and Intent Filters.: Available at http://developer.android.com/guide/topics/intents/intents-filters.html , accessed on May 22nd, 2012.Gu, X., Nahrstedt, K., Yuan, W., Wichadakul, D., Xu, D.: An Xml-Based Quality of Service Enabling Language for the Web. Technical Report. UMI Order Number: UIUCDCS-R-2001-2212., University of Illinois at Urbana-Champaign.Laga, N., Bertin, E., and Crespi, N.: Building a User Friendly Service Dashboard: Automatic and Non-intrusive Chaining between Widgets. In Proceedings of the 2009 Congress on Services - I (July 06–10, 2009). SERVICES. IEEE Computer Society, Washington, DC, 484–491.Laga, N., Bertin, E., and Crespi, N.: Business Process Personalization Through Web Widgets. In Proceedings of the 2010 IEEE international Conference on Web Services (July 05–10, 2010). ICWS. IEEE Computer Society, Washington, DC, 551–558.Liu, Y., Ngu, A. H., and Zeng, L. Z.: QoS computation and policing in dynamic web service selection. In Proceedings of the 13th international World Wide Web Conference on Alternate Track Papers &Amp; Posters (New York, NY, USA, May 19–21, 2004). WWW Alt. ’04. ACM, New York, NY, 66–73.Malik, Z., Bouguettaya, A.: Rater credibility assessment in Web services interactions. World Wide Web 12(1), 3–25 (2009)Martin, D. et al.: OWL-S: Semantic Markup for Web Services. W3C member submission, available at http://www.w3.org/Submission/2004/SUBM-OWL-S-20041122/ , accessed on May 22nd, 2012.Nestler, T., Namoun, A., Schill, A.: End-user development of service-based interactive web applications at the presentation layer. EICS 2011: 197–206.Newcomer, E.: Understanding Web Services: XML, Wsdl, Soap, and UDDI. Addison, Wesley, Boston, Mass., May 2002.O’Reilly, T.: What Is Web 2.0, Design Patterns and Business Models for the Next Generation of Software.Piessens, F., Jacobs, B., Truyen, E., Joosen, W.: Support for Metadata-driven Selection of Run-time Services in .NET is Promising but Immature. vol. 3, no. 2, Special issue: .NET: The Programmer’s Perspective: ECOOP Workshop, 27–35. 2003.Rasch, K;, Li, F., Sehic, S., Ayani R., and Dustdar, S.: “Context-driven personalized service discovery in pervasive environments,” in Proc World Wide Web, 2011, pp. 295–319.Reichl, P.: From ‘Quality-of-Service’ and ‘Quality-of-Design’ to ‘Quality-of-Experience’: A holistic view on future interactive telecommunication ser-vices. In 15th International Conference on Software, Telecommunications and Computer Networks, 2007. Soft-COM 2007. Sept. 2007. vol., no.,1–6, 27–29.Rolland, C., Kaabi, R.S., Kraiem, N.: On ISOA: Intentional Services Oriented Architecture. In Advanced Information Systems Engineering, volume 4495/2007, 158–172, June 2007.Sanchez, A., Carro, B., Wesner, S.: Telco services for end customers: European Perspective. In Communications Magazine. IEEE 46(2), 14–18 (2008)Santhanam, G. R., Basu, S., and Honavar, V.: On Utilizing Qualitative Preferences in Web Service Composition: A CP-net Based Approach. In Proceedings of IEEE Congress on Services, Services - Part I, vol., no.,538–544, 2008.Spanoudakis, G., Mahbub, K., Zisman, A.: A Platform for Context Aware Runtime Web Service Discovery. In Proc IEEE ICWS, 2007, pp233-240.Tsesmetzis, D., Roussaki, I., Sykas, E.: Modeling and Simulation of QoS-aware Web Service Selection for Provider Profit Maximization. Simulation 83(1), 93–106 (2007)Wang, P., Chao, K., Lo, C., Farmer, R., and Kuo, P.: A Reputation-Based Service Selection Scheme. In Proceedings of the 2009 IEEE international Conference on E-Business Engineering (October 21–23, 2009). ICEBE. IEEE Computer Society, Washington, DC, 501–506.Wang, H., Yang, D., Zhao, Y., and Gao, Y.: Multiagent System for Reputation--based Web Services Selection. In Proceedings of the Sixth international Conference on Quality Software (October 27–28, 2006). QSIC. IEEE Computer Society, Washington, DC, 429–434.Wholesale Applications Community.: WAC Informational Whitepaper. Available at http://www.wholesaleappcommunity.com/About-Wac/BACKGROUND%20TO%20WAC/whitepaper.pdf , accessed on May 22nd, 2012.Windows Marketplace.: Available at http://marketplace.windowsphone.com/default.aspx , accessed on May 22nd, 2012.Xu, Z., Martin, P., Powley, W., Zulkernine, F.: Reputation-Enhanced QoS-based Web Services Discovery. Web Services, 2007. In proceedings of IEEE International Conference on Web Services, ICWS 2007. 249, 256, 9–13 July 2007.Yu, Q., Bouguettaya,A.: “Multi-attribute optimization in service selection”. In Proc World Wide Web,2012, pp. 1–31.Yu, T., Zhang, Y., Lin, K. Efficient algorithms for Web services selection with end-to-end QoS constraints. ACM Transaction Web 1, 1. Article 6, 26 pages. (May 2007),

    Distributed resource discovery using a context sensitive infrastructure

    Get PDF
    Distributed Resource Discovery in a World Wide Web environment using full-text indices will never scale. The distinct properties of WWW information (volume, rate of change, topical diversity) limits the scaleability of traditional approaches to distributed Resource Discovery. An approach combining metadata clustering and query routing can, on the other hand, be proven to scale much better. This paper presents the Content-Sensitive Infrastructure, which is a design building on these results. We also present an analytical framework for comparing scaleability of different distribution strategies

    The Options for UK Domestic Water Reduction: A Review

    Get PDF
    Demand pressure on UK water supplies is expected to increase in the next 20 years driven by increasing population, new housing development and reducing household size. Regionally and locally migration will also afect demand particularly in the South-East. The water reduction trends that will have the greatest reduction effect on UK consumption are: 1. For new homes; metering and new efficiencies in design and construction (e.g. low flush toilets, heating and plumbing efficiences) 2. For established housing; metering and modern washing machines

    A Model for Personalized Keyword Extraction from Web Pages using Segmentation

    Full text link
    The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approached with the help of web page segmentation which facilitates in making the problem simpler and solving it effectively. The proposed model is implemented as a prototype and the experiments conducted on it empirically validate the model's efficiency.Comment: 6 Pages, 2 Figure

    Invisible Pixels Are Dead, Long Live Invisible Pixels!

    Full text link
    Privacy has deteriorated in the world wide web ever since the 1990s. The tracking of browsing habits by different third-parties has been at the center of this deterioration. Web cookies and so-called web beacons have been the classical ways to implement third-party tracking. Due to the introduction of more sophisticated technical tracking solutions and other fundamental transformations, the use of classical image-based web beacons might be expected to have lost their appeal. According to a sample of over thirty thousand images collected from popular websites, this paper shows that such an assumption is a fallacy: classical 1 x 1 images are still commonly used for third-party tracking in the contemporary world wide web. While it seems that ad-blockers are unable to fully block these classical image-based tracking beacons, the paper further demonstrates that even limited information can be used to accurately classify the third-party 1 x 1 images from other images. An average classification accuracy of 0.956 is reached in the empirical experiment. With these results the paper contributes to the ongoing attempts to better understand the lack of privacy in the world wide web, and the means by which the situation might be eventually improved.Comment: Forthcoming in the 17th Workshop on Privacy in the Electronic Society (WPES 2018), Toronto, AC

    Basis Token Consistency: A Practical Mechanism for Strong Web Cache Consistency

    Full text link
    With web caching and cache-related services like CDNs and edge services playing an increasingly significant role in the modern internet, the problem of the weak consistency and coherence provisions in current web protocols is becoming increasingly significant and drawing the attention of the standards community [LCD01]. Toward this end, we present definitions of consistency and coherence for web-like environments, that is, distributed client-server information systems where the semantics of interactions with resource are more general than the read/write operations found in memory hierarchies and distributed file systems. We then present a brief review of proposed mechanisms which strengthen the consistency of caches in the web, focusing upon their conceptual contributions and their weaknesses in real-world practice. These insights motivate a new mechanism, which we call "Basis Token Consistency" or BTC; when implemented at the server, this mechanism allows any client (independent of the presence and conformity of any intermediaries) to maintain a self-consistent view of the server's state. This is accomplished by annotating responses with additional per-resource application information which allows client caches to recognize the obsolescence of currently cached entities and identify responses from other caches which are already stale in light of what has already been seen. The mechanism requires no deviation from the existing client-server communication model, and does not require servers to maintain any additional per-client state. We discuss how our mechanism could be integrated into a fragment-assembling Content Management System (CMS), and present a simulation-driven performance comparison between the BTC algorithm and the use of the Time-To-Live (TTL) heuristic.National Science Foundation (ANI-9986397, ANI-0095988
    corecore