Search CORE

617,488 research outputs found

Cosmology from Type Ia Supernovae

Author: Aldering G.
Boyle B. J.
Couch W. J.
Deustua S.
Ellis R. S.
Fabbro S.
Filippenko A. V.
Fruchter A. S.
Goldhaber G.
Goobar A.
Groom D. E.
Hook I. M.
Irwin M.
Kim A. G.
Kim M. Y.
Knop R. A.
Lidman C.
Matheson T.
McMahon R. G.
Newberg H. J. M.
Nugent P.
Pain R.
Panagia N.
Pennypacker C. R.
Perlmutter S.
Ruiz-Lapuente P.
Schaefer B.
Walton N.
Publication venue
Publication date: 01/01/1998
Field of study

This presentation reports on first evidence for a low-mass-density/positive-cosmological-constant universe that will expand forever, based on observations of a set of 40 high-redshift supernovae. The experimental strategy, data sets, and analysis techniques are described. More extensive analyses of these results with some additional methods and data are presented in the more recent LBNL report #41801 (Perlmutter et al., 1998; accepted for publication in Ap.J.), astro-ph/9812133 . This Lawrence Berkeley National Laboratory reprint is a reduction of a poster presentation from the Cosmology Display Session #85 on 9 January 1998 at the American Astronomical Society meeting in Washington D.C. It is also available on the World Wide Web at http://supernova.LBL.gov/ This work has also been referenced in the literature by the pre-meeting abstract citation: Perlmutter et al., B.A.A.S., volume 29, page 1351 (1997).Comment: 9 pages, 8 color figs. Presented at Jan '98 AAS Meeting, also cited as BAAS,29,1351(1997). Archived here in response to requests; see more extensive analyses in ApJ paper (astro-ph/9812133

arXiv.org e-Print Archive

HAL-IN2P3

eScholarship - University of California

CERN Document Server

Hal-Diderot

What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors

Author: Alarte Julián
Silva Josep
Tamarit Muñoz Salvador
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2019
Field of study

"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in PUBLICATION, {VOL 13, ISS 2, (APR 2019)} http://doi.acm.org/10.1145/3316810"[EN] A Web template is a resource that implements the structure and format of a website, making it ready for plugging content into already formatted and prepared pages. For this reason, templates are one of the main development resources for website engineers, because they increase productivity. Templates are also useful for the final user, because they provide uniformity and a common look and feel for all webpages. However, from the point of view of crawlers and indexers, templates are an important problem, because templates usually contain irrelevant information, such as advertisements, menus, and banners. Processing and storing this information leads to a waste of resources (storage space, bandwidth, etc.). It has been measured that templates represent between 40% and 50% of data on the Web. Therefore, identifying templates is essential for indexing tasks. There exist many techniques and tools for template extraction, but, unfortunately, it is not clear at all which template extractor should a user/system use, because they have never been compared, and because they present different (complementary) features such as precision, recall, and efficiency. In this work, we compare the most advanced template extractors. We implemented and evaluated five of the most advanced template extractors in the literature. To compare all of them, we implemented a workbench, where they have been integrated and evaluated. Thanks to this workbench, we can provide a fair empirical comparison of all methods using the same benchmarks, technology, implementation language, and evaluation criteria.This work has been partially supported by the EU (FEDER) and the Spanish Ministerio de Ciencia, Innovacion y Universidades/AEI under grant TIN2016-76843-C4-1-R and by the Generalitat Valenciana under grants PROMETEO-II/2015/013 (SmartLogic) and Prometeo/2019/098 (DeepTrust).Alarte, J.; Silva, J.; Tamarit Muñoz, S. (2019). What Web Template Extractor Should I Use? A Benchmarking and Comparison for Five Template Extractors. ACM Transactions on the Web. 13(2):9:1-9:19. https://doi.org/10.1145/3316810S9:19:19132Alarte, J., Insa, D., Silva, J., & Tamarit, S. (2015). TeMex. Proceedings of the 24th International Conference on World Wide Web - WWW ’15 Companion. doi:10.1145/2740908.2742835Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49. Julián Alarte David Insa Josep Silva and Salvador Tamarit. 2016. Site-Level Web Template Extraction Based on DOM Analysis. Springer International Publishing Cham 36--49.Alassi, D., & Alhajj, R. (2013). Effectiveness of template detection on noise reduction and websites summarization. Information Sciences, 219, 41-72. doi:10.1016/j.ins.2012.07.022Bar-Yossef, Z., & Rajagopalan, S. (2002). Template detection via data mining and its applications. Proceedings of the eleventh international conference on World Wide Web - WWW ’02. doi:10.1145/511446.511522Chakrabarti, D., Kumar, R., & Punera, K. (2007). Page-level template detection via isotonic smoothing. Proceedings of the 16th international conference on World Wide Web - WWW ’07. doi:10.1145/1242572.1242582Chen, L., Ye, S., & Li, X. (2006). Template detection for large scale search engines. Proceedings of the 2006 ACM symposium on Applied computing - SAC ’06. doi:10.1145/1141277.1141534Gibson, D., Punera, K., & Tomkins, A. (2005). The volume and evolution of web page templates. Special interest tracks and posters of the 14th international conference on World Wide Web - WWW ’05. doi:10.1145/1062745.1062763Kim, C., & Shim, K. (2011). TEXT: Automatic Template Extraction from Heterogeneous Web Pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612-626. doi:10.1109/tkde.2010.140Barbara Ann Kitchenham David Budgen and Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Chapman 8 Hall/CRC. Barbara Ann Kitchenham David Budgen and Pearl Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Chapman 8 Hall/CRC.Kołcz, A., & Yih, W. (s. f.). Site-Independent Template-Block Detection. Lecture Notes in Computer Science, 152-163. doi:10.1007/978-3-540-74976-9_17Kohlschütter, C. (2009). A densitometric analysis of web template content. Proceedings of the 18th international conference on World wide web - WWW ’09. doi:10.1145/1526709.1526909Jing Li and C. I. Ezeife. 2006. Cleaning web pages for effective web content mining. In Database and Expert Systems Applications Stéphane Bressan Josef Küng and Roland Wagner (Eds.). Springer Berlin 560--571. 10.1007/11827405_55 Jing Li and C. I. Ezeife. 2006. Cleaning web pages for effective web content mining. In Database and Expert Systems Applications Stéphane Bressan Josef Küng and Roland Wagner (Eds.). Springer Berlin 560--571. 10.1007/11827405_55Bing Liu. 2006. Web Data Mining: Exploring Hyperlinks Contents and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc. Secaucus NJ. Bing Liu. 2006. Web Data Mining: Exploring Hyperlinks Contents and Usage Data (Data-Centric Systems and Applications). Springer-Verlag New York Inc. Secaucus NJ.Liu, L., Han, W., Buttler, D., Pu, C., & Tang, W. (1999). An XJML-based wrapper generator for Web information extraction. Proceedings of the 1999 ACM SIGMOD international conference on Management of data - SIGMOD ’99. doi:10.1145/304182.304570Ma, L., Goharian, N., Chowdhury, A., & Chung, M. (2003). Extracting unstructured data from template generated web documents. Proceedings of the twelfth international conference on Information and knowledge management - CIKM ’03. doi:10.1145/956863.956961Manjula, R., & Chilambuchelvan, A. (2013). Extracting templates from Web pages. 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE). doi:10.1109/icgce.2013.6823541Christopher D. Manning Prabhakar Raghavan and Hinrich SchÃijtze. 2008. Introduction to Information Retrieval. Cambridge University Press New York NY. Christopher D. Manning Prabhakar Raghavan and Hinrich SchÃijtze. 2008. Introduction to Information Retrieval. Cambridge University Press New York NY.Meng, X., Hu, D., & Li, C. (2003). Schema-guided wrapper maintenance for web-data extraction. Proceedings of the fifth ACM international workshop on Web information and data management - WIDM ’03. doi:10.1145/956699.956701Nguyen, D. Q., Nguyen, D. Q., Pham, S. B., & Bui, T. D. (2009). A Fast Template-Based Approach to Automatically Identify Primary Text Content of a Web Page. 2009 International Conference on Knowledge and Systems Engineering. doi:10.1109/kse.2009.39Schäfer, R. (2016). Accurate and efficient general-purpose boilerplate detection for crawled web corpora. Language Resources and Evaluation, 51(3), 873-889. doi:10.1007/s10579-016-9359-2Sivakumar, P. (2015). Effectual Web Content Mining using Noise Removal from Web Pages. Wireless Personal Communications, 84(1), 99-121. doi:10.1007/s11277-015-2596-7Song, D., Sun, F., & Liao, L. (2013). A hybrid approach for content extraction with text density and visual importance of DOM nodes. Knowledge and Information Systems, 42(1), 75-96. doi:10.1007/s10115-013-0687-xR. Uma and B. Latha. 2018. Noise elimination from web pages for efficacious information retrieval. Cluster Comput. (Mar. 2018). https://link.springer.com/article/10.1007/s10586-018-2366-x#citeas. R. Uma and B. Latha. 2018. Noise elimination from web pages for efficacious information retrieval. Cluster Comput. (Mar. 2018). https://link.springer.com/article/10.1007/s10586-018-2366-x#citeas.Uzun, E., Agun, H. V., & Yerlikaya, T. (2013). A hybrid approach for extracting informative content from web pages. Information Processing & Management, 49(4), 928-944. doi:10.1016/j.ipm.2013.02.005Vieira, K., da Costa Carvalho, A. L., Berlt, K., de Moura, E. S., da Silva, A. S., & Freire, J. (2009). On Finding Templates on Web Collections. World Wide Web, 12(2), 171-211. doi:10.1007/s11280-009-0059-3Vieira, K., da Silva, A. S., Pinto, N., de Moura, E. S., Cavalcanti, J. M. B., & Freire, J. (2006). A fast and robust method for web page template detection and removal. Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM ’06. doi:10.1145/1183614.1183654Thijs Vogels Octavian-Eugen Ganea and Carsten Eickhoff. 2018. Web2Text: Deep structured boilerplate removal. CoRR abs/1801.02607 (2018). Retrieved from http://arxiv.org/abs/1801.02607. Thijs Vogels Octavian-Eugen Ganea and Carsten Eickhoff. 2018. Web2Text: Deep structured boilerplate removal. CoRR abs/1801.02607 (2018). Retrieved from http://arxiv.org/abs/1801.02607.Wang, Y., Fang, B., Cheng, X., Guo, L., & Xu, H. (2008). Incremental web page template detection. Proceeding of the 17th international conference on World Wide Web - WWW ’08. doi:10.1145/1367497.1367749Yi, L., Liu, B., & Li, X. (2003). Eliminating noisy information in Web pages for data mining. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’03. doi:10.1145/956750.956785Zheng, S., Song, R., Wen, J.-R., & Giles, C. L. (2009). Efficient record-level wrapper induction. Proceeding of the 18th ACM conference on Information and knowledge management - CIKM ’09. doi:10.1145/1645953.1645962Zheng, S., Song, R., Wen, J.-R., & Wu, D. (2007). Joint optimization of wrapper generation and template detection. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’07. doi:10.1145/1281192.128128

RiuNet

Summary health statistics for the U.S. population: National Health Interview Survey, 2005

Author: National Health Interview Survey (U.S.)
Publication venue: U.S. Dept. of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics
Publication date
Field of study

"This report is one in a set of reports summarizing data from the 2005 National Health Interview Survey (NHIS), a multipurpose health survey conducted by the Centers for Disease Control and Prevention's (CDC) National Center for Health Statistics (NCHS). This report provides national estimates for a broad range of health measures for the U.S. civilian noninstitutionalized population. Two other reports in this year's set provide data on health measures for children and for adults. These three data reports are published for each year of NHIS, and they replace the annual, one-volume Current Estimates series." - p. 1"By Patricia F. Adams, Achintya N. Dey, M.A., and Jackline L. Vickerie, M.G.A., Division of Health Interview Statistics" - p. 1"January 2007."Also available via the World Wide Web as an Acrobat .pdf file (2.44 MB, 113 p.).Includes bibliographical references (p. 8).Suggested citation: Adams PF, Dey AN, Vickerie JL. Summary health statistics for the U.S. population: National Health Interview Survey, 2005. National Center for Health Statistics. Vital Health Stat 10(233). 2007

CDC Stacks

An Emotional Analysis of False Information in Social Media and News Articles

Author: Ghanem Bilal Hisham Hasan
Rangel Francisco
Rosso Paolo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2020
Field of study

[EN] Fake news is risky since it has been created to manipulate the readers' opinions and beliefs. In this work, we compared the language of false news to the real one of real news from an emotional perspective, considering a set of false information types (propaganda, hoax, clickbait, and satire) from social media and online news articles sources. Our experiments showed that false information has different emotional patterns in each of its types, and emotions play a key role in deceiving the reader. Based on that, we proposed a LSTM neural network model that is emotionally-infused to detect false news.The work of the second author was partially funded by the Spanish MICINN under the research project MISMISFAKEnHATE on Misinformation and Miscommunication in social media: FAKEnews and HATE speech (PGC2018-096212B-C31).Ghanem, BHH.; Rosso, P.; Rangel, F. (2020). An Emotional Analysis of False Information in Social Media and News Articles. ACM Transactions on Internet Technology. 20(2):1-18. https://doi.org/10.1145/3381750S118202Magda B. Arnold. 1960. Emotion and Personality. Columbia University Press. Magda B. Arnold. 1960. Emotion and Personality. Columbia University Press.Bhatt, G., Sharma, A., Sharma, S., Nagpal, A., Raman, B., & Mittal, A. (2018). Combining Neural, Statistical and External Features for Fake News Stance Identification. Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW ’18. doi:10.1145/3184558.3191577Castillo, C., Mendoza, M., & Poblete, B. (2011). Information credibility on twitter. Proceedings of the 20th international conference on World wide web - WWW ’11. doi:10.1145/1963405.1963500Chakraborty, A., Paranjape, B., Kakarla, S., & Ganguly, N. (2016). Stop Clickbait: Detecting and preventing clickbaits in online news media. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). doi:10.1109/asonam.2016.7752207Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3-4), 169-200. doi:10.1080/02699939208411068Ghanem, B., Rosso, P., & Rangel, F. (2018). Stance Detection in Fake News A Combined Feature Representation. Proceedings of the First Workshop on Fact Extraction and VERification (FEVER). doi:10.18653/v1/w18-5510Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780. doi:10.1162/neco.1997.9.8.1735Karadzhov, G., Nakov, P., Màrquez, L., Barrón-Cedeño, A., … Koychev, I. (2017). Fully Automated Fact Checking Using External Sources. RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning. doi:10.26615/978-954-452-049-6_046Kumar, S., West, R., & Leskovec, J. (2016). Disinformation on the Web. Proceedings of the 25th International Conference on World Wide Web. doi:10.1145/2872427.2883085Li, X., Meng, W., & Yu, C. (2011). T-verifier: Verifying truthfulness of fact statements. 2011 IEEE 27th International Conference on Data Engineering. doi:10.1109/icde.2011.5767859Nyhan, B., & Reifler, J. (2010). When Corrections Fail: The Persistence of Political Misperceptions. Political Behavior, 32(2), 303-330. doi:10.1007/s11109-010-9112-2Plutchik, R. (2001). The Nature of Emotions. American Scientist, 89(4), 344. doi:10.1511/2001.4.344Popat, K., Mukherjee, S., Strötgen, J., & Weikum, G. (2016). Credibility Assessment of Textual Claims on the Web. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. doi:10.1145/2983323.2983661Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., & Bandyopadhyay, S. (2013). Enhanced SenticNet with Affective Labels for Concept-Based Opinion Mining. IEEE Intelligent Systems, 28(2), 31-38. doi:10.1109/mis.2013.4Rangel, F., & Rosso, P. (2016). On the impact of emotions on author profiling. Information Processing & Management, 52(1), 73-92. doi:10.1016/j.ipm.2015.06.003Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017). Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/d17-1317Ruchansky, N., Seo, S., & Liu, Y. (2017). CSI. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. doi:10.1145/3132847.3132877Tausczik, Y. R., & Pennebaker, J. W. (2009). The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology, 29(1), 24-54. doi:10.1177/0261927x09351676Volkova, S., Shaffer, K., Jang, J. Y., & Hodas, N. (2017). Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). doi:10.18653/v1/p17-2102Zhao, Z., Resnick, P., & Mei, Q. (2015). Enquiring Minds. Proceedings of the 24th International Conference on World Wide Web. doi:10.1145/2736277.274163

RiuNet

Distributed resource discovery using a context sensitive infrastructure

Author: Eliassen F.
Ferguson I.
Fongen A.
Stobart S.
Tait J.
Publication venue: 'IOS Press'
Publication date: 01/01/2001
Field of study

Distributed Resource Discovery in a World Wide Web environment using full-text indices will never scale. The distinct properties of WWW information (volume, rate of change, topical diversity) limits the scaleability of traditional approaches to distributed Resource Discovery. An approach combining metadata clustering and query routing can, on the other hand, be proven to scale much better. This paper presents the Content-Sensitive Infrastructure, which is a design building on these results. We also present an analytical framework for comparing scaleability of different distribution strategies

CiteSeerX

University of Strathclyde Institutional Repository

Recommended from our members

Optimizing genetics online resources for diverse readers.

Author: Chang Jiyoo
Penon-Portmann Monica
Shieh Joseph T
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

PurposeClear and accurate genetic information should be available to health-care consumers at an individualized level of comprehension. The objective of this study is to evaluate the complexity of common online resources and to simplify text content using automated text processing tools.MethodsWe extracted all text from Genetics Home Reference and MedlinePlus in bulk and analyzed content using natural language processing. We applied custom tools to improve the readability and compared readability before and after text optimization.ResultsCommonly used educational materials were more complex than the recommended reading level for the general public. Genetic health information entries from Genetics Home Reference (n = 1279) were written at a median 13.0 grade level. MedlinePlus entries, which are not exclusively genetic (n = 1030), had a median grade level of 7.7. When we optimized text for the 59 actionable conditions by prioritizing medical details using a standard structure, the average reading grade level improved.ConclusionFactors that increase complexity are long sentences and difficult words. Future strategies to reduce complexity include prioritizing relevant details and using more illustrations. Simplifying and providing standardized online health resources would benefit diverse consumers and promote inclusivity

eScholarship - University of California

The Options for UK Domestic Water Reduction: A Review

Author: McDonald A.
Parsons J.
Rees P.
Sim P.
Publication venue: The School of Geography, University of Leeds
Publication date: 01/08/2005
Field of study

Demand pressure on UK water supplies is expected to increase in the next 20 years driven by increasing population, new housing development and reducing household size. Regionally and locally migration will also afect demand particularly in the South-East. The water reduction trends that will have the greatest reduction effect on UK consumption are: 1. For new homes; metering and new efficiencies in design and construction (e.g. low flush toilets, heating and plumbing efficiences) 2. For established housing; metering and modern washing machines

White Rose Research Online

A Model for Personalized Keyword Extraction from Web Pages using Segmentation

Author: Aghila G.
Kuppusamy K. S.
Publication venue: 'Foundation of Computer Science'
Publication date: 02/04/2012
Field of study

The World Wide Web caters to the needs of billions of users in heterogeneous groups. Each user accessing the World Wide Web might have his / her own specific interest and would expect the web to respond to the specific requirements. The process of making the web to react in a customized manner is achieved through personalization. This paper proposes a novel model for extracting keywords from a web page with personalization being incorporated into it. The keyword extraction problem is approached with the help of web page segmentation which facilitates in making the problem simpler and solving it effectively. The proposed model is implemented as a prototype and the experiments conducted on it empirically validate the model's efficiency.Comment: 6 Pages, 2 Figure

arXiv.org e-Print Archive

Crossref