Search CORE

12 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

How comprehensive is the PubMed Central Open Access full-text database?

Author: He Jiangen
Li Kai
Publication venue: 'iSchools'
Publication date: 15/03/2019
Field of study

The comprehensiveness of database is a prerequisite for the quality of scientific works established on this increasingly significant infrastructure. This is especially so for large-scale text-mining analyses of scientific publications facilitated by open-access full-text scientific databases. Given the lack of research concerning the comprehensiveness of this type of academic resource, we conducted a project to analyze the coverage of materials in the PubMed Central Open Access Subset (PMCOAS), a popular source for open-access scientific publications, in terms of the PubMed database. The preliminary results show that the PMCOAS coverage is in a rapid increase in recent years, despite the vast difference by MeSH descriptor

Illinois Digital Environment for Access to Learning and Scholarship Repository

Adverse Drug Event Detection, Causality Inference, Patient Communication and Translational Research

Author: Polepalli Ramesh Balaji
Publication venue: UWM Digital Commons
Publication date: 01/05/2014
Field of study

Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the FDA Adverse Event Reporting System reports. The ADEtector system employs novel natural language processing approaches for ADE detection and provides a user interface to display ADE information. The ADEtector employs machine learning techniques to automatically processes the narrative text and identify the adverse event (AE) and medication entities that appear in that narrative text. The system will analyze the entities recognized to infer the causal relation that exists between AEs and medications by automating the elements of Naranjo score using knowledge and rule based approaches. The Naranjo Adverse Drug Reaction Probability Scale is a validated tool for finding the causality of a drug induced adverse event or ADE. The scale calculates the likelihood of an adverse event related to drugs based on a list of weighted questions. The ADEtector also presents the user with evidence for ADEs by extracting figures that contain ADE related information from biomedical literature. A brief summary is generated for each of the figures that are extracted to help users better comprehend the figure. This will further enhance the user experience in understanding the ADE information better. The ADEtector also helps patients better understand the narrative text by recognizing complex medical jargon and abbreviations that appear in the text and providing definitions and explanations for them from external knowledge resources. This system could help clinicians and researchers in discovering novel ADEs and drug relations and also hypothesize new research questions within the ADE domain

University of Wisconsin-Milwaukee

Mapping Scholarly Communication Infrastructure: A Bibliographic Scan of Digital Scholarly Communication Infrastructure

Author: Lewis David W.
Publication venue: Educopia Institute
Publication date: 01/05/2020
Field of study

This bibliography scan covers a lot of ground. In it, I have attempted to capture relevant recent literature across the whole of the digital scholarly communications infrastructure. I have used that literature to identify significant projects and then document them with descriptions and basic information. Structurally, this review has three parts. In the first, I begin with a diagram showing the way the projects reviewed fit into the research workflow; then I cover a number of topics and functional areas related to digital scholarly communication. I make no attempt to be comprehensive, especially regarding the technical literature; rather, I have tried to identify major articles and reports, particularly those addressing the library community. The second part of this review is a list of projects or programs arranged by broad functional categories. The third part lists individual projects and the organizations—both commercial and nonprofit—that support them. I have identified 206 projects. Of these, 139 are nonprofit and 67 are commercial. There are 17 organizations that support multiple projects, and six of these—Artefactual Systems, Atypon/Wiley, Clarivate Analytics, Digital Science, Elsevier, and MDPI—are commercial. The remaining 11—Center for Open Science, Collaborative Knowledge Foundation (Coko), LYRASIS/DuraSpace, Educopia Institute, Internet Archive, JISC, OCLC, OpenAIRE, Open Access Button, Our Research (formerly Impactstory), and the Public Knowledge Project—are nonprofit.Andrew W. Mellon Foundatio

IUPUIScholarWorks

A General Architecture to Enhance Wiki Systems with Natural Language Processing Techniques

Author: Sateli Bahar
Publication venue
Publication date: 15/04/2012
Field of study

Wikis are web-based software applications that allow users to collaboratively create and edit web page content, through a Web browser using a simplified syntax. The ease-of-use and “open” philosophy of wikis has brought them to the attention of organizations and online communities, leading to a wide-spread adoption as a simple and “quick” way of collaborative knowledge management. However, these characteristics of wiki systems can act as a double-edged sword: When wiki content is not properly structured, it can turn into a “tangle of links”, making navigation, organization and content retrieval difficult for their end-users. Since wiki content is mostly written in unstructured natural language, we believe that existing state-of-the-art techniques from the Natural Language Processing (NLP) and Semantic Computing domains can help mitigating these common problems when using wikis and improve their users’ experience by introducing new features. The challenge, however, is to find a solution for integrating novel semantic analysis algorithms into the multitude of existing wiki systems, without the need for modifying their engines. In this research work, we present a general architecture that allows wiki systems to benefit from NLP services made available through the Semantic Assistants framework – a service-oriented architecture for brokering NLP pipelines as web services. Our main contributions in this thesis include an analysis of wiki engines, the development of collaboration patterns be- tween wikis and NLP, and the design of a cohesive integration architecture. As a concrete application, we deployed our integration to MediaWiki – the powerful wiki engine behind Wikipedia – to prove its practicability. Finally, we evaluate the usability and efficiency of our integration through a number of user studies we performed in real-world projects from various domains, including cultural heritage data management, software requirements engineering, and biomedical literature curation

Concordia University Research Repository

B!SON: A Tool for Open Access Journal Recommendation

Author: Entrup Elias
Eppelin Anita
Ewerth Ralph
Hartwig Josephine
Hoppe Anett
Tullney Marco
Wohlgemuth Michael
Publication venue: Heidelberg : Springer
Publication date: 01/01/2022
Field of study

Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project

Repositorium für Naturwissenschaften und Technik

Figure summarizer browser extensions for PubMed Central

Author: Agarwal
H. Yu
S. Agarwal
Yu
Publication venue: Oxford University Press
Publication date
Field of study

Summary: Figures in biomedical articles present visual evidence for research facts and help readers understand the article better. However, when figures are taken out of context, it is difficult to understand their content. We developed a summarization algorithm to summarize the content of figures and used it in our figure search engine (http://figuresearch.askhermes.org/). In this article, we report on the development of web browser extensions for Mozilla Firefox, Google Chrome and Apple Safari to display summaries for figures in PubMed Central and NCBI Images

Crossref

PubMed Central

Knowledge Management, Trust and Communication in the Era of Social Media

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

The article entitled "Selected Aspects of Evaluating Knowledge Management Quality in Contemporary Enterprises" broadens the understanding of knowledge management and estimates select aspects of knowledge management quality evaluations in modern enterprises from theoretical and practical perspectives. The seventh article aims to present the results of pilot studies on the four largest Information Communication Technology (ICT) companies' involvement in promoting the Sustainable Development Goals (SDGs) through social media. Studies examine which communication strategy is used by companies in social media. The primary purpose of the eighth article is to present the relationship between trust and knowledge sharing, taking into account the importance of this issue in the efficiency of doing business. The results showed that trust is vital in sharing knowledge and essential in achieving a high-performance efficiency level. The ninth article presents the impact of social media on consumer choices in tourism and tourist products' specificity. The study's main purpose was to indicate the most commonly used social media in selecting a tourist destination and implementing Generation Y's journey. The 10th article aims to identify the most critical purposes of using social media by responding to women's attitudes according to age and their respective countries' economic development. The research was done through an online survey in 2017–2018, followed by an analysis of eight countries' results. The article entitled "Integrated Question-Answering System for Natural Disaster Domains Based on Social Media Messages Posted at the Time of Disaster" presents the framework of a question-answering system that was developed using a Twitter dataset containing more than 9 million tweets compiled during the Osaka North Earthquake that occurred on 18 June 2018. The authors also study the structure of the questions posed and develop methods for classifying them into particular categories to find answers from the dataset using an ontology, word similarity, keyword frequency, and natural language processing. The book provides a theoretical and practical background related to trust, knowledge management, and communication in the era of social media. The editor believes that the collection of articles can be relevant to professionals, researchers, and students' needs. The authors try to diagnose the situation and show the new challenges and future directions in this area

Directory of Open Access Books (DOAB)

Preface

Author: Pape-Haugaard Louise B.
Scott Philip
Publication venue: 'IOS Press'
Publication date: 16/06/2020
Field of study

Portsmouth University Research Portal (Pure)

The Future of Information Sciences : INFuture2015 : e-Institutions – Openness, Accessibility, and Preservation

Author
Publication venue: Department of Information and Communication Sciences, Faculty of Humanities and Social Sciences, University of Zagreb
Publication date: 01/11/2015
Field of study

Repozitorij Filozofskog fakulteta u Zagrebu' at University of Zagreb