Search CORE

26,410 research outputs found

Web Data Extraction, Applications and Techniques: A Survey

Author: Abel
Amalfitano
Balduzzi
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Baumgartner
Berger
Berthold
Bettencourt
Califf
Catanese
Chang
Chen
Chen
Chen
Collins
Conover
Crandall
Crescenzi
Crescenzi
Dalvi
Dalvi
De Meo
De Meo
Doan
Emilio Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Ferrara
Flesca
Freitag
Furche
Gatterbauer
Gatterbauer
Giacomo Fiumara
Gjoka
Gkotsis
Gottlob
Gottlob
Hammersley
Han
Hecht
Hsu
Irmak
Khare
Kim
Kinsella
Kleinberg
Kleinberg
Kohlschütter
Kokkoras
Kokkoras
Kokkoras
Krüpl
Kushmerick
Kwak
Laender
Liu
Manning
Masanès
Mathes
Meng
Mislove
Monge
Muslea
Oro
Pan
Pasquale De Meo
Perito
Phan
Plake
Rahm
Rahm
Reis
Robert Baumgartner
Sahuguet
Sarawagi
Schifanella
Selkow
Shi
Soderland
Szomszor
Turmo
Vosecky
Wang
Wang
Weikum
Wilson
Winograd
Yang
Ye
Zafarani
Zanasi
Zhai
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 09/06/2014
Field of study

Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

arXiv.org e-Print Archive

Crossref

Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis

Author: Darlington John
Imtiaz Hazzaz
Zuo Landong
Publication venue
Publication date: 01/09/2009
Field of study

The Market Blended Insight project1 has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the unstructured text on the web, to annotate and then translate the extracted data according to the backend schema

Southampton (e-Prints Soton)

A literature survey of methods for analysis of subjective language

Author: Täckström Oscar
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/2009
Field of study

Subjective language is used to express attitudes and opinions towards things, ideas and people. While content and topic centred natural language processing is now part of everyday life, analysis of subjective aspects of natural language have until recently been largely neglected by the research community. The explosive growth of personal blogs, consumer opinion sites and social network applications in the last years, have however created increased interest in subjective language analysis. This paper provides an overview of recent research conducted in the area

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Sentiment Analysis Using Collaborated Opinion Mining

Author: Malhotra Vikrant
Tyagi Ridhi
Virmani Deepali
Publication venue
Publication date: 12/01/2014
Field of study

Opinion mining and Sentiment analysis have emerged as a field of study since the widespread of World Wide Web and internet. Opinion refers to extraction of those lines or phrase in the raw and huge data which express an opinion. Sentiment analysis on the other hand identifies the polarity of the opinion being extracted. In this paper we propose the sentiment analysis in collaboration with opinion extraction, summarization, and tracking the records of the students. The paper modifies the existing algorithm in order to obtain the collaborated opinion about the students. The resultant opinion is represented as very high, high, moderate, low and very low. The paper is based on a case study where teachers give their remarks about the students and by applying the proposed sentiment analysis algorithm the opinion is extracted and represented.Comment: 5 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Investigating people: a qualitative analysis of the search behaviours of open-source intelligence analysts

Author: Finch E.
Guha R.
Hanbury A.
Holland B. R.
Liu C.
Pirolli P.
Zipf G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

The Internet and the World Wide Web have become integral parts of the lives of many modern individuals, enabling almost instantaneous communication, sharing and broadcasting of thoughts, feelings and opinions. Much of this information is publicly facing, and as such, it can be utilised in a multitude of online investigations, ranging from employee vetting and credit checking to counter-terrorism and fraud prevention/detection. However, the search needs and behaviours of these investigators are not well documented in the literature. In order to address this gap, an in-depth qualitative study was carried out in cooperation with a leading investigation company. The research contribution is an initial identification of Open-Source Intelligence investigator search behaviours, the procedures and practices that they undertake, along with an overview of the difficulties and challenges that they encounter as part of their domain. This lays the foundation for future research in to the varied domain of Open-Source Intelligence gathering

Crossref

Enlighten

A unified view of data-intensive flows in business intelligence systems : a survey

Author: Abelló Gamazo Alberto
Jovanovic Petar
Romero Moral Óscar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Research Directions, Challenges and Issues in Opinion Mining

Author: Hariharan Shanmugasundaram
Lu Joan
Sudhakaran Periakaruppan
Publication venue: 'Science and Engineering Research Support Society'
Publication date: 01/01/2013
Field of study

Rapid growth of Internet and availability of user reviews on the web for any product has provided a need for an effective system to analyze the web reviews. Such reviews are useful to some extent, promising both the customers and product manufacturers. For any popular product, the number of reviews can be in hundreds or even thousands. This creates difficulty for a customer to analyze them and make important decisions on whether to purchase the product or to not. Mining such product reviews or opinions is termed as opinion mining which is broadly classified into two main categories namely facts and opinions. Though there are several approaches for opinion mining, there remains a challenge to decide on the recommendation provided by the system. In this paper, we analyze the basics of opinion mining, challenges, pros & cons of past opinion mining systems and provide some directions for the future research work, focusing on the challenges and issues

Crossref

University of Huddersfield Repository

THE OPTIMIZATION OF THE INTERNAL AND EXTERNAL REPORTING IN FINANCIAL ACCOUNTING: ADOPTING XBRL INTERNATIONAL STANDARD

Author: Catalin Georgel Tudor
Vasile Florescu
Publication venue
Publication date
Field of study

More and more enterprises, especially the listed companies, have adopted newaccounting norms and regulations (IFRS or US GAAP, Bale II and, in perspective, SURFI),manifesting interest for publishing financial reports using a standard format able to considerablyimprove their communication, data collection in the receiving units, control and analysis offinancial information. When switching to the new accounting rules specified in international orregional standards and norms, regulatory and control bodies recommend the XBRL format forfinancial reporting, with recognition of the regional jurisdiction. Our paper makes a review of theliterature, presents the XBRL specific elements and proposes possible solutions for internal andexternal financial reporting of an enterprise. Finally, it concludes on the benefits of adopting XBRLat national level in a potential XBRL Romania project.accounting norms, financial reporting, XBRL, taxonomy, XBRL jurisdiction.

Research Papers in Economics