Search CORE

15,976 research outputs found

The Hidden Web, XML and Semantic Web: A Scientific Data Management Perspective

Author: Nayak Richi
Senellart Pierre
Suchanek Fabian
Varde Aparna
Publication venue
Publication date: 01/01/2011
Field of study

The World Wide Web no longer consists just of HTML pages. Our work sheds light on a number of trends on the Internet that go beyond simple Web pages. The hidden Web provides a wealth of data in semi-structured form, accessible through Web forms and Web services. These services, as well as numerous other applications on the Web, commonly use XML, the eXtensible Markup Language. XML has become the lingua franca of the Internet that allows customized markups to be defined for specific domains. On top of XML, the Semantic Web grows as a common structured data source. In this work, we first explain each of these developments in detail. Using real-world examples from scientific domains of great interest today, we then demonstrate how these new developments can assist the managing, harvesting, and organization of data on the Web. On the way, we also illustrate the current research avenues in these domains. We believe that this effort would help bridge multiple database tracks, thereby attracting researchers with a view to extend database technology.Comment: EDBT - Tutorial (2011

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Montclair State University Digital Commons

INRIA a CCSD electronic archive server

Queensland University of Technology ePrints Archive

Hal-Diderot

HAL-Rennes 1

Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time

Author: Andrassy Bernt
Gupta Pankaj
Rajaram Subburam
Schütze Hinrich
Publication venue
Publication date: 01/01/2018
Field of study

Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNNRSM shows better generalization, topic interpretation, evolution and trends. We also introduce a metric (named as SPAN) to quantify the capability of dynamic topic model to capture word evolution in topics over time.Comment: In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018

arXiv.org e-Print Archive

Crossref

Basic tasks of sentiment analysis

Author: DM Blei
E Cambria
E Cambria
E Cambria
E Cambria
E Cambria
G Murray
G Qiu
GE Hinton
GW Taylor
H Tang
I Chaturvedi
L Oneto
R Collobert
R Ortega
S Branavan
S Poria
S Rill
T Wang
X Ding
Y Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/10/2017
Field of study

Subjectivity detection is the task of identifying objective and subjective sentences. Objective sentences are those which do not exhibit any sentiment. So, it is desired for a sentiment analysis engine to find and separate the objective sentences for further analysis, e.g., polarity detection. In subjective sentences, opinions can often be expressed on one or multiple topics. Aspect extraction is a subtask of sentiment analysis that consists in identifying opinion targets in opinionated text, i.e., in detecting the specific aspects of a product or service the opinion holder is either praising or complaining about

arXiv.org e-Print Archive

Crossref