Search CORE

452 research outputs found

Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification

Author: Foster Jennifer
He Yifan
Liu Qun
Shouxun Lin
Tu Zhaopeng
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/07/2012
Field of study

Convolution kernels support the modeling of complex syntactic information in machine-learning tasks. However, such models are highly sensitive to the type and size of syntactic structure used. It is therefore an important challenge to automatically identify high impact sub-structures relevant to a given task. In this paper we present a systematic study investigating (combinations of) sequence and convolution kernels using different types of substructures in document-level sentiment classification. We show that minimal sub-structures extracted from constituency and dependency trees guided by a polarity lexicon show 1.45 point absolute improvement in accuracy over a bag-of-words classifier on a widely used sentiment corpus

Irish Universities

DCU Online Research Access Service

XML Schema Clustering with Semantic and Hierarchical Similarity Measures

Author: Iryadi Wina
Nayak Richi
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

Crossref

Queensland University of Technology ePrints Archive

Data Mining : Masa Lalu, Sekarang, dan Masa Mendatang

Author: Purba R. (Ronsen)
Publication venue: None
Publication date: 01/01/2012
Field of study

Data mining telah menjadi disiplin ilmu yang dibangun dalam domain kecerdasan buatan (AI), dan rekayasa pengetahuan (KE). Data mining berakar pada machine learning dan statistika, tetapi merambah bidang lain dalam ilmu komputer dan ilmu lainnya seperti biologi, lingkungan, finansial, jaringan dan sebagainya. Data mining telah mendapatkan begitu besar perhatian pada dekade terakhir sehubungan dengan perkembangan hardware yang menyediakan kemampuan komputasi luar biasa yang memungkinkan pengolahan data besar. Tidak seperti kajian lain dalam AI dan KE, data mining dapat diperdebatkan sebagai sebuah aplikasi dibandingkan dengan sebuah teknologi, dengan demikian diharapkan akan menjadi topik yang hangat dibahas di masa mendatang, mengingat pertumbuhan data yang bersifat eksponensial. Paper ini memberikan kilas Balik perjalanan sejarah data mining, keadaan saat ini dan beberapa pandangan dan perkembangan ke depan

Neliti

E-Jurnal Mikroskil (STMIK - STIE Mikroskil)

Semi-supervised co-clustering on attributed heterogeneous information networks

Author: FANG Yuan
JI Yugang
KONG Xiangnan
SHI Chuan
YIN Mingyang
Publication venue: 'Elsevier BV'
Publication date: 01/07/2020
Field of study

trueThe embargo period should be 2 years -- not sure why under the drop down I can only select one year. Please validate.</p

Institutional Knowledge at Singapore Management University

Fast Distributed PageRank Computation

Author: Andersen
Anisur Rahaman Molla
Atish Das Sarma
Avrachenkov
Bahmani
Bahmani
Berkhin
Bianchini
Brin
Cook
Das Sarma
Das Sarma
Das Sarma
Eli Upfal
Gopal Pandurangan
Grolmusz
Iván
Langville
Mitzenmacher
Page
Perra
Sankaralingam
Shi
Wang
Publication venue: 'Elsevier BV'
Publication date: 25/11/2015
Field of study

Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRanks in general graphs and prove strong bounds on the round complexity. We first present a distributed algorithm that takes O\big(\log n/\eps \big) rounds with high probability on any graph (directed or undirected), where

n

is the network size and \eps is the reset probability used in the PageRank computation (typically \eps is a fixed constant). We then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big) rounds in undirected graphs. Both of the above algorithms are scalable, as each node sends only small (\polylog n) number of bits over each edge per round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vector with provably efficient running time.Comment: 14 page

arXiv.org e-Print Archive

Crossref