Search CORE

1,029 research outputs found

Linking thesauri to the linked open data cloud for improved media retrieval

Author: Braeckman Karel
De Sutter Robbie
Debevere Pedro
Mannens Erik
Van de Walle Rik
Van Deursen Davy
Publication venue
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

A Text Classifier Based on Sentence Category VSM

Author: Zhang Quan
Zhang Yun-Liang
Publication venue: 'Tsinghua University Press'
Publication date: 01/10/2006
Field of study

PACLIC 20 / Wuhan, China / 1-3 November, 200

Waseda University Repository

COMPARATIVE ANALYSIS OF PARTICLE SWARM OPTIMIZATION ALGORITHMS FOR TEXT FEATURE SELECTION

Author: Wu Shuang
Publication venue: SJSU ScholarWorks
Publication date: 13/05/2015
Field of study

With the rapid growth of Internet, more and more natural language text documents are available in electronic format, making automated text categorization a must in most fields. Due to the high dimensionality of text categorization tasks, feature selection is needed before executing document classification. There are basically two kinds of feature selection approaches: the filter approach and the wrapper approach. For the wrapper approach, a search algorithm for feature subsets and an evaluation algorithm for assessing the fitness of the selected feature subset are required. In this work, I focus on the comparison between two wrapper approaches. These two approaches use Particle Swarm Optimization (PSO) as the search algorithm. The first algorithm is PSO based K-Nearest Neighbors (KNN) algorithm, while the second is PSO based Rocchio algorithm. Three datasets are used in this study. The result shows that BPSO-KNN is slightly better in classification results than BPSO-Rocchio, while BPSO-Rocchio has far shorter computation time than BPSO-KNN

SJSU ScholarWorks

Experiment on Methods for Clustering and Categorization of Polish Text

Author: Dabrowska-Boruch Agnieszka
Fraczek Rafał
Jamro Ernest
Pietroń Marcin
Russek Paweł
Wiatr Kazimierz
Wielgosz Maciej
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 09/05/2017
Field of study

The main goal of this work was to experimentally verify the methods for a challenging task of categorization and clustering Polish text. Supervised and unsupervised learning was employed respectively for the categorization and clustering. A profound examination of the employed methods was done for the custom-built corpus of Polish texts. The corpus was assembled by the authors from Internet resources. The corpus data was acquired from the news portal and, therefore, it was sorted by type by journalists according to their specialization. The presented algorithms employ Vector Space Model (VSM) and TF-IDF (Term Frequency-Inverse Document Frequency) weighing scheme. Series of experiments were conducted that revealed certain properties of algorithms and their accuracy. The accuracy of algorithms was elaborated regarding their ability to match human arrangement of the documents by the topic. For both the categorization and clustering, the authors used F-measure to assess the quality of allocation

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

A hydrogen peroxide biosensor based on nanoparticle PANI/HRP electrode

Author: Khoo Kok Siong
Othman Siti Amira
Radiman Shahidan
Publication venue
Publication date: 01/01/2010
Field of study

Recently, conducting polymers have attracted much interest in the development of biosensor. It contain π- electron backbone responsible for its unusual electronic properties such as electrical conductivity, low energy optical transitions, low ionization potential and high electron affinity. When the Horseradish peroxidase (HRP) was immobilized to the conducting polymers, these polymers possesses the ability to bind oppositely charged complex entities in their neutral insulating state. Determination of Hydrogen peroxide (H2O2) and other organic peroxides is of practical importance in clinical, environmental and many other fields. This study intends to see the role and properties of PANI/HRP layer towards H2O2 by measuring its current. Langmuir- Blodgett technique was used to form the PANI monolayer and the HRP was deposited in PANI monolayer by using electrodeposition method. Results from U.V.- visible spectrum of PANI with and without HRP shows two sharp absorption peaks at 320 nm and 720 nm. PANI forms as nanoparticles was revealed by VPSEM. AFM shows the image in roughness before and after the HRP was deposited on PANI monolayer. The current and response of H2O2 towards PANI/HRP electrode increases demonstrating effective electrocatalytic reduction of H202. PANI/HRP electrode not only act as excellent materials for rapid electron transfer but also for the fabrication of efficient biosensors

UTHM Institutional Repository

Human-Level Performance on Word Analogy Questions by Latent Relational Analysis

Author: Turney Peter D.
Publication venue
Publication date: 01/01/2004
Field of study

This paper introduces Latent Relational Analysis (LRA), a method for measuring relational similarity. LRA has potential applications in many areas, including information extraction, word sense disambiguation, machine translation, and information retrieval. Relational similarity is correspondence between relations, in contrast with attributional similarity, which is correspondence between attributes. When two words have a high degree of attributional similarity, we call them synonyms. When two pairs of words have a high degree of relational similarity, we say that their relations are analogous. For example, the word pair mason/stone is analogous to the pair carpenter/wood; the relations between mason and stone are highly similar to the relations between carpenter and wood. Past work on semantic similarity measures has mainly been concerned with attributional similarity. For instance, Latent Semantic Analysis (LSA) can measure the degree of similarity between two words, but not between two relations. Recently the Vector Space Model (VSM) of information retrieval has been adapted to the task of measuring relational similarity, achieving a score of 47% on a collection of 374 college-level multiple-choice word analogy questions. In the VSM approach, the relation between a pair of words is characterized by a vector of frequencies of predefined patterns in a large corpus. LRA extends the VSM approach in three ways: (1) the patterns are derived automatically from the corpus (they are not predefined), (2) the Singular Value Decomposition (SVD) is used to smooth the frequency data (it is also used this way in LSA), and (3) automatically generated synonyms are used to explore reformulations of the word pairs. LRA achieves 56% on the 374 analogy questions, statistically equivalent to the average human score of 57%. On the related problem of classifying noun-modifier relations, LRA achieves similar gains over the VSM, while using a smaller corpus

arXiv.org e-Print Archive

NRC Publications Archive

Blog Analysis with Fuzzy TFIDF

Author: Ho Chi-Shu
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2007
Field of study

These days blogs are becoming increasingly popular because it allows anyone to share their personal diary, opinions, and comments on the World Wide Wed. Many blogs contain valuable information, but it is a difficult task to extract this information from a high number of blog comments. The goal is to analyze a high number of blog comments by clustering all blog comments by their similarity based on keyword relevance into smaller groups. TF-IDF weight has been used in classifying documents by measuring appearance frequency of each keyword in a document, but it is not effective in differentiating semantic similarities between words. By applying fuzzy semantic to TF-IDF, TF-IDF becomes fuzzy TF-IDF and has the ability to rank semantic relevancy. Fuzzy VSM can be effective in exploring hidden relationship between blog comments by adapting fuzzy TF-IDF and fuzzy semantic for extending Vector Space Model to fuzzy VSM. Therefore, fuzzy VSM can cluster a high number of blog comments into small number of groups based on document similarity and semantic relevancy

SJSU ScholarWorks