13,791 research outputs found
Comparing SVM and Naive Bayes classifiers for text categorization with Wikitology as knowledge enrichment
The activity of labeling of documents according to their content is known as
text categorization. Many experiments have been carried out to enhance text
categorization by adding background knowledge to the document using knowledge
repositories like Word Net, Open Project Directory (OPD), Wikipedia and
Wikitology. In our previous work, we have carried out intensive experiments by
extracting knowledge from Wikitology and evaluating the experiment on Support
Vector Machine with 10- fold cross-validations. The results clearly indicate
Wikitology is far better than other knowledge bases. In this paper we are
comparing Support Vector Machine (SVM) and Na\"ive Bayes (NB) classifiers under
text enrichment through Wikitology. We validated results with 10-fold cross
validation and shown that NB gives an improvement of +28.78%, on the other hand
SVM gives an improvement of +6.36% when compared with baseline results. Na\"ive
Bayes classifier is better choice when external enriching is used through any
external knowledge base.Comment: 5 page
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia
In this paper we investigate the nature and structure of the relation between
imposed classifications and real clustering in a particular case of a
scale-free network given by the on-line encyclopedia Wikipedia. We find a
statistical similarity in the distributions of community sizes both by using
the top-down approach of the categories division present in the archive and in
the bottom-up procedure of community detection given by an algorithm based on
the spectral properties of the graph. Regardless the statistically similar
behaviour the two methods provide a rather different division of the articles,
thereby signaling that the nature and presence of power laws is a general
feature for these systems and cannot be used as a benchmark to evaluate the
suitability of a clustering method.Comment: 5 pages, 3 figures, epl2 styl
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
- …