6 research outputs found
Context and Keyword Extraction in Plain Text Using a Graph Representation
Document indexation is an essential task achieved by archivists or automatic
indexing tools. To retrieve relevant documents to a query, keywords describing
this document have to be carefully chosen. Archivists have to find out the
right topic of a document before starting to extract the keywords. For an
archivist indexing specialized documents, experience plays an important role.
But indexing documents on different topics is much harder. This article
proposes an innovative method for an indexing support system. This system takes
as input an ontology and a plain text document and provides as output
contextualized keywords of the document. The method has been evaluated by
exploiting Wikipedia's category links as a termino-ontological resources
Context Based Indexing On Synonym System Using Hierarchical Clustering In Web Mining
Now a days, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective indexing structure. Indexing in search engines has become the major issue for improving the performance of Web search engines, so that the most relevant web documents are retrieved in minimum possible time. For this a new indexing mechanism in search engine is proposed which is based on indexing the synonym terms of the web documents, a synonym term which have multiple context with same meaning of the web documents. The indexing is performed on the bases of hierarchical clustering method which clustered the similar term documents into the same cluster and these clusters are clubbed together to form mega cluster on the basis of synonym term. With the similarity of clusters, it will optimize the search process by forming the different levels of hierarchy. Finally, it will give fast and relevant retrieval of web documents to the user
Conception d'un outil d'aide \`a l'indexation de ressources p\'edagogiques - Extraction automatique des th\'ematiques et des mots-clefs de documents UNIT
Indexing learning documents using the Learning Object Metadata (LOM) is often
carried out manually by archivists. Filling out the LOM fields is a long and
difficult task, requiring a complete reading and a full knowledge on the topic
dealt within the document. In this paper, we present an innovative model and
method to assist the archivists in finding the important concepts and keywords
of a learning document. The application is performed using wikipedia's category
links
Algoritmos de aprendizaje automático no supervisado para la extracción de palabras clave en trabajos de investigación de pregrado
La información que administra la Universidad Nacional del Altiplano de Puno, en los últimos años se ha visto incrementada sobre todo trabajos de investigación realizados por estudiantes y egresados de pregrado, para los que se usan técnicas empÃricas para la selección de palabras clave, existiendo a la fecha métodos técnicos que ayuden en este proceso, en tanto el uso de tecnologÃas de información y comunicación han tomado relevancia e importancia en la administración y seguimiento de trabajos de investigación como la Plataforma de Investigación Integrada a la Labor Académica con Responsabilidad (PILAR), donde registra información de los proyectos de investigación como (TÃtulo, Resumen, Palabras Clave), en sus diferentes modalidades. En el presente trabajo de investigación se ha analizado 7430 registros de proyectos de investigación, a los cuales se realizaron predicciones con cada uno de los 09 modelos de aprendizaje automático no supervisado implementados. Los resultados nos muestran que el modelo TF-IDF, es el más eficiente en tiempo y en precisión de extracción de palabras clave, obteniendo un 72 % de precisión y en un tiempo de extracción entre [0.4786 ,SD 0.0501], por cada documento procesado por este modelo.Tesi