6 research outputs found

    Context and Keyword Extraction in Plain Text Using a Graph Representation

    Full text link
    Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources

    Context Based Indexing On Synonym System Using Hierarchical Clustering In Web Mining

    Get PDF
    Now a days, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective indexing structure. Indexing in search engines has become the major issue for improving the performance of Web search engines, so that the most relevant web documents are retrieved in minimum possible time. For this a new indexing mechanism in search engine is proposed which is based on indexing the synonym terms of the web documents, a synonym term which have multiple context with same meaning of the web documents. The indexing is performed on the bases of hierarchical clustering method which clustered the similar term documents into the same cluster and these clusters are clubbed together to form mega cluster on the basis of synonym term. With the similarity of clusters, it will optimize the search process by forming the different levels of hierarchy. Finally, it will give fast and relevant retrieval of web documents to the user

    Conception d'un outil d'aide \`a l'indexation de ressources p\'edagogiques - Extraction automatique des th\'ematiques et des mots-clefs de documents UNIT

    Get PDF
    Indexing learning documents using the Learning Object Metadata (LOM) is often carried out manually by archivists. Filling out the LOM fields is a long and difficult task, requiring a complete reading and a full knowledge on the topic dealt within the document. In this paper, we present an innovative model and method to assist the archivists in finding the important concepts and keywords of a learning document. The application is performed using wikipedia's category links

    Algoritmos de aprendizaje automático no supervisado para la extracción de palabras clave en trabajos de investigación de pregrado

    Get PDF
    La información que administra la Universidad Nacional del Altiplano de Puno, en los últimos años se ha visto incrementada sobre todo trabajos de investigación realizados por estudiantes y egresados de pregrado, para los que se usan técnicas empíricas para la selección de palabras clave, existiendo a la fecha métodos técnicos que ayuden en este proceso, en tanto el uso de tecnologías de información y comunicación han tomado relevancia e importancia en la administración y seguimiento de trabajos de investigación como la Plataforma de Investigación Integrada a la Labor Académica con Responsabilidad (PILAR), donde registra información de los proyectos de investigación como (Título, Resumen, Palabras Clave), en sus diferentes modalidades. En el presente trabajo de investigación se ha analizado 7430 registros de proyectos de investigación, a los cuales se realizaron predicciones con cada uno de los 09 modelos de aprendizaje automático no supervisado implementados. Los resultados nos muestran que el modelo TF-IDF, es el más eficiente en tiempo y en precisión de extracción de palabras clave, obteniendo un 72 % de precisión y en un tiempo de extracción entre [0.4786 ,SD 0.0501], por cada documento procesado por este modelo.Tesi

    Context and Keyword Extraction in Plain Text Using a Graph Representation

    No full text
    corecore