Search CORE

2,748 research outputs found

Scientific Documents clustering based on Text Summarization

Author: Amoli Pedram Vahdani
Sojoodi Sh. Omid
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/08/2015
Field of study

In this paper a novel method is proposed for scientific document clustering. The proposed method is a summarization-based hybrid algorithm which comprises a preprocessing phase. In the preprocessing phase unimportant words which are frequently used in the text are removed. This process reduces the amount of data for the clustering purpose. Furthermore frequent items cause overlapping between the clusters which leads to inefficiency of the cluster separation. After the preprocessing phase, Term Frequency/Inverse Document Frequency (TFIDF) is calculated for all words and stems over the document to score them in the document. Text summarization is performed then in the sentence level. Document clustering is finally done according to the scores of calculated TFIDF. The hybrid progress of the proposed scheme, from preprocessing phase to document clustering, gains a rapid and efficient clustering method which is evaluated by 400 English texts extracted from scientific databases of 11 different topics. The proposed method is compared with CSSA, SMTC and Max-Capture methods. The results demonstrate the proficiency of the proposed scheme in terms of computation time and efficiency using F-measure criterion

IAES journal

Institute of Advanced Engineering and Science

Document clustering for knowledge synthesis and project portfolio funding decision in R&D organizations

Author: Gupta Richa
Kumar Abhishek
Kumar Naresh
Publication venue: Annals of Library and Information Studies (ALIS)
Publication date: 23/11/2019
Field of study

The paper discusses a method of using document clustering for information/knowledge synthesis and decision facilitation in R&D organisations. The emerging methodologies of machine learning, artificial intelligence and data science in conjunction with fuzzy mathematics can be optimally exploited to catalyse development of information bank for research organisations. This knowledge ecosystem can be utilized by the proposed mechanism to accelerate and reinforce interdisciplinary research for R&D organisations and empower them to make efficacious information-driven decisions related to project portfolio selection and proposal funding

Online Publishing @ NISCAIR

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK STUDY ON DIFFERENT SENTENCE LEVEL CLUSTERING ALGORITHMS FOR TEXT MINING

Author: PROF Mangrulkar
PROF Rakhi S Ram Waghmare
Vaishali Bhujade
Publication venue
Publication date: 11/04/2020
Field of study

Abstract: Clustering is the process of grouping of data items. The sentence clustering is used in variety of applications i.e. classify and categorization of documents, automatic summary generation, etc. In text mining, the sentence clustering plays a vital role this is used in text activities. Size of clusters can change from one cluster to another. The existing system many clustering methods and algorithms are used for clustering the documents at sentence level. In this paper, we study the different sentence clustering algorithm as a study. The main aim of this study is to present an overview of the sentence level clustering techniques are to find the drawback of the exiting work and how could overcome the all this drawback for clustering algorithm. And we can obtain the more efficient technique or we may propose the new method to overcome the problems in existing methods like time redundancy and data aqurency

CiteSeerX

An Improved Similarity Matching based Clustering Framework for Short and Sentence Level Text

Author: Basha M. John
Kaliyamurthie K.P.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2017
Field of study

Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods

Crossref

ZENODO

Institute of Advanced Engineering and Science

A New Clustering Technique On Text In Sentence For Text Mining

Author: Kumar S Phani
Narayana B Lakshmmi
Publication venue: Kakinada Institute of Engineering and Technology for Women
Publication date: 01/04/2015
Field of study

Clustering is a commonly considered data mining problem in the text domains. The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In this paper, the sentence level based clustering algorithm is discussed as a survey. The survey explains about the problems in clustering in sentence level and the solutions to overcome these problems. This paper presents a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pairwise similarities between data objects Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA) is extension of FRECCA which is used for the clustering of sentences. Contents present in text documents contain hierarchical structure and there are many terms present in the documents which are related to more than one theme hence HFRECCA will be useful algorithm for natural language documents. In this algorithm single object may belong to more than one cluster

International Journal of Science Engineering and Advance Technology (IJSEAT)

An Emergent Approach to Text Analysis Based on a Connectionist Model and the Web

Author: Cimino MARIO GIOVANNI COSIMO ANTONIO
Vaglini Gigliola
Publication venue: 'MDPI AG'
Publication date: 01/01/2013
Field of study

In this paper, we present a method to provide proactive assistance in text checking, based on usage relationships between words structuralized on the Web. For a given sentence, the method builds a connectionist structure of relationships between word n-grams. Such structure is then parameterized by means of an unsupervised and language agnostic optimization process. Finally, the method provides a representation of the sentence that allows emerging the least prominent usage-based relational patterns, helping to easily find badly-written and unpopular text. The study includes the problem statement and its characterization in the literature, as well as the proposed solving approach and some experimental use

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

An Automatic Intelligent System for Document Processing and Fruition

Author: Stefano Ferilli
Publication venue
Publication date: 01/01/2018
Field of study

With the increasing amount of documents available on-line, the need for intelligent digital libraries, that allow to automatize the document processing tasks and to suitably organize and make available the documents so as to provide personalized and focused access, becomes more and more pressing. This paper proposes an integrated system that merges intelligent modules covering all the phases involved in a document lifecycle, from acquisition, to processing, to information extraction, to personalized fruition for final users. The role and possible cooperation of Machine Learning and Data Mining techniques in the system is highlighted and discussed, along with their importance to provide effective support to both the building and the fruition of the Digital Library and the underlying knowledge base

Archivio istituzionale della ricerca - Università di Bari