5 research outputs found

    Text documents clustering using data mining techniques

    Get PDF
    Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts on weight of document that calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights

    A web content mining application for detecting relevant pages using Jaccard similarity

    Get PDF
    The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity

    Text Mining: Design of Interactive Search Engine Based Regular Expressions of Online Automobile Advertisements

    No full text
    Technology world has greatly evolved over the past decades, which led to inflated data volume. This progress of technology in the digital form generated scattered texts across millions of web pages. Unstructured texts contain a vast amount of textual data. Discover of useful and interesting relations from unstructured texts requires more processing by computers. Therefore, text mining and information extraction have become an exciting research field to get structured and valuable information. This paper focuses on text pre-processing of automotive advertisements domains to configure a structured database. The structured database was created by extract the information over unstructured automotive advertisements, which is an area of natural language processing. Information extraction deals with finding factual information in text using learning regular expressions. We manually craft rule-based specific approaches to extract structured information from unstructured web pages. Structured information will be provided by user-friendly search engine designed for topic-specific knowledge. Consequently, this information that extracted from these advertisements uses to perform a structured search over certain interesting attributes. Thus, the tuples are assigned a probability and indexed to support the efficiency of extraction and exploration via user queries

    Data loss prevention (DLP) by using MRSH-v2 algorithm

    No full text
    Sensitive data may be stored in different forms. Not only legal owners but also malicious people are interesting of getting sensitive data. Exposing valuable data to others leads to severe Consequences. Customers, organizations, and /or companies lose their money and reputation due to data breaches. There are many reasons for data leakages. Internal threats such as human mistakes and external threats such as DDoS attacks are two main reasons for data loss. In general, data may be categorized based into three kinds: data in use, data at rest, and data in motion. Data Loss Prevention (DLP) are good tools to identify important data. DLP can do analysis for data content and send feedback to administrators to make decision such as filtering, deleting, or encryption. Data Loss Prevention (DLP) tools are not a final solution for data breaches, but they consider good security tools to eliminate malicious activities and protect sensitive information. There are many kinds of DLP techniques, and approximation matching is one of them. Mrsh-v2 is one type of approximation matching. It is implemented and evaluated by using TS dataset and confusion matrix. Finally, Mrsh-v2 has high score of true positive and sensitivity, and it has low score of false negative

    Students' participation in collaborative research should be recognised

    No full text
    Letter to the editor
    corecore