76 research outputs found

    A Statistical Study of Technological Trends in Logistics: Patent Analysis

    Get PDF
    The aim of the study is to develop a methodology for constructing series for distribution of patents based on selected characteristics, to analyze patterns in the structure of logistics processes and structural changes in them as an information base for a statistical study of technological trends in logistics. It is determined that research on technological trends is becoming very relevant for developing a successful technological strategy in logistics, taking into account the strategic importance of identifying opportunities for and threats to technological development while achieving sustainable competitiveness in the logistic market. Given the rapid pace of development of technology, both in logistics and related fields, logistics service providers need methodological support and relevant patent data to facilitate research on technological trends to increase their competitiveness. It is determined that using the method of expert assessments, qualification and company analysis in studying technological trends in logistics under modern conditions is ineffective, while statistical study of technological trends based on patent analysis in logistics is underestimated. The necessity of a statistical study of technological trends in logistics using a 4-stage methodology for conducting patent analysis is substantiated. The proposed methodology involves the use of the Latent Dirichlet Allocation (LDA), which allows determining the logistics related technological topics behind patents. In addition, information on technological topics and their trends, obtained as a result of the proposed methodology for patent analysis, will help to better understand the technological landscape in logistics

    KNN with TF-IDF based Framework for Text Categorization

    Get PDF
    AbstractKNN is a very popular algorithm for text classification. This paper presents the possibility of using KNN algorithm with TF-IDF method and framework for text classification. Framework enables classification according to various parameters, measurement and analysis of results. Evaluation of framework was focused on the speed and quality of classification. The results of testing showed the good and bad features of algorithm, providing guidance for the further development of similar frameworks

    System of Information Feedback on Archive Using Term Frequency-Inverse Document Frequency and Vector Space Model Methods

    Get PDF
    The archive is one of the examples of documents that important. Archives are stored systematically with a view to helping and simplifying the storage and retrieval of the archive. In the information retrieval (Information retrieval) the process of retrieving relevant documents and not retrieving documents that are not relevant. To retrieve the relevant documents, a method is needed. Using the Term Frequency-Inverse Document and Vector Space Model methods can find relevant documents according to the level of closeness or similarity, in addition to applying the Nazief-Adriani stemming algorithm can improve information retrieval performance by transforming words in a document or text to the basic word form. then the system indexes the document to simplify and speed up the search process. Relevance is determined by calculating the similarity values between existing documents by querying and represented in certain forms. The documents obtained, then the system sort by the level of relevance to the query


    Get PDF
    Research on one of the higher education dharmas is carried out by each lecturer and is a challenge for lecturers who pay attention to produce new and useful findings. Research results will be published in journals both nationally and internationally and one of the websites published by Ristekbirn is Sinta which includes all research works in Indonesia. The problem in this research is the accumulation of data that is getting bigger and it needs to be analyzed by utilizing text mining by searching for the resources contained in the abstract document and presenting part of the information. The purpose of this study is to classify the suitability of another document so that knowledge is found. and placement in groups according to existing topics. The process of these problems is by classifying documents based on abstracts from the publication of scientific papers. Solving these problems involves two mutually supporting algorithms, namely TD-IDF with Cosine Similarity with different tasks. TF-IDF ensures the weight of each document that can be read and read with Cosine Similarity. This research uses text mining as part of the search for related patterns and documents that have been tested. For the process of calculating the test data, 1 document and 15 documents were used as training data. With the calculation of TD-IDF the weight of each document from Q, D2 to D15 is 10,946, 28,050,27,176, 39,043, 36,535, 30,696, 25,612, 12,581, 42,335, 29,661, 33,867, 31,706, 22,654, 15,450, 59,832, 42,127, The similarity of the data is tested by determining the value of k = 4 which results in similarity to the Expert System and Cryptography, while with the selection of K = 5 with the highest similarity to the expert system.

    Tf*Idf and Random Walk For Term Candidate Selection On Automatic Subject Indexing

    Get PDF
    Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. The selection of term candidate on automatic subject indexing is very important, because it can influence the result of topic extraction on document. Recently on the automatic subject indexing especially in the term candidate selection only consider terms in the document collection. In contrast, indexer prefers to choose general term on manual subject indexing for selection of term candidate. In this paper, we proposed a new strategy for selecting term candidate on automatic subject indexing for extraction the main topic from the document. The proposed method uses a combination of Term Frequency Inverse Document Frequency (TF*IDF) and Random Walk on the structure of thesaurus. Experimental results show that the proposed method can select the terms candidate that relevant to the topic of the document with F-Measure of 0.24

    Clustering Software Components for Program Restructuring and Component Reuse Using Hybrid XNOR Similarity Function

    Get PDF
    AbstractComponent based software development has gained a lot of practical importance in the field of software engineering from academic researchers and also from industry perspective. Finding components for efficient software reuse is one of the important problems aimed by researchers. Clustering reduces the search space of components by grouping similar entities together thus ensuring reduced time complexity as it reduces the search time for component retrieval. In this research, we instigate a generalized approach for clustering a given set of documents or software components by defining a similarity function called hybrid XNOR function to find degree of similarity between two document sets or software components. A similarity matrix is obtained for a given set of documents or components by applying hybrid XNOR function. We define and design the algorithm for component or document clustering which has the input as similarity matrix and output being set of clusters. The output is a set of highly cohesive pattern groups or components

    Impact of Online Education and Sentiment Analysis from Twitter Data using Topic Modeling Algorithms

    Get PDF
    During a pandemic, all industries suffer greatly, and every sector of the world suffers in some way, including the education sector. Internet expressions reflect users' feelings about a product or service. The polarity of information in source data toward a subject under investigation is determined by sentiment analysis processes. The goal of this study is to examine social media expressions about online teaching and learning, as online education will become a part of everyday life in the future. We collected data from Twitter using keywords related to online education and Google form from engineering undergraduate students for prototype implementation. This analysis will assist teachers, parents, and the student community in understanding the benefits and drawbacks of the education industry, allowing for further improvement in educational outcomes. We used aspect-based sentiment analysis and topic modeling to determine sentiment polarity and important topics for education sector stakeholders. To begin, we used TextBlob Python package to determine sentiment polarity, and Bag of Words, LDA and LSA model for discovering topics. After modeling topics from the collected data, topic Coherence is used to assess the degree of semantic similarity between high-scoring words in the topic. The word cloud and LDAvis are used to visualize data. The experimental results are promising and it will assist education stakeholders in addressing the concerns that have been identified as social media expressions to work on

    Sentiment Analysis of Conservation Studies Captures Successes of Species Reintroductions

    Get PDF
    Learning from the rapidly growing body of scientific articles is constrained by human bandwidth. Existing methods in machine learning have been developed to extract knowledge from human language and may automate this process. Here, we apply sentiment analysis, a type of natural language processing, to facilitate a literature review in reintroduction biology. We analyzed 1,030,558 words from 4,313 scientific abstracts published over four decades using four previously trained lexicon-based models and one recursive neural tensor network model. We find frequently used terms share both a general and a domain-specific value, with either positive (success, protect, growth) or negative (threaten, loss, risk) sentiment. Sentiment trends suggest that reintroduction studies have become less variable and increasingly successful over time and seem to capture known successes and challenges for conservation biology. This approach offers promise for rapidly extracting explicit and latent information from a large corpus of scientific texts
    • …