140,124 research outputs found

    Rancang Bangun Aplikasi Text Mining dalam Mengelompokkan Judul Penelitian Dosen Menggunakan Metode Shared Nearest Neighbor dan Euclidean Similarity

    Get PDF
    Data mining adalah proses untuk mengekstrak informasi tersembunyi menjadi sebuah pengetahuan. Beberapa jenis data dalam data mining adalah web mining, text mining, sequence mining, graph mining, temporal data mining, mining spatial data, Mining data terdistribusi dan multimedia mining. Pengelompokan dokumen merupakan salah satu teknik dari text mining. Tujuan penelitian ini adalah untuk membangun aplikasi pengelompokkan judul penelitian dosen menggunakan metode shared nearest neighbor. Metode yang digunakan dalam penelitian merupakan salah satu metode pengelompokkan dalam text mining yaitu shared nearest neighbor (SNN) dengan euclidean similarity. Pengujian dilakukan menggunakan black box test. Hasil dari penelitian ini adalah aplikasi text mining yang mampu mengelompokkan judul penelitian dose

    Concept Relation Discovery and Innovation Enabling Technology (CORDIET)

    Get PDF
    Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user can define temporal, text mining and compound attributes. The text mining attributes are used to analyze the unstructured text in documents, the temporal attributes use these document's timestamps for analysis. The compound attributes are XML rules based on text mining and temporal attributes. The user can cluster objects with object-cluster rules and can chop the data in pieces with segmentation rules. The artifacts are optimized for efficient data analysis; object labels in the FCA lattice and ESOM map contain an URL on which the user can click to open the selected document

    Temporal Text Mining: From Frequencies to Word Embeddings

    Get PDF
    The last decade has witnessed a tremendous growth in the amount of textual data available from web pages and social media posts, as well as from digitized sources, such as newspapers and books. However, as new data is continuously created to record the events of the moment, old data is archived day by day, for months, years, and decades. From this point of view, web archives play an important role not only as sources of data, but also as testimonials of history. In this respect, state-of-art machine learning models for word representations, namely word embeddings, are not able to capture the dynamic nature of semantics, since they represent a word as a single-state vector which do not consider different time spans of the corpus. Although diachronic word embeddings have started appearing in recent works, the very small literature leaves several open questions that must be addressed. Moreover, these works model language evolution from a strong linguistic perspective. We approach this problem from a slightly different perspective. In particular, we discuss temporal word embeddings models trained on highly evolving corpora, in order to model the knowledge that textual archives have accumulated over the years. This allow to discover semantic evolution of words, but also find temporal analogies and compute temporal translations. Moreover, we conducted experiments on word frequencies. The results of an in-depth temporal analysis of shifts in word semantics, in comparison to word frequencies, show that these two variations are related

    Detecting temporal cognition in text: Comparison of Judgements by self, expert and machine

    Get PDF
    There is a growing research focus on temporal cognition, due to its importance in memory and planning, and links with psychological wellbeing. Researchers are increasingly using diary studies, experience sampling and social media data to study temporal thought. However, it remains unclear whether such reports can be accurately interpreted for temporal orientation. In this study, temporal orientation judgements about text reports of thoughts were compared across human coding, automatic text mining, and participant self-report

    Terrorist threat assessment with formal concept analysis.

    Get PDF
    The National Police Service Agency of the Netherlands developed a model to classify (potential) jihadists in four sequential phases of radicalism. The goal of the model is to signal the potential jihadist as early as possible to prevent him or her to enter the next phase. This model has up till now, never been used to actively find new subjects. In this paper, we use Formal Concept Analysis to extract and visualize potential jihadists in the different phases of radicalism from a large set of reports describing police observations. We employ Temporal Concept Analysis to visualize how a possible jihadist radicalizes over time. The combination of these instruments allows for easy decisionmaking on where and when to act.Formal concept analysis; Temporal concept analysis; Contextual attribute logic; Text mining; Terrorist threat assesment;

    Document Clustering with Bursty Information

    Get PDF
    Nowadays, almost all text corpora, such as blogs, emails and RSS feeds, are a collection of text streams. The traditional vector space model (VSM), or bag-of-words representation, cannot capture the temporal aspect of these text streams. So far, only a few bursty features have been proposed to create text representations with temporal modeling for the text streams. We propose bursty feature representations that perform better than VSM on various text mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we propose a novel framework to generate bursty distance measure. We evaluated it on UPGMA, Star and K-Medoids clustering algorithms. The bursty distance measure did not only perform equally well on various text collections, but it was also able to cluster the news articles related to specific events much better than other models
    • …