Search CORE

140,124 research outputs found

Rancang Bangun Aplikasi Text Mining dalam Mengelompokkan Judul Penelitian Dosen Menggunakan Metode Shared Nearest Neighbor dan Euclidean Similarity

Author: Mushlihudin Mushlihudin
Zahrotun Lisna
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 30/12/2017
Field of study

Data mining adalah proses untuk mengekstrak informasi tersembunyi menjadi sebuah pengetahuan. Beberapa jenis data dalam data mining adalah web mining, text mining, sequence mining, graph mining, temporal data mining, mining spatial data, Mining data terdistribusi dan multimedia mining. Pengelompokan dokumen merupakan salah satu teknik dari text mining. Tujuan penelitian ini adalah untuk membangun aplikasi pengelompokkan judul penelitian dosen menggunakan metode shared nearest neighbor. Metode yang digunakan dalam penelitian merupakan salah satu metode pengelompokkan dalam text mining yaitu shared nearest neighbor (SNN) dengan euclidean similarity. Pengujian dilakukan menggunakan black box test. Hasil dari penelitian ini adalah aplikasi text mining yang mampu mengelompokkan judul penelitian dose

Journal of Education and Learning (EduLearn)

UAD Journal Management System

Concept Relation Discovery and Innovation Enabling Technology (CORDIET)

Author: Dedene Guido
Elzinga Paul
Ignatov Dmitry
Kuznetsov Sergei O.
Neznanov Alexey
Poelmans Jonas
Viaene Stijn
Publication venue
Publication date: 01/01/2011
Field of study

Concept Relation Discovery and Innovation Enabling Technology (CORDIET), is a toolbox for gaining new knowledge from unstructured text data. At the core of CORDIET is the C-K theory which captures the essential elements of innovation. The tool uses Formal Concept Analysis (FCA), Emergent Self Organizing Maps (ESOM) and Hidden Markov Models (HMM) as main artifacts in the analysis process. The user can define temporal, text mining and compound attributes. The text mining attributes are used to analyze the unstructured text in documents, the temporal attributes use these document's timestamps for analysis. The compound attributes are XML rules based on text mining and temporal attributes. The user can cluster objects with object-cluster rules and can chop the data in pieces with segmentation rules. The artifacts are optimized for efficient data analysis; object labels in the FCA lattice and ESOM map contain an URL on which the user can click to open the selected document

arXiv.org e-Print Archive

Vlerick Repository

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Temporal Text Mining: From Frequencies to Word Embeddings

Author: Del Coco Pierpaolo Elio Jr
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 15/03/2018
Field of study

The last decade has witnessed a tremendous growth in the amount of textual data available from web pages and social media posts, as well as from digitized sources, such as newspapers and books. However, as new data is continuously created to record the events of the moment, old data is archived day by day, for months, years, and decades. From this point of view, web archives play an important role not only as sources of data, but also as testimonials of history. In this respect, state-of-art machine learning models for word representations, namely word embeddings, are not able to capture the dynamic nature of semantics, since they represent a word as a single-state vector which do not consider different time spans of the corpus. Although diachronic word embeddings have started appearing in recent works, the very small literature leaves several open questions that must be addressed. Moreover, these works model language evolution from a strong linguistic perspective. We approach this problem from a slightly different perspective. In particular, we discuss temporal word embeddings models trained on highly evolving corpora, in order to model the knowledge that textual archives have accumulated over the years. This allow to discover semantic evolution of words, but also find temporal analogies and compute temporal translations. Moreover, we conducted experiments on word frequencies. The results of an in-depth temporal analysis of shifts in word semantics, in comparison to word frequencies, show that these two variations are related

AMS Tesi di Laurea

Detecting temporal cognition in text: Comparison of Judgements by self, expert and machine

Author: BUSBY GRANT Janie
Walsh Erin
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

There is a growing research focus on temporal cognition, due to its importance in memory and planning, and links with psychological wellbeing. Researchers are increasingly using diary studies, experience sampling and social media data to study temporal thought. However, it remains unclear whether such reports can be accurately interpreted for temporal orientation. In this study, temporal orientation judgements about text reports of thoughts were compared across human coding, automatic text mining, and participant self-report

Directory of Open Access Journals

University of Canberra Research Repository

Frontiers - Publisher Connector

The Australian National University

Terrorist threat assessment with formal concept analysis.

Author: Dedene Guido
Elzinga Paul
Morsing Shanti
Poelmans Jonas
Viaene Stijn
Publication venue
Publication date
Field of study

The National Police Service Agency of the Netherlands developed a model to classify (potential) jihadists in four sequential phases of radicalism. The goal of the model is to signal the potential jihadist as early as possible to prevent him or her to enter the next phase. This model has up till now, never been used to actively find new subjects. In this paper, we use Formal Concept Analysis to extract and visualize potential jihadists in the different phases of radicalism from a large set of reports describing police observations. We employ Temporal Concept Analysis to visualize how a possible jihadist radicalizes over time. The combination of these instruments allows for easy decisionmaking on where and when to act.Formal concept analysis; Temporal concept analysis; Contextual attribute logic; Text mining; Terrorist threat assesment;

Research Papers in Economics

Document Clustering with Bursty Information

Author: Chaoji Vineet
Hoonlor Apirak
Szymanski Bolesław K.
Zaki Mohamed J.
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 30/01/2013
Field of study

Nowadays, almost all text corpora, such as blogs, emails and RSS feeds, are a collection of text streams. The traditional vector space model (VSM), or bag-of-words representation, cannot capture the temporal aspect of these text streams. So far, only a few bursty features have been proposed to create text representations with temporal modeling for the text streams. We propose bursty feature representations that perform better than VSM on various text mining tasks, such as document retrieval, topic modeling and text categorization. For text clustering, we propose a novel framework to generate bursty distance measure. We evaluated it on UPGMA, Star and K-Medoids clustering algorithms. The bursty distance measure did not only perform equally well on various text collections, but it was also able to cluster the news articles related to specific events much better than other models

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)