7 research outputs found

    i-JEN: Visual interactive Malaysia crime news retrieval system

    Get PDF
    Supporting crime news investigation involves a mechanism to help monitor the current and past status of criminal events. We believe this could be well facilitated by focusing on the user interfaces and the event crime model aspects. In this paper we discuss on a development of Visual Interactive Malaysia Crime News Retrieval System (i-JEN) and describe the approach, user studies and planned, the system architecture and future plan. Our main objectives are to construct crime-based event; investigate the use of crime-based event in improving the classification and clustering; develop an interactive crime news retrieval system; visualize crime news in an effective and interactive way; integrate them into a usable and robust system and evaluate the usability and system performance. The system will serve as a news monitoring system which aims to automatically organize, retrieve and present the crime news in such a way as to support an effective monitoring, searching, and browsing for the target users groups of general public, news analysts and policemen or crime investigators. The study will contribute to the better understanding of the crime data consumption in the Malaysian context as well as the developed system with the visualisation features to address crime data and the eventual goal of combating the crimes

    i-JEN: Visual Interactive Malaysia Crime News Retrieval System

    Full text link

    Topic Tracking for Punjabi Language

    Get PDF
    This paper introduces Topic Tracking for Punjabi language. Text mining is a field that automatically extracts previously unknown and useful information from unstructured textual data. It has strong connections with natural language processing. NLP has produced technologies that teach computers natural language so that they may analyze, understand and even generate text. Topic tracking is one of the technologies that has been developed and can be used in the text mining process. The main purpose of topic tracking is to identify and follow events presented in multiple news sources, including newswires, radio and TV broadcasts. It collects dispersed information together and makes it easy for user to get a general understanding. Not much work has been done in Topic tracking for Indian Languages in general and Punjabi in particular. First we survey various approaches available for Topic Tracking, then represent our approach for Punjabi. The experimental results are shown

    Multi-objective of wind-driven optimization as feature selection and clustering to enhance text clustering

    Get PDF
    Text Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as the multi-objective first time as unsupervised Feature Selection (WDOFS) and the second time as a Clustering algorithm (WDOC). For example, the WDOC outperformed Harmony Search and Particle Swarm in terms of F-measurement by 93.3%; in contrast, text clustering's performance improves 0.9% because of using suggested clustering on the proposed feature selection. With WDOFS more than 50 percent of features have been removed from the other examination of features. The best result got the multi-objectives with F-measurement 98.3%

    Trending topic extraction from social media

    Get PDF
    Social media has become the first source of information for many people. The amount of information posted on social media daily has become very vast that it became difficult to track. One of the most popular social media applications is Twitter. Users follow lots of news accounts, public figures, and their friends so they can be updated by the latest events around them. Since the dialect language and the style of writing differ from a region to another, our objective in this research is to extract trending topics for an Egyptian twitter user. In this way, the user can easily get at a glimpse of the trending topics discussed by the people he follows. To find the best approach achieving our objective, we investigate the document pivot and the feature pivot approaches. By applying the document pivot approach on the baseline data using tf-itf (term frequency-inverse tweet frequency) representation, repeated bisecting k-means clustering technique and extracting most frequent n-grams from each cluster we could achieve a recall value of 100% and F1 measure of 0.8. The application of the feature pivot approach on the baseline data using the content similarity algorithm to group related unigrams together, could achieve a recall value of 100% and F1 measure of 0.923. To validate our results we collected 12 different data sets of different sizes (200, 400, 600, and 1200) and from three different domains (sports, entertainment, and news) then applied both approaches to them. The average recall, precision and F1 measure values resulted from applying the feature pivot approach are larger than those achieved by applying the document pivot approach. To make sure this difference in results is statistically significant we applied the Two-sample one-tailed paired significance t-test that showed the results are significantly better at confidence interval of 90% The results showed that the document pivot approach could extract the trending topics for an Egyptian twitter user with an average recall value of 0.714, average precision value of 0.521, and average F1 measure value of 0.556 versus average recall, precision and F1 measure values of 0.981, 0.754, and 0.833 respectively, when applying the feature pivot approach. â€