43 research outputs found

    Extracting News Events from Microblogs

    Full text link
    Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance

    VIDEO SCENE DETECTION USING CLOSED CAPTION TEXT

    Get PDF
    Issues in Automatic Video Biography Editing are similar to those in Video Scene Detection and Topic Detection and Tracking (TDT). The techniques of Video Scene Detection and TDT can be applied to interviews to reduce the time necessary to edit a video biography. The system has attacked the problems of extraction of video text, story segmentation, and correlation. This thesis project was divided into three parts: extraction, scene detection, and correlation. The project successfully detected scene breaks in series television episodes and displayed scenes that had similar content

    A Temporal Frequent Itemset-Based Clustering Approach For Discovering Event Episodes From News Sequence

    Get PDF
    When performing environmental scanning, organizations typically deal with a numerous of events and topics about their core business, relevant technique standards, competitors, and market, where each event or topic to monitor or track generally is associated with many news documents. To reduce information overload and information fatigues when monitoring or tracking such events, it is essential to develop an effective event episode discovery mechanism for organizing all news documents pertaining to an event of interest. In this study, we propose the time-adjoining frequent itemset-based event-episode discovery (TAFIED) technique. Based on the frequent itemset-based hierarchical clustering (FIHC) approach, our proposed TAFIED further considers the temporal characteristic of news articles, including the burst, novelty, and temporal proximity of features in an event episode, when discovering event episodes from the sequence of news articles pertaining to a specific event. Using the traditional feature-based HAC, HAC with a time-decaying function (HAC+TD), and FIHC techniques as performance benchmarks, our empirical evaluation results suggest that the proposed TAFIED technique outperforms all evaluation benchmarks in cluster recall and cluster precision

    Analyse der Meinungsentwicklung in Online Foren – Konzept und Fallstudie

    Get PDF
    Das Web 2.0 ist u.a. auch eine weltweite Plattform für Meinungsäußerungen. Immer mehr Kunden diskutieren online über Produkte und tauschen Erfahrungen aus. Die Analyse der Online Beiträge stellt daher ein wichtiges Marktforschungsinstrument dar. Es wird ein Ansatz zur automatischen Identifikation, Aggregation und Analyse von Meinungen mittels Text Mining vorgestellt und dessen Anwendung an einem Beispiel aus der Sportartikelindustrie aufgezeigt

    TEXT MINING AND TEMPORAL TREND DETECTION ON THE INTERNET FOR TECHNOLOGY ASSESSMENT: MODEL AND TOOL

    Get PDF
    In today´s world, organizations conduct technology assessment (TAS) prior to decision making about investments in existing, emerging, and hot technologies to avoid costly mistakes and survive in the hyper-competitive business environment. Relying on web search engines in looking for relevant information for TAS processes, decision makers face abundant unstructured information that limit their ability to assess technologies within a reasonable time frame. Thus the following qustion arises: how to extract valuable TAS knowledge from a diverse corpus of textual data on the web? To cope with this qustion, this paper presents a web-based model and tool for knowledge mapping. The proposed knowledge maps are constructed on the basis of a novel method of co-word analysis, based on webometric web counts and a temporal trend detection algorithm which employs the vector space model (VSM). The approach is demonstrated and validated for a spectrum of information technologies. Results show that the research model assessments are highly correlated with subjective expert (n=136) assessment (r \u3e 0.91), and with predictive validity valu above 85%. Thus, it seems safe to assume that this work can probably be generalized to other domains. The model contribution is emphasized by the current growing attention to the big-data phenomenon

    Proximity-based document representation for named entity retrieval

    Full text link

    Pengembangan Pencarian Produk Terkait Menggunakan Euclidean Distance Dan Cosine Similarity Pada Aplikasi Halal Nutrition Food

    Get PDF
    Seiring berjalannya waktu muncul lebih banyak inovasi dan variasi produk makanan baru. Namun hanya sebagian kecil yang sudah tersertifikasi. Pada tahun 2016 lalu, telah dikembangkan aplikasi Halal Nutrition Food untuk oleh Jauhar Fatawi. Aplikasi tersebut mempermudah pengguna melakukan pencarian terhadap produk halal. Aplikasi dikembangkan lagi pada tahun 2017 oleh Adnan Mauludin Fajriyadi dengan mengembangkan fitur pencarian menggunakan algoritma OKAPI BM25F untuk meningkatkan relevansi hasil pencarian. Penelitian kali ini mengembangkan penelitian sebelumnya dengan menampilkan produk halal yang terkait berdasarkan komposisinya. Produk terkait dicari berdasarkan kemiripan komposisi menggunakan euclidean distance dan cosine similarity. Produk yang memiliki banyak kemiripan dengan produk yang telah tersertifikasi, dapat menambah keyakinan pengguna walaupun produk yang sedang dicari belum tersertifikasi. Aplikasi dapat menampilkan notifikasi kepada pengguna apabila terdapat produk halal yang mirip dengan produk yang sedang dilihat pengguna. Penelitian ini juga menunjukkan bahwa pencarian produk terkait menggunakan cosine similarity memiliki presisi sebesar 84%, sedangkan euclidean distance memiliki presisi sebesar 72%. Penelitian ini juga menguji fitur MoreLikeThis dari Apache Lucene yang memiliki presisi sebesar 80%, sedikit lebih rendah dibandingkan dengan cosine similarity. =================================================================================================== As time went on, more innovations and new food product variations emerged. But only a small percentage has been certified. In 2016, Halal Nutrition Food has been developed by Jauhar Fatawi. The app makes it easy for users to search halal products. The application was developed again in 2017 by Adnan Mauludin Fajriyadi by developing a search feature using the BM25F OKAPI algorithm to improve the relevance of search results. This research develops previous research by showing the related halal product based on its composition. Related products are searched on the basis of similarities of composition using euclidean distance and cosine similarity. Products that have many similarities with a certified product can add to the user’s belief that the product being searched is not certified. The app can display a notification to the user if there is a halal product similar to the product that the user is viewing. This study also shows that the search for related products using cosine similarity has a precision of 84%, while the euclidean distance has aprecision of 72%. This study also tested MoreLikeThis feature from Apache Lucene that has a precision of 80%, slightly lower than cosine similarity
    corecore