Search CORE

52 research outputs found

PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts

Author: Bridges Robert A.
Czejdo Bogdan
Goodall John R.
Iannacone Michael D.
McNeil Nikki
Perez Nicolas
Publication venue
Publication date: 11/10/2013
Field of study

Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to facilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on Machine Learning and Applications 201

arXiv.org e-Print Archive

Crossref

観測頻度に基づくゆう度比の保守的な直接推定

Author: 吉田光男
川上賢十
梅村恭司
菊地真人
Publication venue: 'University of St. Thomas (Project Muse)'
Publication date: 01/04/2019
Field of study

データを確率的に取り扱う問題において，統計的尺度の推定は手法の構成やデータ分析の基盤的役割を担う．本論文では統計的尺度の一つであるゆう度比を，離散的な標本空間から得た観測頻度をもとに推定する問題を扱う．素朴な推定方法は，ゆう度比の定義に従い，ゆう度比を構成する二つの確率分布を最ゆう推定して，その比を取ることである．しかし，低頻度からゆう度比を求めるとき，この方法は推定量を不当に高く見積もってしまう場合がある．そこで，ゆう度比の直接推定法uLSIF を応用し，ゆう度比を低めに（保守的に）推定する方法を提案する．提案手法は，最ゆう推定によって求めたゆう度比を正則化パラメータによって調整する枠組みである．実験では提案手法の振る舞いを明らかにし，その有効性を示した．更に，自然言語処理におけるブートストラップ法を利用した実験も行い，提案手法の実用性も示した

Toyohashi University of Technology Repository

Comparing Statistical and Data Mining Techniques for Enrichment Ontology with Instances

Author: Imsombut Aurawan
Kajornrit Jesada
Publication venue: 'Lifescience Global'
Publication date: 09/06/2017
Field of study

Enriching instances into an ontology is an important task because the process extends knowledge in ontology to cover more extensively the domain of interest, so that greater benefits can be obtained. There are many techniques to classify instances of concepts with two popular techniques being the statistical and data mining methods. The paper compares the use of the two methods to classify instances to enrich ontology having greater domain knowledge, and selects a conditional random field for the statistical method and feature-weight k-nearest neighbor classification for the data mining method. The experiments are conducted on tourism ontology. The results show that conditional random fields methods provide greater precision and recall value than the other, specifically, F1-measure is 74.09% for conditional random fields and 60.04% for feature-weight k-nearest neighbor classification

Publication Management System

Information extraction for social media

Author: Habib Mena B.
Keulen Maurice van
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2014
Field of study

The rapid growth in IT in the last two decades has led to a growth in the amount of information available online. A new style for sharing information is social media. Social media is a continuously instantly updated source of information. In this position paper, we propose a framework for Information Extraction (IE) from unstructured user generated contents on social media. The framework proposes solutions to overcome the IE challenges in this domain such as the short context, the noisy sparse contents and the uncertain contents. To overcome the challenges facing IE from social media, State-Of-The-Art approaches need to be adapted to suit the nature of social media posts. The key components and aspects of our proposed framework are noisy text filtering, named entity extraction, named entity disambiguation, feedback loops, and uncertainty handling

CiteSeerX

University of Twente Research Information