11 research outputs found

    Crime Hot-Spot Modeling via Topic Modeling and Relative Density Estimation

    Full text link
    We present a method to capture groupings of similar calls and determine their relative spatial distribution from a collection of crime record narratives. We first obtain a topic distribution for each narrative, and then propose a nearest neighbors relative density estimation (kNN-RDE) approach to obtain spatial relative densities per topic. Experiments over a large corpus (n=475,019n=475,019) of narrative documents from the Atlanta Police Department demonstrate the viability of our method in capturing geographic hot-spot trends which call dispatchers do not initially pick up on and which go unnoticed due to conflation with elevated event density in general.Comment: 9 pages, 12 figure

    外れ値を考慮した複数辞書によるオンラインNMF

    Get PDF
     実世界には環境音等の多様な混合信号が存在している.これらの信号の多くは非負値で表すことができ,ガウシアンノイズのような雑音だけでなく外れ値を含むようなものも存在する.こういった,実世界に存在する様々な混合信号の特定の要素の信号に注目し,その信号の特性を把握した上での信号解析を目指す. 外れ値を含む混合信号の解析を行うことにより,画像であればノイズ除去や超解像,音声であれば音源分離や自動採譜といった事が可能になる.この他にも,データ構造の把握によってエンターテイメント,セキュリティ等の様々な観点からデータを扱うことができる. このような信号を解析する手法の一つとして非負値行列因子分解(NMF)が存在する.非負値の行列で表すことができる信号であれば,基底行列と係数行列と呼ばれる行列に分解することができ,基底行列にその信号の頻出パターンを得ることができる.発展形としては雑音に対して頑強なモデルや,大規模なデータにも対応可能なオンライン学習モデルが存在する.また,先行研究として外れ値を考慮したオンラインNMFの研究も行われている. 本研究では,台風中継でのレポーターの音声や高校野球での歓声中の器楽演奏といった,一部の要素の信号の特性が予め把握できる信号の解析を行う. 更に,オンライン学習可能にすることにより,大規模なデータ解析にも応用することができる.また,逐次的に追加される信号の解析ができるため,多様な混合信号にも対応することができる. 提案手法では,従来手法に加えて混合信号の中の特定の信号の特性を予め学習したデータを用意し,それを踏まえて混合信号の解析を行った.予め注目した信号の特性を把握した上で学習を行うことにより,様々な混合信号から注目した信号を抽出可能になるという利点に着目した. 提案手法による人工データ,画像データ,音源データの信号分離実験を行った結果,良好な結果は得られなかったが,複数の係数行列および基底行列における制約条件や初期値設定など,幾つかの検討すべき課題を得る事ができた.電気通信大学201

    Topic space trajectories: A case study on machine learning literature

    Get PDF
    The annual number of publications at scientific venues, for example, conferences and journals, is growing quickly. Hence, even for researchers it becomes harder and harder to keep track of research topics and their progress. In this task, researchers can be supported by automated publication analysis. Yet, many such methods result in uninterpretable, purely numerical representations. As an attempt to support human analysts, we present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We demonstrate how these trajectories can be interpreted based on eight different analysis approaches. To obtain comprehensible results, we employ non-negative matrix factorization as well as suitable visualization techniques. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. In addition to a thorough introduction of our method, our focus is on an extensive analysis of the results we achieved. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work. An advantage in these applications over previous methods lies in the good interpretability of the results obtained through our methods

    Unsupervised Anomaly Detection of High Dimensional Data with Low Dimensional Embedded Manifold

    Get PDF
    Anomaly detection techniques are supposed to identify anomalies from loads of seemingly homogeneous data and being able to do so can lead us to timely, pivotal and actionable decisions, saving us from potential human, financial and informational loss. In anomaly detection, an often encountered situation is the absence of prior knowledge about the nature of anomalies. Such circumstances advocate for ‘unsupervised’ learning-based anomaly detection techniques. Compared to its ‘supervised’ counterpart, which possesses the luxury to utilize a labeled training dataset containing both normal and anomalous samples, unsupervised problems are far more difficult. Moreover, high dimensional streaming data from tons of interconnected sensors present in modern day industries makes the task more challenging. To carry out an investigative effort to address these challenges is the overarching theme of this dissertation. In this dissertation, the fundamental issue of similarity measure among observations, which is a central piece in any anomaly detection techniques, is reassessed. Manifold hypotheses suggests the possibility of low dimensional manifold structure embedded in high dimensional data. In the presence of such structured space, traditional similarity measures fail to measure the true intrinsic similarity. In light of this revelation, reevaluating the notion of similarity measure seems more pressing rather than providing incremental improvements over any of the existing techniques. A graph theoretic similarity measure is proposed to differentiate and thus identify the anomalies from normal observations. Specifically, the minimum spanning tree (MST), a graph-based approach is proposed to approximate the similarities among data points in the presence of high dimensional structured space. It can track the structure of the embedded manifold better than the existing measures and help to distinguish the anomalies from normal observations. This dissertation investigates further three different aspects of the anomaly detection problem and develops three sets of solution approaches with all of them revolving around the newly proposed MST based similarity measure. In the first part of the dissertation, a local MST (LoMST) based anomaly detection approach is proposed to detect anomalies using the data in the original space. A two-step procedure is developed to detect both cluster and point anomalies. The next two sets of methods are proposed in the subsequent two parts of the dissertation, for anomaly detection in reduced data space. In the second part of the dissertation, a neighborhood structure assisted version of the nonnegative matrix factorization approach (NS-NMF) is proposed. To detect anomalies, it uses the neighborhood information captured by a sparse MST similarity matrix along with the original attribute information. To meet the industry demands, the online version of both LoMST and NS-NMF is also developed for real-time anomaly detection. In the last part of the dissertation, a graph regularized autoencoder is proposed which uses an MST regularizer in addition to the original loss function and is thus capable of maintaining the local invariance property. All of the approaches proposed in the dissertation are tested on 20 benchmark datasets and one real-life hydropower dataset. When compared with the state of art approaches, all three approaches produce statistically significant better outcomes. “Industry 4.0” is a reality now and it calls for anomaly detection techniques capable of processing a large amount of high dimensional data generated in real-time. The proposed MST based similarity measure followed by the individual techniques developed in this dissertation are equipped to tackle each of these issues and provide an effective and reliable real-time anomaly identification platform
    corecore