1,319 research outputs found

    KACST Arabic Text Classification Project: Overview and Preliminary Results

    No full text
    Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Masked Conditional Neural Networks for sound classification

    Get PDF
    The remarkable success of deep convolutional neural networks in image-related applications has led to their adoption also for sound processing. Typically the input is a time–frequency representation such as a spectrogram, and in some cases this is treated as a two-dimensional image. However, spectrogram properties are very different to those of natural images. Instead of an object occupying a contiguous region in a natural image, frequencies of a sound are scattered about the frequency axis of a spectrogram in a pattern unique to that particular sound. Applying conventional convolution neural networks has therefore required extensive hand-tuning, and presented the need to find an architecture better suited to the time–frequency properties of audio. We introduce the ConditionaL Neural Network (CLNN)1 and its extension, the Masked ConditionaL Neural Network (MCLNN) designed to exploit the nature of sound in a time–frequency representation. The CLNN is, broadly speaking, linear across frequencies but non-linear across time: it conditions its inference at a particular time based on preceding and succeeding time slices, and the MCLNN use a controlled systematic sparseness that embeds a filterbank-like behavior within the network. Additionally, the MCLNN automates the concurrent exploration of several feature combinations analogous to hand-crafting the optimum combination of features for a recognition task. We have applied the MCLNN to the problem of music genre classification, and environmental sound recognition on several music (Ballroom, GTZAN, ISMIR2004, and Homburg), and environmental sound (Urbansound8K, ESC-10, and ESC-50) datasets. The classification accuracy of the MCLNN surpasses neural networks based architectures including state-of-the-art Convolutional Neural Networks and several hand-crafted attempts

    Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization

    Get PDF
    This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to trai

    Advances in Deep Learning Towards Fire Emergency Application : Novel Architectures, Techniques and Applications of Neural Networks

    Get PDF
    Paper IV is not published yet.With respect to copyright paper IV and paper VI was excluded from the dissertation.Deep Learning has been successfully used in various applications, and recently, there has been an increasing interest in applying deep learning in emergency management. However, there are still many significant challenges that limit the use of deep learning in the latter application domain. In this thesis, we address some of these challenges and propose novel deep learning methods and architectures. The challenges we address fall in these three areas of emergency management: Detection of the emergency (fire), Analysis of the situation without human intervention and finally Evacuation Planning. In this thesis, we have used computer vision tasks of image classification and semantic segmentation, as well as sound recognition, for detection and analysis. For evacuation planning, we have used deep reinforcement learning.publishedVersio

    Indexing and retrieval in digital libraries : developing taxonomies for a repository of decision technologies

    Get PDF
    DecisionNet is an online Internet-based repository of decision technologies. It links remote users with these technologies and provides a directory service to enable search and selection of suitable technologies. The ability to retrieve relevant objects through search mechanisms is basic to any repository's success and usability and depends on effective classification of the decision technologies. This thesis develops classification methods to enable indexing of the DecisionNet repository. Existing taxonomies for software and other online repositories are examined. Criteria and principles for a good taxonomy are established and systematically applied to develop DecisionNet taxonomies. A database design is developed to store the taxonomies and to classify the technologies in the repository. User interface issues for navigation of a hierarchical classification system are discussed. A user interface for remote World Wide Web users is developed. This user interface is designed for browsing the taxonomy structure and creating search parameters online. Recommendations for the implementation of a repository search mechanism are given.http://archive.org/details/indexingndretrie1094532199NAU.S. Navy (U.S.N.) authorApproved for public release; distribution is unlimited

    MFIRE-2: A Multi Agent System for Flow-based Intrusion Detection Using Stochastic Search

    Get PDF
    Detecting attacks targeted against military and commercial computer networks is a crucial element in the domain of cyberwarfare. The traditional method of signature-based intrusion detection is a primary mechanism to alert administrators to malicious activity. However, signature-based methods are not capable of detecting new or novel attacks. This research continues the development of a novel simulated, multiagent, flow-based intrusion detection system called MFIRE. Agents in the network are trained to recognize common attacks, and they share data with other agents to improve the overall effectiveness of the system. A Support Vector Machine (SVM) is the primary classifier with which agents determine an attack is occurring. Agents are prompted to move to different locations within the network to find better vantage points, and two methods for achieving this are developed. One uses a centralized reputation-based model, and the other uses a decentralized model optimized with stochastic search. The latter is tested for basic functionality. The reputation model is extensively tested in two configurations and results show that it is significantly superior to a system with non-moving agents. The resulting system, MFIRE-2, demonstrates exciting new network defense capabilities, and should be considered for implementation in future cyberwarfare applications

    Modelling forest landscape dynamics in Glen Affric, northern Scotland

    Get PDF
    Consideration of forest management at the landscape scale is essential if commitments to the conservation of biodiversity are to be upheld. The ecosystem management approach, developed largely in North America, has made use of various landscape modelling tools to assist in planning for biodiversity maintenance and ecological restoration. The roles of habitat suitability models, metapopulation models, spatially explicit population models (SEPMs) and forest landscape dynamics models (FLDMs) in the planning process are discussed and a review of forest dynamics models is presented. Potential is identified for developing landscape models in the UK for both landscape restoration projects and semi-natural woodland management. Glen Affric, in northern Scotland contains a large area of native pine and birch woodland and is the subject of a long-term restoration project. A new model, GALDR (Glen Affric Landscape Dynamics Reconstruction) is introduced and is believed to be the first FLDM developed for British woodland. The theory behind the model is described in detail and preliminary results and sensitivity analyses are presented. Furthermore, GALAM (Glen Affric Lichen Abundance Model), a new SEPM for the rare epiphytic lichen Bryoria furcellata is also described. Results of simulations from the linked GALDR and GALAM models are presented which shed light on the role of landscape heterogeneity in determining the dynamics of lichen habitats and populations. It is concluded that, whilst much work will be required to develop a management-oriented decision support system from the GALDR model, the modelling process may aid researchers in the identification of knowledge gaps in ecological theory relevant to management and restoration
    • …
    corecore