634 research outputs found

    Indexing Highly Repetitive String Collections

    Full text link
    Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through applications like bioinformatics, the string collections experienced a growth that outperforms Moore's Law and challenges our ability of handling them even in compressed form. It turns out, fortunately, that many of these rapidly growing string collections are highly repetitive, so that their information content is orders of magnitude lower than their plain size. The statistical compression methods used for classical collections, however, are blind to this repetitiveness, and therefore a new set of techniques has been developed in order to properly exploit it. The resulting indexes form a new generation of data structures able to handle the huge repetitive string collections that we are facing. In this survey we cover the algorithmic developments that have led to these data structures. We describe the distinct compression paradigms that have been used to exploit repetitiveness, the fundamental algorithmic ideas that form the base of all the existing indexes, and the various structures that have been proposed, comparing them both in theoretical and practical aspects. We conclude with the current challenges in this fascinating field

    Text-to-picture tools, systems, and approaches: a survey

    Get PDF
    Text-to-picture systems attempt to facilitate high-level, user-friendly communication between humans and computers while promoting understanding of natural language. These systems interpret a natural language text and transform it into a visual format as pictures or images that are either static or dynamic. In this paper, we aim to identify current difficulties and the main problems faced by prior systems, and in particular, we seek to investigate the feasibility of automatic visualization of Arabic story text through multimedia. Hence, we analyzed a number of well-known text-to-picture systems, tools, and approaches. We showed their constituent steps, such as knowledge extraction, mapping, and image layout, as well as their performance and limitations. We also compared these systems based on a set of criteria, mainly natural language processing, natural language understanding, and input/output modalities. Our survey showed that currently emerging techniques in natural language processing tools and computer vision have made promising advances in analyzing general text and understanding images and videos. Furthermore, important remarks and findings have been deduced from these prior works, which would help in developing an effective text-to-picture system for learning and educational purposes. - 2019, The Author(s).This work was made possible by NPRP grant #10-0205-170346 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors

    Harnessing Big Data and Machine Learning for Event Detection and Localization

    Get PDF
    Anomalous events are rare and significantly deviate from expected pattern and other data instances, making them hard to predict. Correctly and timely detecting anomalous severe events can help reduce risks and losses. Many anomalous event detection techniques are studied in the literature. Recently, big data and machine learning based techniques have shown a remarkable success in a wide range of fields. It is important to tailor big data and machine learning based techniques for each application; otherwise it may result in expensive computation, slow prediction, false alarms, and improper prediction granularity.First, we aim to address the above challenges by harnessing big data and machine learning techniques for fast and reliable prediction and localization of severe events. Firstly, to improve storage failure prediction, we develop a new lightweight and high performing tensor decomposition-based method, named SEFEE, for storage error forecasting in large-scale enterprise storage systems. SEFEE employs tensor decomposition technique to capture latent spatio-temporal information embedded in storage event logs. By utilizing the latent spatio-temporal information, we can make accurate storage error forecasting without training requirements of typical machine learning techniques. The training-free method allows for live prediction of storage errors and their locations in the storage system based on previous observations that had been used in tensor decomposition pipeline to extract meaningful latent correlations. Moreover, we propose an extension to include severity of the errors as contextual information to improve the accuracy of tensor decomposition which in turn improves the prediction accuracy. We further provide detailed characterization of NetApp dataset to provide additional insight into the dynamics of typical large-scale enterprise storage systems for the community.Next, we focus on another application -- AI-driven Wildfire prediction. Wildfires cause billions of dollars in property damages and loss of lives, with harmful health threats. We aim to correctly detect and localize wildfire events in the early stage and also classify wildfire smoke based on perceived pixel density of camera images. Due to the lack of publicly available dataset for early wildfire smoke detection, we first collect and process images from the AlertWildfire camera network. The images are annotated with bounding boxes and densities for deep learning methods to use. We then adapt a transformer-based end-to-end object detection model for wildfire detection using our dataset. The dataset and detection model together form as a benchmark named the Nevada smoke detection benchmark, or Nemo for short. Nemo is the first open-source benchmark for wildfire smoke detection with the focus of the early incipient stage. We further provide a weakly supervised Nemo version to enable wider support as a benchmark

    Iterated function systems and shape representation

    Get PDF
    We propose the use of iterated function systems as an isomorphic shape representation scheme for use in a machine vision environment. A concise description of the basic theory and salient characteristics of iterated function systems is presented and from this we develop a formal framework within which to embed a representation scheme. Concentrating on the problem of obtaining automatically generated two-dimensional encodings we describe implementations of two solutions. The first is based on a deterministic algorithm and makes simplifying assumptions which limit its range of applicability. The second employs a novel formulation of a genetic algorithm and is intended to function with general data input. Keywords: Machine Vision, Shape Representation, Iterated Function Systems, Genetic Algorithms

    Bio-inspired Landing Approaches and Their Potential Use On Extraterrestrial Bodies

    No full text
    International audienceAutomatic landing on extraterrestrial bodies is still a challenging and hazardous task. Here we propose a new type of autopilot designed to solve landing problems, which is based on neurophysiological, behavioral, and biorobotic findings on flying insects. Flying insects excel in optic flow sensing techniques and cope with highly parallel data at a low energy and computational cost using lightweight dedicated motion processing circuits. In the first part of this paper, we present our biomimetic approach in the context of a lunar landing scenario, assuming a 2-degree-of-freedom spacecraft approaching the moon, which is simulated with the PANGU software. The autopilot we propose relies only on optic flow (OF) and inertial measurements, and aims at regulating the OF generated during the landing approach, by means of a feedback control system whose sensor is an OF sensor. We put forward an estimation method based on a two-sensor setup to accurately estimate the orientation of the lander's velocity vector, which is mandatory to control the lander's pitch in a near optimal way with respect to the fuel consumption. In the second part, we present a lightweight Visual Motion Sensor (VMS) which draws on the results of neurophysiological studies on the insect visual system. The VMS was able to perform local 1-D angular speed measurements in the range 1.5°/s - 25°/s. The sensor was mounted on an 80 kg unmanned helicopter and test-flown outdoors over various fields. The OF measured onboard was shown to match the ground-truth optic flow despite the dramatic disturbances and vibrations experienced by the sensor

    Multi-texture image segmentation

    Get PDF
    Visual perception of images is closely related to the recognition of the different texture areas within an image. Identifying the boundaries of these regions is an important step in image analysis and image understanding. This thesis presents supervised and unsupervised methods which allow an efficient segmentation of the texture regions within multi-texture images. The features used by the methods are based on a measure of the fractal dimension of surfaces in several directions, which allows the transformation of the image into a set of feature images, however no direct measurement of the fractal dimension is made. Using this set of features, supervised and unsupervised, statistical processing schemes are presented which produce low classification error rates. Natural texture images are examined with particular application to the analysis of sonar images of the seabed. A number of processes based on fractal models for texture synthesis are also presented. These are used to produce realistic images of natural textures, again with particular reference to sonar images of the seabed, and which show the importance of phase and directionality in our perception of texture. A further extension is shown to give possible uses for image coding and object identification
    corecore