171 research outputs found

    Partially Overlapping Channel Assignments in Wireless Mesh Networks

    Get PDF

    Text Detection in Natural Scenes and Technical Diagrams with Convolutional Feature Learning and Cascaded Classification

    Get PDF
    An enormous amount of digital images are being generated and stored every day. Understanding text in these images is an important challenge with large impacts for academic, industrial and domestic applications. Recent studies address the difficulty of separating text targets from noise and background, all of which vary greatly in natural scenes. To tackle this problem, we develop a text detection system to analyze and utilize visual information in a data driven, automatic and intelligent way. The proposed method incorporates features learned from data, including patch-based coarse-to-fine detection (Text-Conv), connected component extraction using region growing, and graph-based word segmentation (Word-Graph). Text-Conv is a sliding window-based detector, with convolution masks learned using the Convolutional k-means algorithm (Coates et. al, 2011). Unlike convolutional neural networks (CNNs), a single vector/layer of convolution mask responses are used to classify patches. An initial coarse detection considers both local and neighboring patch responses, followed by refinement using varying aspect ratios and rotations for a smaller local detection window. Different levels of visual detail from ground truth are utilized in each step, first using constraints on bounding box intersections, and then a combination of bounding box and pixel intersections. Combining masks from different Convolutional k-means initializations, e.g., seeded using random vectors and then support vectors improves performance. The Word-Graph algorithm uses contextual information to improve word segmentation and prune false character detections based on visual features and spatial context. Our system obtains pixel, character, and word detection f-measures of 93.14%, 90.26%, and 86.77% respectively for the ICDAR 2015 Robust Reading Focused Scene Text dataset, out-performing state-of-the-art systems, and producing highly accurate text detection masks at the pixel level. To investigate the utility of our feature learning approach for other image types, we perform tests on 8- bit greyscale USPTO patent drawing diagram images. An ensemble of Ada-Boost classifiers with different convolutional features (MetaBoost) is used to classify patches as text or background. The Tesseract OCR system is used to recognize characters in detected labels and enhance performance. With appropriate pre-processing and post-processing, f-measures of 82% for part label location, and 73% for valid part label locations and strings are obtained, which are the best obtained to-date for the USPTO patent diagram data set used in our experiments. To sum up, an intelligent refinement of convolutional k-means-based feature learning and novel automatic classification methods are proposed for text detection, which obtain state-of-the-art results without the need for strong prior knowledge. Different ground truth representations along with features including edges, color, shape and spatial relationships are used coherently to improve accuracy. Different variations of feature learning are explored, e.g. support vector-seeded clustering and MetaBoost, with results suggesting that increased diversity in learned features benefit convolution-based text detectors

    Advances in Subspace-based Solutions for Diarization in the Broadcast Domain

    Get PDF
    La motivación de esta tesis es la necesidad de soluciones robustas al problema de diarización. Estas técnicas de diarización deben proporcionar valor añadido a la creciente cantidad disponible de datos multimedia mediante la precisa discriminación de los locutores presentes en la señal de audio. Desafortunadamente, hasta tiempos recientes este tipo de tecnologías solamente era viable en condiciones restringidas, quedando por tanto lejos de una solución general. Las razones detrás de las limitadas prestaciones de los sistemas de diarización son múltiples. La primera causa a tener en cuenta es la alta complejidad de la producción de la voz humana, en particular acerca de los procesos fisiológicos necesarios para incluir las características discriminativas de locutor en la señal de voz. Esta complejidad hace del proceso inverso, la estimación de dichas características a partir del audio, una tarea ineficiente por medio de las técnicas actuales del estado del arte. Consecuentemente, en su lugar deberán tenerse en cuenta aproximaciones. Los esfuerzos en la tarea de modelado han proporcionado modelos cada vez más elaborados, aunque no buscando la explicación última de naturaleza fisiológica de la señal de voz. En su lugar estos modelos aprenden relaciones entre la señales acústicas a partir de un gran conjunto de datos de entrenamiento. El desarrollo de modelos aproximados genera a su vez una segunda razón, la variabilidad de dominio. Debido al uso de relaciones aprendidas a partir de un conjunto de entrenamiento concreto, cualquier cambio de dominio que modifique las condiciones acústicas con respecto a los datos de entrenamiento condiciona las relaciones asumidas, pudiendo causar fallos consistentes en los sistemas.Nuestra contribución a las tecnologías de diarización se ha centrado en el entorno de radiodifusión. Este dominio es actualmente un entorno todavía complejo para los sistemas de diarización donde ninguna simplificación de la tarea puede ser tenida en cuenta. Por tanto, se deberá desarrollar un modelado eficiente del audio para extraer la información de locutor y como inferir el etiquetado correspondiente. Además, la presencia de múltiples condiciones acústicas debido a la existencia de diferentes programas y/o géneros en el domino requiere el desarrollo de técnicas capaces de adaptar el conocimiento adquirido en un determinado escenario donde la información está disponible a aquellos entornos donde dicha información es limitada o sencillamente no disponible.Para este propósito el trabajo desarrollado a lo largo de la tesis se ha centrado en tres subtareas: caracterización de locutor, agrupamiento y adaptación de modelos. La primera subtarea busca el modelado de un fragmento de audio para obtener representaciones precisas de los locutores involucrados, poniendo de manifiesto sus propiedades discriminativas. En este área se ha llevado a cabo un estudio acerca de las actuales estrategias de modelado, especialmente atendiendo a las limitaciones de las representaciones extraídas y poniendo de manifiesto el tipo de errores que pueden generar. Además, se han propuesto alternativas basadas en redes neuronales haciendo uso del conocimiento adquirido. La segunda tarea es el agrupamiento, encargado de desarrollar estrategias que busquen el etiquetado óptimo de los locutores. La investigación desarrollada durante esta tesis ha propuesto nuevas estrategias para estimar el mejor reparto de locutores basadas en técnicas de subespacios, especialmente PLDA. Finalmente, la tarea de adaptación de modelos busca transferir el conocimiento obtenido de un conjunto de entrenamiento a dominios alternativos donde no hay datos para extraerlo. Para este propósito los esfuerzos se han centrado en la extracción no supervisada de información de locutor del propio audio a diarizar, sinedo posteriormente usada en la adaptación de los modelos involucrados.<br /

    Interference aware cluster-based joint channel assignment scheme in multi-channel multi-radio wireless mesh networks

    Get PDF
    Wireless Mesh Networks (WMNs) are emerging as a promising solution for robust and ubiquitous broadband Internet access in both urban and rural areas. WMNs extend the coverage and capacity of traditionalWi-Fi islands through multi-hop,multichannel and multi-radio wireless connectivity. The foremost challenge, encountered in deploying a WMN, is the interference present between the co-located links, which limits the throughput of the network. Thus, the objective of this research is to improve the throughput, fairness and channel utilization of WMNs by mitigating the interference using optimized spatial re-usability of joint channels available in the 2.4 GHz Industrial, Scientific, and Medical (ISM) band. Interference is quantified depending on the relative location of the interfering links. Further, the Interference aware Non-Overlapping Channel assignment (I-NOC) model is developed to mitigate the interference by utilizing optimized spectral re-usability of Non-Overlapping Channels (NOCs). NOCs are limited in number. Therefore, I-NOC model is extended by using joint channels available in the free spectrum, and termed as Interference aware Joint Channel Assignment (I-JCA) model. Normally, joint channel assignment is considered harmful due to adjacent channel interference. However, by systematic optimization, the I-JCA model has utilized the spectral re-usability of joint channels. I-JCA model cannot be solved at the time of network initialization because it requires prior knowledge of the geometric locations of the nodes. Thus, Interference aware Cluster-based Joint Channel Assignment Scheme (I-CJCAS) is developed. I-CJCAS partitions the network topology into tangential non-overlapping clusters, with each cluster consisting of intra- and inter-cluster links. I-CJCAS mitigates the interference effect of a cluster’s intra-cluster links by assigning a distinct common channel within its interference domain. On the other hand, the inter-cluster links are assigned to a channel based on the transmitter of the inter-cluster link. I-CJCAS is benchmarked with Hyacinth, Breadth-First Search Channel Assignment (BFS-CA) and Cluster- Based Channel Assignment Scheme (CCAS) in terms of throughput, fairness, channel utilization, and impact of traffic load in single-hop and multi-hop flows. Results show that I-CJCAS has outperformed the benchmark schemes at least by a factor of 15 percent. As a part of future work, I-CJCAS can be extended to incorporate dynamic traffic load, topology control, and external interference from co-located wireless network deployments

    Duality based optical flow algorithms with applications

    Get PDF
    We consider the popular TV-L1 optical flow formulation, and the so-called dual-ity based algorithm for minimizing the TV-L1 energy. The original formulation is extended to allow for vector valued images, and minimization results are given. In addition we consider di↵erent definitions of total variation regulariza-tion, and related formulations of the optical flow problem that may be used with a duality based algorithm. We present a highly optimized algorithmic setup to estimate optical flows, and give five novel applications. The first application is registration of medical images, where X-ray images of di↵erent hands, taken using di↵erent imaging devices are registered using a TV-L1 optical flow algo-rithm. We propose to regularize the input images, using sparsity enhancing regularization of the image gradient to improve registration results. The second application is registration of 2D chromatograms, where registration only have to be done in one of the two dimensions, resulting in a vector valued registration problem with values having several hundred dimensions. We propose a nove

    Heuristic Evaluation for Novice Programming Systems

    Get PDF
    The past few years has seen a proliferation of novice programming tools. The availability of a large number of systems has made it difficult for many users to choose among them. Even for education researchers, comparing the relative quality of these tools, or judging their respective suitability for a given context, is hard in many instances. For designers of such systems, assessing the respective quality of competing design decisions can be equally difficult. Heuristic evaluation provides a practical method of assessing the quality of alternatives in these situations and of identifying potential problems with existing systems for a given target group or context. Existing sets of heuristics, however, are not specific to the domain of novice programming and thus do not evaluate all aspects of interest to us in this specialised application domain. In this article, we propose a set of heuristics to be used in heuristic evaluations of novice programming systems. These heuristics have the potential to allow a useful assessment of the quality of a given system with lower cost than full formal user studies and greater precision than the use of existing sets of heuristics. The heuristics are described and discussed in detail. We present an evaluation of the effectiveness of the heuristics that suggests that the new set of heuristics provides additional useful information to designers not obtained with existing heuristics sets

    Reinforcement Learning for Active Visual Perception

    Get PDF
    Visual perception refers to automatically recognizing, detecting, or otherwise sensing the content of an image, video or scene. The most common contemporary approach to tackle a visual perception task is by training a deep neural network on a pre-existing dataset which provides examples of task success and failure, respectively. Despite remarkable recent progress across a wide range of vision tasks, many standard methodologies are static in that they lack mechanisms for adapting to any particular settings or constraints of the task at hand. The ability to adapt is desirable in many practical scenarios, since the operating regime often differs from the training setup. For example, a robot which has learnt to recognize a static set of training images may perform poorly in real-world settings, where it may view objects from unusual angles or explore poorly illuminated environments. The robot should then ideally be able to actively position itself to observe the scene from viewpoints where it is more confident, or refine its perception with only a limited amount of training data for its present operating conditions.In this thesis we demonstrate how reinforcement learning (RL) can be integrated with three fundamental visual perception tasks -- object detection, human pose estimation, and semantic segmentation -- in order to make the resulting pipelines more adaptive, accurate and/or faster. In the first part we provide object detectors with the capacity to actively select what parts of a given image to analyze and when to terminate the detection process. Several ideas are proposed and empirically evaluated, such as explicitly including the speed-accuracy trade-off in the training process, which makes it possible to specify this trade-off during inference. In the second part we consider active multi-view 3d human pose estimation in complex scenarios with multiple people. We explore this in two different contexts: i) active triangulation, which requires carefully observing each body joint from multiple viewpoints, and ii) active viewpoint selection for monocular 3d estimators, which requires considering which viewpoints yield accurate fused estimates when combined. In both settings the viewpoint selection systems face several challenges, such as partial observability resulting e.g. from occlusions. We show that RL-based methods outperform heuristic ones in accuracy, with negligible computational overhead. Finally, the thesis concludes with establishing a framework for embodied visual active learning in the context of semantic segmentation, where an agent should explore a 3d environment and actively query annotations to refine its visual perception. Our empirical results suggest that reinforcement learning can be successfully applied within this framework as well

    Spaceborne synthetic aperture radar pilot study

    Get PDF
    A pilot study of a spaceborne sidelooking radar is summarized. The results of the system trade studies are given along with the electrical parameters for the proposed subsystems. The mechanical aspects, packaging, thermal control and dynamics of the proposed design are presented. Details of the data processor are given. A system is described that allows the data from a pass over the U. S. to be in hard copy form within two hours. Also included are the proposed schedule, work breakdown structure, and cost estimate

    Digital VLSI Architectures for Advanced Channel Decoders

    Get PDF
    Error-correcting codes are strongly adopted in almost every modern digital communication and storage system, such as wireless communications, optical communications, Flash memories, computer hard drives, sensor networks, and deep-space probes. New and emerging applications demand codes with better error-correcting capability. On the other hand, the design and implementation of those high-gain error-correcting codes pose many challenges. They usually involve complex mathematical computations, and mapping them directly to hardware often leads to very high complexity. This work aims to focus on Polar codes, which are a recent class of channel codes with the proven ability to reduce decoding error probability arbitrarily small as the block-length is increased, provided that the code rate is less than the capacity of the channel. This property and the recursive code-construction of this algorithms attracted wide interest from the communications community. Hardware architectures with reduced complexity can efficiently implement a polar codes decoder using either successive cancellation approximation or belief propagation algorithms. The latter offers higher throughput at high signal-to-noise ratio thanks to the inherently parallel decision-making capability of such decoder type. A new analysis on belief propagation scheduling algorithms for polar codes and on interconnection structure of the decoding trellis not covered in literature is also presented. It allowed to achieve an hardware implementation that increase the maximum information throughput under belief propagation decoding while also minimizing the implementation complexity
    corecore