642 research outputs found

    Picture processing for enhancement and recognition

    Get PDF
    Recent years have been characterized by an incredible growth in computing power and storage capabilities, communication speed and bandwidth availability, either for desktop platform or mobile device. The combination of these factors have led to a new era of multimedia applications: browsing of huge image archives, consultation of online video databases, location based services and many other. Multimedia is almost everywhere and requires high quality data, easy retrieval of multimedia contents, increase in network access capacity and bandwidth per user. To meet all the mentioned requirements many efforts have to be made in various research areas, ranging from signal processing, image and video analysis, communication protocols, etc. The research activity developed during these three years concerns the field of multimedia signal processing, with particular attention to image and video analysis and processing. Two main topics have been faced: the first is relating to image and video reconstruction/restoration (using super resolution techniques) in web based application for multimedia contents' fruition; the second is relating to image analysis for location based systems in indoor scenario. The first topic is relating to image and video processing, in particular the focus has been put on the development of algorithm for super resolution reconstruction of image and video sequences in order to make easier the fruition of multimedia data over the web. On one hand, latest years have been characterized by an incredible proliferation and surprising success of user generated multimedia contents, and also distributed and collaborative multimedia database over the web. This brought to serious issues related to their management and maintenance: bandwidth limitation and service costs are important factors when dealing with mobile multimedia contents’ fruition. On the other hand, the current multimedia consumer market has been characterized by the advent of cheap but rather high-quality high definition displays. However, this trend is only partially supported by the deployment of high-resolution multimedia services, thus the resulting disparity between content and display formats have to be addressed and older productions need to be either re-mastered or postprocessed in order to be broadcasted for HD exploitation. In the presented scenario, superresolution reconstruction represents a major solution. Image or video super resolution techniques allow restoring the original spatial resolution from low-resolution compressed data. In this way, both content and service providers, not to tell the final users, are relieved from the burden of providing and supporting large multimedia data transfer. The second topic addressed during my Phd research activity is related to the implementation of an image based positioning system for an indoor navigator. As modern mobile device become faster, classical signal processing is suggested to be used for new applications, such location based service. The exponential growth of wearable devices, such as smartphone and PDA in general, equipped with embedded motion (accelerometers) and rotation (gyroscopes) sensors, Internet connection and high-resolution cameras makes it ideal for INS (Inertial Navigation System) applications aiming to support the localization/navigation of objects and/or users in an indoor environment where common localization systems, such as GPS (Global Positioning System), fail. Thus the need to use alternative positioning techniques. A series of intensive tests have been carried out, showing how modern signal processing techniques can be successfully applied in different scenarios, from image and video enhancement up to image recognition for localization purpose, providing low costs solutions and ensuring real-time performance

    Multiple Media Correlation: Theory and Applications

    Get PDF
    This thesis introduces multiple media correlation, a new technology for the automatic alignment of multiple media objects such as text, audio, and video. This research began with the question: what can be learned when multiple multimedia components are analyzed simultaneously? Most ongoing research in computational multimedia has focused on queries, indexing, and retrieval within a single media type. Video is compressed and searched independently of audio, text is indexed without regard to temporal relationships it may have to other media data. Multiple media correlation provides a framework for locating and exploiting correlations between multiple, potentially heterogeneous, media streams. The goal is computed synchronization, the determination of temporal and spatial alignments that optimize a correlation function and indicate commonality and synchronization between media objects. The model also provides a basis for comparison of media in unrelated domains. There are many real-world applications for this technology, including speaker localization, musical score alignment, and degraded media realignment. Two applications, text-to-speech alignment and parallel text alignment, are described in detail with experimental validation. Text-to-speech alignment computes the alignment between a textual transcript and speech-based audio. The presented solutions are effective for a wide variety of content and are useful not only for retrieval of content, but in support of automatic captioning of movies and video. Parallel text alignment provides a tool for the comparison of alternative translations of the same document that is particularly useful to the classics scholar interested in comparing translation techniques or styles. The results presented in this thesis include (a) new media models more useful in analysis applications, (b) a theoretical model for multiple media correlation, (c) two practical application solutions that have wide-spread applicability, and (d) Xtrieve, a multimedia database retrieval system that demonstrates this new technology and demonstrates application of multiple media correlation to information retrieval. This thesis demonstrates that computed alignment of media objects is practical and can provide immediate solutions to many information retrieval and content presentation problems. It also introduces a new area for research in media data analysis

    Exploring Sparse, Unstructured Video Collections of Places

    Get PDF
    The abundance of mobile devices and digital cameras with video capture makes it easy to obtain large collections of video clips that contain the same location, environment, or event. However, such an unstructured collection is difficult to comprehend and explore. We propose a system that analyses collections of unstructured but related video data to create a Videoscape: a data structure that enables interactive exploration of video collections by visually navigating — spatially and/or temporally — between different clips. We automatically identify transition opportunities, or portals. From these portals, we construct the Videoscape, a graph whose edges are video clips and whose nodes are portals between clips. Now structured, the videos can be interactively explored by walking the graph or by geographic map. Given this system, we gauge preference for different video transition styles in a user study, and generate heuristics that automatically choose an appropriate transition style. We evaluate our system using three further user studies, which allows us to conclude that Videoscapes provides significant benefits over related methods. Our system leads to previously unseen ways of interactive spatio-temporal exploration of casually captured videos, and we demonstrate this on several video collections

    Computer Generation of Integral Images using Interpolative Shading Techniques

    Get PDF
    Research to produce artificial 3D images that duplicates the human stereovision has been ongoing for hundreds of years. What has taken millions of years to evolve in humans is proving elusive even for present day technological advancements. The difficulties are compounded when real-time generation is contemplated. The problem is one of depth. When perceiving the world around us it has been shown that the sense of depth is the result of many different factors. These can be described as monocular and binocular. Monocular depth cues include overlapping or occlusion, shading and shadows, texture etc. Another monocular cue is accommodation (and binocular to some extent) where the focal length of the crystalline lens is adjusted to view an image. The important binocular cues are convergence and parallax. Convergence allows the observer to judge distance by the difference in angle between the viewing axes of left and right eyes when both are focussing on a point. Parallax relates to the fact that each eye sees a slightly shifted view of the image. If a system can be produced that requires the observer to use all of these cues, as when viewing the real world, then the transition to and from viewing a 3D display will be seamless. However, for many 3D imaging techniques, which current work is primarily directed towards, this is not the case and raises a serious issue of viewer comfort. Researchers worldwide, in university and industry, are pursuing their approaches in the development of 3D systems, and physiological disturbances that can cause nausea in some observers will not be acceptable. The ideal 3D system would require, as minimum, accurate depth reproduction, multiviewer capability, and all-round seamless viewing. The necessity not to wear stereoscopic or polarising glasses would be ideal and lack of viewer fatigue essential. Finally, for whatever the use of the system, be it CAD, medical, scientific visualisation, remote inspection etc on the one hand, or consumer markets such as 3D video games and 3DTV on the other, the system has to be relatively inexpensive. Integral photography is a ‘real camera’ system that attempts to comply with this ideal; it was invented in 1908 but due to technological reasons was not capable of being a useful autostereoscopic system. However, more recently, along with advances in technology, it is becoming a more attractive proposition for those interested in developing a suitable system for 3DTV. The fast computer generation of integral images is the subject of this thesis; the adjective ‘fast’ being used to distinguish it from the much slower technique of ray tracing integral images. These two techniques are the standard in monoscopic computer graphics whereby ray tracing generates photo-realistic images and the fast forward geometric approach that uses interpolative shading techniques is the method used for real-time generation. Before this present work began it was not known if it was possible to create volumetric integral images using a similar fast approach as that employed by standard computer graphics, but it soon became apparent that it would be successful and hence a valuable contribution in this area. Presented herein is a full description of the development of two derived methods for producing rendered integral image animations using interpolative shading. The main body of the work is the development of code to put these methods into practice along with many observations and discoveries that the author came across during this task.The Defence and Research Agency (DERA), a contract (LAIRD) under the European Link/EPSRC photonics initiative, and DTI/EPSRC sponsorship within the PROMETHEUS project

    Towards Making JavaScript Applications Secure and Private

    Get PDF
    JavaScript is a popular programming language widely used on both the browser and the server sides. Researchers have extensively studied different aspects of the security and privacy of JavaScript, for instance, the vulnerability detection of the server-side Node.JS applications and the browser-side fingerprinting techniques. Despite the research efforts, multiple challenges of JavaScript remain unsolved: on the server-side, existing vulnerability detection approaches do not generalize to a wide range of popular vulnerabilities and the detection rate is not satisfactory; on the client-side, service providers can only fingerprint users within a single browser but not cross different browsers. In this dissertation, we propose a flow-, branch- and context-sensitive static analysis approach to generate a novel graph structure, named Object Dependence Graph (ODG), to address the server-side vulnerability detection challenges, and a cross-browser fingerprinting method that utilizes multiple novel OS and hardware level features to solve the client-side fingerprinting challenge. On the server-side, ODG represents JavaScript objects as nodes and their relations with Abstract Syntax Tree (AST) as edges, and allows users to detect multiple types of vulnerabilities during and after the generation process of ODG and by graph queries. Our evaluation shows that for server-side vulnerability detection, our approach outperforms all the state-of-the-art JavaScript vulnerability detection tools in terms of false-positive rate and false-negative rate. We apply our tool to detect six types of vulnerabilities on top of an NPM package dataset, which correctly reports 241 zero-day vulnerable packages, and 81 of them are assigned with CVE identifiers. On the client-side, our approach utilizes multiple novel OS and hardware level features, such as those from graphics cards and CPUs, to achieve better accuracy and stability. The evaluation shows that our approach can identify 99.24% of the browsers and 84.64% of the devices, as opposed to 90.83% and 68.98% of the state-of-the-art approaches, respectively

    Diverse Contributions to Implicit Human-Computer Interaction

    Full text link
    Cuando las personas interactúan con los ordenadores, hay mucha información que no se proporciona a propósito. Mediante el estudio de estas interacciones implícitas es posible entender qué características de la interfaz de usuario son beneficiosas (o no), derivando así en implicaciones para el diseño de futuros sistemas interactivos. La principal ventaja de aprovechar datos implícitos del usuario en aplicaciones informáticas es que cualquier interacción con el sistema puede contribuir a mejorar su utilidad. Además, dichos datos eliminan el coste de tener que interrumpir al usuario para que envíe información explícitamente sobre un tema que en principio no tiene por qué guardar relación con la intención de utilizar el sistema. Por el contrario, en ocasiones las interacciones implícitas no proporcionan datos claros y concretos. Por ello, hay que prestar especial atención a la manera de gestionar esta fuente de información. El propósito de esta investigación es doble: 1) aplicar una nueva visión tanto al diseño como al desarrollo de aplicaciones que puedan reaccionar consecuentemente a las interacciones implícitas del usuario, y 2) proporcionar una serie de metodologías para la evaluación de dichos sistemas interactivos. Cinco escenarios sirven para ilustrar la viabilidad y la adecuación del marco de trabajo de la tesis. Resultados empíricos con usuarios reales demuestran que aprovechar la interacción implícita es un medio tanto adecuado como conveniente para mejorar de múltiples maneras los sistemas interactivos.Leiva Torres, LA. (2012). Diverse Contributions to Implicit Human-Computer Interaction [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17803Palanci
    • …
    corecore