47 research outputs found

    Efficient Privacy Preserving Viola-Jones Type Object Detection via Random Base Image Representation

    Full text link
    A cloud server spent a lot of time, energy and money to train a Viola-Jones type object detector with high accuracy. Clients can upload their photos to the cloud server to find objects. However, the client does not want the leakage of the content of his/her photos. In the meanwhile, the cloud server is also reluctant to leak any parameters of the trained object detectors. 10 years ago, Avidan & Butman introduced Blind Vision, which is a method for securely evaluating a Viola-Jones type object detector. Blind Vision uses standard cryptographic tools and is painfully slow to compute, taking a couple of hours to scan a single image. The purpose of this work is to explore an efficient method that can speed up the process. We propose the Random Base Image (RBI) Representation. The original image is divided into random base images. Only the base images are submitted randomly to the cloud server. Thus, the content of the image can not be leaked. In the meanwhile, a random vector and the secure Millionaire protocol are leveraged to protect the parameters of the trained object detector. The RBI makes the integral-image enable again for the great acceleration. The experimental results reveal that our method can retain the detection accuracy of that of the plain vision algorithm and is significantly faster than the traditional blind vision, with only a very low probability of the information leakage theoretically.Comment: 6 pages, 3 figures, To appear in the proceedings of the IEEE International Conference on Multimedia and Expo (ICME), Jul 10, 2017 - Jul 14, 2017, Hong Kong, Hong Kon

    Interim research assessment 2003-2005 - Computer Science

    Get PDF
    This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities

    Biosignal controlled recommendation in entertainment systems

    Get PDF
    With the explosive growth of the entertainment contents and the ubiquitous access of them via fixed or mobile computing devices, recommendation systems become essential tools to help the user to find the right entertainment at the right time and location. I envision that by integrating the bio signal input into the recommendation process, it will help the users not only to find interesting contents, but also to increase one’s comfort level by taking into account the biosginal feedback from the users. The goal of this project was to develop a biosignal controlled entertainment recommendation system that increases the user’s comfort level by reducing the level of stress. As the starting point, this project aims to contribute to the field of recommendation systems with two points. The first is the mechanism of embedding the biosignal non-intrusively into the recommendation process. The second is the strategy of the biosignal controlled recommendation to reduce stress. Heart rate controlled in-flight music recommendation is chosen as its application domain. The hypothesis of this application is that, the passenger's heart rate deviates from the normal due to unusual long haul flight cabin environment. By properly designing a music recommendation system to recommend heart rate controlled personalized music playlists to the passenger, the passengers' heart rate can be uplifted, down-lifted back to normal or kept within normal, thus their stress can be reduced. Four research questions have been formulated based on this hypothesis. After the literature study, the project went mainly through three phases: framework design, system implementation and user evaluation to answer these research questions. During the framework design phase, the heart rate was firstly modeled as the states of bradycardia, normal and tachycardia. The objective of the framework is that, if the user's heart rate is higher or lower than the normal heart rate, the system recommends a personalized music playlist accordingly to transfer the user’s heart rate back to normal, otherwise to keep it at normal. The adaptive framework integrates the concepts of context adaptive systems, user profiling, and the methods of using music to adjust the heart rate in a feedback control system. In the feedback loop, the playlists were composed using a Markov decision process. Yet, the framework allows the user to reject the recommendations and to manually select the favorite music items. During this process, the system logs the interactions between the user and the system for later learning the user’s latest music preferences. The designed framework was then implemented with platform independent software architecture. The architecture has five abstraction levels. The lowest resource level contains the music source, the heart rate sensors and the user profile information. The second layer is for resource management. In this layer are the manager components to manage the resources from the first layer and to modulate the access from upper layers to these resources. The third layer is the database, acting as a data repository. The fourth layer is for the adaptive control, which includes the user feedback log, the inference engine and the preference learning component. The top layer is the user interface. In this architecture, the layers and the components in the layers are loosely coupled, which ensures the flexibility. The implemented system was used in the user experiments to validate the hypothesis. The experiments simulated the long haul flights from Amsterdam to Shanghai with the same time schedule as the KLM flights. Twelve subjects were invited to participate in the experiments. Six were allocated to the controlled group and others were allocated to the treatment group. In addition to a normal entertainment system for the control group, the treatment group was also provided with the heart rate controlled music recommendation system. The experiments results validated the hypothesis and answered the research questions. The passenger's heart rate deviates from normal. In our user experiments, the passenger's heart rate was in the bradycardia state 24.6% of time and was in the tachycardia state 7.3% of time. The recommended uplifting music reduces the average bradycardia state duration from 14.78 seconds in the control group to 6.86 seconds in the treatment group. The recommended keeping music increases the average normal state duration from 24.66 seconds in the control group to 29.79 seconds in the treatment group. The recommended down-lifting music reduces the average tachycardia state duration from 13.89 seconds in the control group to 6.53 seconds in the treatment group. Compared to the control group, the stress of the treatment group has been reduced significantly

    Adaptive Methods for Color Vision Impaired Users

    Get PDF
    Color plays a key role in the understanding of the information in computer environments. It happens that about 5% of the world population is affected by color vision deficiency (CVD), also called color blindness. This visual impairment hampers the color perception, ending up by limiting the overall perception that CVD people have about the surrounding environment, no matter it is real or virtual. In fact, a CVD individual may not distinguish between two different colors, what often originates confusion or a biased understanding of the reality, including web environments, whose web pages are plenty of media elements like text, still images, video, sprites, and so on. Aware of the difficulties that color-blind people may face in interpreting colored contents, a significant number of recoloring algorithms have been proposed in the literature with the purpose of improving the visual perception of those people somehow. However, most of those algorithms lack a systematic study of subjective assessment, what undermines their validity, not to say usefulness. Thus, in the sequel of the research work behind this Ph.D. thesis, the central question that needs to be answered is whether recoloring algorithms are of any usefulness and help for colorblind people or not. With this in mind, we conceived a few preliminary recoloring algorithms that were published in conference proceedings elsewhere. Except the algorithm detailed in Chapter 3, these conference algorithms are not described in this thesis, though they have been important to engender those presented here. The first algorithm (Chapter 3) was designed and implemented for people with dichromacy to improve their color perception. The idea is to project the reddish hues onto other hues that are perceived more regularly by dichromat people. The second algorithm (Chapter 4) is also intended for people with dichromacy to improve their perception of color, but its applicability covers the adaptation of text and image, in HTML5- compliant web environments. This enhancement of color contrast of text and imaging in web pages is done while keeping the naturalness of color as much as possible. Also, to the best of our knowledge, this is the first web recoloring approach targeted to dichromat people that takes into consideration both text and image recoloring in an integrated manner. The third algorithm (Chapter 5) primarily focuses on the enhancement of some of the object contours in still images, instead of recoloring the pixels of the regions bounded by such contours. Enhancing contours is particularly suited to increase contrast in images, where we find adjacent regions that are color indistinguishable from dichromat’s point of view. To our best knowledge, this is one of the first algorithms that take advantage of image analysis and processing techniques for region contours. After accurate subjective assessment studies for color-blind people, we concluded that the CVD adaptation methods are useful in general. Nevertheless, each method is not efficient enough to adapt all sorts of images, that is, the adequacy of each method depends on the type of image (photo-images, graphical representations, etc.). Furthermore, we noted that the experience-based perceptual learning of colorblind people throughout their lives determines their visual perception. That is, color adaptation algorithms must satisfy requirements such as color naturalness and consistency, to ensure that dichromat people improve their visual perception without artifacts. On the other hand, CVD adaptation algorithms should be object-oriented, instead of pixel-oriented (as typically done), to select judiciously pixels that should be adapted. This perspective opens an opportunity window for future research in color accessibility in the field of in human-computer interaction (HCI).A cor desempenha um papel fundamental na compreensão da informação em ambientes computacionais. Porém, cerca de 5% da população mundial é afetada pela deficiência de visão de cor (ou Color Vision Deficiency (CVD), do Inglês), correntemente designada por daltonismo. Esta insuficiência visual dificulta a perceção das cores, o que limita a perceção geral que os indivíduos têm sobre o meio, seja real ou virtual. Efetivamente, um indivíduo com CVD vê como iguais cores que são diferentes, o que origina confusão ou uma compreensão distorcida da realidade, assim como dos ambientes web, onde existe uma abundância de conteúdos média coloridos, como texto, imagens fixas e vídeo, entre outros. Com o intuito de mitigar as dificuldades que as pessoas com CVD enfrentam na interpretação de conteúdos coloridos, tem sido proposto na literatura um número significativo de algoritmos de recoloração, que têm como o objetivo melhorar, de alguma forma, a perceção visual de pessoas com CVD. Porém, a maioria desses trabalhos carece de um estudo sistemático de avaliação subjetiva, o que põe em causa a sua validação, se não mesmo a sua utilidade. Assim, a principal questão à qual se pretende responder, como resultado do trabalho de investigação subjacente a esta tese de doutoramento, é se os algoritmos de recoloração têm ou não uma real utilidade, constituindo assim uma ajuda efetiva às pessoas com daltonismo. Tendo em mente esta questão, concebemos alguns algoritmos de recoloração preliminares que foram publicados em atas de conferências. Com exceção do algoritmo descrito no Capítulo 3, esses algoritmos não são descritos nesta tese, não obstante a sua importância na conceção daqueles descritos nesta dissertação. O primeiro algoritmo (Capítulo 3) foi projetado e implementado para pessoas com dicromacia, a fim de melhorar a sua perceção da cor. A ideia consiste em projetar as cores de matiz avermelhada em matizes que são melhor percebidos pelas pessoas com os tipos de daltonismo em causa. O segundo algoritmo (Capítulo 4) também se destina a melhorar a perceção da cor por parte de pessoas com dicromacia, porém a sua aplicabilidade abrange a adaptação de texto e imagem, em ambientes web compatíveis com HTML5. Isto é conseguido através do realce do contraste de cores em blocos de texto e em imagens, em páginas da web, mantendo a naturalidade da cor tanto quanto possível. Além disso, tanto quanto sabemos, esta é a primeira abordagem de recoloração em ambiente web para pessoas com dicromacia, que trata o texto e a imagem de forma integrada. O terceiro algoritmo (Capítulo 5) centra-se principalmente na melhoria de alguns dos contornos de objetos em imagens, em vez de aplicar a recoloração aos pixels das regiões delimitadas por esses contornos. Esta abordagem é particularmente adequada para aumentar o contraste em imagens, quando existem regiões adjacentes que são de cor indistinguível sob a perspetiva dos observadores com dicromacia. Também neste caso, e tanto quanto é do nosso conhecimento, este é um dos primeiros algoritmos em que se recorre a técnicas de análise e processamento de contornos de regiões. Após rigorosos estudos de avaliação subjetiva com pessoas com daltonismo, concluiu-se que os métodos de adaptação CVD são úteis em geral. No entanto, cada método não é suficientemente eficiente para todos os tipo de imagens, isto é, o desempenho de cada método depende do tipo de imagem (fotografias, representações gráficas, etc.). Além disso, notámos que a aprendizagem perceptual baseada na experiência das pessoas daltónicas ao longo de suas vidas é determinante para perceber aquilo que vêem. Isto significa que os algoritmos de adaptação de cor devem satisfazer requisitos tais como a naturalidade e a consistência da cor, de modo a não pôr em causa aquilo que os destinatários consideram razoável ver no mundo real. Por outro lado, a abordagem seguida na adaptação CVD deve ser orientada aos objetos, em vez de ser orientada aos pixéis (como tem sido feito até ao momento), de forma a possibilitar uma seleção mais criteriosa dos pixéis que deverão ser sujeitos ao processo de adaptação. Esta perspectiva abre uma janela de oportunidade para futura investigação em acessibilidade da cor no domínio da interacção humano-computador (HCI)

    Machine learning approaches to video activity recognition: from computer vision to signal processing

    Get PDF
    244 p.La investigación presentada se centra en técnicas de clasificación para dos tareas diferentes, aunque relacionadas, de tal forma que la segunda puede ser considerada parte de la primera: el reconocimiento de acciones humanas en vídeos y el reconocimiento de lengua de signos.En la primera parte, la hipótesis de partida es que la transformación de las señales de un vídeo mediante el algoritmo de Patrones Espaciales Comunes (CSP por sus siglas en inglés, comúnmente utilizado en sistemas de Electroencefalografía) puede dar lugar a nuevas características que serán útiles para la posterior clasificación de los vídeos mediante clasificadores supervisados. Se han realizado diferentes experimentos en varias bases de datos, incluyendo una creada durante esta investigación desde el punto de vista de un robot humanoide, con la intención de implementar el sistema de reconocimiento desarrollado para mejorar la interacción humano-robot.En la segunda parte, las técnicas desarrolladas anteriormente se han aplicado al reconocimiento de lengua de signos, pero además de ello se propone un método basado en la descomposición de los signos para realizar el reconocimiento de los mismos, añadiendo la posibilidad de una mejor explicabilidad. El objetivo final es desarrollar un tutor de lengua de signos capaz de guiar a los usuarios en el proceso de aprendizaje, dándoles a conocer los errores que cometen y el motivo de dichos errores

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity

    From GeoVisualization to visual-analytics: methodologies and techniques for human-information discourse

    Get PDF
    2010 - 2011The objective of our research is to give support to decision makers when facing problems which require rapid solutions in spite of the complexity of scenarios under investigation. In order to achieve this goal our studies have been focused on GeoVisualization and GeoVisual Analytics research field, which play a relevant role in this scope, because they exploit results from several disciplines, such as exploratory data analysis and GIScience, to provide expert users with highly interactive tools by which they can both visually synthesize information from large datasets and perform complex analytical tasks. The research we are carrying out along this line is meant to develop software applications capable both to build an immediate overview of a scenario and to explore elements featuring it. To this aim, we are defining methodologies and techniques which embed key aspects from different disciplines, such as augmented reality and location-based services. Their integration is targeted to realize advanced tools where the geographic component role is primary and is meant to contribute to a human-information discourse... [edited by author]X n.s

    Voice Modeling Methods for Automatic Speaker Recognition

    Get PDF
    Building a voice model means to capture the characteristics of a speaker´s voice in a data structure. This data structure is then used by a computer for further processing, such as comparison with other voices. Voice modeling is a vital step in the process of automatic speaker recognition that itself is the foundation of several applied technologies: (a) biometric authentication, (b) speech recognition and (c) multimedia indexing. Several challenges arise in the context of automatic speaker recognition. First, there is the problem of data shortage, i.e., the unavailability of sufficiently long utterances for speaker recognition. It stems from the fact that the speech signal conveys different aspects of the sound in a single, one-dimensional time series: linguistic (what is said?), prosodic (how is it said?), individual (who said it?), locational (where is the speaker?) and emotional features of the speech sound itself (to name a few) are contained in the speech signal, as well as acoustic background information. To analyze a specific aspect of the sound regardless of the other aspects, analysis methods have to be applied to a specific time scale (length) of the signal in which this aspect stands out of the rest. For example, linguistic information (i.e., which phone or syllable has been uttered?) is found in very short time spans of only milliseconds of length. On the contrary, speakerspecific information emerges the better the longer the analyzed sound is. Long utterances, however, are not always available for analysis. Second, the speech signal is easily corrupted by background sound sources (noise, such as music or sound effects). Their characteristics tend to dominate a voice model, if present, such that model comparison might then be mainly due to background features instead of speaker characteristics. Current automatic speaker recognition works well under relatively constrained circumstances, such as studio recordings, or when prior knowledge on the number and identity of occurring speakers is available. Under more adverse conditions, such as in feature films or amateur material on the web, the achieved speaker recognition scores drop below a rate that is acceptable for an end user or for further processing. For example, the typical speaker turn duration of only one second and the sound effect background in cinematic movies render most current automatic analysis techniques useless. In this thesis, methods for voice modeling that are robust with respect to short utterances and background noise are presented. The aim is to facilitate movie analysis with respect to occurring speakers. Therefore, algorithmic improvements are suggested that (a) improve the modeling of very short utterances, (b) facilitate voice model building even in the case of severe background noise and (c) allow for efficient voice model comparison to support the indexing of large multimedia archives. The proposed methods improve the state of the art in terms of recognition rate and computational efficiency. Going beyond selective algorithmic improvements, subsequent chapters also investigate the question of what is lacking in principle in current voice modeling methods. By reporting on a study with human probands, it is shown that the exclusion of time coherence information from a voice model induces an artificial upper bound on the recognition accuracy of automatic analysis methods. A proof-of-concept implementation confirms the usefulness of exploiting this kind of information by halving the error rate. This result questions the general speaker modeling paradigm of the last two decades and presents a promising new way. The approach taken to arrive at the previous results is based on a novel methodology of algorithm design and development called “eidetic design". It uses a human-in-the-loop technique that analyses existing algorithms in terms of their abstract intermediate results. The aim is to detect flaws or failures in them intuitively and to suggest solutions. The intermediate results often consist of large matrices of numbers whose meaning is not clear to a human observer. Therefore, the core of the approach is to transform them to a suitable domain of perception (such as, e.g., the auditory domain of speech sounds in case of speech feature vectors) where their content, meaning and flaws are intuitively clear to the human designer. This methodology is formalized, and the corresponding workflow is explicated by several use cases. Finally, the use of the proposed methods in video analysis and retrieval are presented. This shows the applicability of the developed methods and the companying software library sclib by means of improved results using a multimodal analysis approach. The sclib´s source code is available to the public upon request to the author. A summary of the contributions together with an outlook to short- and long-term future work concludes this thesis

    THE USE OF TUNED FRONT END OPTICAL RECEIVER AND PULSE POSITION MODULATION

    Get PDF
    The aim of this work is to investigate the use of tuned front-ends with OOK and PPM schemes, in addition to establish a theory for baseband tuned front end receivers. In this thesis, a background of baseband receivers, tuned receivers, and modulation schemes used in baseband optical communication is presented. Also, the noise theory of baseband receivers is reviewed which establishes a grounding for developing the theory relating to optical baseband tuned receivers. This work presents novel analytical expressions for tuned transimpedance, tuned components, noise integrals and equivalent input and output noise densities of two tuned front-end receivers employing bi-polar junction transistors and field effect transistors as the input. It also presents novel expressions for optimising the collector current for tuned receivers. The noise modelling developed in this work overcomes some limitations of the conventional noise modelling and allows tuned receivers to be optimised and analysed. This work also provides an in-depth investigation of optical baseband tuned receivers for on-off keying (OOK), Pulse position modulation (PPM), and Di-code pulse position modulation (Di-code PPM). This investigation aims to give quantitative predictions of the receiver performance for various types of receivers with different photodetectors (PIN photodetector and avalanche photodetector), different input transistors (bi-polar junction transistor BJT and field effect transistor FET), different pre-detection filters (1st order low pass filter and 3rd order Butterworth filter), different detection methods, and different tuned configurations (inductive shunt feedback front end tuned A and serial tuned front end tuned B). This investigation considers various optical links such as line of sight (LOS) optical link, non-line of sight (NLOS) link and optical fibre link. All simulations, modelling, and calculations (including: channel modelling, receiver modelling, noise modelling, pulse shape and inter-symbol interference simulations, and error probability and receiver calculations) are performed by using a computer program (PTC Mathcad prime 4, version: M010/2017) which is used to evaluate and analyse the performance of these optical links. As an outcome of this investigation, noise power in tuned receivers is significantly reduced for all examined configurations and under different conditions compared to non-tuned receivers. The overall receiver performance is improved by over 3dB in some cases. This investigation provides an overview and demonstration of cases where tuned receiver can be optimised for baseband transmission, offering a much better performance compared to non-tuned receivers. The performance improvement that tuned receivers offers can benefit a wide range of optical applications. This investigation also addresses some recommendations and suggestions for further work in some emerging applications such as underwater optical wireless communication (UOWC), visible light communication (VLC), and implantable medical devices (IMD). Keyword: Optical communications, Baseband receivers, Noise modelling, tuned front end, pulse position modulation (PPM)
    corecore