56 research outputs found

    Combining crowd worker, algorithm, and expert efforts to find boundaries of objects in images

    Get PDF
    While traditional approaches to image analysis have typically relied upon either manual annotation by experts or purely-algorithmic approaches, the rise of crowdsourcing now provides a new source of human labor to create training data or perform computations at run-time. Given this richer design space, how should we utilize algorithms, crowds, and experts to better annotate images? To answer this question for the important task of finding the boundaries of objects or regions in images, I focus on image segmentation, an important precursor to solving a variety of fundamental image analysis problems, including recognition, classification, tracking, registration, retrieval, and 3D visualization. The first part of the work includes a detailed analysis of the relative strengths and weaknesses of three different approaches to demarcate object boundaries in images: by experts, by crowdsourced laymen, and by automated computer vision algorithms. The second part of the work describes three hybrid system designs that integrate computer vision algorithms and crowdsourced laymen to demarcate boundaries in images. Experiments revealed that hybrid system designs yielded more accurate results than relying on algorithms or crowd workers alone and could yield segmentations that are indistinguishable from those created by biomedical experts. To encourage community-wide effort to continue working on developing methods and systems for image-based studies which can have real and measurable impact that benefit society at large, datasets and code are publicly-shared (http://www.cs.bu.edu/~betke/BiomedicalImageSegmentation/)

    Combining crowd worker, algorithm, and expert efforts to find boundaries of objects in images

    Get PDF
    While traditional approaches to image analysis have typically relied upon either manual annotation by experts or purely-algorithmic approaches, the rise of crowdsourcing now provides a new source of human labor to create training data or perform computations at run-time. Given this richer design space, how should we utilize algorithms, crowds, and experts to better annotate images? To answer this question for the important task of finding the boundaries of objects or regions in images, I focus on image segmentation, an important precursor to solving a variety of fundamental image analysis problems, including recognition, classification, tracking, registration, retrieval, and 3D visualization. The first part of the work includes a detailed analysis of the relative strengths and weaknesses of three different approaches to demarcate object boundaries in images: by experts, by crowdsourced laymen, and by automated computer vision algorithms. The second part of the work describes three hybrid system designs that integrate computer vision algorithms and crowdsourced laymen to demarcate boundaries in images. Experiments revealed that hybrid system designs yielded more accurate results than relying on algorithms or crowd workers alone and could yield segmentations that are indistinguishable from those created by biomedical experts. To encourage community-wide effort to continue working on developing methods and systems for image-based studies which can have real and measurable impact that benefit society at large, datasets and code are publicly-shared (http://www.cs.bu.edu/~betke/BiomedicalImageSegmentation/)

    Automatic crowdflow estimation enhanced by crowdsourcing

    Get PDF
    [ANGLÈS] Video surveillance systems are evolving from simple closed-circuit television (CCTV) towards intelligent systems capable of understanding the recorded scenes. This trend is accompanied by the widespread increase in the amount of cameras, which makes the continuous monitoring of video feeds a practically impossible task. In this scenario, video surveillance systems make intensive use of video analytics and image processing in order to allow their scalability and boost their effectiveness. One of such video analytics performed in video surveillance systems is crowd analysis. Crowd analysis plays a fundamental role in security applications. For instance, keeping a rough estimate of the amount of people present in a given area or inside a building is critical to prevent jams in an emergency or when planning the distribution of entry and exit nodes. In this thesis, we focus on crowd flow estimation. Crowd flow is defined as the number of people that have crossed a specific region over time. Hence, the goal of the method is to estimate the crowd flow as accurately as possible in real time. Many automatic methods have been proposed in the literature to estimate the crowd flow. However, video analytics techniques often face a wide range of difficulties such as occlusions, shadows, environmental conditions changes or distortions in the video. Developed methods struggle to maintain a high accuracy in such situations. Crowdsourcing has been shown as an effective solution to solve to problems that involve complex cognitive tasks. By incorporating human assistantship, the performance of automatic methods can be enhanced in adverse situations. In this thesis, an automatic crowd flow estimation method, previously developed in the Video and Image Processing Laboratory at Purdue University, is implemented and crowdsourcing is used to enhance its performance. Also, a web platform is developed to control the whole system remotely by the operator of the system, and to allow the crowdsourcing members to perform their tasks.[CASTELLÀ] Los sistemas de videovigilancia están evolucionando desde simples circuitos cerrados de televisión (CCTV) hacia sistemas inteligentes capaces de entender las escenas registradas. A esta tendencia le acompaña el extendido incremento en la cantidad de cámaras, hecho que hace que monitorizar continuamente todos los flujos de vídeo sea una tarea prácticamente imposible. En este escenario, los sistemas de videovigilancia hacen un uso intensivo de analíticas de video y procesado de imagen al fin de permitir su escalabilidad e impulsar su efectividad. Una de estas analíticas de vídeo que sea realizan en los sistemas de videovigilancia es el llamado > o análisis de multitudes. El > lleva a cabo un rol fundamental en aplicaciones de seguridad. Por ejemplo, mantener una estimación aproximada de la cantidad de personas presentes en una área o dentro de un edificio es crítico para prevenir atascos en una emergencia o para planear la distribución de nodos de entrada o salida. En esta tesis, nos focalizamos en estimación del > o flujo de multitudes. > se define como el número de personas que han cruzado una región específica a lo largo del tiempo. Así, el objetivo del método es estimar el > tan precisamente como sea posible en tiempo real. En la literatura se han propuesto muchos métodos automáticos para estimar el >. Aun así, las técnicas de analíticas de vídeo a menudo se enfrentan con una amplia gama de dificultades tales como oclusiones, sombras, cambios en las condiciones ambientales o distorsiones en el vídeo. Los métodos desarrollados pelean por mantener una alta precisión en estas situaciones. El > se ha demostrado como una solución efectiva a los problemas que involucran tareas cognitivas complejas. Incorporando asistencia humana, se puede mejorar el rendimiento de los métodos automáticos en situaciones adversas. En esta tesis, se implementa un método automático de estimación del >, previamente desarrollado en el Video and Image Processing Laboratory en la universidad de Purdue, y se usa > para mejorar su rendimiento. Además, se desarrolla una plataforma web para controlar todo el sistema remotamente por parte del operador, y permitir a los miembros del > llevar a cabo sus tareas.[CATALÀ] Els sistemes de videovigilància estan evolucionant des de simples circuits tancats de televisió (CCTV) cap a sistemes intel·ligents capaços d'entendre les escenes enregistrades. A aquesta tendència li acompanya l'extès increment en la quantitat de càmeres, fet que fa que monitoritzar continuament tots els fluxes de video sigui una tasca pràcticament impossible. En aquest escenari, els sistemes de videovigilància fan un ús intensiu d'analítiques de video i processament d'imatge per tal de permetre la seva escalabilitat i impulsar la seva efectivitat. Una d'aquestes analítiques de video que es realitzen en els sistemes de videvigilància és l'anomenat > o anàlisi de multituds. El > duu a terme un rol fonamental en aplicacions de seguretat. Per exemple, mantenir una estimació aproximada de la quantitat de persones presents en una àrea o dintre d'un edifici és crític per prevenir embusos en una emergència o per planejar la distribució de nodes d'entrada o sortida. En aquesta tesis, ens focalitzem en estimació del > o fluxe de mutituds. > es defineix com el nombre de persones que han creuat una regió específica al llarg del temps. Així, l'objectiu del mètode és estimar el > tan precisament com sigui possible en temps real. En la literatura s'han proposat molts mètodes automàtics per estimar el >. Tot i així, les tècniques d'analítiques de video sovint s'enfronten a una àmplia gamma de dificultats com ara oclusions, sombres, canvis en les condicions ambientals o distorsions en el video. Els mètodes desenvolupats barallen per mantenir una alta precisió en aquestes situacions. El > s'ha demostrat com una sol·lució efectiva als problemes que involucren tasques cognitives complexes. Incorporant assistència humana, es pot millorar el rendiment dels mètodes automàtics en situacions adverses. En aquesta tesi, s'implementa un mètode automàtic d'estimació del >, prèviament desenvolupat al Video and Image Processing Laboratory a la universitat de Purdue, i es fa servir > per millorar el seu rendiment. A més, es desenvolupa una plataforma web per controlar tot el sistema remotament per l'operador, i per permetre als membres del > portar a terme les seves tasques

    Analysis and Decision-Making with Social Media

    Get PDF
    abstract: The rapid advancements of technology have greatly extended the ubiquitous nature of smartphones acting as a gateway to numerous social media applications. This brings an immense convenience to the users of these applications wishing to stay connected to other individuals through sharing their statuses, posting their opinions, experiences, suggestions, etc on online social networks (OSNs). Exploring and analyzing this data has a great potential to enable deep and fine-grained insights into the behavior, emotions, and language of individuals in a society. This proposed dissertation focuses on utilizing these online social footprints to research two main threads – 1) Analysis: to study the behavior of individuals online (content analysis) and 2) Synthesis: to build models that influence the behavior of individuals offline (incomplete action models for decision-making). A large percentage of posts shared online are in an unrestricted natural language format that is meant for human consumption. One of the demanding problems in this context is to leverage and develop approaches to automatically extract important insights from this incessant massive data pool. Efforts in this direction emphasize mining or extracting the wealth of latent information in the data from multiple OSNs independently. The first thread of this dissertation focuses on analytics to investigate the differentiated content-sharing behavior of individuals. The second thread of this dissertation attempts to build decision-making systems using social media data. The results of the proposed dissertation emphasize the importance of considering multiple data types while interpreting the content shared on OSNs. They highlight the unique ways in which the data and the extracted patterns from text-based platforms or visual-based platforms complement and contrast in terms of their content. The proposed research demonstrated that, in many ways, the results obtained by focusing on either only text or only visual elements of content shared online could lead to biased insights. On the other hand, it also shows the power of a sequential set of patterns that have some sort of precedence relationships and collaboration between humans and automated planners.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Visual saliency computation for image analysis

    Full text link
    Visual saliency computation is about detecting and understanding salient regions and elements in a visual scene. Algorithms for visual saliency computation can give clues to where people will look in images, what objects are visually prominent in a scene, etc. Such algorithms could be useful in a wide range of applications in computer vision and graphics. In this thesis, we study the following visual saliency computation problems. 1) Eye Fixation Prediction. Eye fixation prediction aims to predict where people look in a visual scene. For this problem, we propose a Boolean Map Saliency (BMS) model which leverages the global surroundedness cue using a Boolean map representation. We draw a theoretic connection between BMS and the Minimum Barrier Distance (MBD) transform to provide insight into our algorithm. Experiment results show that BMS compares favorably with state-of-the-art methods on seven benchmark datasets. 2) Salient Region Detection. Salient region detection entails computing a saliency map that highlights the regions of dominant objects in a scene. We propose a salient region detection method based on the Minimum Barrier Distance (MBD) transform. We present a fast approximate MBD transform algorithm with an error bound analysis. Powered by this fast MBD transform algorithm, our method can run at about 80 FPS and achieve state-of-the-art performance on four benchmark datasets. 3) Salient Object Detection. Salient object detection targets at localizing each salient object instance in an image. We propose a method using a Convolutional Neural Network (CNN) model for proposal generation and a novel subset optimization formulation for bounding box filtering. In experiments, our subset optimization formulation consistently outperforms heuristic bounding box filtering baselines, such as Non-maximum Suppression, and our method substantially outperforms previous methods on three challenging datasets. 4) Salient Object Subitizing. We propose a new visual saliency computation task, called Salient Object Subitizing, which is to predict the existence and the number of salient objects in an image using holistic cues. To this end, we present an image dataset of about 14K everyday images which are annotated using an online crowdsourcing marketplace. We show that an end-to-end trained CNN subitizing model can achieve promising performance without requiring any localization process. A method is proposed to further improve the training of the CNN subitizing model by leveraging synthetic images. 5) Top-down Saliency Detection. Unlike the aforementioned tasks, top-down saliency detection entails generating task-specific saliency maps. We propose a weakly supervised top-down saliency detection approach by modeling the top-down attention of a CNN image classifier. We propose Excitation Backprop and the concept of contrastive attention to generate highly discriminative top-down saliency maps. Our top-down saliency detection method achieves superior performance in weakly supervised localization tasks on challenging datasets. The usefulness of our method is further validated in the text-to-region association task, where our method provides state-of-the-art performance using only weakly labeled web images for training

    Leaming Visual Appearance: Perception, Modeling and Editing.

    Get PDF
    La apariencia visual determina como entendemos un objecto o imagen, y, por tanto, es un aspecto fundamental en la creación de contenido digital. Es un término general, englobando otros como la apariencia de los materiales, definida como la impresión que tenemos de un material, y la cual supone una interacción física entre luz y materia, y como nuestro sistema visual es capaz de percibirla. Sin embargo, modelar computacionalmente el comportamiento de nuestro sistema visual es una tarea difícil, entre otros motivos porque no existe una teoría definitiva y unificada sobre la percepción visual humana. Además, aunque hemos desarrollado algoritmos capaces de modelar fehacientemente la interacción entre luz y materia, existe una desconexión entre los parámetros físicos que usan estos algoritmos, y los parámetros perceptuales que el sistema visual humano entiende. Esto hace que manipular estas representaciones físicas, y sus interacciones, sea una tarea tediosa y costosa, incluso para usuarios expertos. Esta tesis busca mejorar nuestra comprensión de la percepción de la apariencia de materiales y usar dicho conocimiento para mejorar los algoritmos existentes para la generación de contenido visual. Específicamente, la tesis tiene contribuciones en tres áreas: proponiendo nuevos modelos computacionales para medir la similitud de apariencia; investigando la interacción entre iluminación y geometría; y desarrollando aplicaciones intuitivas para la manipulación de apariencia, en concreto, para el re-iluminado de humanos y para editar la apariencia de materiales.Una primera parte de la tesis explora métodos para medir la similaridad de apariencia. Ser capaces de medir cómo de similares son dos materiales, o imágenes, es un problema clásico en campos de la computación visual como visión por computador o informática gráfica. Abordamos primero el problema de similaridad en la apariencia de materiales. Proponemos un método basado en deep learning que combina imágenes con juicios subjetivos sobre la similitud de materiales, recogidos mediante estudios de usuario. Por otro lado, se explora el problema de la similaridad entre iconos. En este segundo caso, se hace uso de redes neuronales siamesas, y el estilo y la identidad que dan los artistas juega un papel clave en dicha medida de similaridad. La segunda parte avanza en la comprensión de cómo los factores de confusión (confounding factors) afectan a nuestra percepción de la apariencia de los materiales. Dos factores de confusión claves son la geometría de los objetos y la iluminación de la escena. Comenzamos investigando el efecto de dichos factores a la hora de reconocer los materiales a través de diversos experimentos y estudios estadísticos. También investigamos el efecto del movimiento del objeto en la percepción de la apariencia de materiales.En la tercera parte exploramos aplicaciones intuitivas para la manipulación de la apariencia visual. Primero, abordamos el problema de la re-iluminación de humanos. Proponemos una nueva formulación del problema, y basándonos en ella, se diseña y entrena un modelo basado en redes neuronales profundas para re-iluminar una escena. Por último, abordamos el problema de la edición intuitiva de materiales. Para ello, recopilamos juicios humanos sobre la percepción de diferentes atributos y presentamos un modelo, basado en redes neuronales profundas, capaz de editar materiales de forma realista simplemente variando el valor de los atributos recogidos.<br /

    Communication of Digital Material Appearance Based on Human Perception

    Get PDF
    Im alltägliche Leben begegnen wir digitalen Materialien in einer Vielzahl von Situationen wie beispielsweise bei Computerspielen, Filmen, Reklamewänden in zB U-Bahn Stationen oder beim Online-Kauf von Kleidungen. Während einige dieser Materialien durch digitale Modelle repräsentiert werden, welche das Aussehen einer bestimmten Oberfläche in Abhängigkeit des Materials der Fläche sowie den Beleuchtungsbedingungen beschreiben, basieren andere digitale Darstellungen auf der simplen Verwendung von Fotos der realen Materialien, was zB bei Online-Shopping häufig verwendet wird. Die Verwendung von computer-generierten Materialien ist im Vergleich zu einzelnen Fotos besonders vorteilhaft, da diese realistische Erfahrungen im Rahmen von virtuellen Szenarien, kooperativem Produkt-Design, Marketing während der prototypischen Entwicklungsphase oder der Ausstellung von Möbeln oder Accesoires in spezifischen Umgebungen erlauben. Während mittels aktueller Digitalisierungsmethoden bereits eine beeindruckende Reproduktionsqualität erzielt wird, wird eine hochpräzise photorealistische digitale Reproduktion von Materialien für die große Vielfalt von Materialtypen nicht erreicht. Daher verwenden viele Materialkataloge immer noch Fotos oder sogar physikalische Materialproben um ihre Kollektionen zu repräsentieren. Ein wichtiger Grund für diese Lücke in der Genauigkeit des Aussehens von digitalen zu echten Materialien liegt darin, dass die Zusammenhänge zwischen physikalischen Materialeigenschaften und der vom Menschen wahrgenommenen visuellen Qualität noch weitgehend unbekannt sind. Die im Rahmen dieser Arbeit durchgeführten Untersuchungen adressieren diesen Aspekt. Zu diesem Zweck werden etablierte digitalie Materialmodellen bezüglich ihrer Eignung zur Kommunikation von physikalischen und sujektiven Materialeigenschaften untersucht, wobei Beobachtungen darauf hinweisen, dass ein Teil der fühlbaren/haptischen Informationen wie z.B. Materialstärke oder Härtegrad aufgrund der dem Modell anhaftenden geometrische Abstraktion verloren gehen. Folglich wird im Rahmen der Arbeit das Zusammenspiel der verschiedenen Sinneswahrnehmungen (mit Fokus auf die visuellen und akustischen Modalitäten) untersucht um festzustellen, welche Informationen während des Digitalisierungsprozesses verloren gehen. Es zeigt sich, dass insbesondere akustische Informationen in Kombination mit der visuellen Wahrnehmung die Einschätzung fühlbarer Materialeigenschaften erleichtert. Eines der Defizite bei der Analyse des Aussehens von Materialien ist der Mangel bezüglich sich an der Wahnehmung richtenden Metriken die eine Beantwortung von Fragen wie z.B. "Sind die Materialien A und B sich ähnlicher als die Materialien C und D?" erlauben, wie sie in vielen Anwendungen der Computergrafik auftreten. Daher widmen sich die im Rahmen dieser Arbeit durchgeführten Studien auch dem Vergleich von unterschiedlichen Materialrepräsentationen im Hinblick auf. Zu diesem Zweck wird eine Methodik zur Berechnung der wahrgenommenen paarweisen Ähnlichkeit von Material-Texturen eingeführt, welche auf der Verwendung von Textursyntheseverfahren beruht und sich an der Idee/dem Begriff der geradenoch-wahrnehmbaren Unterschiede orientiert. Der vorgeschlagene Ansatz erlaubt das Überwinden einiger Probleme zuvor veröffentlichter Methoden zur Bestimmung der Änhlichkeit von Texturen und führt zu sinnvollen/plausiblen Distanzen von Materialprobem. Zusammenfassend führen die im Rahmen dieser Dissertation dargestellten Inhalte/Verfahren zu einem tieferen Verständnis bezüglich der menschlichen Wahnehmung von digitalen bzw. realen Materialien über unterschiedliche Sinne, einem besseren Verständnis bzgl. der Bewertung der Ähnlichkeit von Texturen durch die Entwicklung einer neuen perzeptuellen Metrik und liefern grundlegende Einsichten für zukünftige Untersuchungen im Bereich der Perzeption von digitalen Materialien.In daily life, we encounter digital materials and interact with them in numerous situations, for instance when we play computer games, watch a movie, see billboard in the metro station or buy new clothes online. While some of these virtual materials are given by computational models that describe the appearance of a particular surface based on its material and the illumination conditions, some others are presented as simple digital photographs of real materials, as is usually the case for material samples from online retailing stores. The utilization of computer-generated materials entails significant advantages over plain images as they allow realistic experiences in virtual scenarios, cooperative product design, advertising in prototype phase or exhibition of furniture and wearables in specific environments. However, even though exceptional material reproduction quality has been achieved in the domain of computer graphics, current technology is still far away from highly accurate photo-realistic virtual material reproductions for the wide range of existing categories and, for this reason, many material catalogs still use pictures or even physical material samples to illustrate their collections. An important reason for this gap between digital and real material appearance is that the connections between physical material characteristics and the visual quality perceived by humans are far from well-understood. Our investigations intend to shed some light in this direction. Concretely, we explore the ability of state-of-the-art digital material models in communicating physical and subjective material qualities, observing that part of the tactile/haptic information (eg thickness, hardness) is missing due to the geometric abstractions intrinsic to the model. Consequently, in order to account for the information deteriorated during the digitization process, we investigate the interplay between different sensing modalities (vision and hearing) and discover that particular sound cues, in combination with visual information, facilitate the estimation of such tactile material qualities. One of the shortcomings when studying material appearance is the lack of perceptually-derived metrics able to answer questions like "are materials A and B more similar than C and D?", which arise in many computer graphics applications. In the absence of such metrics, our studies compare different appearance models in terms of how capable are they to depict/transmit a collection of meaningful perceptual qualities. To address this problem, we introduce a methodology to compute the perceived pairwise similarity between textures from material samples that makes use of patch-based texture synthesis algorithms and is inspired on the notion of Just-Noticeable Differences. Our technique is able to overcome some of the issues posed by previous texture similarity collection methods and produces meaningful distances between samples. In summary, with the contents presented in this thesis we are able to delve deeply in how humans perceive digital and real materials through different senses, acquire a better understanding of texture similarity by developing a perceptually-based metric and provide a groundwork for further investigations in the perception of digital materials

    On the Robustness of Object Detection Based Deep Learning Models

    Get PDF
    Object detection is one of the most popular areas in the field of computer vision and deep learning. Several advances have been reported in the literature showing promising object detection results. However, most of these results use databases of images that have been collected under almost ideal conditions and tested with input images mostly not representative of real life imagery. When tested with challenging data, most of these object detection models break down.The objective of this work is to quantify the performance of the most recent object detection models in the presence of realistic degradation in the form of differing levels of brightness, saturation, contrast, Gaussian blur, image size, sharpness, Gaussian noise, speckle noise, and salt and pepper noise. We have selected Faster RCNN as a typical model that is representative of the state of the art. We have used a binary class dataset from our laboratory for testing: Aphylla. We have also selected a popular multi-class dataset widely used by the community for our work: VOC2007.We have conducted the following experiments (1) ran the model on the original pristine dataset and recorded the mAP score result, (2) ran the model on nine methods of degradation with 12 levels in each and recorded the mAP score results, and (3) compared the degradation results to one another to determine the model robustness. These experiments led to the clustering of the degradation models into three categories: high, medium, and low impact. These categories are based on the fluctuations within the results. The first class containing brightness and contrast resembles a Gaussian-like bell shaped curve with a plateau at the top. The second cluster contains Gaussian blur, image size, and all three types of noise resembles an exponential decay. The third category contains saturation and sharpness and has shown a small reduction in performance, which stays mostly uniform throughout the range.The value of this research comes from studying the results and providing consistent guidance to the user as to which level of image degradation needs to be dealt with at a pre-processing stage to alleviate the drop in performance
    corecore