71 research outputs found

    An online algorithm for constrained face clustering in videos

    Get PDF
    We address the problem of face clustering in long, real world videos. This is a challenging task because faces in such videos exhibit wide variability in scale, pose, illumination, expressions, and may also be partially occluded. The majority of the existing face clustering algorithms are offline, i.e., they assume the availability of the entire data at once. However, in many practical scenarios, complete data may not be available at the same time or may be too large to process or may exhibit significant variation in the data distribution over time. We propose an online clustering algorithm that processes data sequentially in short segments of variable length. The faces detected in each segment are either assigned to an existing cluster or are used to create a new one. Our algorithm uses several spatiotemporal constraints, and a convolutional neural network (CNN) to obtain a robust representation of the faces in order to achieve high clustering accuracy on two benchmark video databases (82.1 % and 93.8%). Despite being an online method (usually known to have lower accuracy), our algorithm achieves comparable or better results than state-of-the-art offline and online methods

    Agrupamento de faces em vídeos digitais.

    Get PDF
    Faces humanas são algumas das entidades mais importantes frequentemente encontradas em vídeos. Devido ao substancial volume de produção e consumo de vídeos digitais na atualidade (tanto vídeos pessoais quanto provenientes das indústrias de comunicação e entretenimento), a extração automática de informações relevantes de tais vídeos se tornou um tema ativo de pesquisa. Parte dos esforços realizados nesta área tem se concentrado no uso do reconhecimento e agrupamento facial para auxiliar o processo de anotação automática de faces em vídeos. No entanto, algoritmos de agrupamento de faces atuais ainda não são robustos às variações de aparência de uma mesma face em situações de aquisição típicas. Neste contexto, o problema abordado nesta tese é o agrupamento de faces em vídeos digitais, com a proposição de nova abordagem com desempenho superior (em termos de qualidade do agrupamento e custo computacional) em relação ao estado-da-arte, utilizando bases de vídeos de referência da literatura. Com fundamentação em uma revisão bibliográfica sistemática e em avaliações experimentais, chegou-se à proposição da abordagem, a qual é constituída por módulos de pré-processamento, detecção de faces, rastreamento, extração de características, agrupamento, análise de similaridade temporal e reagrupamento espacial. A abordagem de agrupamento de faces proposta alcançou os objetivos planejados obtendo resultados superiores (no tocante a diferentes métricas) a métodos avaliados utilizando as bases de vídeos YouTube Celebrities (KIM et al., 2008) e SAIVT-Bnews (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013).Human faces are some of the most important entities frequently encountered in videos. As a result of the currently high volumes of digital videos production and consumption both personal and profissional videos, automatic extraction of relevant information from those videos has become an active research topic. Many efforts in this area have focused on the use of face clustering and recognition in order to aid with the process of annotating faces in videos. However, current face clustering algorithms are not robust to variations of appearance that a same face may suffer due to typical changes in acquisition scenarios. Hence, this thesis proposes a novel approach to the problem of face clustering in digital videos which achieves superior performance (in terms of clustering quality and computational cost) in comparison to the state-of-the-art, using reference video databases according to the literature. After performing a systematic literature review and experimental evaluations, the current approach has been proposed, which has the following modules: preprocessing, face detection, tracking, feature extraction, clustering, temporal similarity analysis, and spatial reclustering. The proposed approach for face clustering achieved the planned objectives obtaining better results (according to different metrics) than those presented by methods evaluated on the YouTube Celebrities videos dataset (KIM et al., 2008) and SAIVT-Bnews videos dataset (GHAEMMAGHAMI, DEAN e SRIDHARAN, 2013)

    Taken by Surprise: Contrast effect for Similarity Scores

    Full text link
    Accurately evaluating the similarity of object vector embeddings is of critical importance for natural language processing, information retrieval and classification tasks. Popular similarity scores (e.g cosine similarity) are based on pairs of embedding vectors and disregard the distribution of the ensemble from which objects are drawn. Human perception of object similarity significantly depends on the context in which the objects appear. In this work we propose the surprise score\textit{surprise score}, an ensemble-normalized similarity metric that encapsulates the contrast effect of human perception and significantly improves the classification performance on zero- and few-shot document classification tasks. This score quantifies the surprise to find a given similarity between two elements relative to the pairwise ensemble similarities. We evaluate this metric on zero/few shot classification and clustering tasks and typically find 10-15 % better performance compared to raw cosine similarity. Our code is available at https://github.com/MeetElise/surprise-similarity.Comment: 9 pages, 2 figures and 4 table

    A framework for clustering and adaptive topic tracking on evolving text and social media data streams.

    Get PDF
    Recent advances and widespread usage of online web services and social media platforms, coupled with ubiquitous low cost devices, mobile technologies, and increasing capacity of lower cost storage, has led to a proliferation of Big data, ranging from, news, e-commerce clickstreams, and online business transactions to continuous event logs and social media expressions. These large amounts of online data, often referred to as data streams, because they get generated at extremely high throughputs or velocity, can make conventional and classical data analytics methodologies obsolete. For these reasons, the issues of management and analysis of data streams have been researched extensively in recent years. The special case of social media Big Data brings additional challenges, particularly because of the unstructured nature of the data, specifically free text. One classical approach to mine text data has been Topic Modeling. Topic Models are statistical models that can be used for discovering the abstract ``topics\u27\u27 that may occur in a corpus of documents. Topic models have emerged as a powerful technique in machine learning and data science, providing a great balance between simplicity and complexity. They also provide sophisticated insight without the need for real natural language understanding. However they have not been designed to cope with the type of text data that is abundant on social media platforms, but rather for traditional medium size corpora consisting of longer documents, adhering to a specific language and typically spanning a stable set of topics. Unlike traditional document corpora, social media messages tend to be very short, sparse, noisy, and do not adhere to a standard vocabulary, linguistic patterns, or stable topic distributions. They are also generated at high velocity that impose high demands on topic modeling; and their evolving or dynamic nature, makes any set of results from topic modeling quickly become stale in the face of changes in the textual content and topics discussed within social media streams. In this dissertation, we propose an integrated topic modeling framework built on top of an existing stream-clustering framework called Stream-Dashboard, which can extract, isolate, and track topics over any given time period. In this new framework, Stream Dashboard first clusters the data stream points into homogeneous groups. Then data from each group is ushered to the topic modeling framework which extracts finer topics from the group. The proposed framework tracks the evolution of the clusters over time to detect milestones corresponding to changes in topic evolution, and to trigger an adaptation of the learned groups and topics at each milestone. The proposed approach to topic modeling is different from a generic Topic Modeling approach because it works in a compartmentalized fashion, where the input document stream is split into distinct compartments, and Topic Modeling is applied on each compartment separately. Furthermore, we propose extensions to existing topic modeling and stream clustering methods, including: an adaptive query reformulation approach to help focus on the topic discovery with time; a topic modeling extension with adaptive hyper-parameter and with infinite vocabulary; an adaptive stream clustering algorithm incorporating the automated estimation of dynamic, cluster-specific temporal scales for adaptive forgetting to help facilitate clustering in a fast evolving data stream. Our experimental results show that the proposed adaptive forgetting clustering algorithm can mine better quality clusters; that our proposed compartmentalized framework is able to mine topics of better quality compared to competitive baselines; and that the proposed framework can automatically adapt to focus on changing topics using the proposed query reformulation strategy

    Development of a Bicycle Level of Service Methodology for Two-Way Stop-Controlled (TWSC) Intersections

    Get PDF
    This thesis fills a missing piece in research on multimodal performance measures for traffic on streets and highways. The Highway Capacity Manual (HCM) published by the Transportation Research Board (TRB) provides Level of Service (LOS) methodologies which enable engineers and planners to evaluate the overall performance of roadways and highways based on the physical characteristics of facilities. This allows for the evaluation of those facilities and offers a means for recognizing issues and planning, designing, implementing, and ultimately assessing improvements. Originally, level of service was developed for automotive traffic only, but with recent developments as part of the complete streets movement, the performance of infrastructure for alternative transportation modes have also started being assessed in this fashion. There are methodologies in HCM 2010 for bicycle traffic at signalized intersections, all-way stop-controlled intersections, roadway and highway segments, but as of yet, no bicycle level of service methodology exists for two-way stop-controlled intersections. This work attempts to fill this gap. The methodology utilized for this report includes video collection of sample two-way stop-controlled intersections throughout California, collection of survey responses from viewers of video, and linear regression of collected survey responses with physical attributes of each sample intersection as the explanatory variables. Data was analyzed from both combined and individual street movements to determine the final equation set. The final methodology involves two separate procedures for major and minor streets at TWSC intersections. Final factors deemed significant in bicycle level of service analysis include sight distances, speed limits, presence of bus stops, presence and type of bicycle infrastructure, street widths and types of lanes present, pavement quality, and traffic flows

    Content based video retrieval via spatial-temporal information discovery.

    Get PDF
    Content based video retrieval (CBVR) has been strongly motivated by a variety of realworld applications. Most state-of-the-art CBVR systems are built based on Bag-of-visual- Words (BovW) framework for visual resources representation and access. The framework, however, ignores spatial and temporal information contained in videos, which plays a fundamental role in unveiling semantic meanings. The information includes not only the spatial layout of visual content on a still frame (image), but also temporal changes across the sequential frames. Specially, spatially and temporally co-occurring visual words, which are extracted under the BovW framework, often tend to collaboratively represent objects, scenes, or events in the videos. The spatial and temporal information discovery would be useful to advance the CBVR technology. In this thesis, we propose to explore and analyse the spatial and temporal information from a new perspective: i) co-occurrence of the visual words is formulated as a correlation matrix, ii) spatial proximity and temporal coherence are analytically and empirically studied to re ne this correlation. Following this, a quantitative spatial and temporal correlation (STC) model is de ned. The STC discovered from either the query example (denoted by QC) or the data collection (denoted by DC) are assumed to determine speci- city of the visual words in the retrieval model, i:e: selected Words-Of-Interest are found more important for certain topics. Based on this hypothesis, we utilized the STC matrix to establish a novel visual content similarity measurement method and a query reformulation scheme for the retrieval model. Additionally, the STC also characterizes the context of the visual words, and accordingly a STC-Based context similarity measurement is proposed to detect the synonymous visual words. The method partially solves an inherent error of visual vocabulary under the BovW framework. Systematic experimental evaluations on public TRECVID and CC WEB VIDEO video collections demonstrate that the proposed methods based on the STC can substantially improve retrieval e ectiveness of the BovW framework. The retrieval model based on STC outperforms state-of-the-art CBVR methods on the data collections without storage and computational expense. Furthermore, the rebuilt visual vocabulary in this thesis is more compact and e ective. Above methods can be incorporated together for e ective and e cient CBVR system implementation. Based on the experimental results, it is concluded that the spatial-temporal correlation e ectively approximates the semantical correlation. This discovered correlation approximation can be utilized for both visual content representation and similarity measurement, which are key issues for CBVR technology development

    Downward continued multichannel seismic refraction analysis of Atlantis Massif oceanic core complex, 30°N, Mid-Atlantic Ridge

    Get PDF
    Author Posting. © American Geophysical Union, 2012. This article is posted here by permission of American Geophysical Union for personal use, not for redistribution. The definitive version was published in Geochemistry Geophysics Geosystems 13 (2012): Q0AG07, doi:10.1029/2012GC004059.Detailed seismic refraction results show striking lateral and vertical variability of velocity structure within the Atlantis Massif oceanic core complex (OCC), contrasting notably with its conjugate ridge flank. Multichannel seismic (MCS) data are downward continued using the Synthetic On Bottom Experiment (SOBE) method, providing unprecedented detail in tomographic models of the P-wave velocity structure to subseafloor depths of up to 1.5 km. Velocities can vary up to 3 km/s over several hundred meters and unusually high velocities (~5 km/s) are found immediately beneath the seafloor in key regions. Correlation with in situ and dredged rock samples, video and records from submersible dives, and a 1.415 km drill core, allow us to infer dominant lithologies. A high velocity body(ies) found to shoal near to the seafloor in multiple locations is interpreted as gabbro and is displaced along isochrons within the OCC, indicating a propagating magmatic source as the origin for this pluton(s). The western two-thirds of the Southern Ridge is capped in serpentinite that may extend nearly to the base of our ray coverage. The distribution of inferred serpentinite indicates that the gabbroic pluton(s) was emplaced into a dominantly peridotitic host rock. Presumably the mantle host rock was later altered via seawater penetration along the detachment zone, which controlled development of the OCC. The asymmetric distribution of seismic velocities and morphology of Atlantis Massif are consistent with a detachment fault with a component of dip to the southeast. The lowest velocities observed atop the eastern Central Dome and conjugate crust are most likely volcanics. Here, an updated model of the magmatic and extensional faulting processes at Atlantis Massif is deduced from the seismic results, contributing more generally to understanding the processes controlling the formation of heterogeneous lithosphere at slow-rate spreading centers.NSF support was provided via grant OCE-0927442.2012-11-1

    Measurement of transmission efficiency for 400 MeV proton beam through collimator at Fermilab MuCool Test Area using Chromox-6 scintillation screen

    Get PDF
    The MuCool Test Area (MTA) at Fermilab is a facility to develop the technology required for ionization cooling for a future Muon Collider and/or Neutrino Factory. As part of this research program, feasibility studies of various types of RF cavities in a high magnetic field environment are in progress. As a unique approach, we have tested a RF cavity filled with a high pressure hydrogen gas with a 400 MeV proton beam in an external magnetic field (B = 3 T). Quantitative information about the number of protons passing through this cavity is an essential requirement of the beam test. The MTA is a flammable gas (hydrogen) hazard zone. Due to safety reasons, no active (energized) beam diagnostic instrument can be used. Moreover, when the magnetic field is on, current transformers (toroids) used for beam intensity measurements do not work due to the saturation of the ferrite material of the transformer. Based on these requirements, we have developed a passive beam diagnostic instrumentation using a combination of a Chromox-6 scintillation screen and CCD camera. This paper describes details of the beam profile and position obtained from the CCD image with B = 0 T and B = 3 T, and for high and low intensity proton beams. A comparison is made with beam size obtained from multi-wires detector. Beam transmission efficiency through a collimator with a 4 mm diameter hole is measured by the toroids and CCD image of the scintillation screen. Results show that the transmission efficiency estimated from the CCD image is consistent with the toroid measurement, which enables us to monitor the beam transmission efficiency even in a high magnetic field environment.close1

    A closer look at web questionnaire design

    Get PDF
    corecore