12 research outputs found

    Exploring the topical structure of short text through probability models : from tasks to fundamentals

    Get PDF
    Recent technological advances have radically changed the way we communicate. Today’s communication has become ubiquitous and it has fostered the need for information that is easier to create, spread and consume. As a consequence, we have experienced the shortening of text messages in mediums ranging from electronic mailing, instant messaging to microblogging. Moreover, the ubiquity and fast-paced nature of these mediums have promoted their use for unthinkable tasks. For instance, reporting real-world events was classically carried out by news reporters, but, nowadays, most interesting events are first disclosed on social networks like Twitter by eyewitness through short text messages. As a result, the exploitation of the thematic content in short text has captured the interest of both research and industry. Topic models are a type of probability models that have traditionally been used to explore this thematic content, a.k.a. topics, in regular text. Most popular topic models fall into the sub-class of LVMs (Latent Variable Models), which include several latent variables at the corpus, document and word levels to summarise the topics at each level. However, classical LVM-based topic models struggle to learn semantically meaningful topics in short text because the lack of co-occurring words within a document hampers the estimation of the local latent variables at the document level. To overcome this limitation, pooling and hierarchical Bayesian strategies that leverage on contextual information have been essential to improve the quality of topics in short text. In this thesis, we study the problem of learning semantically meaningful and predictive representations of text in two distinct phases: • In the first phase, Part I, we investigate the use of LVM-based topic models for the specific task of event detection in Twitter. In this situation, the use of contextual information to pool tweets together comes naturally. Thus, we first extend an existing clustering algorithm for event detection to use the topics learned from pooled tweets. Then, we propose a probability model that integrates topic modelling and clustering to enable the flow of information between both components. • In the second phase, Part II and Part III, we challenge the use of local latent variables in LVMs, specially when the context of short messages is not available. First of all, we study the evaluation of the generalization capabilities of LVMs like PFA (Poisson Factor Analysis) and propose unbiased estimation methods to approximate it. With the most accurate method, we compare the generalization of chordal models without latent variables to that of PFA topic models in short and regular text collections. In summary, we demonstrate that by integrating clustering and topic modelling, the performance of event detection techniques in Twitter is improved due to the interaction between both components. Moreover, we develop several unbiased likelihood estimation methods for assessing the generalization of PFA and we empirically validate their accuracy in different document collections. Finally, we show that we can learn chordal models without latent variables in text through Chordalysis, and that they can be a competitive alternative to classical topic models, specially in short text.Els avenços tecnològics han canviat radicalment la forma que ens comuniquem. Avui en dia, la comunicació és ubiqua, la qual cosa fomenta l’ús de informació fàcil de crear, difondre i consumir. Com a resultat, hem experimentat l’escurçament dels missatges de text en diferents medis de comunicació, des del correu electrònic, a la missatgeria instantània, al microblogging. A més de la ubiqüitat, la naturalesa accelerada d’aquests medis ha promogut el seu ús per tasques fins ara inimaginables. Per exemple, el relat d’esdeveniments era clàssicament dut a terme per periodistes a peu de carrer, però, en l’actualitat, el successos més interessants es publiquen directament en xarxes socials com Twitter a través de missatges curts. Conseqüentment, l’explotació de la informació temàtica del text curt ha atret l'interès tant de la recerca com de la indústria. Els models temàtics (o topic models) són un tipus de models de probabilitat que tradicionalment s’han utilitzat per explotar la informació temàtica en documents de text. Els models més populars pertanyen al subgrup de models amb variables latents, els quals incorporen varies variables a nivell de corpus, document i paraula amb la finalitat de descriure el contingut temàtic a cada nivell. Tanmateix, aquests models tenen dificultats per aprendre la semàntica en documents curts degut a la manca de coocurrència en les paraules d’un mateix document, la qual cosa impedeix una correcta estimació de les variables locals. Per tal de solucionar aquesta limitació, l’agregació de missatges segons el context i l’ús d’estratègies jeràrquiques Bayesianes són essencials per millorar la qualitat dels temes apresos. En aquesta tesi, estudiem en dos fases el problema d’aprenentatge d’estructures semàntiques i predictives en documents de text: En la primera fase, Part I, investiguem l’ús de models temàtics amb variables latents per la detecció d’esdeveniments a Twitter. En aquest escenari, l’ús del context per agregar tweets sorgeix de forma natural. Per això, primer estenem un algorisme de clustering per detectar esdeveniments a partir dels temes apresos en els tweets agregats. I seguidament, proposem un nou model de probabilitat que integra el model temàtic i el de clustering per tal que la informació flueixi entre ambdós components. En la segona fase, Part II i Part III, qüestionem l’ús de variables latents locals en models per a text curt sense context. Primer de tot, estudiem com avaluar la capacitat de generalització d’un model amb variables latents com el PFA (Poisson Factor Analysis) a través del càlcul de la likelihood. Atès que aquest càlcul és computacionalment intractable, proposem diferents mètodes d estimació. Amb el mètode més acurat, comparem la generalització de models chordals sense variables latents amb la del models PFA, tant en text curt com estàndard. En resum, demostrem que integrant clustering i models temàtics, el rendiment de les tècniques de detecció d’esdeveniments a Twitter millora degut a la interacció entre ambdós components. A més a més, desenvolupem diferents mètodes d’estimació per avaluar la capacitat generalizadora dels models PFA i validem empíricament la seva exactitud en diverses col·leccions de text. Finalment, mostrem que podem aprendre models chordals sense variables latents en text a través de Chordalysis i que aquests models poden ser una bona alternativa als models temàtics clàssics, especialment en text curt.Postprint (published version

    Deep Probabilistic Models for Camera Geo-Calibration

    Get PDF
    The ultimate goal of image understanding is to transfer visual images into numerical or symbolic descriptions of the scene that are helpful for decision making. Knowing when, where, and in which direction a picture was taken, the task of geo-calibration makes it possible to use imagery to understand the world and how it changes in time. Current models for geo-calibration are mostly deterministic, which in many cases fails to model the inherent uncertainties when the image content is ambiguous. Furthermore, without a proper modeling of the uncertainty, subsequent processing can yield overly confident predictions. To address these limitations, we propose a probabilistic model for camera geo-calibration using deep neural networks. While our primary contribution is geo-calibration, we also show that learning to geo-calibrate a camera allows us to implicitly learn to understand the content of the scene

    Klasifikacija dvodeminezionalnih slika lica za razlikovanje djece od odraslih osoba na temelju antropometrije

    Get PDF
    Classification of face images can be done in various ways. This research uses two-dimensional photographs of people's faces to detect children in images. Algorithm for classification of images into children and adults is developed and existing algorithms are analysed. This algorithm will also be used for age estimation. Through analysis of the state of the art researchon facial landmarks for age estimationand combination with changes that occur in human face morphology during growth and aging, facial landmarks needed for age classification and estimation of humans are identified. Algorithm is based on ratios of Euclidean distances between those landmarks. Based on these ratios, children can be detected and age can be estimated.Slike lica mogu biti klasificirane na različite načine. Ovo istraživanje koristi dvodimenzionalne fotografije ljudskih lica za detekciju djece na slikama. Kreiran je novi algoritam za klasifikaciju fotografija ljudskih lica u dvije grupe, djeca i odrasli. Algoritam će se također koristiti za procjenu dobi osoba na slici te će biti analizirani postojeći algoritmi. Kroz analizu literature o karakterističnim točkama korištenih u procjeni dobi i kombinacijom dobivenih karakterističnih točaka s morfološkim promjenama tokom odrastanja i starenja, definirane su karakteristične točke potrebne za klasifikaciju i procjenu dobi. Algoritam se bazira na omjerima Euklidskih udaljenosti između identificiranih karakterističnih točaka

    Klasifikacija dvodeminezionalnih slika lica za razlikovanje djece od odraslih osoba na temelju antropometrije

    Get PDF
    Classification of face images can be done in various ways. This research uses two-dimensional photographs of people's faces to detect children in images. Algorithm for classification of images into children and adults is developed and existing algorithms are analysed. This algorithm will also be used for age estimation. Through analysis of the state of the art researchon facial landmarks for age estimationand combination with changes that occur in human face morphology during growth and aging, facial landmarks needed for age classification and estimation of humans are identified. Algorithm is based on ratios of Euclidean distances between those landmarks. Based on these ratios, children can be detected and age can be estimated.Slike lica mogu biti klasificirane na različite načine. Ovo istraživanje koristi dvodimenzionalne fotografije ljudskih lica za detekciju djece na slikama. Kreiran je novi algoritam za klasifikaciju fotografija ljudskih lica u dvije grupe, djeca i odrasli. Algoritam će se također koristiti za procjenu dobi osoba na slici te će biti analizirani postojeći algoritmi. Kroz analizu literature o karakterističnim točkama korištenih u procjeni dobi i kombinacijom dobivenih karakterističnih točaka s morfološkim promjenama tokom odrastanja i starenja, definirane su karakteristične točke potrebne za klasifikaciju i procjenu dobi. Algoritam se bazira na omjerima Euklidskih udaljenosti između identificiranih karakterističnih točaka

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Innovative Methods and Materials in Structural Health Monitoring of Civil Infrastructures

    Get PDF
    In the past, when elements in sructures were composed of perishable materials, such as wood, the maintenance of houses, bridges, etc., was considered of vital importance for their safe use and to preserve their efficiency. With the advent of materials such as reinforced concrete and steel, given their relatively long useful life, periodic and constant maintenance has often been considered a secondary concern. When it was realized that even for structures fabricated with these materials that the useful life has an end and that it was being approached, planning maintenance became an important and non-negligible aspect. Thus, the concept of structural health monitoring (SHM) was introduced, designed, and implemented as a multidisciplinary method. Computational mechanics, static and dynamic analysis of structures, electronics, sensors, and, recently, the Internet of Things (IoT) and artificial intelligence (AI) are required, but it is also important to consider new materials, especially those with intrinsic self-diagnosis characteristics, and to use measurement and survey methods typical of modern geomatics, such as satellite surveys and highly sophisticated laser tools

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

    Contributions for the automatic description of multimodal scenes

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

    Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

    Get PDF
    Peer reviewe
    corecore