39 research outputs found

    Automatic caption generation for news images

    Get PDF
    This thesis is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Automatic description generation for video frames would help security authorities manage more efficiently and utilize large volumes of monitoring data. Image search engines could potentially benefit from image description in supporting more accurate and targeted queries for end users. Importantly, generating image descriptions would aid blind or partially sighted people who cannot access visual information in the same way as sighted people can. However, previous work has relied on fine-gained resources, manually created for specific domains and applications In this thesis, we explore the feasibility of automatic caption generation for news images in a knowledge-lean way. We depart from previous work, as we learn a model of caption generation from publicly available data that has not been explicitly labelled for our task. The model consists of two components, namely extracting image content and rendering it in natural language. Specifically, we exploit data resources where images and their textual descriptions co-occur naturally. We present a new dataset consisting of news articles, images, and their captions that we required from the BBC News website. Rather than laboriously annotating images with keywords, we simply treat the captions as the labels. We show that it is possible to learn the visual and textual correspondence under such noisy conditions by extending an existing generative annotation model (Lavrenko et al., 2003). We also find that the accompanying news documents substantially complements the extraction of the image content. In order to provide a better modelling and representation of image content,We propose a probabilistic image annotation model that exploits the synergy between visual and textual modalities under the assumption that images and their textual descriptions are generated by a shared set of latent variables (topics). Using Latent Dirichlet Allocation (Blei and Jordan, 2003), we represent visual and textual modalities jointly as a probability distribution over a set of topics. Our model takes these topic distributions into account while finding the most likely keywords for an image and its associated document. The availability of news documents in our dataset allows us to perform the caption generation task in a fashion akin to text summarization; save one important difference that our model is not solely based on text but uses the image in order to select content from the document that should be present in the caption. We propose both extractive and abstractive caption generation models to render the extracted image content in natural language without relying on rich knowledge resources, sentence-templates or grammars. The backbone for both approaches is our topic-based image annotation model. Our extractive models examine how to best select sentences that overlap in content with our image annotation model. We modify an existing abstractive headline generation model to our scenario by incorporating visual information. Our own model operates over image description keywords and document phrases by taking dependency and word order constraints into account. Experimental results show that both approaches can generate human-readable captions for news images. Our phrase-based abstractive model manages to yield as informative captions as those written by the BBC journalists

    Aplicación de modelado de tópicos en reseñas de hospedajes de Airbnb en Berlín de 2010 a 2019

    Get PDF
    El Procesamiento de Lenguaje Natural se ha convertido en una disciplina clave en la era digital. Con el aumento en la cantidad de información que se genera diariamente en diferentes formatos, es necesario contar con herramientas que permitan analizar, entender y categorizar los datos. Ante esta realidad se desarrolla este trabajo explorando una de estas herramientas, el modelado de tópicos. Este trabajo se divide en dos principales partes, en la primera se hace una comparativa de forma metodológica, de tres modelos (LDA, GSDMM y BERTopic) para el modelado de tópicos. Y en la segunda parte, se hace un análisis de tópicos con el modelo seleccionado de la primera parte. El trabajo se desarrolló con una base de datos de reseñas de alojamientos de Airbnb Berlín. Previa a las dos principales partes se realizó un pretratamiento de la base de datos, el cual incluye pasos como selección de columnas, detección de idioma de las reseñas, traducción al inglés, tokenizado, eliminación de stopwords y stemming. Posterior a esto se realizó la comparativa empleando las siguientes métricas, valor de coherencia, análisis visual y tiempo de ejecución. De la comparativa se seleccionó el modelo LDA, debido a que es el que presenta mejores resultados en todas las métricas. Enseguida realizo una optimización de los parámetros, haciendo un barrido de los valores posibles en un rango definido. Como resultado del modelo optimizado se encontraron cinco tópicos principales en el conjunto de reseñas, los cuales se categorizan como “Positivo General”, “Negativo General”, “Ubicación Airbnb”, “Positivo Host” y “Habitaciones/Espacio del Airbnb”. Siendo “Positivo General” el tópico principal en el conjunto y el tópico “Habitaciones/Espacio del Airbnb” muestra mayor relevancia en reseñas escritas en español y francés

    Text miner's little helper: scalable self-tuning methodologies for knowledge exploration

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Learning semantic structures from in-domain documents

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 175-184).Semantic analysis is a core area of natural language understanding that has typically focused on predicting domain-independent representations. However, such representations are unable to fully realize the rich diversity of technical content prevalent in a variety of specialized domains. Taking the standard supervised approach to domainspecific semantic analysis requires expensive annotation effort for each new domain of interest. In this thesis, we study how multiple granularities of semantic analysis can be learned from unlabeled documents within the same domain. By exploiting in-domain regularities in the expression of text at various layers of linguistic phenomena, including lexicography, syntax, and discourse, the statistical approaches we propose induce multiple kinds of structure: relations at the phrase and sentence level, content models at the paragraph and section level, and semantic properties at the document level. Each of our models is formulated in a hierarchical Bayesian framework with the target structure captured as latent variables, allowing them to seamlessly incorporate linguistically-motivated prior and posterior constraints, as well as multiple kinds of observations. Our empirical results demonstrate that the proposed approaches can successfully extract hidden semantic structure over a variety of domains, outperforming multiple competitive baselines.by Harr Chen.Ph.D

    Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.

    Full text link
    Relational data refers to data that contains explicit relations among objects. Nowadays, relational data are universal and have a broad appeal in many different application domains. The problem of estimating similarity between objects is a core requirement for many standard Machine Learning (ML), Natural Language Processing (NLP) and Information Retrieval (IR) problems such as clustering, classiffication, word sense disambiguation, etc. Traditional machine learning approaches represent the data using simple, concise representations such as feature vectors. While this works very well for homogeneous data, i.e, data with a single feature type such as text, it does not exploit the availability of dfferent feature types fully. For example, scientic publications have text, citations, authorship information, venue information. Each of the features can be used for estimating similarity. Representing such objects has been a key issue in efficient mining (Getoor and Taskar, 2007). In this thesis, we propose natural representations for relational data using multiple, connected layers of graphs; one for each feature type. Also, we propose novel algorithms for estimating similarity using multiple heterogeneous features. Also, we present novel algorithms for tasks like topic detection and music recommendation using the estimated similarity measure. We demonstrate superior performance of the proposed algorithms (root mean squared error of 24.81 on the Yahoo! KDD Music recommendation data set and classiffication accuracy of 88% on the ACL Anthology Network data set) over many of the state of the art algorithms, such as Latent Semantic Analysis (LSA), Multiple Kernel Learning (MKL) and spectral clustering and baselines on large, standard data sets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89824/1/mpradeep_1.pd

    Shallow and deep learning for audio and natural language processing

    Get PDF
    Many machine learning algorithms can be viewed as optimization problems that seek the optimum hypothesis in a hypothesis space. To model the complex dependencies in real-world artificial intelligence tasks, machine learning algorithms are required to have high expressive power (high degrees of freedom or richness of a family of functions) and a large amount of training data. Deep learning models and kernel machines are regarded as models with high expressive power through the composition of multiple layers of nonlinearities and through nonlinearly mapping data to a high-dimensional space, respectively. While the majority of deep learning work is focused on pure classification problems given input data, there are many other challenging Artificial Intelligence (AI) problems beyond classification tasks. In real-world applications, there are cases where we have structured relationships between and among input data and output targets, which have not been fully taken into account in deep learning models. On the other hand, though kernel machines involve convex optimization and have strong theoretical grounding in tractable optimization techniques, for large-scale applications, kernel machines often suffer from significant memory requirements and computational expense. Resolving the computational limitation and thereby enhancing the expressibility of kernel machines are important for large-scale real-world applications. Learning models based on deep learning and kernel machines for audio and natural language processing tasks are developed in this dissertation. In particular, we address the challenges for deep learning with structured relationships among data and the computational limitations of large-scale kernel machines. A general framework is proposed to consider the relationship among output predictions and enforce constraints between a mixture input and output predictions for monaural source separation tasks. To model the structured relationships among inputs, the deep structured semantic models are introduced for an information retrieval task. Queries and documents are modeled as inputs to the deep learning models and the relevance is measured through the similarity at the output layer. A discriminative objective function is proposed to exploit the similarity and dissimilarity between queries and web documents. To address the scalability and efficiency of large-scale kernel machines, using deep architectures, ensemble models, and a scalable parallel solver are investigated to further scale-up kernel machines approximated by randomized feature maps. The proposed techniques are shown to match the expressive power of deep neural network based models in spoken language understanding and speech recognition tasks

    Data Mining Techniques to Understand Textual Data

    Get PDF
    More than ever, information delivery online and storage heavily rely on text. Billions of texts are produced every day in the form of documents, news, logs, search queries, ad keywords, tags, tweets, messenger conversations, social network posts, etc. Text understanding is a fundamental and essential task involving broad research topics, and contributes to many applications in the areas text summarization, search engine, recommendation systems, online advertising, conversational bot and so on. However, understanding text for computers is never a trivial task, especially for noisy and ambiguous text such as logs, search queries. This dissertation mainly focuses on textual understanding tasks derived from the two domains, i.e., disaster management and IT service management that mainly utilizing textual data as an information carrier. Improving situation awareness in disaster management and alleviating human efforts involved in IT service management dictates more intelligent and efficient solutions to understand the textual data acting as the main information carrier in the two domains. From the perspective of data mining, four directions are identified: (1) Intelligently generate a storyline summarizing the evolution of a hurricane from relevant online corpus; (2) Automatically recommending resolutions according to the textual symptom description in a ticket; (3) Gradually adapting the resolution recommendation system for time correlated features derived from text; (4) Efficiently learning distributed representation for short and lousy ticket symptom descriptions and resolutions. Provided with different types of textual data, data mining techniques proposed in those four research directions successfully address our tasks to understand and extract valuable knowledge from those textual data. My dissertation will address the research topics outlined above. Concretely, I will focus on designing and developing data mining methodologies to better understand textual information, including (1) a storyline generation method for efficient summarization of natural hurricanes based on crawled online corpus; (2) a recommendation framework for automated ticket resolution in IT service management; (3) an adaptive recommendation system on time-varying temporal correlated features derived from text; (4) a deep neural ranking model not only successfully recommending resolutions but also efficiently outputting distributed representation for ticket descriptions and resolutions
    corecore