23 research outputs found

    Toward Efficient and Incremental Spectral Clustering via Parametric Spectral Clustering

    Full text link
    Spectral clustering is a popular method for effectively clustering nonlinearly separable data. However, computational limitations, memory requirements, and the inability to perform incremental learning challenge its widespread application. To overcome these limitations, this paper introduces a novel approach called parametric spectral clustering (PSC). By extending the capabilities of spectral clustering, PSC addresses the challenges associated with big data and real-time scenarios and enables efficient incremental clustering with new data points. Experimental evaluations conducted on various open datasets demonstrate the superiority of PSC in terms of computational efficiency while achieving clustering quality mostly comparable to standard spectral clustering. The proposed approach has significant potential for incremental and real-time data analysis applications, facilitating timely and accurate clustering in dynamic and evolving datasets. The findings of this research contribute to the advancement of clustering techniques and open new avenues for efficient and effective data analysis. We publish the experimental code at https://github.com/109502518/PSC_BigData

    Learning visual attributes from contextual explanations

    Get PDF
    In computer vision, attributes are mid-level concepts shared across categories. They provide a natural communication between humans and machines for image retrieval. They also provide detailed information about objects. Finally, attributes can describe properties of unfamiliar objects. These are some very appealing properties of attributes, but learning attributes is a challenging task. Since attributes are less well-defined, capturing them with computational models poses a different set of challenges than capturing object categories does. There is a miscommunication of attributes between humans and machines, since machines may not understand what humans have in mind when referring to a particular attribute. Humans usually provide labels if an object or attribute is present or not without any explanation. However, attributes are more complex and may require explanations for a better understanding. This Ph.D. thesis aims to tackle these challenges in learning automatic attribute predictive models. In particular, it focuses on enhancing attribute predictive power with contextual explanations. These explanations aim to enhance data quality with human knowledge, which can be expressed in the form of interactions and may be affected by our personality. First, we emulate human learning skill to understand unfamiliar situations. Humans try to infer properties from what they already know (background knowledge). Hence, we study attribute learning in data-scarce and non-related domains emulating human understanding skills. We discover transferable knowledge to learn attributes from different domains. Our previous project inspires us to request contextual explanations to improve attribute learning. Thus, we enhance attribute learning with context in the form of gaze, captioning, and sketches. Human gaze captures subconscious intuition and associates certain components to the meaning of an attribute. For example, gaze associates the tiptoe of a shoe to a pointy attribute. To complement this gaze representation, captioning follows conscious thinking with prior analysis. An annotator may analyze an image and may provide the following description: “This shoe is pointy because its sharp form at the tiptoe”. Finally, in image search, sketches provide a holistic view of an image query, which complement specific details encapsulated via attribute comparisons. To conclude, our methods with contextual explanations outperform many baselines via quantitative and qualitative evaluation

    Shallow Representations, Profound Discoveries : A methodological study of game culture in social media

    Get PDF
    This thesis explores the potential of representation learning techniques in game studies, highlighting their effectiveness and addressing challenges in data analysis. The primary focus of this thesis is shallow representation learning, which utilizes simpler model architectures but is able to yield effective modeling results. This thesis investigates the following research objectives: disentangling the dependencies of data, modeling temporal dynamics, learning multiple representations, and learning from heterogeneous data. The contributions of this thesis are made from two perspectives: empirical analysis and methodology development, to address these objectives. Chapters 1 and 2 provide a thorough introduction, motivation, and necessary background information for the thesis, framing the research and setting the stage for subsequent publications. Chapters 3 to 5 summarize the contribution of the 6 publications, each of which contributes to demonstrating the effectiveness of representation learning techniques in addressing various analytical challenges. In Chapter 1 and 2, the research objects and questions are also motivated and described. In particular, Introduction to the primary application field game studies is provided and the connections of data analysis and game culture is highlighted. Basic notion of representation learning, and canonical techniques such as probabilistic principal component analysis, topic modeling, and embedding models are described. Analytical challenges and data types are also described to motivate the research of this thesis. Chapter 3 presents two empirical analyses conducted in Publication I and II that present empirical data analysis on player typologies and temporal dynamics of player perceptions. The first empirical analysis takes the advantage of a factor model to offer a flexible player typology analysis. Results and analytical framework are particularly useful for personalized gamification. The Second empirical analysis uses topic modeling to analyze the temporal dynamic of player perceptions of the game No Man’s Sky in relation to game changes. The results reflect a variety of player perceptions including general gaming activities, game mechanic. Moreover, a set of underlying topics that are directly related to game updates and changes are extracted and the temporal dynamics of them have reflected that players responds differently to different updates and changes. Chapter 4 presents two method developments that are related to factor models. The first method, DNBGFA, developed in Publication III, is a matrix factorization model for modeling the temporal dynamics of non-negative matrices from multiple sources. The second mothod, CFTM, developed in Publication IV introduces a factor model to a topic model to handle sophisticated document-level covariates. The develeopd methods in Chapter 4 are also demonstrated for analyzing text data. Chapter 5 summarizes Publication V and Publication VI that develop embedding models. Publication V introduces Bayesian non-parametric to a graph embedding model to learn multiple representations for nodes. Publication VI utilizes a Gaussian copula model to deal with heterogeneous data in representation learning. The develeopd methods in Chapter 5 are also demonstrated for data analysis tasks in the context of online communities. Lastly, Chapter 6 renders discussions and conclusions. Contributions of this thesis are highlighted, limitations, ongoing challenges, and potential future research directions are discussed

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Computational Stylistics in Poetry, Prose, and Drama

    Get PDF
    The contributions in this edited volume approach poetry, narrative, and drama from the perspective of Computational Stylistics. They exemplify methods of computational textual analysis and explore the possibility of computational generation of literary texts. The volume presents a range of computational and Natural Language Processing applications to literary studies, such as motif detection, network analysis, machine learning, and deep learning

    Task Recommendation in Crowdsourcing Platforms

    Get PDF
    Task distribution platforms, such as micro-task markets, project assignment portals, and job search engines, support the assignment of tasks to workers. Public crowdsourcing platforms support the assignment of tasks in micro-task markets to help task requesters to complete their tasks and allow workers to earn money. Enterprise crowdsourcing platforms provide a marketplace within enterprises for the internal placement of tasks from employers to employees. Most of both types of task distribution platforms rely on the workers' selection capabilities or provide simple filtering steps to reduce the number of tasks a worker can choose from. This self-selection mechanism unfortunately allows for tasks to be performed by under- or over-qualified workers. Supporting the workers by introducing a task recommender system helps to solve such deficits of existing task distributions. In this thesis, the requirements towards task recommendation in task distribution platforms are gathered with a focus on the worker's perspective, the design of appropriate assignment strategies is described, and innovative methods to recommend tasks based on their textual descriptions are provided. Different viewpoints are taken into account by analyzing the domains of micro-tasks, project assignments, and job postings. The requirements of enterprise crowdsourcing platforms are compiled based on the literature and a qualitative study, providing a conceptual design of task assignment strategies. The demands of workers and their perception of task similarity on public crowdsourcing platforms are identified, leading to the design and implementation of additional methods to determine the similarity of micro-tasks. The textual descriptions of micro-tasks, projects, and job postings are analyzed in order to provide innovative methods for task recommendation in these domains

    Computational Stylistics in Poetry, Prose, and Drama

    Get PDF
    The contributions in this edited volume approach poetry, narrative, and drama from the perspective of Computational Stylistics. They exemplify methods of computational textual analysis and explore the possibility of computational generation of literary texts. The volume presents a range of computational and Natural Language Processing applications to literary studies, such as motif detection, network analysis, machine learning, and deep learning

    Data bases and data base systems related to NASA's Aerospace Program: A bibliography with indexes

    Get PDF
    This bibliography lists 641 reports, articles, and other documents introduced into the NASA scientific and technical information system during the period January 1, 1981 through June 30, 1982. The directory was compiled to assist in the location of numerical and factual data bases and data base handling and management systems

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at UniversitĂ  degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown
    corecore