1,172 research outputs found

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Math Search for the Masses: Multimodal Search Interfaces and Appearance-Based Retrieval

    Full text link
    We summarize math search engines and search interfaces produced by the Document and Pattern Recognition Lab in recent years, and in particular the min math search interface and the Tangent search engine. Source code for both systems are publicly available. "The Masses" refers to our emphasis on creating systems for mathematical non-experts, who may be looking to define unfamiliar notation, or browse documents based on the visual appearance of formulae rather than their mathematical semantics.Comment: Paper for Invited Talk at 2015 Conference on Intelligent Computer Mathematics (July, Washington DC

    Automated Refactoring in Software Automation Platforms

    Get PDF
    Software Automation Platforms (SAPs) enable faster development and reduce the need to use code to construct applications. SAPs provide abstraction and automation, result- ing in a low-entry barrier for users with less programming skills to become proficient developers. An unfortunate consequence of using SAPs is the production of code with a higher technical debt since such developers are less familiar with the software develop- ment best practices. Hence, SAPs should aim to produce a simpler software construction and evolution pipeline beyond providing a rapid software development environment. One simple example of such high technical debt is the Unlimited Records anti-pattern, which occurs whenever queries are unbounded, i.e. the maximum number of records to be fetched is not explicitly limited. Limiting the number of records retrieved may, in many cases, improve the performance of applications by reducing screen-loading time, thus making applications faster and more responsive, which is a top priority for developers. A second example is the Duplicated Code anti-pattern that severely affects code readability and maintainability, and can even be the cause of bug propagation. To overcome this problem we will resort to automated refactoring as it accelerates the refactoring process and provides provably correct modifications. This dissertation aims to study and develop a solution for automated refactorings in the context of OutSystems (an industry-leading SAP). This was carried out by implement- ing automated techniques for automatically refactoring a set of selected anti-patterns in OutSystems logic that are currently detected by the OutSystems technical debt monitor- ing tool.As Plataformas de Automação de Software (PAS) habilitam os seus utilizadores a desen- volver aplicações de forma mais rápida e reduzem a necessidade de escrever código. Estas fornecem abstração e automação, o que auxilia utilizadores com menos formação técnica a tornarem-se programadores proficientes. No entanto, a integração de programadores com menos formação técnica também contribui para a produção de código com alta dívida técnica, uma vez que os mesmos estão menos familiarizados com as melhores práticas de desenvolvimento de software. Desta forma, as PAS devem ter como objetivo a cons- trução e evolução de software de forma simples para além de fornecer um ambiente de desenvolvimento de software rápido. Um exemplo de alta dívida técnica é o anti-padrão Unlimited Records, que ocorre sempre que o número máximo de registos a ser retornado por uma consulta à base de dados não é explicitamente limitado. Limitar o número de registos devolvidos pode, em muitos casos, melhorar o desempenho das aplicações, reduzindo o tempo que demora a carregar o ecrã, tornando assim as aplicações mais rápidas e responsivas, sendo esta uma das principais prioridades dos programadores. Um segundo exemplo é o anti-padrão Código Duplicado que afeta gravemente a legibilidade e manutenção do código, e que pode causar a propagação de erros. Para superar este problema, recorreremos à reestru- turação automatizada, pois acelera o processo de reestruturação através de modificações comprovadamente corretas. O objetivo desta dissertação é estudar e desenvolver uma solução para reestruturação automatizada no contexto da OutSystems (uma PAS líder neste setor). Tal foi realizado através da implementação de técnicas automatizadas para reestruturar um conjunto de anti-padrões que são atualmente detetados pela ferramenta de monitorização de dívida técnica da OutSystems

    A Curvature Sensitive Filter and its Application in Microfossil Image Characterisation

    Full text link

    Social media analytics: a survey of techniques, tools and platforms

    Get PDF
    This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing

    The 4th Conference of PhD Students in Computer Science

    Get PDF
    corecore