825 research outputs found

    From Theory to Practice: A Data Quality Framework for Classification Tasks

    Get PDF
    The data preprocessing is an essential step in knowledge discovery projects. The experts affirm that preprocessing tasks take between 50% to 70% of the total time of the knowledge discovery process. In this sense, several authors consider the data cleaning as one of the most cumbersome and critical tasks. Failure to provide high data quality in the preprocessing stage will significantly reduce the accuracy of any data analytic project. In this paper, we propose a framework to address the data quality issues in classification tasks DQF4CT. Our approach is composed of: (i) a conceptual framework to provide the user guidance on how to deal with data problems in classification tasks; and (ii) an ontology that represents the knowledge in data cleaning and suggests the proper data cleaning approaches. We presented two case studies through real datasets: physical activity monitoring (PAM) and occupancy detection of an office room (OD). With the aim of evaluating our proposal, the cleaned datasets by DQF4CT were used to train the same algorithms used in classification tasks by the authors of PAM and OD. Additionally, we evaluated DQF4CT through datasets of the Repository of Machine Learning Databases of the University of California, Irvine (UCI). In addition, 84% of the results achieved by the models of the datasets cleaned by DQF4CT are better than the models of the datasets authors.This work has also been supported by: Project: “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Convocatoria 03-2018 Publicación de artículos en revistas de alto impacto. Project: “Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT - ID 4633” financed by Convocatoria 04C–2018 “Banco de Proyectos Conjuntos UEES-Sostenibilidad” of Project “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    Personalizing the web: A tool for empowering end-users to customize the web through browser-side modification

    Get PDF
    167 p.Web applications delegate to the browser the final rendering of their pages. Thispermits browser-based transcoding (a.k.a. Web Augmentation) that can be ultimately singularized for eachbrowser installation. This creates an opportunity for Web consumers to customize their Web experiences.This vision requires provisioning adequate tooling that makes Web Augmentation affordable to laymen.We consider this a special class of End-User Development, integrating Web Augmentation paradigms.The dominant paradigm in End-User Development is scripting languages through visual languages.This thesis advocates for a Google Chrome browser extension for Web Augmentation. This is carried outthrough WebMakeup, a visual DSL programming tool for end-users to customize their own websites.WebMakeup removes, moves and adds web nodes from different web pages in order to avoid tabswitching, scrolling, the number of clicks and cutting and pasting. Moreover, Web Augmentationextensions has difficulties in finding web elements after a website updating. As a consequence, browserextensions give up working and users might stop using these extensions. This is why two differentlocators have been implemented with the aim of improving web locator robustness

    Personalizing the web: A tool for empowering end-users to customize the web through browser-side modification

    Get PDF
    167 p.Web applications delegate to the browser the final rendering of their pages. Thispermits browser-based transcoding (a.k.a. Web Augmentation) that can be ultimately singularized for eachbrowser installation. This creates an opportunity for Web consumers to customize their Web experiences.This vision requires provisioning adequate tooling that makes Web Augmentation affordable to laymen.We consider this a special class of End-User Development, integrating Web Augmentation paradigms.The dominant paradigm in End-User Development is scripting languages through visual languages.This thesis advocates for a Google Chrome browser extension for Web Augmentation. This is carried outthrough WebMakeup, a visual DSL programming tool for end-users to customize their own websites.WebMakeup removes, moves and adds web nodes from different web pages in order to avoid tabswitching, scrolling, the number of clicks and cutting and pasting. Moreover, Web Augmentationextensions has difficulties in finding web elements after a website updating. As a consequence, browserextensions give up working and users might stop using these extensions. This is why two differentlocators have been implemented with the aim of improving web locator robustness

    How to Address the Data Quality Issues in Regression Models: A Guided Process for Data Cleaning

    Get PDF
    Today, data availability has gone from scarce to superabundant. Technologies like IoT, trends in social media and the capabilities of smart-phones are producing and digitizing lots of data that was previously unavailable. This massive increase of data creates opportunities to gain new business models, but also demands new techniques and methods of data quality in knowledge discovery, especially when the data comes from different sources (e.g., sensors, social networks, cameras, etc.). The data quality process of the data set proposes conclusions about the information they contain. This is increasingly done with the aid of data cleaning approaches. Therefore, guaranteeing a high data quality is considered as the primary goal of the data scientist. In this paper, we propose a process for data cleaning in regression models (DC-RM). The proposed data cleaning process is evaluated through a real datasets coming from the UCI Repository of Machine Learning Databases. With the aim of assessing the data cleaning process, the dataset that is cleaned by DC-RM was used to train the same regression models proposed by the authors of UCI datasets. The results achieved by the trained models with the dataset produced by DC-RM are better than or equal to that presented by the datasets' authors.This work has been also supported by the Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    Air Force Institute of Technology Research Report 2018

    Get PDF
    This Research Report presents the FY18 research statistics and contributions of the Graduate School of Engineering and Management (EN) at AFIT. AFIT research interests and faculty expertise cover a broad spectrum of technical areas related to USAF needs, as reflected by the range of topics addressed in the faculty and student publications listed in this report. In most cases, the research work reported herein is directly sponsored by one or more USAF or DOD agencies. AFIT welcomes the opportunity to conduct research on additional topics of interest to the USAF, DOD, and other federal organizations when adequate manpower and financial resources are available and/or provided by a sponsor. In addition, AFIT provides research collaboration and technology transfer benefits to the public through Cooperative Research and Development Agreements (CRADAs). Interested individuals may discuss ideas for new research collaborations, potential CRADAs, or research proposals with individual faculty using the contact information in this document

    Detecting Sarcasm in Multimodal Social Platforms

    Full text link
    Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment. The detection of sarcasm in social media platforms has been applied in the past mainly to textual utterances where lexical indicators (such as interjections and intensifiers), linguistic markers, and contextual information (such as user profiles, or past conversations) were used to detect the sarcastic tone. However, modern social media platforms allow to create multimodal messages where audiovisual content is integrated with the text, making the analysis of a mode in isolation partial. In our work, we first study the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators. Moreover, we propose two different computational frameworks to detect sarcasm that integrate the textual and visual modalities. The first approach exploits visual semantics trained on an external dataset, and concatenates the semantics features with state-of-the-art textual features. The second method adapts a visual neural network initialized with parameters trained on ImageNet to multimodal sarcastic posts. Results show the positive effect of combining modalities for the detection of sarcasm across platforms and methods.Comment: 10 pages, 3 figures, final version published in the Proceedings of ACM Multimedia 201

    Large-Scale Analysis of Framework-Specific Exceptions in Android Apps

    Full text link
    Mobile apps have become ubiquitous. For app developers, it is a key priority to ensure their apps' correctness and reliability. However, many apps still suffer from occasional to frequent crashes, weakening their competitive edge. Large-scale, deep analyses of the characteristics of real-world app crashes can provide useful insights to guide developers, or help improve testing and analysis tools. However, such studies do not exist -- this paper fills this gap. Over a four-month long effort, we have collected 16,245 unique exception traces from 2,486 open-source Android apps, and observed that framework-specific exceptions account for the majority of these crashes. We then extensively investigated the 8,243 framework-specific exceptions (which took six person-months): (1) identifying their characteristics (e.g., manifestation locations, common fault categories), (2) evaluating their manifestation via state-of-the-art bug detection techniques, and (3) reviewing their fixes. Besides the insights they provide, these findings motivate and enable follow-up research on mobile apps, such as bug detection, fault localization and patch generation. In addition, to demonstrate the utility of our findings, we have optimized Stoat, a dynamic testing tool, and implemented ExLocator, an exception localization tool, for Android apps. Stoat is able to quickly uncover three previously-unknown, confirmed/fixed crashes in Gmail and Google+; ExLocator is capable of precisely locating the root causes of identified exceptions in real-world apps. Our substantial dataset is made publicly available to share with and benefit the community.Comment: ICSE'18: the 40th International Conference on Software Engineerin

    A Systematic Survey of ML Datasets for Prime CV Research Areas-Media and Metadata

    Get PDF
    The ever-growing capabilities of computers have enabled pursuing Computer Vision through Machine Learning (i.e., MLCV). ML tools require large amounts of information to learn from (ML datasets). These are costly to produce but have received reduced attention regarding standardization. This prevents the cooperative production and exploitation of these resources, impedes countless synergies, and hinders ML research. No global view exists of the MLCV dataset tissue. Acquiring it is fundamental to enable standardization. We provide an extensive survey of the evolution and current state of MLCV datasets (1994 to 2019) for a set of specific CV areas as well as a quantitative and qualitative analysis of the results. Data were gathered from online scientific databases (e.g., Google Scholar, CiteSeerX). We reveal the heterogeneous plethora that comprises the MLCV dataset tissue; their continuous growth in volume and complexity; the specificities of the evolution of their media and metadata components regarding a range of aspects; and that MLCV progress requires the construction of a global standardized (structuring, manipulating, and sharing) MLCV "library". Accordingly, we formulate a novel interpretation of this dataset collective as a global tissue of synthetic cognitive visual memories and define the immediately necessary steps to advance its standardization and integration

    Trustworthiness in Mobile Cyber Physical Systems

    Get PDF
    Computing and communication capabilities are increasingly embedded in diverse objects and structures in the physical environment. They will link the ‘cyberworld’ of computing and communications with the physical world. These applications are called cyber physical systems (CPS). Obviously, the increased involvement of real-world entities leads to a greater demand for trustworthy systems. Hence, we use "system trustworthiness" here, which can guarantee continuous service in the presence of internal errors or external attacks. Mobile CPS (MCPS) is a prominent subcategory of CPS in which the physical component has no permanent location. Mobile Internet devices already provide ubiquitous platforms for building novel MCPS applications. The objective of this Special Issue is to contribute to research in modern/future trustworthy MCPS, including design, modeling, simulation, dependability, and so on. It is imperative to address the issues which are critical to their mobility, report significant advances in the underlying science, and discuss the challenges of development and implementation in various applications of MCPS
    corecore