572 research outputs found

    An Exploration of Representation Learning and Sequential Modeling Approaches for Supervised Topic Classification in Job Advertisements

    Get PDF
    This thesis applies the explorative double diamond design process borrowed to iteratively frame a research problem applicable in the context of a recruitment web service and then find the best approach to solve it. Thereby the problem focus is laid on multi-class classification, in particular the task of labelling sentences in job advertisements with one of six topics which were found to be covered in every typical job description. A dataset is obtained for evaluation and conventional N-Gram Vector Space models are compared with Representation Learning approaches, notably continuous distributed representations, and Sequential Modeling techniques using Recurrent Neural Networks. Results of the experiments show that the Representation Learning and Sequential Modeling approaches perform on par or better than traditional feature engineering methods and show a promising direction in and beyond research in Computational Linguistics and Natural Language Processing

    ACP Dashboard: an interactive visualization tool for selecting analytics configurations in an industrial setting

    Get PDF
    The production process on a factory can be described by big amount of data. It is used to optimize the production process, reduce number of failures and control material waste. For this, data is processed, analyzed and classified using the analysis techniques - text classification algorithms. Thus there should be an approach that supports choice of algorithms on both, technical and management levels. We propose a tool called Analytics Configuration Performance Dashboard which facilitates process of algorithm configurations comparison. It is based on a meta-learning approach. Additionally, we introduce three business metrics on which algorithms are compared, they map onto machine learning algorithm evaluation metrics and help to assess algorithms from industry perspective. Moreover, we develop a visualization in order to provide clear representation of the data. Clustering is used to define groups of algorithms that have common performance in business metrics. We conclude with evaluation of the proposed approach and techniques, which were chosen for its implementation

    shinyReCoR. A shiny application for automatically coding text responses using R

    Get PDF
    In this paper, we introduce shinyReCoR: a new app that utilizes a cluster-based method for automatically coding open-ended text responses. Reliable coding of text responses from educational or psychological assessments requires substantial organizational and human effort. The coding of natural language in responses to tests depends on the texts\u27 complexity, corresponding coding guides, and the guides\u27 quality. Manual coding is thus not only expensive but also error-prone. With shinyReCoR, we provide a more efficient alternative. The use of natural language processing makes texts utilizable for statistical methods. shinyReCoR is a Shiny app deployed as an R-package that allows users with varying technical affinity to create automatic response classifiers through a graphical user interface based on annotated data. The present paper describes the underlying methodology, including machine learning, as well as peculiarities of the processing of language in the assessment context. The app guides users through the workflow with steps like text corpus compilation, semantic space building, preprocessing of the text data, and clustering. Users can adjust each step according to their needs. Finally, users are provided with an automatic response classifier, which can be evaluated and tested within the process. (DIPF/Orig.

    Looking Beyond Appearances: Synthetic Training Data for Deep CNNs in Re-identification

    Full text link
    Re-identification is generally carried out by encoding the appearance of a subject in terms of outfit, suggesting scenarios where people do not change their attire. In this paper we overcome this restriction, by proposing a framework based on a deep convolutional neural network, SOMAnet, that additionally models other discriminative aspects, namely, structural attributes of the human figure (e.g. height, obesity, gender). Our method is unique in many respects. First, SOMAnet is based on the Inception architecture, departing from the usual siamese framework. This spares expensive data preparation (pairing images across cameras) and allows the understanding of what the network learned. Second, and most notably, the training data consists of a synthetic 100K instance dataset, SOMAset, created by photorealistic human body generation software. Synthetic data represents a good compromise between realistic imagery, usually not required in re-identification since surveillance cameras capture low-resolution silhouettes, and complete control of the samples, which is useful in order to customize the data w.r.t. the surveillance scenario at-hand, e.g. ethnicity. SOMAnet, trained on SOMAset and fine-tuned on recent re-identification benchmarks, outperforms all competitors, matching subjects even with different apparel. The combination of synthetic data with Inception architectures opens up new research avenues in re-identification.Comment: 14 page

    MevaL: A Visual Machine Learning Model Evaluation Tool for Financial Crime Detection

    Get PDF
    Data Science and Machine Learning are two valuable allies to fight financial crime,the domain where Feedzai seeks to leverage its value proposition in support of its mission:to make banking and commerce safe. Data is at the core of both fields and this domain, sostructuring instances for visual consumption provides an effective way of understandingthe data and communicating insights.The development of a solution for each project and use case requires a careful andeffective Machine Learning Model Evaluation stage, as it is the major source of feedbackbefore deployment. The tooling for this stage available at Feedzai can be improved,accelerated, visually supported, and diversified to enable data scientists to boost theirdaily work and the quality of the models.In this work, I propose to collect and compile internal and external input, in terms ofworkflow and Model Evaluation, in a proposal hierarchically segmented by well-definedobjectives and tasks, to instantiate the proposal in a Python package, and to iteratively val-idate the package with Feedzai’s data scientists. Therefore, the first contribution is MevaL,a Python package for Model Evaluation with visual support, integrated into Feedzai’s DataScience environment by design. In fact, MevaL is already being leveraged as a visualization package on two internal reporting projects that are serving some of Feedzai’s majorclients.In addition to MevaL, the second contribution of this work is the Model EvaluationTopology developed to ensure clear communication and design of features.A Ciência de Dados e a Aprendizagem Automática [277] são duas valiosas aliadas no combate à criminalidade económico-financeira, o domínio em que a Feedzai procura potenciar a sua proposta de valor em prol da sua missão: tornar o sistema bancário e o comércio seguros. Além disso, os dados estão no centro das duas áreas e deste domínio.Assim, a estruturação visual dos mesmos fornece uma maneira eficaz de os entender e transmitir informação.O desenvolvimento de uma solução para cada projeto e caso de uso requer um estágiocuidadoso e eficaz de Avaliação de Modelos de Aprendizagem Automática, pois esteestágio coincide com a principal fonte de retorno (feedback) antes da implementaçãoda solução. As ferramentas de Avaliação de Modelos disponíveis na Feedzai podem seraprimoradas, aceleradas, suportadas visualmente e diversificadas para permitir que oscientistas de dados impulsionem o seu trabalho diário e a qualidade destes modelos.Neste trabalho, proponho a recolha e compilação de informação interna e externa, em termos de fluxo de trabalho e Avaliação de Modelos, numa proposta hierarquicamente segmentada por objetivos e tarefas bem definidas, a instanciação desta proposta num pacote Python e a validação iterativa deste pacote em colaboração com os cientistas de dados da Feedzai. Posto isto, a primeira contribuição deste trabalho é o MevaL, um pacote Python para Avaliação de Modelos com suporte visual, integrado no ambiente de Ciência de Dados da Feedzai. Na verdade, o MevaL já está a ser utilizado como um pacote de visualização em dois projetos internos de preparação de relatórios automáticos para alguns dos principais clientes da Feedzai.Além do MevaL, a segunda contribuição deste trabalho é a Topologia de Avaliação de Modelos desenvolvida para garantir uma comunicação clara e o design enquadrado das diferentes funcionalidades

    Proficiency-aware systems

    Get PDF
    In an increasingly digital world, technological developments such as data-driven algorithms and context-aware applications create opportunities for novel human-computer interaction (HCI). We argue that these systems have the latent potential to stimulate users and encourage personal growth. However, users increasingly rely on the intelligence of interactive systems. Thus, it remains a challenge to design for proficiency awareness, essentially demanding increased user attention whilst preserving user engagement. Designing and implementing systems that allow users to become aware of their own proficiency and encourage them to recognize learning benefits is the primary goal of this research. In this thesis, we introduce the concept of proficiency-aware systems as one solution. In our definition, proficiency-aware systems use estimates of the user's proficiency to tailor the interaction in a domain and facilitate a reflective understanding for this proficiency. We envision that proficiency-aware systems leverage collected data for learning benefit. Here, we see self-reflection as a key for users to become aware of necessary efforts to advance their proficiency. A key challenge for proficiency-aware systems is the fact that users often have a different self-perception of their proficiency. The benefits of personal growth and advancing one's repertoire might not necessarily be apparent to users, alienating them, and possibly leading to abandoning the system. To tackle this challenge, this work does not rely on learning strategies but rather focuses on the capabilities of interactive systems to provide users with the necessary means to reflect on their proficiency, such as showing calculated text difficulty to a newspaper editor or visualizing muscle activity to a passionate sportsperson. We first elaborate on how proficiency can be detected and quantified in the context of interactive systems using physiological sensing technologies. Through developing interaction scenarios, we demonstrate the feasibility of gaze- and electromyography-based proficiency-aware systems by utilizing machine learning algorithms that can estimate users' proficiency levels for stationary vision-dominant tasks (reading, information intake) and dynamic manual tasks (playing instruments, fitness exercises). Secondly, we show how to facilitate proficiency awareness for users, including design challenges on when and how to communicate proficiency. We complement this second part by highlighting the necessity of toolkits for sensing modalities to enable the implementation of proficiency-aware systems for a wide audience. In this thesis, we contribute a definition of proficiency-aware systems, which we illustrate by designing and implementing interactive systems. We derive technical requirements for real-time, objective proficiency assessment and identify design qualities of communicating proficiency through user reflection. We summarize our findings in a set of design and engineering guidelines for proficiency awareness in interactive systems, highlighting that proficiency feedback makes performance interpretable for the user.In einer zunehmend digitalen Welt schaffen technologische Entwicklungen - wie datengesteuerte Algorithmen und kontextabhängige Anwendungen - neuartige Interaktionsmöglichkeiten mit digitalen Geräten. Jedoch verlassen sich Nutzer oftmals auf die Intelligenz dieser Systeme, ohne dabei selbst auf eine persönliche Weiterentwicklung hinzuwirken. Wird ein solches Vorgehen angestrebt, verlangt dies seitens der Anwender eine erhöhte Aufmerksamkeit. Es ist daher herausfordernd, ein entsprechendes Design für Kompetenzbewusstsein (Proficiency Awareness) zu etablieren. Das primäre Ziel dieser Arbeit ist es, eine Methodik für das Design und die Implementierung von interaktiven Systemen aufzustellen, die Nutzer dabei unterstützen über ihre eigene Kompetenz zu reflektieren, um dadurch Lerneffekte implizit wahrnehmen können. Diese Arbeit stellt ein Konzept für fähigkeitsbewusste Systeme (proficiency-aware systems) vor, welche die Fähigkeiten von Nutzern abschätzen, die Interaktion entsprechend anpassen sowie das Bewusstsein der Nutzer über deren Fähigkeiten fördern. Hierzu sollten die Systeme gesammelte Daten von Nutzern einsetzen, um Lerneffekte sichtbar zu machen. Die Möglichkeit der Anwender zur Selbstreflexion ist hierbei als entscheidend anzusehen, um als Motivation zur Verbesserung der eigenen Fähigkeiten zu dienen. Eine zentrale Herausforderung solcher Systeme ist die Tatsache, dass Nutzer - im Vergleich zur Abschätzung des Systems - oft eine divergierende Selbstwahrnehmung ihrer Kompetenz haben. Im ersten Moment sind daher die Vorteile einer persönlichen Weiterentwicklung nicht unbedingt ersichtlich. Daher baut diese Forschungsarbeit nicht darauf auf, Nutzer über vorgegebene Lernstrategien zu unterrichten, sondern sie bedient sich der Möglichkeiten interaktiver Systeme, die Anwendern die notwendigen Hilfsmittel zur Verfügung stellen, damit diese selbst über ihre Fähigkeiten reflektieren können. Einem Zeitungseditor könnte beispielsweise die aktuelle Textschwierigkeit angezeigt werden, während einem passionierten Sportler dessen Muskelaktivität veranschaulicht wird. Zunächst wird herausgearbeitet, wie sich die Fähigkeiten der Nutzer mittels physiologischer Sensortechnologien erkennen und quantifizieren lassen. Die Evaluation von Interaktionsszenarien demonstriert die Umsetzbarkeit fähigkeitsbewusster Systeme, basierend auf der Analyse von Blickbewegungen und Muskelaktivität. Hierbei kommen Algorithmen des maschinellen Lernens zum Einsatz, die das Leistungsniveau der Anwender für verschiedene Tätigkeiten berechnen. Im Besonderen analysieren wir stationäre Aktivitäten, die hauptsächlich den Sehsinn ansprechen (Lesen, Aufnahme von Informationen), sowie dynamische Betätigungen, die die Motorik der Nutzer fordern (Spielen von Instrumenten, Fitnessübungen). Der zweite Teil zeigt auf, wie Systeme das Bewusstsein der Anwender für deren eigene Fähigkeiten fördern können, einschließlich der Designherausforderungen , wann und wie das System erkannte Fähigkeiten kommunizieren sollte. Abschließend wird die Notwendigkeit von Toolkits für Sensortechnologien hervorgehoben, um die Implementierung derartiger Systeme für ein breites Publikum zu ermöglichen. Die Forschungsarbeit beinhaltet eine Definition für fähigkeitsbewusste Systeme und veranschaulicht dieses Konzept durch den Entwurf und die Implementierung interaktiver Systeme. Ferner werden technische Anforderungen objektiver Echtzeitabschätzung von Nutzerfähigkeiten erforscht und Designqualitäten für die Kommunikation dieser Abschätzungen mittels Selbstreflexion identifiziert. Zusammengefasst sind die Erkenntnisse in einer Reihe von Design- und Entwicklungsrichtlinien für derartige Systeme. Insbesondere die Kommunikation, der vom System erkannten Kompetenz, hilft Anwendern, die eigene Leistung zu interpretieren

    BeSocratic: An Intelligent Tutoring System for the Recognition, Evaluation, and Analysis of Free-form Student Input

    Get PDF
    This dissertation describes a novel intelligent tutoring system, BeSocratic, which aims to help fill the gap between simple multiple-choice systems and free-response systems. BeSocratic focuses on targeting questions that are free-form in nature yet defined to the point which allows for automatic evaluation and analysis. The system includes a set of modules which provide instructors with tools to assess student performance. Beyond text boxes and multiple-choice questions, BeSocratic contains several modules that recognize, evaluate, provide feedback, and analyze student-drawn structures, including Euclidean graphs, chemistry molecules, computer science graphs, and simple drawings. Our system uses a visual, rule-based authoring system which enables the creation of activities for use within science, technology, engineering, and mathematics classrooms. BeSocratic records each action that students make within the system. Using a set of post-analysis tools, teachers have the ability to examine both individual and group performances. We accomplish this using hidden Markov model-based clustering techniques and visualizations. These visualizations can help teachers quickly identify common strategies and errors for large groups of students. Furthermore, analysis results can be used directly to improve activities through advanced detection of student errors and refined feedback. BeSocratic activities have been created and tested at several universities. We report specific results from several activities, and discuss how BeSocratic\u27s analysis tools are being used with data from other systems. We specifically detail two chemistry activities and one computer science activity: (1) an activity focused on improving mechanism use, (2) an activity which assesses student understanding of Gibbs energy, and (3) an activity which teaches students the fundamentals of splay trees. In addition to analyzing data collected from students within BeSocratic, we share our visualizations and results from analyzing data gathered with another educational system, PhET
    corecore