1,594 research outputs found

    Term-driven E-Commerce

    Get PDF
    Die Arbeit nimmt sich der textuellen Dimension des E-Commerce an. Grundlegende Hypothese ist die textuelle Gebundenheit von Information und Transaktion im Bereich des elektronischen Handels. Überall dort, wo Produkte und Dienstleistungen angeboten, nachgefragt, wahrgenommen und bewertet werden, kommen natürlichsprachige Ausdrücke zum Einsatz. Daraus resultiert ist zum einen, wie bedeutsam es ist, die Varianz textueller Beschreibungen im E-Commerce zu erfassen, zum anderen können die umfangreichen textuellen Ressourcen, die bei E-Commerce-Interaktionen anfallen, im Hinblick auf ein besseres Verständnis natürlicher Sprache herangezogen werden

    Usability evaluation of digital libraries: a tutorial

    Get PDF
    This one-day tutorial is an introduction to usability evaluation for Digital Libraries. In particular, we will introduce Claims Analysis. This approach focuses on the designers’ motivations and reasons for making particular design decisions and examines the effect on the user’s interaction with the system. The general approach, as presented by Carroll and Rosson(1992), has been tailored specifically to the design of digital libraries. Digital libraries are notoriously difficult to design well in terms of their eventual usability. In this tutorial, we will present an overview of usability issues and techniques for digital libraries, and a more detailed account of claims analysis, including two supporting techniques – simple cognitive analysis based on Norman’s ‘action cycle’ and Scenarios and personas. Through a graduated series of worked examples, participants will get hands-on experience of applying this approach to developing more usable digital libraries. This tutorial assumes no prior knowledge of usability evaluation, and is aimed at all those involved in the development and deployment of digital libraries

    usage and usability assessment: library practices and concerns

    Get PDF
    This report offers a survey of the methods that are being deployed at leading digital libraries to assess the use and usability of their online collections and services. Focusing on 24 Digital Library Federation member libraries, the study's author, Distinguished DLF Fellow Denise Troll Covey, conducted numerous interviews with library professionals who are engaged in assessment. The report describes the application, strengths, and weaknesses of assessment techniques that include surveys, focus groups, user protocols, and transaction log analysis. Covey's work is also an essential methodological guidebook. For each method that she covers, she is careful to supply a definition, explain why and how libraries use the method, what they do with the results, and what problems they encounter. The report includes an extensive bibliography on more detailed methodological information, and descriptions of assessment instruments that have proved particularly effective

    Vector representation of Internet domain names using Word embedding techniques

    Get PDF
    Word embeddings is a well-known set of techniques widely used in natural language processing ( NLP ). This thesis explores the use of word embeddings in a new scenario. A vector space model ( VSM) for Internet domain names ( DNS) is created by taking core ideas from NLP techniques and applying them to real anonymized DNS log queries from a large Internet Service Provider ( ISP) . The main goal is to find semantically similar domains only using information of DNS queries without any other knowledge about the content of those domains. A set of transformations through a detailed preprocessing pipeline with eight specific steps is defined to move the original problem to a problem in the NLP field. Once the preprocessing pipeline is applied and the DNS log files are transformed to a standard text corpus, we show that state-of-the-art techniques for word embeddings can be successfully applied in order to build what we called a DNS-VSM (a vector space model for Internet domain names). Different word embeddings techniques are evaluated in this work: Word2Vec (with Skip-Gram and CBOW architectures), App2Vec (with a CBOW architecture and adding time gaps between DNS queries), and FastText (which includes sub-word information). The obtained results are compared using various metrics from Information Retrieval theory and the quality of the learned vectors is validated with a third party source, namely, similar sites service offered by Alexa Internet, Inc2 . Due to intrinsic characteristics of domain names, we found that FastText is the best option for building a vector space model for DNS. Furthermore, its performance (considering the top 3 most similar learned vectors to each domain) is compared against two baseline methods: Random Guessing (returning randomly any domain name from the dataset) and Zero Rule (returning always the same most popular domains), outperforming both of them considerably. The results presented in this work can be useful in many engineering activities, with practical application in many areas. Some examples include websites recommendations based on similar sites, competitive analysis, identification of fraudulent or risky sites, parental-control systems, UX improvements (based on recommendations, spell correction, etc.), click-stream analysis, representation and clustering of users navigation profiles, optimization of cache systems in recursive DNS resolvers (among others). Finally, as a contribution to the research community a set of vectors of the DNS-VSM trained on a similar dataset to the one used in this thesis is released and made available for download through the github page in [1]. With this we hope that further work and research can be done using these vectors.La vectorización de palabras es un conjunto de técnicas bien conocidas y ampliamente usadas en el procesamiento del lenguaje natural ( PLN ). Esta tesis explora el uso de vectorización de palabras en un nuevo escenario. Un modelo de espacio vectorial ( VSM) para nombres de dominios de Internet ( DNS ) es creado tomando ideas fundamentales de PLN, l as cuales son aplicadas a consultas reales anonimizadas de logs de DNS de un gran proveedor de servicios de Internet ( ISP) . El objetivo principal es encontrar dominios relacionados semánticamente solamente usando información de consultas DNS sin ningún otro conocimiento sobre el contenido de esos dominios. Un conjunto de transformaciones a través de un detallado pipeline de preprocesamiento con ocho pasos específicos es definido para llevar el problema original a un problema en el campo de PLN. Una vez aplicado el pipeline de preprocesamiento y los logs de DNS son transformados a un corpus de texto estándar, se muestra que es posible utilizar con éxito técnicas del estado del arte respecto a vectorización de palabras para construir lo que denominamos un DNS-VSM (un modelo de espacio vectorial para nombres de dominio de Internet). Diferentes técnicas de vectorización de palabras son evaluadas en este trabajo: Word2Vec (con arquitectura Skip-Gram y CBOW) , App2Vec (con arquitectura CBOW y agregando intervalos de tiempo entre consultas DNS ), y FastText (incluyendo información a nivel de sub-palabra). Los resultados obtenidos se comparan usando varias métricas de la teoría de Recuperación de Información y la calidad de los vectores aprendidos es validada por una fuente externa, un servicio para obtener sitios similares ofrecido por Alexa Internet, Inc . Debido a características intrínsecas de los nombres de dominio, encontramos que FastText es la mejor opción para construir un modelo de espacio vectorial para DNS . Además, su performance es comparada contra dos métodos de línea base: Random Guessing (devolviendo cualquier nombre de dominio del dataset de forma aleatoria) y Zero Rule (devolviendo siempre los mismos dominios más populares), superando a ambos de manera considerable. Los resultados presentados en este trabajo pueden ser útiles en muchas actividades de ingeniería, con aplicación práctica en muchas áreas. Algunos ejemplos incluyen recomendaciones de sitios web, análisis competitivo, identificación de sitios riesgosos o fraudulentos, sistemas de control parental, mejoras de UX (basada en recomendaciones, corrección ortográfica, etc.), análisis de flujo de clics, representación y clustering de perfiles de navegación de usuarios, optimización de sistemas de cache en resolutores de DNS recursivos (entre otros). Por último, como contribución a la comunidad académica, un conjunto de vectores del DNS-VSM entrenado sobre un juego de datos similar al utilizado en esta tesis es liberado y hecho disponible para descarga a través de la página github en [1]. Con esto esperamos a que más trabajos e investigaciones puedan realizarse usando estos vectores

    Using laptop computers to develop basic skills: a handbook for practitioners

    Get PDF

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Individual Differences and Instructed Second Language Acquisition: Insights from Intelligent Computer Assisted Language Learning

    Get PDF
    The present dissertation focuses on the role of cognitive individual difference factors in the acquisition of second language vocabulary in the context of intelligent computer assisted language learning (ICALL). The aim was to examine the association between working memory and declarative memory and the learning of English phrasal verbs in a web-based ICALL-mediated experiment. Following a pretest-posttest design, 127 adult learners of English were assigned to two instructional conditions, namely meaning-focused and form-focused conditions. Learners in both conditions read news texts on the web for about two weeks; learners in the form-focused condition additionally interacted with the texts via selecting multiple-choice options. The results showed that both working memory and declarative memory were predictive of vocabulary acquisition. However, only the working memory effect was modulated by the instructional context, with the effect being found exclusively in the form-focused condition, and thus suggesting the presence of an aptitude-treatment interaction. Finally, findings also revealed that learning during treatment in the form-focused group was nonlinear, and that paying attention to form and meaning simultaneously impeded global reading comprehension for intermediate, not advanced learners. From a theoretical perspective, the findings provide evidence to suggest that individual differences in both working memory and declarative memory affect the acquisition of lexical knowledge in ICALL-supported contexts. Methodologically, the current study illustrates the advantages of conducting interdisciplinary work between ICALL and second language acquisition by allowing for the collection of experimental data through a web-based, all-encompassing ICALL system. Overall, the present dissertation represents an initial attempt at characterizing who is likely to benefit from ICALL-based interventions
    corecore