153 research outputs found

    Data Cleaning Methods for Client and Proxy Logs

    Get PDF
    In this paper we present our experiences with the cleaning of Web client and proxy usage logs, based on a long-term browsing study with 25 participants. A detailed clickstream log, recorded using a Web intermediary, was combined with a second log of user interface actions, which was captured by a modified Firefox browser for a subset of the participants. The consolidated data from both records revealed many page requests that were not directly related to user actions. For participants who had no ad-filtering system installed, these artifacts made up one third of all transferred Web pages. Three major reasons could be identified: HTML Frames and iFrames, advertisements, and automatic page reloads. The experiences made during the data cleaning process might help other researchers to choose adequate filtering methods for their data

    Off the Beaten tracks: Exploring Three Aspects of Web Navigation

    Get PDF
    This paper presents results of a long-term client-side Web usage study, updating previous studies that range in age from five to ten years. We focus on three aspects of Web navigation: changes in the distribution of navigation actions, speed of navigation and within-page navigation. “Navigation actions” corresponding to users’ individual page requests are discussed by type. We reconfirm links to be the most important navigation element, while backtracking has lost more than half of its previously reported share and form submission has become far more common. Changes of the Web and the browser interfaces are candidates for causing these changes. Analyzing the time users stayed on pages, we confirm Web navigation to be a rapidly interactive activity. A breakdown of page characteristics shows that users often do not take the time to read the available text or consider all links. The performance of the Web is analyzed and reassessed against the resulting requirements. Finally, habits of within-page navigation are presented. Although most selected hyperlinks are located in the top left corner of the screen, in nearly a quarter of all cases people choose links that require scrolling. We analyzed the available browser real estate to gain insights for the design of non-scrolling Web pages

    Enhancing Usability Evaluation of Web-Based Geographic Information Systems (WebGIS) with Visual Analytics

    Get PDF
    Many websites nowadays incorporate geospatial data that users interact with, for example, to filter search results or compare alternatives. These web-based geographic information systems (WebGIS) pose new challenges for usability evaluations as both the interaction with classic interface elements and with map-based visualizations have to be analyzed to understand user behavior. This paper proposes a new scalable approach that applies visual analytics to logged interaction data with WebGIS, which facilitates the interactive exploration and analysis of user behavior. In order to evaluate our approach, we implemented it as a toolkit that can be easily integrated into existing WebGIS. We then deployed the toolkit in a user study (N=60) with a realistic WebGIS and analyzed users\u27 interaction in a second study with usability experts (N=7). Our results indicate that the proposed approach is practically feasible, easy to integrate into existing systems, and facilitates insights into the usability of WebGIS

    Evaluating the Readability of Scientific Web Pages Using Intelligent Analysis Tools

    Get PDF
    The World Wide Web (WWW) is a primary resource of information. With the growth of the WWW, it is not only enough to have a web presence, but also to have a highly readable website. Scientific web pages include text content, tables, graphs, charts, images, mathematical formulae that are difficult to represent in a legible manner. This study involved creating a sample scientific website. Users had to browse a sample website and answer a survey questionnaire. The survey responses were analyzed using data mining techniques from SAS Enterprise Miner to determine the main factors affecting readability of the website

    Usability Tool Support for Model-Based Web Development

    Get PDF
    When web engineering methods are used for the development of web applications, models are created during the development process which describe the website. Using the information present in these models, it is possible to create usability tool support that is more advanced than current approaches, which do not rely on the presence of models. This dissertation presents ideas for tool support during different phases of the development, such as the implementation phase or the testing phase. For example, if a tool knows from a model that the audience of a website are teenagers, it can examine whether the words and sentences used on the website are likely to be understood by teenagers. An approach is presented to augment existing web engineering models with the additional information ("age" in this case) and to make it available to tools, e.g. via embedding it in HTML code. Two prototypes demonstrate the concepts for integrating usability tool support into web engineering

    The Wooster Voice (Wooster, OH), 2006-11-17

    Get PDF
    This edition of the College of Wooster\u27s student run newspaper was published on November 17 of 2006, and it is eight pages long. The first candidate that may take over after President Stanton Hales\u27 retirement is Ralph Kuncl, he visited campus this past Monday. The Ebert Art Center had 35 covers of the Alumni Magazine on display from November 4 to 11 to celebrate the magazine\u27s 120th year. Online registration was a success this year, having very little problems, and receiving very few calls. This Friday, C.O.W Country Fair will be held in Lowry as a part of International Education Week. Page four has an article about the student comedy called \u27Don\u27t Throw Shoes.\u27 The athletic updates for the week are on pages seven and eight.https://openworks.wooster.edu/voice2001-2011/1152/thumbnail.jp

    Privacy-preserving machine learning system at the edge

    Get PDF
    Data privacy in machine learning has become an urgent problem to be solved, along with machine learning's rapid development and the large attack surface being explored. Pre-trained deep neural networks are increasingly deployed in smartphones and other edge devices for a variety of applications, leading to potential disclosures of private information. In collaborative learning, participants keep private data locally and communicate deep neural networks updated on their local data, but still, the private information encoded in the networks' gradients can be explored by adversaries. This dissertation aims to perform dedicated investigations on privacy leakage from neural networks and to propose privacy-preserving machine learning systems for edge devices. Firstly, the systematization of knowledge is conducted to identify the key challenges and existing/adaptable solutions. Then a framework is proposed to measure the amount of sensitive information memorized in each layer's weights of a neural network based on the generalization error. Results show that, when considered individually, the last layers encode a larger amount of information from the training data compared to the first layers. To protect such sensitive information in weights, DarkneTZ is proposed as a framework that uses an edge device's Trusted Execution Environment (TEE) in conjunction with model partitioning to limit the attack surface against neural networks. The performance of DarkneTZ is evaluated, including CPU execution time, memory usage, and accurate power consumption, using two small and six large image classification models. Due to the limited memory of the edge device's TEE, model layers are partitioned into more sensitive layers (to be executed inside the device TEE), and a set of layers to be executed in the untrusted part of the operating system. Results show that even if a single layer is hidden, one can provide reliable model privacy and defend against state of art membership inference attacks, with only a 3% performance overhead. This thesis further strengthens investigations from neural network weights (in on-device machine learning deployment) to gradients (in collaborative learning). An information-theoretical framework is proposed, by adapting usable information theory and considering the attack outcome as a probability measure, to quantify private information leakage from network gradients. The private original information and latent information are localized in a layer-wise manner. After that, this work performs sensitivity analysis over the gradients \wrt~private information to further explore the underlying cause of information leakage. Numerical evaluations are conducted on six benchmark datasets and four well-known networks and further measure the impact of training hyper-parameters and defense mechanisms. Last but not least, to limit the privacy leakages in gradients, I propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems. TEEs are utilized on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. This work leverages greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of the implementation shows that PPFL significantly improves privacy by defending against data reconstruction, property inference, and membership inference attacks while incurring small communication overhead and client-side system overheads. This thesis offers a better understanding of the sources of private information in machine learning and provides frameworks to fully guarantee privacy and achieve comparable ML model utility and system overhead with regular machine learning framework.Open Acces

    Web browsing interactions inferred from a flow-level perspective

    Get PDF
    Desde que su uso se extendiera a mediados de los noventa, la web ha sido probablemente el servicio de Internet más popular. De hecho, muchos usuarios la utilizan prácticamente como sinónimo de Internet. Hoy en día los usuarios de la web utilizan una gran cantidad dispositivos distintos para acceder a ella desde ordenadores tradicionales a teléfonos móviles, tabletas, lectores de libros electrónicos o, incluso, relojes inteligentes. Además, los usuarios se han acostumbrado a acceder a diferentes servicios a través de sus navegadores web en vez de utilizar aplicaciones dedicadas a ello. Este es el caso, por ejemplo del correo electrónico, del streaming de vídeo o de suites ofimáticas (como la proporcionada por Google Docs). Como consecuencia de todo esto, hoy en día el tráfico web es muy complejo y el efecto que tiene en las redes es muy importante. La comunidad científica ha reaccionado a esta situación impulsando muchos estudios que caracterizan la web y su tráfico y que proponen maneras de mejorar su funcionamiento. Sin embargo, muchos estudios centrados en el tráfico web han considerado el tráfico de los clientes o los servidores en su totalidad con el objetivo de describirlo estadísticamente. En otros casos, se han introducido en el nivel de aplicación al centrarse en los mensajes HTTP. Pocos trabajos han buscado describir el efecto que las sesiones de un sitio web y las visitas a páginas web tienen en el tráfico de un usuario. No obstante, esas interacciones son las que el usuario experimenta al navegar y, por tanto, son las que mejor representan su comportamiento. El trabajo que se presenta en esta tesis gira alrededor de esas interacciones y se enfoca especialmente en identificarlas en el tráfico de los usuarios. Esta tesis aborda el problema desde una perspectiva a nivel de flujo. En otras palabras, el estudio que se presenta se centra en una caracterización del tráfico web obtenida para cada conexión mediante datos de los niveles de transporte y red, nunca mediante datos de aplicación. La perspectiva a nivel de flujo introduce ciertas limitaciones en las propuestas desarrolladas, pero lo compensa al permitir desarrollar sistemas escalables, fáciles de instalar en cualquier red y que evitan acceder a información de usuario que podría ser sensible. En los capítulos de este documento se introducen varios métodos para identificar sesiones a sitios web y descargas de páginas web en el tráfico de los usuarios. Para desarrollar dichos métodos se ha caracterizado tráfico web capturado de varias formas: accediendo a páginas automáticamente, con la ayuda de voluntarios en un entorno controlado y en el enlace de la Universidad Pública de Navarra. Los métodos que presentamos se basan en parámetros a nivel de conexión como los tiempos de inicio y final de los flujos o las direcciones IP de servidor. Estos parámetros se emplean para encontrar conexiones relacionadas en el tráfico de los usuarios. La validación de los resultados obtenidos con los distintos métodos ha sido complicada al no disponer de trazas etiquetadas correctamente que puedan usarse para verificar que las clasificaciones se han realizado de forma correcta. Además, al no haber propuestas similares en la literatura científica ha sido imposible comparar los resultados obtenidos con los de otros autores. Por todo esto ha sido necesario diseña métodos específicos de validación que también se describen en este documento. Ser capaces de identificar sesiones a sitios web y descargas de páginas web tiene aplicaciones inmediatas para administradores de red y proveedores de servicio ya que les permitiría recoger datos sobre el perfil de navegación de sus usuarios e incluso bloquear tráfico indeseado y dar prioridad al importante. Además, las ventajas de trabajar a nivel de conexión se aplican especialmente en su caso. Por último, los resultados obtenidos a través de los métodos presentados en esta tesis podrían emplearse en diseñar esquemas capaces de clasificar el tráfico web dependiendo del servicio que lo haya producido ya que se podrían utilizar como parámetros de entrada las características de múltiples conexiones relacionadas.Since its use became widespread during the mid 1990s, the web has probably been the most popular Internet service. In fact, for many lay users, the web is almost a synonym for the Internet. Web users today access it from a myriad of different devices from traditional computers to smartphones, tablets, ebook readers and even smart watches. Moreover, users have become accustomed to accessing multiple different services through their web browsers instead of through dedicated applications. This is the case, for example, of e-mail, video-streaming or office suites (such as the one provided by Google Docs). As a consequence, web traffic nowadays is complex and its effect on the networks is very important. The scientific community has reacted to this providing many works that characterize the web and its traffic and propose ways of improving its operation. Nevertheless, studies focused on web traffic have often considered the traffic of web clients or servers as a whole in order to describe their particular performance, or have delved into the application level by focusing on HTTP messages. Few works have attempted to describe the effect of website sessions and webpage visits on web traffic. Those web browsing interactions are, however, the elements of web operation that the user actually experiences and thus are the most representative of his behavior. The work presented in this thesis revolves around these web interactions with the special focus of identifying them in user traffic. This thesis offers a distinctive approach in that the problem at hand is faced from a flow-level perspective. That is, the study presented here centers on a characterization of web traffic obtained on a per connection basis and using information from the transport and network levels rather than relying on deep packet inspection. This flow-level perspective introduces various constraints to the proposals developed, but pays off by offering scalability, ease of deployment, and by avoiding the need to access potentially sensitive application data. In the chapters of this document, different methods for identifying website sessions and webpage downloads in user traffic are introduced. In order to develop those methods, web traffic is characterized from a connection perspective using traces captured by accessing the web automatically, with the help of voluntary users in a controlled environment, and captured in the wild from users of the Public University of Navarre. The methods rely on connection-level parameters such as start and end timestamps or server IP addresses in order to find related connections in the traffic of web users. Evaluating the performance of the different methods has been problematic because of the absence of ground truth (labeled web traffic traces are hard to obtain and the labeling process is very complex) and the lack of similar research which could be used for comparison purposes. As a consequence, specific validation methods have been designed and they are also described in this document. Identifying website sessions and webpage downloads in user traffic has multiple immediate applications for network administrators and Internet service providers as it would allow them to gather additional insight into their users browsing behavior and even block undesired traffic or prioritize important one. Moreover, the advantages of a connection-level perspective would be specially interesting for them. Finally, this work could also help in research directed to classifying thee services provided through the web as grouping the connections related to the same website session may offer additional information for the classification process.Programa Oficial de Doctorado en Tecnologías de las Comunicaciones (RD 1393/2007)Komunikazioen Teknologietako Doktoretza Programa Ofiziala (ED 1393/2007

    Investigating value propositions in social media: studies of brand and customer exchanges on Twitter

    Get PDF
    Social media presents one of the richest forums to investigate publicly explicit brand value propositions and its corresponding customer engagement. Seldom have researchers investigated the nature of value propositions available on social media and the insights that can be unearthed from available data. This work bridges this gap by studying the value propositions available on the Twitter platform. This thesis presents six different studies conducted to examine the nature of value propositions. The first study presents a value taxonomy comprising 15 value propositions that are identified in brand tweets. This taxonomy is tested for construct validity using a Delphi panel of 10 experts – 5 from information science and 5 from marketing. The second study demonstrates the utility of the taxonomy developed by identifying the 15 value propositions from brand tweets (nb=658) of the top-10 coffee brands using content analysis. The third study investigates the feedback provided by customers (nc=12077) for values propositioned by the top-10 coffee brands (for the 658 brand tweets). Also, it investigates which value propositions embedded in brand tweets attract ‘shallow’ vs. ‘deep’ engagement from customers. The fourth study is a replication of studies 2 and 3 for a different time-period. The data considered for studies 2 and 3 was for a 3-month period in 2015. In the fourth study, Twitter data for the same brands was analysed for a different (nb=290, nc=8811) 3-month period in 2018. This study thus examines the nature of change in value propositions across brands over time. The fifth study was on generalizability and replicates the investigation of brand and customer tweets (nb=635, nc=7035) in the market domain of the top-10 car brands in 2018. Lastly, study six conducted an evaluation of a software system called Value Analysis Toolkit (VAT) that was constructed based on the research findings in studies 1 - 5. This tool is targeted at researchers and practitioners who can use the tool to obtain value proposition-based insights from social media data (brand value propositions and the corresponding feedback from customers). The developed tool is evaluated for external validity using 35 students and 5 industry participants in three dimensions (tool’s analytics features, usability and usefulness). Overall, the contributions of this thesis are: a) a taxonomy to identify value propositions in Twitter (study 1) b) an approach to extract value proposition-based insights in brand tweets and the corresponding feedback from customers in the process of value co-creation (studies 2 - 5) for the top-10 coffee and car brands, and c) an operational tool (study 6) that can be used to analyse value propositions of various brands (e.g., compare value propositions of different brands), and identify which value propositions attract positive electronic word of mouth (eWOM). These value proposition-based insights can be used by social media managers to devise social-media strategies that are likely to stimulate positive discussions about a brand in social media

    Generating intelligent tutoring systems for teaching reading: combining phonological awareness and thematic approaches

    Get PDF
    The objective of this thesis is to investigate the use of computers with artificial intelligence methods for the teaching of basic literacy skills to be applied eventually to the teaching of illiterate adults in Brazil.In its development many issues related to adult education have been considered, and two very significant approaches to the teaching of reading were focused on in detail: Phonological Awareness (PA) and Generative Themes. After being excluded from literacy curricula for a long time during the ascendancy of the "Whole Word" approaches, activities for the development of phonological awareness are currently being accepted as fundamental for teaching reading, and are being incorporated in most English literacy programmes. Generative Themes, in turn, were first introduced in Brazil in a massive programme for teaching reading to adults, and have since then been used successfully in a number of developing countries for the same purpose. However, these two approaches are apparently conflicting in their principles and emphasis, for the first (PA) is generally centred on the technical aspects of phonology, based on well controlled experiments and research, whereas the second is socially inspired and focused mainly on meaning and social relationships.The main question addressed in this research, consequently, is whether these two apparently conflicting approaches could be combined to create a method that would be technically PA oriented but at the same time could concentrate on meaning by using thematic vocabularies as stimuli for teaching. Would it be possible to find words to illustrate all the phonological features with which a PA method deals using a thematic vocabulary?To answer this question diverse concepts, languages and tools have been developed as part of this research, in order to allow the selection of thematic vocabularies, the description of PA curricula, the distribution of thematic words across PA curricula, the description of teaching activities and the definition of the teaching strategy rules to orient the teaching sequence.The resultant vocabularies have been evaluated and the outcomes of the research have been assessed by literacy experts. A prototype system for delivering experimental teaching activities through the Internet has also been developed and demonstrated
    corecore