18 research outputs found

    Improving efficacy of library Services: ARIMA modelling for predicting book borrowing for optimizing resource utilization

    Get PDF
    Book borrowing is a key service in libraries. Library users frequently visit the library for borrowing books compared to other library services. To predict book-borrowing service in a college library, Auto Regressive Integrated Moving Average (ARIMA) model has been developed from the data pertaining to book borrowing during the year 1998 to 2013. The study found that the number of books borrowed one month and twelve months earlier could estimate the number of books borrowed in a month. The study used a fitted model for predicting book borrowing for the year 2014 by two alternative approaches: 12-steps ahead versus 1-step ahead. The calculations show that there was no significant difference (P=0.928; Wilcoxon signed rank test) between 1-step and 12-steps ahead approach for predicting book borrowing. However, the Root Mean Squared Error (RMSE) in 1-step ahead approach (109.57) was lower than 12-steps approach (131.33). The study findings indicate that ARIMA models are useful for monitoring book borrowing in institutional libraries. Furthermore, these models can predict library usage trends

    Improving efficacy of library Services: ARIMA modelling for predicting book borrowing for optimizing resource utilization

    Get PDF
    Book borrowing is a key service in libraries. Library users frequently visit the library for borrowing books compared to other library services. To predict book-borrowing service in a college library, Auto Regressive Integrated Moving Average (ARIMA) model has been developed from the data pertaining to book borrowing during the year 1998 to 2013. The study found that the number of books borrowed one month and twelve months earlier could estimate the number of books borrowed in a month. The study used a fitted model for predicting book borrowing for the year 2014 by two alternative approaches: 12-steps ahead versus 1-step ahead. The calculations show that there was no significant difference (P=0.928; Wilcoxon signed rank test) between 1-step and 12-steps ahead approach for predicting book borrowing. However, the Root Mean Squared Error (RMSE) in 1-step ahead approach (109.57) was lower than 12-steps approach (131.33). The study findings indicate that ARIMA models are useful for monitoring book borrowing in institutional libraries. Furthermore, these models can predict library usage trends

    You are how (and where) you search? Comparative analysis of web search behavior using web tracking data.

    Get PDF
    In this article, we conduct a comparative analysis of web search behaviors in Switzerland and Germany. For this aim, we rely on a combination of web tracking data and survey data collected over a period of 2 months from users in Germany (n = 558) and Switzerland (n = 563). We find that web search accounts for 13% of all desktop browsing, with the share being higher in Switzerland than in Germany. In over 50% of cases users clicked on the first search result, with over 97% of all clicks being made on the first page of search outputs. Most users rely on Google when conducting searches, with some differences observed in users' preferences for other engines across demographic groups. Further, we observe differences in the temporal patterns of web search use between women and men, marking the necessity of disaggregating data by gender in observational studies regarding online information seeking behaviors. Our findings highlight the contextual differences in web search behavior across countries and demographic groups that should be taken into account when examining search behavior and the potential effects of web search result quality on societies and individuals

    Race and Health Online: A Public Health Exploration of the Digital Landscape

    Get PDF
    The Internet has continued to reach new audience members and is an integral part of United States society. Social Cognitive Theory addresses the impact of the environment on health behavior, providing justification for surveillance of the digital environment in health behavior research. Health information headlines from two highly trafficked news sites were analyzed using content analysis. Search terms used were health, Blacks, African American, ethnicity and 2011. The headlines were coded by independent graduate level individuals and assessed for nine indices of interest. There were 209 headlines analyzed for the study. Headlines contained health information that correlated with social predictors and indicators for moral exclusion and social injustice. This study indicates that racial assumptions continue to be evident in the reporting of news and the conveyance of health information, assumptions that shape attitudes for research, policy and practice

    Google Analytics based Temporal-Geospatial Analysis for Web Management: A Case Study of a K-12 Online Resource Website

    Get PDF
    As Google Analytics becomes increasingly popular, more detailed records of users’ behaviors can be captured and analyzed to better understand the performance of websites. However, current Google Analytics related research usually draws conclusions from rough estimation based on the observation of the dashboard or other basic statistical processing of the data. This study aims to provide a more accurate and informative analysis from both temporal and geospatial perspectives via clustering and GIS application. The results obtained from a resource website case study demonstrate that the proposed method is able to help web managers better examine the temporal effect on users’ visiting patterns based on accurate mathematical computation as well as provides more geographical insight into website performance through the constructed density measure and 3D graphic presentation. By offering in-depth quantitative information relying on mining data from web logs, such a study can help web stakeholders make better decisions on how to maintain and improve the websites, especially adjusting resources by considering temporal fluctuations and inequity in geographical distribution

    A Systematic Review on Search Engine Advertising

    Get PDF
    The innovation of Search Engine Advertising (SEA) was first introduced in 1998. It soon became a very popular tool among practitioners for promoting their websites on the Web and turned into a billion dollar revenue source for search engines. In parallel with its rapid growth in use, SEA attracted the attention of academic researchers resulting in a large number of publications on the topic of SEA. However, no comprehensive review of this accumulated body of knowledge is currently available. This shortcoming has motivated us to conduct a systematic review of SEA literature. Herewith, we searched for and collected 101 papers on the topic of SEA, published in 72 journals from different disciplines and analyzed them to answer the research questions for this study. We have identified the historical development of SEA literature, predominant journals in the publication of SEA research, active reference disciplines as well as the main researchers in the field of SEA. Moreover, we have classified SEA literature into four categories and 10 research topics. We also uncovered a number of gaps in SEA literature and provided future research direction accordingly. Available at: https://aisel.aisnet.org/pajais/vol7/iss3/2

    IPTV log events profiling

    Get PDF
    Tese de mestrado em Segurança InformĂĄtica, apresentada Ă  Universidade de Lisboa, atravĂ©s da Faculdade de CiĂȘncias, 2010O objectivo central da anĂĄlise de logs Ă© adquirir novos conhecimentos que ajudem os administradores de sistemas a compreender melhor o modo como os seus sistemas estĂŁo sendo usados. As redes de telecomunicaçÔes constituem uma das ĂĄreas onde a quantidade de dados de log registados Ă© enorme e onde, na maioria das vezes, apenas uma Ă­nfima parte desses dados Ă© analisada. Isso coloca importantes perguntas a um gestor de rede: Devem continuam a registar esses dados? Existe alguma informação relevante que possa ser extraĂ­da a partir desses dados? Os mĂ©todos para a extrair jĂĄ existem? Poderemos melhorar a operação e a gestĂŁo, se essa informação estivesse disponĂ­vel? SĂł apĂłs responder a estas perguntas Ă© que um gestor pode decidir o que fazer com um log de dados que consuma imensos recursos. A finalidade principal do nosso trabalho Ă© analisar o tipo de informaçÔes adicionais que possam ser extraĂ­das dos arquivos de log dos servidores de uma plataforma de IPTV. O nosso foco especĂ­fico Ă© tentar compreender se Ă© possĂ­vel determinar quais as sequĂȘncias de eventos que sĂŁo despoletadas na plataforma quando um cliente executa uma determinada acção. Essas sequĂȘncias sĂŁo desconhecidas para nĂłs e nĂŁo sĂŁo documentadas pelo fornecedor de software. Para este trabalho escolhemos a sequĂȘncia de “arranque" que o utilizador despoleta quando inicializa a sua set‐top‐box (STB). Caracterizamos totalmente essa sequĂȘncia de pedidos web services que uma STB realiza durante a fase de arranque e autenticação na plataforma IPTV. De seguida, desenvolvemos um mĂ©todo que pode ser aplicado automaticamente para isolar essas sequĂȘncias nos logs de dados em bruto. Esse mĂ©todo começa por definir o evento de inĂ­cio e depois tenta identificar o evento final da sequĂȘncia aplicando um conjunto de regras empĂ­ricas definidas por nĂłs. Todos os eventos entre aqueles dois constituem a sequĂȘncia e, embora a maioria deles sejam mandatĂłrios, existem alguns eventos que ocorrem apenas em certos casos, devido Ă s caracterĂ­sticas particulares de cada STB. Finalmente, e depois de isolar as sequĂȘncias, realizamos uma anĂĄlise estatĂ­stica a fim de validar a exactidĂŁo do nosso mĂ©todo de identificação/isolamento. A metodologia Ă© desenvolvida e avaliada utilizando um conjunto de dados reais de uma plataforma de IPTV pertencente a uma empresa de telecomunicaçÔes. Os resultados confirmam que o mĂ©todo escolhido pode produzir um resultado com um alto nĂ­vel de precisĂŁo, tendo em conta as caracterĂ­sticas especĂ­ficas dos dados de entrada. Com o conhecimento adicional que este mĂ©todo pode extrair dos logs de dados, os gestores e os engenheiros de rede podem caracterizar melhor os vĂĄrios utilizadores que estĂŁo usando a plataforma, estabelecendo perfis tĂ­picos para as sequĂȘncias de eventos e isolando as sequĂȘncias anormais para que as mesmas possam ser alvo de anĂĄlise adicional em termos tĂ©cnicos ou de segurança.The central goal of log analysis is to acquire new knowledge which will help systems administrators to better understand how their systems are being used. Telecommunications networks are one of those areas where the amount of logged data is huge and, most of the times, only parts of it are analyzed. That presents big questions to a network manager: Should they continue to log the data? Is there any relevant information that can be extracted from that data? The methods to extract it already exist? Could operations and management be improved if that information was available? Only by answering these questions can a manager decide what to do with the logging of data that consumes lots of resources. The main objective of our work is to analyze what type of additional information can be extracted from the server log files of an IPTV platform. Our specific focus is in trying to understand if it is possible to determine what sequences of events are triggered at the platform when a client executes a certain action. These sequences are unknown to us and are not documented by the software provider. For this work, we choose the “boot‐up action” that the user performs when he powers‐on his set‐top‐box (STB). We fully characterize that sequence of web service requests that an STB performs while booting up and logging into the IPTV platform. Then, we develop a method that can be automatically applied to isolate those sequences in the logs raw data. This method starts by defining the start event and then tries to identify the sequence’s end event by applying a set of empirical rules defined by us. The events in between constitute the sequence and while most of them are mandatory there are some events that only occur in certain cases due to the characteristics of a particular STB. Finally, after isolating the sequences, we perform a statistical analysis in order to validate the accuracy of our identification/isolation method. The methodology is developed and evaluated using a real dataset from a telecommunications company’s IPTV platform. The results confirm that the chosen method can produce an output with a high level of accuracy, taking into account the specific characteristics of the input logs data. With the additional knowledge that this method can extract from the logs data, network managers and engineers can better characterize the many users that are using the platform, establishing the typical sequence of events profiles and isolating the anomalous ones for further inspection focused both on technical and security concerns

    Renewable Estimation and Incremental Inference with Streaming Health Datasets

    Full text link
    The overarching objective of my dissertation is to develop a new methodology that allows to sequentially update parameter estimates and their standard errors along with data streams. The key technical novelty pertains to the fact that the proposed estimation method, termed as renewable estimation in my dissertation, uses current data and summary statistics of historical data, but no use of any historical subject-level data. To implement renewable estimation, I utilize the powerful Lambda architecture in Apache Spark to design a new paradigm that includes an inference layer in addition to the existing speed layer. This expanded architecture is named as the Rho architecture, which accommodates inference-related statistics and to facilitate sequential updating of quantities involved in estimation and inference. The first project focuses on the renewable estimation in the setting of generalized linear models (RenewGLM) in which I develop a new sequential updating algorithm to calculate numerical solutions of parameter estimates and related inferential quantities. The proposed procedure aggregates both score functions and information matrices over streaming data batches through some summary statistics. I show that the resulting estimation is asymptotically equivalent to the maximum likelihood estimation (MLE) obtained with the entire data once. I demonstrate this new methodology on the analysis of the National Automotive Sampling System-Crashworthiness Data System (NASS CDS) to evaluate the effectiveness of graduated driver licensing (GDL) in the USA. The second project focuses on a substantial extension of the first project to analyze streaming datasets with correlated outcomes, such as clustered data and longitudinal data. I establish the theoretical guarantees for the proposed renewable quadratic inference function (RenewQIF) for dependent outcomes and implement it within the Rho architecture. Furthermore, I relax the homogeneous assumption in the first project and consider regime-switching regression models with a structural change-point. I propose a real-time hypothesis testing procedure based on a goodness-of-fit test statistic that is shown to achieve both proper type I error control and desirable change-point detection power. The third project concerns data streams that involve both inter-data batch correlation and dynamic heterogeneity, arising typically from various types of electronic health records (EHR) and mobile health data. This project is built in the framework of state space models in which the observed data stream is driven by a latent state process that may incorporate trend, seasonal, or time-varying covariate effects. In this setting, calculating the online MLE is challenge due to the involvement of high-dimensional integrals and complex covariance structures. In this project, I develop a Kalman filter to facilitate a multivariate online regression analysis (MORA) in the context of linear state space mixed models. MORA enables to renew both point estimates and standard errors of the fixed effects. We also apply the MORA method to analyze an EHR data example, adjusting for some heterogeneous batch-specific effects.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163085/1/luolsph_1.pd
    corecore