2,590 research outputs found

    Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

    Get PDF
    Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

    Methodologies for the Automatic Location of Academic and Educational Texts on the Internet

    Get PDF
    Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis. This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined

    Who is Tracking Me on the Web?

    Get PDF
    Lately, online privacy is an issue that has been gaining the focus of the media because more and more companies have adopted business models which offer its customers free services and increasing its value by manipulating and possibly selling the information acquired by gathering the data from its users on the web. These methods are diverse and have benefited from the technology advancements in the internet browsers and the evolution of technologies such as HTLM5 and JavaScript. This dissertation aims to gather and analyze the information regarding the tracking that takes place in the Portuguese online market, this will be supported by a detailed state of the art of the technologies and methods used in the gathering of user information. To identify the major players in web tracking the study will analyze the websites which Portuguese users frequently visit and evaluate their prevalence and the methods used to gather the information. With the goal of identifying those entities. With this research, the author pretends to inform the public of who are the main entities involve in the process of collecting personal information, its methods used and how to prevent it. A Paper based on a subset of 300 websites using the same framework and analysis tool of this research was published for the symposium INForum 2016, which took place in September of 2016.A privacidade na internet têm sido um tópico que nos últimos anos tem vindo a ganhar atenção dos média. Cada vez mais empresas adotam modelos de negócios que oferecem serviços gratuitos aos utilizadores, gerando valor por tratar e possivelmente vender informação adquirida pela recolha de dados da utilização dos websites. Os métodos de recolha são variados e tiram proveito dos avanços tecnológicos dos browsers e das tecnologias à sua volta, por exemplo HTLM5 e JavaScript. Esta dissertação pretende reunir e analisar a informação relativa as tecnologias usadas para a recolha informação no mercado online português, vai ser suportado por um detalhado estado da arte das tecnologias e métodos usados na recolha de informação dos utilizadores. Para identificar as principais entidades envolvidas na perda de privacidade dos utilizadores, este estudo vai analisar os websites que os portugueses mais frequentam e avaliar a prevalência dessas entidades e os métodos por esses usados. Com esta investigação pretende-se informar as pessoas de quem são as principais entidades envolvidas nos processos de recolha de informação pessoal, os métodos usados e como o prevenir. Um artigo baseado num subconjunto de 300 websites utilizando a mesma framework e mesma ferramenta de analise foi publicado para o simpósio INForum 2016 que se realizou em setembro de 2016

    Describing Web Archives: A Computer-Assisted Approach

    Get PDF
    Currently, web archives are challenging for users to discover and use. Many archives and libraries are actively collecting web archives, but description in this area has been dominated by bibliographic approaches, which do not connect web archives to existing description or contextual information, and have often resulted in format-based silos. This is primarily because web archiving tools such as Archive-It arrange materials by seeds and groups of seeds, which reflect the complex technical process of web crawling or web recording, and are often not very meaningful to users or helpful for discovery. This article makes the case for arranging and describing web archives in meaningful aggregates according to established standards—showing how archival practices allow archivists to arrange the diversity of web content according to their common forms and functions while empowering them to be creative with their time and thoughtful with their labor. It provides a path to exposing important provenance information to users and demonstrates an existing proof of concept. Finally, it outlines a possible integration between ArchivesSpace and Archive-It that is feasible to implement for many archives and would automate the repetitive parts of creating and updating description for new web crawls

    Predicting Rising Follower Counts on Twitter Using Profile Information

    Full text link
    When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy, NY, US

    Harbouring Dissent: Greek Independent and Social Media and the Antifascist Movement

    Get PDF
    This article examines Greek activists’ use of a range of communication technologies, including social media, blogs, citizen journalism sites, Web radio, and anonymous networks. Drawing on Anna Tsing’s theoretical model, the article examines key frictions around digital technologies that emerged within a case study of the antifascist movement in Athens, focusing on the period around the 2013 shutdown of Athens Indymedia. Drawing on interviews with activists and analysis of online communications, including issue networks and social media activity, we find that the antifascist movement itself is created and recreated through a process of productive friction, as different groups and individuals with varying ideologies and experiences work together
    • …
    corecore