2,590 research outputs found
Methodologies for the Automatic Location of Academic and Educational Texts on the Internet
Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis.
This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined
Methodologies for the Automatic Location of Academic and Educational Texts on the Internet
Traditionally online databases of web resources have been compiled by a human editor, or though the submissions of authors or interested parties. Considerable resources are needed to maintain a constant level of input and relevance in the face of increasing material quantity and quality, and much of what is in databases is of an ephemeral nature. These pressures dictate that many databases stagnate after an initial period of enthusiastic data entry. The solution to this problem would seem to be the automatic harvesting of resources, however, this process necessitates the automatic classification of resources as ‘appropriate’ to a given database, a problem only solved by complex text content analysis.
This paper outlines the component methodologies necessary to construct such an automated harvesting system, including a number of novel approaches. In particular this paper looks at the specific problems of automatically identifying academic research work and Higher Education pedagogic materials. Where appropriate, experimental data is presented from searches in the field of Geography as well as the Earth and Environmental Sciences. In addition, appropriate software is reviewed where it exists, and future directions are outlined
Who is Tracking Me on the Web?
Lately, online privacy is an issue that has been gaining the focus of the media because more and
more companies have adopted business models which offer its customers free services and
increasing its value by manipulating and possibly selling the information acquired by gathering
the data from its users on the web. These methods are diverse and have benefited from the
technology advancements in the internet browsers and the evolution of technologies such as
HTLM5 and JavaScript.
This dissertation aims to gather and analyze the information regarding the tracking that takes
place in the Portuguese online market, this will be supported by a detailed state of the art of
the technologies and methods used in the gathering of user information.
To identify the major players in web tracking the study will analyze the websites which
Portuguese users frequently visit and evaluate their prevalence and the methods used to gather
the information. With the goal of identifying those entities.
With this research, the author pretends to inform the public of who are the main entities involve
in the process of collecting personal information, its methods used and how to prevent it.
A Paper based on a subset of 300 websites using the same framework and analysis tool of this
research was published for the symposium INForum 2016, which took place in September of
2016.A privacidade na internet têm sido um tópico que nos últimos anos tem vindo a ganhar atenção
dos média. Cada vez mais empresas adotam modelos de negócios que oferecem serviços
gratuitos aos utilizadores, gerando valor por tratar e possivelmente vender informação
adquirida pela recolha de dados da utilização dos websites. Os métodos de recolha são variados
e tiram proveito dos avanços tecnológicos dos browsers e das tecnologias à sua volta, por
exemplo HTLM5 e JavaScript.
Esta dissertação pretende reunir e analisar a informação relativa as tecnologias usadas para a
recolha informação no mercado online português, vai ser suportado por um detalhado estado
da arte das tecnologias e métodos usados na recolha de informação dos utilizadores.
Para identificar as principais entidades envolvidas na perda de privacidade dos utilizadores, este
estudo vai analisar os websites que os portugueses mais frequentam e avaliar a prevalência
dessas entidades e os métodos por esses usados.
Com esta investigação pretende-se informar as pessoas de quem são as principais entidades
envolvidas nos processos de recolha de informação pessoal, os métodos usados e como o
prevenir.
Um artigo baseado num subconjunto de 300 websites utilizando a mesma framework e mesma
ferramenta de analise foi publicado para o simpósio INForum 2016 que se realizou em setembro
de 2016
Describing Web Archives: A Computer-Assisted Approach
Currently, web archives are challenging for users to discover and use. Many archives and libraries are actively collecting web archives, but description in this area has been dominated by bibliographic approaches, which do not connect web archives to existing description or contextual information, and have often resulted in format-based silos. This is primarily because web archiving tools such as Archive-It arrange materials by seeds and groups of seeds, which reflect the complex technical process of web crawling or web recording, and are often not very meaningful to users or helpful for discovery. This article makes the case for arranging and describing web archives in meaningful aggregates according to established standards—showing how archival practices allow archivists to arrange the diversity of web content according to their common forms and functions while empowering them to be creative with their time and thoughtful with their labor. It provides a path to exposing important provenance information to users and demonstrates an existing proof of concept. Finally, it outlines a possible integration between ArchivesSpace and Archive-It that is feasible to implement for many archives and would automate the repetitive parts of creating and updating description for new web crawls
Predicting Rising Follower Counts on Twitter Using Profile Information
When evaluating the cause of one's popularity on Twitter, one thing is
considered to be the main driver: Many tweets. There is debate about the kind
of tweet one should publish, but little beyond tweets. Of particular interest
is the information provided by each Twitter user's profile page. One of the
features are the given names on those profiles. Studies on psychology and
economics identified correlations of the first name to, e.g., one's school
marks or chances of getting a job interview in the US. Therefore, we are
interested in the influence of those profile information on the follower count.
We addressed this question by analyzing the profiles of about 6 Million Twitter
users. All profiles are separated into three groups: Users that have a first
name, English words, or neither of both in their name field. The assumption is
that names and words influence the discoverability of a user and subsequently
his/her follower count. We propose a classifier that labels users who will
increase their follower count within a month by applying different models based
on the user's group. The classifiers are evaluated with the area under the
receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy,
NY, US
Harbouring Dissent: Greek Independent and Social Media and the Antifascist Movement
This article examines Greek activists’ use of a range of communication technologies, including social media, blogs, citizen journalism sites, Web radio, and anonymous networks. Drawing on Anna Tsing’s theoretical model, the article examines key frictions around digital technologies that emerged within a case study of the antifascist movement in Athens, focusing on the period around the 2013 shutdown of Athens Indymedia. Drawing on interviews with activists and analysis of online communications, including issue networks and social media activity, we find that the antifascist movement itself is created and recreated through a process of productive friction, as different groups and individuals with varying ideologies and experiences work together
- …