7 research outputs found
Contributos para a eficácia do clustering usando o tagging social
Doutoramento em Informação e Comunicação em Plataformas DigitaisNos últimos anos temos vindo a assistir a uma mudança na forma como a informação é disponibilizada online. O surgimento da web para todos possibilitou a fácil edição, disponibilização e partilha da informação gerando um considerável aumento da mesma. Rapidamente surgiram sistemas que permitem a coleção e partilha dessa informação, que para além de possibilitarem a coleção dos recursos também permitem que os utilizadores a descrevam utilizando tags ou comentários. A organização automática dessa informação é um dos maiores desafios no contexto da web atual. Apesar de existirem vários algoritmos de clustering, o compromisso entre a eficácia (formação de grupos que fazem sentido) e a eficiência (execução em tempo aceitável) é difícil de encontrar.
Neste sentido, esta investigação tem por problemática aferir se um sistema de agrupamento automático de documentos, melhora a sua eficácia quando se integra um sistema de classificação social.
Analisámos e discutimos dois métodos baseados no algoritmo k-means para o clustering de documentos e que possibilitam a integração do tagging social nesse processo. O primeiro permite a integração das tags diretamente no Vector Space Model e o segundo propõe a integração das tags para a seleção das sementes iniciais. O primeiro método permite que as tags sejam pesadas em função da sua ocorrência no documento através do parâmetro Social Slider. Este método foi criado tendo por base um modelo de predição que sugere que, quando se utiliza a similaridade dos cossenos, documentos que partilham tags ficam mais próximos enquanto que, no caso de não partilharem, ficam mais distantes. O segundo método deu origem a um algoritmo que denominamos k-C. Este para além de permitir a seleção inicial das sementes através de uma rede de tags também altera a forma como os novos centróides em cada iteração são calculados. A alteração ao cálculo dos centróides teve em consideração uma reflexão sobre a utilização da distância euclidiana e similaridade dos cossenos no algoritmo de clustering k-means.
No contexto da avaliação dos algoritmos foram propostos dois algoritmos, o algoritmo da “Ground truth automática” e o algoritmo MCI. O primeiro permite a deteção da estrutura dos dados, caso seja desconhecida, e o segundo é uma medida de avaliação interna baseada na similaridade dos cossenos entre o documento mais próximo de cada documento.
A análise de resultados preliminares sugere que a utilização do primeiro método de integração das tags no VSM tem mais impacto no algoritmo k-means do que no algoritmo k-C. Além disso, os resultados obtidos evidenciam que não existe correlação entre a escolha do parâmetro SS e a qualidade dos clusters. Neste sentido, os restantes testes foram conduzidos utilizando apenas o algoritmo k-C (sem integração de tags no VSM), sendo que os resultados obtidos indicam que a utilização deste algoritmo tende a gerar clusters mais eficazes.In recent years there has been a change in the way information is displayed online. The generalized access to the world wide web allowed an easy production, editing, distribution and sharing of the information, resulting in a massive increase of data. Thereafter were created systems thought to collect and share that information, as well as allowing the users to tag or comment the data. The automatic organization of that information is one of the biggest challenges in the current Web context. Despite the existence of several clustering algorithms, the commitment between effectiveness (forming groups that make sense) and efficiency (doing so in an acceptable running time) is difficult to achieve.
Therefore, this investigation intends to assess if a document clustering system improves it’s effectiveness when integrating a social classification system.
We have analyzed and discussed two methods for clustering documents, based on the k-means algorithm, which allows the integration of social tagging in the clustering process. The first method allows integrating tags directly into the Vector Space Model and the second proposes the integration of tags to select the initial seeds. The first method allows tags to be weighted according to their occurrence in the respective document through the Social Slider parameter. This method was based on a predicting model which states that when using cosine similarity, the documents sharing tags are closer and when not sharing tags, documents are more distant. The second method generated an algorithm named k-C. In addition to allowing initial seed selection through a network of tags, it also changes the way new centroids are calculated in each iteration. The change in centroid calculation came from the use of Euclidian distance and cosine similarity in the k-means clustering algorithm.
Considering algorithm creation and assessment, two algorithms were proposed: the “Automatic Ground Truth” algorithm and the “MCI” algorithm. The first one allows detecting the data structure, if unknown; and the second one is an internal evaluation measure based on cosine similarity between the document closest to each document.
The analyses of the preliminary results suggests that using the first tag integration algorithm method on the VSM has a bigger impact on the k-means algorithm than on the k-C algorithm. Besides, the obtained results show that there is no correlation between the SS parameter choice and the quality of the clusters. In this sense, the tests were made using only the k-C algorithm (without tag integration on the VSM) and the results indicated that using this algorithm results in the creation of more effective clusters
Use of social networks by college students: A study case in the Iberian Peninsula
La investigación que aquí se presenta se desarrolló durante el segundo
semestre del curso académico 2014/2015, y versa sobre el uso del software
social por parte de los estudiantes del Grado de Educación Primaria
pertenecientes a dos instituciones de Educación Superior: Universidad de Jaén
(España) y la Escuela Superior de Educación del Instituto Politécnico de Viana
do Castelo (Portugal). Los objetivos se centran en conocer qué herramientas
de software social utilizan preferentemente el alumnado, cuáles son sus
percepciones sobre estas herramientas para sus posibilidades académicas y si
hay diferencias en función de la universidad de procedencia. Se empleó como
instrumento de recogida de datos un cuestionario. Se concluye que la red
social más conocida y utilizada es Facebook. El hallazgo principal reside en las
diferencias encontradas entre las dos muestras estudiadas en lo concerniente a
su utilizaciónThe research presented here was developed during the second semester of the
academic year 2014/2015, and relates to the use of social software by students
of the Degree in Primary Education from two higher education institutions:
University of Jaén (Spain) and the School of Education at the Polytechnic
Institute of Viana do Castelo (Portugal). The objectives are to know what social
software tools are used mainly by the students, what are their perceptions about
these tools about their academic potential and whether there are differences
depending on the university. It was used as data collection instrument a
questionnaire. We conclude that the best known and used social network is
Facebook. The main finding lies in the differences between the two samples
studied with regard to their us
Uso de las redes sociales por los alumnos universitarios de educación: un estudio de caso de la península ibérica
The research presented here was developed during the second semester of the
academic year 2014/2015, and relates to the use of social software by students
of the Degree in Primary Education from two higher education institutions:
University of Jaén (Spain) and the School of Education at the Polytechnic
Institute of Viana do Castelo (Portugal). The objectives are to know what social
software tools are used mainly by the students, what are their perceptions about
these tools about their academic potential and whether there are differences
depending on the university. It was used as data collection instrument a
questionnaire. We conclude that the best known and used social network is
Facebook. The main finding lies in the differences between the two samples
studied with regard to their use.La investigación que aquí se presenta se desarrolló durante el segundo
semestre del curso académico 2014/2015, y versa sobre el uso del software
social por parte de los estudiantes del Grado de Educación Primaria
pertenecientes a dos instituciones de Educación Superior: Universidad de Jaén
(España) y la Escuela Superior de Educación del Instituto Politécnico de Viana
do Castelo (Portugal). Los objetivos se centran en conocer qué herramientas
de software social utilizan preferentemente el alumnado, cuáles son sus
percepciones sobre estas herramientas para sus posibilidades académicas y si
hay diferencias en función de la universidad de procedencia. Se empleó como
instrumento de recogida de datos un cuestionario. Se concluye que la red
social más conocida y utilizada es Facebook. El hallazgo principal reside en las
diferencias encontradas entre las dos muestras estudiadas en lo concerniente a
su utilización
Tendencias pedagógicas
Resumen tomado de la publicaciónMonográfico con el título: "Educación y Educación Superior en el Contexto Iberoamericano"La investigación que se presenta se desarrolló durante el segundo
semestre del curso académico 2014/2015, y versa sobre el uso del software
social por parte de los estudiantes del Grado de Educación Primaria
pertenecientes a dos instituciones de Educación Superior: Universidad de Jaén
(España) y la Escuela Superior de Educación del Instituto Politécnico de Viana
do Castelo (Portugal). Los objetivos se centran en conocer qué herramientas
de software social utilizan preferentemente el alumnado, cuáles son sus
percepciones sobre estas herramientas para sus posibilidades académicas y si
hay diferencias en función de la universidad de procedencia. Se empleó como
instrumento de recogida de datos un cuestionario. Se concluye que la red
social más conocida y utilizada es Facebook. El hallazgo principal reside en las
diferencias encontradas entre las dos muestras estudiadas en lo concerniente a
su utilización.ES
NEOTROPICAL CARNIVORES: a data set on carnivore distribution in the Neotropics
Mammalian carnivores are considered a key group in maintaining ecological health and can indicate potential ecological integrity in landscapes where they occur. Carnivores also hold high conservation value and their habitat requirements can guide management and conservation plans. The order Carnivora has 84 species from 8 families in the Neotropical region: Canidae; Felidae; Mephitidae; Mustelidae; Otariidae; Phocidae; Procyonidae; and Ursidae. Herein, we include published and unpublished data on native terrestrial Neotropical carnivores (Canidae; Felidae; Mephitidae; Mustelidae; Procyonidae; and Ursidae). NEOTROPICAL CARNIVORES is a publicly available data set that includes 99,605 data entries from 35,511 unique georeferenced coordinates. Detection/non-detection and quantitative data were obtained from 1818 to 2018 by researchers, governmental agencies, non-governmental organizations, and private consultants. Data were collected using several methods including camera trapping, museum collections, roadkill, line transect, and opportunistic records. Literature (peer-reviewed and grey literature) from Portuguese, Spanish and English were incorporated in this compilation. Most of the data set consists of detection data entries (n = 79,343; 79.7%) but also includes non-detection data (n = 20,262; 20.3%). Of those, 43.3% also include count data (n = 43,151). The information available in NEOTROPICAL CARNIVORES will contribute to macroecological, ecological, and conservation questions in multiple spatio-temporal perspectives. As carnivores play key roles in trophic interactions, a better understanding of their distribution and habitat requirements are essential to establish conservation management plans and safeguard the future ecological health of Neotropical ecosystems. Our data paper, combined with other large-scale data sets, has great potential to clarify species distribution and related ecological processes within the Neotropics. There are no copyright restrictions and no restriction for using data from this data paper, as long as the data paper is cited as the source of the information used. We also request that users inform us of how they intend to use the data
Characterisation of microbial attack on archaeological bone
As part of an EU funded project to investigate the factors influencing bone preservation in the archaeological record, more than 250 bones from 41 archaeological sites in five countries spanning four climatic regions were studied for diagenetic alteration. Sites were selected to cover a range of environmental conditions and archaeological contexts. Microscopic and physical (mercury intrusion porosimetry) analyses of these bones revealed that the majority (68%) had suffered microbial attack. Furthermore, significant differences were found between animal and human bone in both the state of preservation and the type of microbial attack present. These differences in preservation might result from differences in early taphonomy of the bones. © 2003 Elsevier Science Ltd. All rights reserved