Search CORE

16,123 research outputs found

Detecting incorrect product names in online sources for product master data

Author: Fleisch Elgar
Karpischek Stephan
Michahelles Florian
Publication venue
Publication date: 29/05/2019
Field of study

The global trade item number (GTIN) is traditionally used to identify trade items and look up corresponding information within industrial supply chains. Recently, consumers have also started using GTINs to access additional product information with mobile barcode scanning applications. Providers of these applications use different sources to provide product names for scanned GTINs. In this paper we analyze data from eight publicly available sources for a set of GTINs scanned by users of a mobile barcode scanning application. Our aim is to measure the correctness of product names in online sources and to quantify the problem of product data quality. We use a combination of string matching and supervised learning to estimate the number of incorrect product names. Our results show that approximately 2% of all product names are incorrect. The applied method is useful for brand owners to monitor the data quality for their products and enables efficient data integration for application providers

RERO DOC Digital Library

The emerging landscape of Social Media Data Collection: anticipating trends and addressing future challenges

Author: Sáez Ortuño Laura
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 21/09/2023
Field of study

[spa] Las redes sociales se han convertido en una herramienta poderosa para crear y compartir contenido generado por usuarios en todo internet. El amplio uso de las redes sociales ha llevado a generar una enorme cantidad de información, presentando una gran oportunidad para el marketing digital. A través de las redes sociales, las empresas pueden llegar a millones de consumidores potenciales y capturar valiosos datos de los consumidores, que se pueden utilizar para optimizar estrategias y acciones de marketing. Los beneficios y desafíos potenciales de utilizar las redes sociales para el marketing digital también están creciendo en interés entre la comunidad académica. Si bien las redes sociales ofrecen a las empresas la oportunidad de llegar a una gran audiencia y recopilar valiosos datos de los consumidores, el volumen de información generada puede llevar a un marketing sin enfoque y consecuencias negativas como la sobrecarga social. Para aprovechar al máximo el marketing en redes sociales, las empresas necesitan recopilar datos confiables para propósitos específicos como vender productos, aumentar la conciencia de marca o fomentar el compromiso y para predecir los comportamientos futuros de los consumidores. La disponibilidad de datos de calidad puede ayudar a construir la lealtad a la marca, pero la disposición de los consumidores a compartir información depende de su nivel de confianza en la empresa o marca que lo solicita. Por lo tanto, esta tesis tiene como objetivo contribuir a la brecha de investigación a través del análisis bibliométrico del campo, el análisis mixto de perfiles y motivaciones de los usuarios que proporcionan sus datos en redes sociales y una comparación de algoritmos supervisados y no supervisados para agrupar a los consumidores. Esta investigación ha utilizado una base de datos de más de 5,5 millones de colecciones de datos durante un período de 10 años. Los avances tecnológicos ahora permiten el análisis sofisticado y las predicciones confiables basadas en los datos capturados, lo que es especialmente útil para el marketing digital. Varios estudios han explorado el marketing digital a través de las redes sociales, algunos centrándose en un campo específico, mientras que otros adoptan un enfoque multidisciplinario. Sin embargo, debido a la naturaleza rápidamente evolutiva de la disciplina, se requiere un enfoque bibliométrico para capturar y sintetizar la información más actualizada y agregar más valor a los estudios en el campo. Por lo tanto, las contribuciones de esta tesis son las siguientes. En primer lugar, proporciona una revisión exhaustiva de la literatura sobre los métodos para recopilar datos personales de los consumidores de las redes sociales para el marketing digital y establece las tendencias más relevantes a través del análisis de artículos significativos, palabras clave, autores, instituciones y países. En segundo lugar, esta tesis identifica los perfiles de usuario que más mienten y por qué. Específicamente, esta investigación demuestra que algunos perfiles de usuario están más inclinados a cometer errores, mientras que otros proporcionan información falsa intencionalmente. El estudio también muestra que las principales motivaciones detrás de proporcionar información falsa incluyen la diversión y la falta de confianza en las medidas de privacidad y seguridad de los datos. Finalmente, esta tesis tiene como objetivo llenar el vacío en la literatura sobre qué algoritmo, supervisado o no supervisado, puede agrupar mejor a los consumidores que proporcionan sus datos en las redes sociales para predecir su comportamiento futuro

Diposit Digital de la Universitat de Barcelona

Evaluation of a Secure Smart Contract Development in Ethereum

Author: Dias Daniel da Rocha Maia
Publication venue
Publication date: 01/01/2020
Field of study

In the Ethereum Blockchain, Smart Contracts are the standard programs that can perform operations in the network using the platform currency (ether) and data. Once these contracts are deployed, the user cannot change their state in the system. This immutability means that, if the contract has any vulnerabilities, it cannot be erased or modified. Ensuring that a contract is safe in the network requires the knowledge of developers to avoid these problems. Many tools explore and analyse the contract security and behaviour and, as a result, detect the vulnerabilities present. This thesis aims to analyse and integrate different security analysis tools in the smart contract development process allowing for better knowledge and awareness of best practices and tools to test and verify contracts, providing a safer smart contract to deploy. The development of the final solution that allows the integration of security analysis tools in smart contracts was performed in two stages. In the first stage, approaches, patterns and tools to develop smart contracts were studied and compared, by running them on a standard set of vulnerable contracts, to understand how effective they are in detecting vulnerabilities. Seven existing tools were found that can support the detection of vulnerabilities during the development process. In the second stage, it is introduced a framework called EthSential. EthSential was designed and implemented to initially integrate the security analysis tools, Mythril, Securify and Slither, with two ways to use, command line and Visual Studio Code. EthSential is published and publicly available through PyPI and Visual Studio Code extensions. To evaluate the solution, two software testing methods and a usability and satisfaction questionnaire were performed. The results were positive in terms of software testing. However, in terms of usability and satisfaction of the developers, the overall results did not meet expectations, concluding that improvements should be made in the future to increase the developers’ satisfaction and usability.Em Ethereum, contratos inteligentes são programas que permitem realizar operações na rede utilizando a moeda digital (ether) e os dados armazenados na mesma. Assim que estes contratos são enviados para a plataforma, o utilizador é impedido de alterar seu estado. Esta imutabilidade faz com que se o contrato tiver alguma vulnerabilidade, não poderá ser apagado ou modificado. Para garantir que um contrato seja considerado seguro, requer um conhecimento dos programadores em lidar com estas vulnerabilidades. Existem muitas ferramentas que exploram e analisam a segurança e o comportamento do contrato de forma a detectar as vulnerabilidades presentes. Esta tese tem como objectivo analisar e integrar diferentes ferramentas de análise de segurança no processo de desenvolvimento de contratos inteligentes. De forma a permitir um melhor conhecimento e consciência das melhores práticas é necessário analisar as ferramentas de teste e verificação de contratos, proporcionando assim um contrato mais seguro. O desenvolvimento da solução final foi realizado em duas fases. Na primeira fase, foram estudadas abordagens, padrões e ferramentas para desenvolver contratos inteligentes, e comparar essas ferramentas, executando-as num conjunto de contratos vulneráveis, para entender o quão eficaz são na detecção de vulnerabilidades. Neste estudo foram encontradas sete ferramentas que podem apoiar a detecção de vulnerabilidades durante o processo de desenvolvimento. Na segunda fase, é apresentada uma aplicação denominada EthSential. A aplicação foi desenhada e implementada de forma a integrar, inicialmente, as ferramentas de análise de segurança Mythril, Securify e Slither. A aplicação permite duas formas de uso, através da linha de comandos e através das extensões do Visual Studio Code. A aplicação foi publicada e disponibilizada publicamente através das ferramentas PyPI e Visual Studio Code. Para avaliar a solução, foram realizados dois métodos de teste de software e um questionário de usabilidade e satisfação. Os resultados finais foram considerados positivos em termos de teste de software. No entanto, em termos de usabilidade e satisfação dos programados, os resultados não correspoderam às expectativas. Concluindo assim que algumas melhorias devem ser feitas no futuro para aumentar a satisfação dos programadores e a respectiva usabilidade da solução

Repositório Científico do Instituto Politécnico do Porto

Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

Author: Pavllo Dario
Piccardi Tiziano
West Robert
Publication venue
Publication date: 07/04/2018
Field of study

We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Internet Filters: A Public Policy Report (Second edition; fully revised and updated)

Author: Ariel Feldman
Christina Cho
Marjorie Heins
Publication venue: National Coalition Against Censorship
Publication date: 05/05/2006
Field of study

No sooner was the Internet upon us than anxiety arose over the ease of accessing pornography and other controversial content. In response, entrepreneurs soon developed filtering products. By the end of the decade, a new industry had emerged to create and market Internet filters....Yet filters were highly imprecise from the beginning. The sheer size of the Internet meant that identifying potentially offensive content had to be done mechanically, by matching "key" words and phrases; hence, the blocking of Web sites for "Middlesex County," or words such as "magna cum laude". Internet filters are crude and error-prone because they categorize expression without regard to its context, meaning, and value. Yet these sweeping censorship tools are now widely used in companies, homes, schools, and libraries. Internet filters remain a pressing public policy issue to all those concerned about free expression, education, culture, and democracy. This fully revised and updated report surveys tests and studies of Internet filtering products from the mid-1990s through 2006. It provides an essential resource for the ongoing debate

IssueLab