Search CORE

52 research outputs found

Spherical microaggregation : anonymizing sparse vector spaces

Author: Abril Castellano Daniel
Navarro-Arribas Guillermo
Torra i Reventós Vicenç
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Unstructured texts are a very popular data type and still widely unexplored in the privacy preserving data mining field. We consider the problem of providing public information about a set of confidential documents. To that end we have developed a method to protect a Vector Space Model (VSM), to make it public even if the documents it represents are private. This method is inspired by microaggregation, a popular protection method from statistical disclosure control, and adapted to work with sparse and high dimensional data sets

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Diposit Digital de Documents de la UAB

Measuring context dependency in birdsong using artificial neural networks

Author: Koda Hiroki
Morita Takashi
Okanoya Kazuo
Tachibana Ryosuke O.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/12/2021
Field of study

Context dependency is a key feature in sequential structures of human language, which requires reference between words far apart in the produced sequence. Assessing how long the past context has an effect on the current status provides crucial information to understand the mechanism for complex sequential behaviors. Birdsongs serve as a representative model for studying the context dependency in sequential signals produced by non-human animals, while previous reports were upper-bounded by methodological limitations. Here, we newly estimated the context dependency in birdsongs in a more scalable way using a modern neural-network-based language model whose accessible context length is sufficiently long. The detected context dependency was beyond the order of traditional Markovian models of birdsong, but was consistent with previous experimental investigations. We also studied the relation between the assumed/auto-detected vocabulary size of birdsong (i.e., fine- vs. coarse-grained syllable classifications) and the context dependency. It turned out that the larger vocabulary (or the more fine-grained classification) is assumed, the shorter context dependency is detected

PubMed Central

Kyoto University Research Information Repository

On the difficulty of automatically detecting irony: beyond a simple case of negation

Author: Reyes Pérez Antonio
Rosso Paolo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2014
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s10115-013-0652-8It is well known that irony is one of the most subtle devices used to, in a refined way and without a negation marker, deny what is literally said. As such, its automatic detection would represent valuable knowledge regarding tasks as diverse as sentiment analysis, information extraction, or decision making. The research described in this article is focused on identifying key values of components to represent underlying characteristics of this linguistic phenomenon. In the absence of a negation marker, we focus on representing the core of irony by means of three conceptual layers. These layers involve 8 different textual features. By representing four available data sets with these features, we try to find hints about how to deal with this unexplored task from a computational point of view. Our findings are assessed by human annotators in two strata: isolated sentences and entire documents. The results show how complex and subjective the task of automatically detecting irony could be.The research work of Paolo Rosso was done in the framework of the European Commission WIQ-EI Web Information Quality Evaluation Initiative (IRSES grant no. 269180) project within the FP 7 Marie Curie People, the DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Reyes Pérez, A.; Rosso, P. (2014). On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowledge and Information Systems. 40(3):595-614. https://doi.org/10.1007/s10115-013-0652-8S595614403Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguistics 34(4):555–596Atserias J, Casas B, Comelles E, González M, Padró L, Padró M (2006) Freeling 1.3: syntactic and semantic services in an open-source nlp library. In: Proceedings of the 5th international conference on language resources and evaluation, pp 48–55Attardo S (2007) Irony as relevant inappropriateness. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 135–174Banerjee S, Agarwal N (2012) Analyzing collective behavior from blogs using swarm intelligence. Knowl Inf Syst. doi: 10.1007/s10115-012-0512-yBeydoun G, Hoffmann A (2012) Dynamic evaluation of the development process of knowledge-based information systems. Knowl Inf Syst. doi: 10.1007/s10115-012-0491-zBurfoot C, Baldwin T (2009) Automatic satire detection: are you having a laugh? In: ACL-IJCNLP ’09: proceedings of the ACL-IJCNLP 2009 conference short papers, pp 161–164Carvalho P, Sarmento L, Silva M, de Oliveira E (2009) Clues for detecting irony in user-generated contents: oh...!! It’s “so easy”; -). In: TSA ’09: proceeding of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion. ACM, Hong Kong, China, pp 53–56Clark H, Gerrig R (1984) On the pretense theory of irony. J Exp Psychol Gen 113(1):121–126Colston H (2007) On necessary conditions for verbal irony comprehension. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 97–134Colston H, Gibbs R (2007) A brief history of irony. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 3–24Curcó C (2007) Irony: negation, echo, and metarepresentation. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 269–296Davidov D, Tsur O, Rappoport A (2010) Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: Proceedings of the 14th conference on computational natural language learning, CoNLL ’10. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 107–116Francisco V, Gervás P, Peinado F (2010) Ontological reasoning for improving the treatment of emotions in text. Knowl Inf Syst 24(2):23Gibbs R (2007) Irony in talk among friends. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 339–360Gibbs R, Colston H (2007) The future of irony studies. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, LondonGiora R (1995) On irony and negation. Discourse Process 19(2):239–264Giora R, Balaban N, Fein O, Alkabets I (2005) Negation as positivity in disguise. In: Colston H, Katz A (eds) Figurative language comprehension: social and cultural influences. Erlbaum, Hillsdale, pp 233–258Giora R, Federman S, Kehat A, Fein O, Sabah H (2005) Irony aptness. Humor 18:23–39Grice H (1975) Logic and conversation. In: Cole P, Morgan JL (eds) Syntax and semantics, vol 3. Academic Press, New York, pp 41–58Horn L, Kato Y (2000) Introduction: negation and polarity at the millennium. In: Horn L, Kato Y (eds) Studies in negation and polarity. Oxford University Press, Oxford, pp 1–19Kaup B, Lüdtke J, Zwaan R (2006) Processing negated sentences with contradictory predicates: is a door that is not open mentally closed? J Pragmat 38:1033–1050Kisilevich S, Ang CS, Last M (2011) Large-scale analysis of self-disclosure patterns among online social networks users: A Russian context. Knowl Inf Syst. doi: 10.1007/s10115-011-0443-zKreuz R (2001) Using figurative language to increase advertising effectiveness. In: Office of Naval Research Military Personnel Research Science Workshop. University of Memphis, Memphis, TNKumon-Nakamura S, Glucksberg S, Brown M (2007) How about another piece of pie: the allusional pretense theory of discourse irony. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, LondonLangacker R (1991) Concept, image and symbol, the cognitive basis of grammar. Mounton de Gruyter, BerlinLiu J, Wang K (2012) Anonymizing bag-valued sparse data by semantic similarity-based clustering. Knowl Inf Syst. doi: 10.1007/s10115-012-0515-8Lucariello J (2007) Situational irony: a concept of events gone away. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 467–498Miller G (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACL, pp 271–278Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP). Association for Computational Linguistics, Morristown, NJ, USA, pp 79–86Pedersen T, Patwardhan S, Michelizzi J (2004) Wordnet:similarity—measuring the relatedness of concepts. In: Proceeding of the 9th national conference on artificial intelligence (AAAI-04). Association for Computational Linguistics, Morristown, NJ, USA, pp 1024–1025Reyes A, Rosso P (2011) Mining subjective knowledge from customer reviews: a specific case of irony detection. In: Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (WASSA 2.011). Association for Computational Linguistics, pp 118–124Reyes A, Rosso P (2012) Making objective decisions from subjective data: detecting irony in customers reviews. Decis Support Syst 53(4):754–760. doi: 10.1016/j.dss.2012.05.027Reyes A, Rosso P, Buscaldi D (2012) From humor recognition to irony detection: the figurative language of social media. Data Knowl Eng 74:1–12. doi: 10.1016/j.datak.2012.02.005Sarmento L, Carvalho P, Silva M, de Oliveira E (2009) Automatic creation of a reference corpus for political opinion mining in user-generated content, In: TSA ’09: proceeding of the 1st international CIKM workshop on topic-sentiment analysis for mass opinion. ACM, Hong Kong, China, pp 29–36Sperber D, Wilson D (1992) On verbal irony. Lingua 87:53–76Tsur O, Davidov D, Rappoport A (2010) ICWSM—a great catchy name: semi-supervised recognition of sarcastic sentences in online product reviews. In: Cohen WW, Gosling S (eds) Proceedings of the 4t international AAAI conference on weblogs and social media. The AAAI Press, Washington, DC, pp 162–169Utsumi A (1996) A unified theory of irony and its computational formalization. In: Proceedings of the 16th conference on computational linguistics. Association for Computational Linguistics, Morristown, NJ, USA, pp 962–967Veale T, Hao Y (2009) Support structures for linguistic creativity: a computational analysis of creative irony in similes. In: Proceedings of CogSci 2009, the 31st annual meeting of the cognitive science society, pp 1376–1381Veale T, Hao Y (2010) Detecting ironic intent in creative comparisons. In: Proceedings of 19th European conference on artificial intelligence—ECAI 2010. IOS Press, Amsterdam, The Netherlands, pp 765–770Whissell C (2009) Using the revised dictionary of affect in language to quantify the emotional undertones of samples of natural language. Psychol Rep 105(2):509–521Wilson D, Sperber D (2007) On verbal irony. In: Gibbs R, Colston H (eds) Irony in language and thought. Taylor and Francis Group, London, pp 35–56Zagibalov T, Belyatskaya K, Carroll J (2010) Comparable English-Russian book review corpora for sentiment analysis. In: Proceedings of the 1st workshop on computational approaches to subjectivity and sentiment analysis. Lisbon, Portugal, pp 67–7

Crossref

RiuNet

Pattern-Based Vulnerability Discovery

Author: Yamaguchi Fabian
Publication venue
Publication date: 30/10/2015
Field of study

Georg-August-University Göttingen

Contribution to privacy-enhancing tecnologies for machine learning applications

Author: Rodríguez Hoyos Ana Fernanda
Publication venue: Universitat Politècnica de Catalunya
Publication date: 21/10/2020
Field of study

For some time now, big data applications have been enabling revolutionary innovation in every aspect of our daily life by taking advantage of lots of data generated from the interactions of users with technology. Supported by machine learning and unprecedented computation capabilities, different entities are capable of efficiently exploiting such data to obtain significant utility. However, since personal information is involved, these practices raise serious privacy concerns. Although multiple privacy protection mechanisms have been proposed, there are some challenges that need to be addressed for these mechanisms to be adopted in practice, i.e., to be “usable” beyond the privacy guarantee offered. To start, the real impact of privacy protection mechanisms on data utility is not clear, thus an empirical evaluation of such impact is crucial. Moreover, since privacy is commonly obtained through the perturbation of large data sets, usable privacy technologies may require not only preservation of data utility but also efficient algorithms in terms of computation speed. Satisfying both requirements is key to encourage the adoption of privacy initiatives. Although considerable effort has been devoted to design less “destructive” privacy mechanisms, the utility metrics employed may not be appropriate, thus the wellness of such mechanisms would be incorrectly measured. On the other hand, despite the advent of big data, more efficient approaches are not being considered. Not complying with the requirements of current applications may hinder the adoption of privacy technologies. In the first part of this thesis, we address the problem of measuring the effect of k-anonymous microaggregation on the empirical utility of microdata. We quantify utility accordingly as the accuracy of classification models learned from microaggregated data, evaluated over original test data. Our experiments show that the impact of the de facto microaggregation standard on the performance of machine-learning algorithms is often minor for a variety of data sets. Furthermore, experimental evidence suggests that the traditional measure of distortion in the community of microdata anonymization may be inappropriate for evaluating the utility of microaggregated data. Secondly, we address the problem of preserving the empirical utility of data. By transforming the original data records to a different data space, our approach, based on linear discriminant analysis, enables k-anonymous microaggregation to be adapted to the application domain of data. To do this, first, data is rotated (projected) towards the direction of maximum discrimination and, second, scaled in this direction, penalizing distortion across the classification threshold. As a result, data utility is preserved in terms of the accuracy of machine learned models for a number of standardized data sets. Afterwards, we propose a mechanism to reduce the running time for the k-anonymous microaggregation algorithm. This is obtained by simplifying the internal operations of the original algorithm. Through extensive experimentation over multiple data sets, we show that the new algorithm gets significantly faster. Interestingly, this remarkable speedup factor is achieved with no additional loss of data utility.Les aplicacions de big data impulsen actualment una accelerada innovació aprofitant la gran quantitat d’informació generada a partir de les interaccions dels usuaris amb la tecnologia. Així, qualsevol entitat és capaç d'explotar eficientment les dades per obtenir utilitat, emprant aprenentatge automàtic i capacitats de còmput sense precedents. No obstant això, sorgeixen en aquest escenari serioses preocupacions pel que fa a la privacitat dels usuaris ja que hi ha informació personal involucrada. Tot i que s'han proposat diversos mecanismes de protecció, hi ha alguns reptes per a la seva adopció en la pràctica, és a dir perquè es puguin utilitzar. Per començar, l’impacte real d'aquests mecanismes en la utilitat de les dades no esta clar, raó per la qual la seva avaluació empírica és important. A més, considerant que actualment es manegen grans volums de dades, una privacitat usable requereix, no només preservació de la utilitat de les dades, sinó també algoritmes eficients en temes de temps de còmput. És clau satisfer tots dos requeriments per incentivar l’adopció de mesures de privacitat. Malgrat que hi ha diversos esforços per dissenyar mecanismes de privacitat menys "destructius", les mètriques d'utilitat emprades no serien apropiades, de manera que aquests mecanismes de protecció podrien estar sent incorrectament avaluats. D'altra banda, tot i l’adveniment del big data, la investigació existent no s’enfoca molt en millorar la seva eficiència. Lamentablement, si els requisits de les aplicacions actuals no es satisfan, s’obstaculitzarà l'adopció de tecnologies de privacitat. A la primera part d'aquesta tesi abordem el problema de mesurar l'impacte de la microagregació k-Gnónima en la utilitat empírica de microdades. Per això, quantifiquem la utilitat com la precisió de models de classificació obtinguts a partir de les dades microagregades. i avaluats sobre dades de prova originals. Els experiments mostren que l'impacte de l’algoritme de rmicroagregació estàndard en el rendiment d’algoritmes d'aprenentatge automàtic és usualment menor per a una varietat de conjunts de dades avaluats. A més, l’evidència experimental suggereix que la mètrica tradicional de distorsió de les dades seria inapropiada per avaluar la utilitat empírica de dades microagregades. Així també estudiem el problema de preservar la utilitat empírica de les dades a l'ésser anonimitzades. Transformant els registres originaIs de dades en un espai de dades diferent, el nostre enfocament, basat en anàlisi de discriminant lineal, permet que el procés de microagregació k-anònima s'adapti al domini d’aplicació de les dades. Per això, primer, les dades són rotades o projectades en la direcció de màxima discriminació i, segon, escalades en aquesta direcció, penalitzant la distorsió a través del llindar de classificació. Com a resultat, la utilitat de les dades es preserva en termes de la precisió dels models d'aprenentatge automàtic en diversos conjunts de dades. Posteriorment, proposem un mecanisme per reduir el temps d'execució per a la microagregació k-anònima. Això s'aconsegueix simplificant les operacions internes de l'algoritme escollit Mitjançant una extensa experimentació sobre diversos conjunts de dades, vam mostrar que el nou algoritme és bastant més ràpid. Aquesta acceleració s'aconsegueix sense que hi ha pèrdua en la utilitat de les dades. Finalment, en un enfocament més aplicat, es proposa una eina de protecció de privacitat d'individus i organitzacions mitjançant l'anonimització de dades sensibles inclosos en logs de seguretat. Es dissenyen diferents mecanismes d'anonimat per implementar-los en base a la definició d'una política de privacitat, en el context d'un projecte europeu que té per objectiu construir un sistema de seguretat unificat.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Contribution to privacy-enhancing tecnologies for machine learning applications

Author: Rodríguez Hoyos Ana Fernanda
Publication venue: Universitat Politècnica de Catalunya
Publication date: 21/10/2020
Field of study

Tesis Doctorals en Xarxa

Privacy preserving publishing of hierarchical data

Author: Özalp İsmet
Publication venue
Publication date: 01/01/2017
Field of study

Many applications today rely on storage and management of semi-structured information, e.g., XML databases and document-oriented databases. This data often has to be shared with untrusted third parties, which makes individuals' privacy a fundamental problem. In this thesis, we propose anonymization techniques for privacy preserving publishing of hierarchical data. We show that the problem of anonymizing hierarchical data poses unique challenges that cannot be readily solved by existing mechanisms. We addressed these challenges by utilizing two major privacy techniques; generalization and anatomization. Data generalization encapsulates data by mapping nearly low-level values (e.g., influenza) to higher-level concepts (e.g., respiratory system diseases). Using generalizations and suppression of data values, we revised two standards for privacy protection: kanonymity that hides individuals within groups of k members and `-diversity that bounds the probability of linking sensitive values with individuals.We then apply these standards to hierarchical data and present utility-aware algorithms that enforce the standards. To evaluate our algorithms and their heuristics, we experiment on synthetic and real datasets obtained from two universities. Our experiments show that we significantly outperform related methods that provide comparable privacy guarantees. Data anatomization masks the link between identifying attributes and sensitive attributes. This mechanism removes the necessity for generalization and opens up the possibility for higher utility. While this is so, anatomization has not been proposed for hierarchical data where utility is a serious concern due to high dimensionality. In this thesis we show, how one can perform the non-trivial task of defining anatomization in the context of hierarchical data. Moreover, we extend the definition of classical `-diversity and introduce (p,m)-privacy that bounds the probability of being linked to more than m occurrences of any sensitive values by p. Again, in our experiments we have observed that even under stricter privacy conditions our method performs exemplary

Sabanci University Research Database

Neural and Non-Neural Approaches to Authorship Attribution

Author: Sari Yunita
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 05/09/2018
Field of study

White Rose E-theses Online

Semantic attack on anonymised transaction data

Author: Alshuhail Asma
Publication venue
Publication date
Field of study

Publishing data about individuals is a double-edged sword; it can provide a significant benefit for a range of organisations to help understand issues concerning individuals, and improve services they offer. However, it can also represent a serious threat to individuals’ privacy. To overcome these threats, researchers have worked on developing anonymisation methods. However, the anonymisation methods do not take into consideration the semantic relationships and meaning of data, which can be exploited by attackers to expose protected data. In our work, we study a specific anonymisation method called disassociation and investigate if it provides adequate protection for transaction data. The disassociation method hides sensitive links between transaction’s items by dividing them into chunks. We propose a de-anonymisation approach to attacking transaction data anonymised by the disassociated data. The approach exploits the semantic relationships between transaction items to reassociate them. Our findings reveal that the disassociation method may not effectively protect transaction data. Our de-anonymisation approach can recombine approximately 60% of the disassociated items and can break the privacy of nearly 70% of the protected itemets in disassociated transactions

Online Research @ Cardiff