    Leveraging Recursive Neural Networks on Dependency Trees for Online-Toxicity Detection on Twitter

    Current social dynamics are strongly linked to what happens on Social Media. Opinions, emotions, and how people perceive the world around them are strongly influenced by what they see or read on Social Platforms. We can insert in this field Social Media phenomena like Fake News, Hate Speech, Propaganda, Race and Gender biases. All these events are considered to be among the most significant problems for social stability and one of the most effective means of influencing people. Much work has been done by researchers from different areas of Computer Science, in particular from Natural Language Processing and Network Analysis, focusing on textual information in the first case (articles, posts, comments, etc.) or graph structures and node activities in the second (detection of malicious spreaders, polarization, etc.). In this thesis, we will clarify what are the main problems in this area of research, known by most as Computational Social Science, providing the theoretical basis of the most used tools. Then, we will go into specifics dealing with the topic of the detection of toxic messages on Twitter at the level of the single tweet, comparing different Deep Learning models, among which some innovative solutions proposed by us, trying to answer the following question: can Natural Language syntax be useful in such task? Unlike, for instance, Sentiment Analysis, we have not yet achieved high performance, especially because the models typically used, given a sentence, turn out to focus a lot on the occurring words rather than on the meaning of the sentence itself. Our idea starts from the assumption that exploiting syntactic information can be effective to overcome this obstacle. In the end, we will provide the results of our experiments and possible related interpretations, proposing scientific and ethical reflections, and finally try to convince the reader on why research should invest efforts on this topic, and what future scenarios we should focus on

    Advanced deep learning for medical image segmentation:Towards global and data-efficient learning

    Advanced deep learning for medical image segmentation:Towards global and data-efficient learning

    (at)america.jp: Identity, nationalism, and power on the Internet, 1969-2000

    america.jp explores identity, nationalism, and power on the Internet between 1969 and 2000 through a cultural analysis of Internet code and the creative processes behind it. The dissertation opens with an examination of a real-time Internet Blues jam that linked Japanese and American musicians between Tokyo and Mississippi in 1999. The technological, cultural, and linguistic uncertainties that characterized the Internet jam, combined with the inventive reactions of the musicians who participated, help to introduce the fundamental conceptual question of the dissertation: is code a cultural product and if so can the Internet be considered a distinctly American technology?;A comparative study of the Internet\u27s origins in the United States and Japan finds that code is indeed a cultural entity but that it is a product not of one nation, but of many. A cultural critique of the Internet\u27s domain name conventions explores the heavily-gendered creation of code and the institutional power that supports it. An ethnography of the Internet\u27s managing organization, The Internet Corporation for Assigned Names and Numbers (ICANN), investigates conflicts and identity formation within and among nations at a time when new Internet technologies have blurred humans\u27 understanding of geographic boundaries. In the year 2000, an effort to prevent United States domination of ICANN produced unintended consequences: disputes about the definition of geographic regions and an eruption of anxiety, especially in China, that the Asian seat on the ICANN board would be dominated by Japan. These incidents indicate that the Internet simultaneously destabilizes identity and ossifies it. In this paradoxical situation, cultures and the people in them are forced to reconfigure the boundaries that circumscribe who they think they are

    Alzheimer’s Dementia Recognition Through Spontaneous Speech

    OddAssist - An eSports betting recommendation system

    It is globally accepted that sports betting has been around for as long as the sport itself. Back in the 1st century, circuses hosted chariot races and fans would bet on who they thought would emerge victorious. With the evolution of technology, sports evolved and, mainly, the bookmakers evolved. Due to the mass digitization, these houses are now available online, from anywhere, which makes this market inherently more tempting. In fact, this transition has propelled the sports betting industry into a multi-billion-dollar industry that can rival the sports industry. Similarly, younger generations are increasingly attached to the digital world, including electronic sports – eSports. In fact, young men are more likely to follow eSports than traditional sports. Counter-Strike: Global Offensive, the videogame on which this dissertation focuses, is one of the pillars of this industry and during 2022, 15 million dollars were distributed in tournament prizes and there was a peak of 2 million concurrent viewers. This factor, combined with the digitization of bookmakers, make the eSports betting market extremely appealing for exploring machine learning techniques, since young people who follow this type of sports also find it easy to bet online. In this dissertation, a betting recommendation system is proposed, implemented, tested, and validated, which considers the match history of each team, the odds of several bookmakers and the general feeling of fans in a discussion forum. The individual machine learning models achieved great results by themselves. More specifically, the match history model managed an accuracy of 66.66% with an expected calibration error of 2.10% and the bookmaker odds model, with an accuracy of 65.05% and a calibration error of 2.53%. Combining the models through stacking increased the accuracy to 67.62% but worsened the expected calibration error to 5.19%. On the other hand, merging the datasets and training a new, stronger model on that data improved the accuracy to 66.81% and had an expected calibration error of 2.67%. The solution is thoroughly tested in a betting simulation encapsulating 2500 matches. The system’s final odd is compared with the odds of the bookmakers and the expected long-term return is computed. A bet is made depending on whether it is above a certain threshold. This strategy called positive expected value betting was used at multiple thresholds and the results were compared. While the stacking solution did not perform in a betting environment, the match history model prevailed with profits form 8% to 90%; the odds model had profits ranging from 13% to 211%; and the dataset merging solution profited from 11% to 77%, all depending on the minimum expected value thresholds. Therefore, from this work resulted several machine learning approaches capable of profiting from Counter Strike: Global Offensive bets long-term.É globalmente aceite que as apostas desportivas existem há tanto tempo quanto o próprio desporto. Mesmo no primeiro século, os circos hospedavam corridas de carruagens e os fãs apostavam em quem achavam que sairia vitorioso, semelhante às corridas de cavalo de agora. Com a evolução da tecnologia, os desportos foram evoluindo e, principalmente, evoluíram as casas de apostas. Devido à onda de digitalização em massa, estas casas passaram a estar disponíveis online, a partir de qualquer sítio, o que torna este mercado inerentemente mais tentador. De facto, esta transição propulsionou a indústria das apostas desportivas para uma indústria multibilionária que agora pode mesmo ser comparada à indústria dos desportos. De forma semelhante, gerações mais novas estão cada vez mais ligadas ao digital, incluindo desportos digitais – eSports. Counter-Strike: Global Offensive, o videojogo sobre o qual esta dissertação incide, é um dos grandes impulsionadores desta indústria e durante 2022, 15 milhões de dólares foram distribuídos em prémios de torneios e houve um pico de espectadores concorrentes de 2 milhões. Embora esta realidade não seja tão pronunciada em Portugal, em vários países, jovens adultos do sexo masculino, têm mais probabilidade de acompanharem eSports que desportos tradicionais. Este fator, aliado à digitalização das casas de apostas, tornam o mercado de apostas em eSports muito apelativo para a exploração técnicas de aprendizagem automática, uma vez que os jovens que acompanham este tipo de desportos têm facilidade em apostar online. Nesta dissertação é proposto, implementado, testado e validado um sistema de recomendação de apostas que considera o histórico de resultados de cada equipa, as cotas de várias casas de apostas e o sentimento geral dos fãs num fórum de discussão – HLTV. Deste modo, foram inicialmente desenvolvidos 3 sistemas de aprendizagem automática. Para avaliar os sistemas criados, foi considerado o período de outubro de 2020 até março de 2023, o que corresponde a 2500 partidas. Porém, sendo o período de testes tão extenso, existe muita variação na competitividade das equipas. Deste modo, para evitar que os modelos ficassem obsoletos durante este período de teste, estes foram re-treinados no mínimo uma vez por mês durante a duração do período de testes. O primeiro sistema de aprendizagem automática incide sobre a previsão a partir de resultados anteriores, ou seja, o histórico de jogos entre as equipas. A melhor solução foi incorporar os jogadores na previsão, juntamente com o ranking da equipa e dando mais peso aos jogos mais recentes. Esta abordagem, utilizando regressão logística teve uma taxa de acerto de 66.66% com um erro expectável de calibração de 2.10%. O segundo sistema compila as cotas das várias casas de apostas e faz previsões com base em padrões das suas variações. Neste caso, incorporar as casas de aposta tendo atingido uma taxa de acerto de 65.88% utilizando regressão logística, porém, era um modelo pior calibrado que o modelo que utilizava a média das cotas utilizando gradient boosting machine, que exibiu uma taxa de acerto de 65.06%, mas melhores métricas de calibração, com um erro expectável de 2.53%. O terceiro sistema, baseia-se no sentimento dos fãs no fórum HLTV. Primeiramente, é utilizado o GPT 3.5 para extrair o sentimento de cada comentário, com uma taxa geral de acerto de 84.28%. No entanto, considerando apenas os comentários classificados como conclusivos, a taxa de acerto é de 91.46%. Depois de classificados, os comentários são depois passados a um modelo support vector machine que incorpora o comentador e a sua taxa de acerto nas partidas anteriores. Esta solução apenas previu corretamente 59.26% dos casos com um erro esperado de calibração de 3.22%. De modo a agregar as previsões destes 3 modelos, foram testadas duas abordagens. Primeiramente, foi testado treinar um novo modelo a partir das previsões dos restantes (stacking), obtendo uma taxa de acerto de 67.62%, mas com um erro de calibração esperado de 5.19%. Na segunda abordagem, por outro lado, são agregados os dados utilizados no treino dos 3 modelos individuais, e é treinado um novo modelo com base nesse conjunto de dados mais complexo. Esta abordagem, recorrendo a support vector machine, obteve uma taxa de acerto mais baixa, 66.81% mas um erro esperado de calibração mais baixo, 2.67%. Por fim, as abordagens são postas à prova através de um simulador de apostas, onde sistema cada faz uma previsão e a compara com a cota oferecia pelas casas de apostas. A simulação é feita para vários patamares de retorno mínimo esperado, onde os sistemas apenas apostam caso a taxa esperada de retorno da cota seja superior à do patamar. Esta cota final é depois comparada com as cotas das casas de apostas e, caso exista uma casa com uma cota superior, uma aposta é feita. Esta estratégia denomina-se de apostas de valor esperado positivo, ou seja, apostas cuja cota é demasiado elevada face à probabilidade de se concretizar e que geram lucros a longo termo. Nesta simulação, os melhores resultados, para uma taxa de mínima de 5% foram os modelos criados a partir das cotas das casas de apostas, com lucros entre os 13% e os 211%; o dos dados históricos que lucrou entre 8% e 90%; e por fim, o modelo composto, com lucros entre os 11% e os 77%. Assim, deste trabalho resultaram diversos sistemas baseados em machine learning capazes de obter lucro a longo-termo a apostar em Counter Strike: Global Offensive

    Beyond Quantity: Research with Subsymbolic AI

    How do artificial neural networks and other forms of artificial intelligence interfere with methods and practices in the sciences? Which interdisciplinary epistemological challenges arise when we think about the use of AI beyond its dependency on big data? Not only the natural sciences, but also the social sciences and the humanities seem to be increasingly affected by current approaches of subsymbolic AI, which master problems of quality (fuzziness, uncertainty) in a hitherto unknown way. But what are the conditions, implications, and effects of these (potential) epistemic transformations and how must research on AI be configured to address them adequately

    Designing a Patient-Centered Clinical Workflow to Assess Cyberbully Experiences of Youths in the U.S. Healthcare System

    Cyberbullying or online harassment is often defined as when someone repeatedly and intentionally harasses, mistreats, or makes fun of others aiming to scare, anger or shame them using electronic devices [296]. Youths experiencing cyberbullying report higher levels of anxiety and depression, mental distress, suicide thoughts, and substance abuse than their non-bullied peers [360, 605, 261, 354]. Even though bullying is associated with significant health problems, to date, very little youth anti-bullying efforts are initiated and directed in clinical settings. There is presently no standardized procedure or workflow across health systems for systematically assessing cyberbullying or other equally dangerous online activities among vulnerable groups like children or adolescents [599]. Therefore, I developed a series of research projects to link digital indicators of cyberbullying or online harassment to clinical practices by advocating design considerations for a patient-centered clinical assessment and workflow that addresses patients’ needs and expectations to ensure quality care. Through this dissertation, I aim to answer these high-level research questions:RQ1. How does the presence of severe online harassment on online platforms contribute to negative experiences and risky behaviors within vulnerable populations? RQ2. How efficient is the current mechanism of screening these risky online negative experiences and behaviors, specifically related to cyberbully, within at-risk populations like adolescent in clinical settings? RQ3. How might evidence of activities and negative harassing experiences on online platforms best be integrated into electronic health records during clinical treatment? I first explore how harassment is presented within different social media platforms from diverse contexts and cultural norms (study 1,2, and 3); next, by analyzing actual patient data, I address current limitations in the screening process in clinical settings that fail to efficiently address core aspect of cyberbullying and their consequences within adolescent patients (study 4 and 5); finally, connecting all my findings, I recommend specific design guidelines for a refined screening tool and structured processes for implementation and integration of the screened data into patients’ electronic health records (EHRs) for better patient assessment and treatment outcomes around cyberbully within adolescent patients (study 6)

    Word Knowledge and Word Usage

    Word storage and processing define a multi-factorial domain of scientific inquiry whose thorough investigation goes well beyond the boundaries of traditional disciplinary taxonomies, to require synergic integration of a wide range of methods, techniques and empirical and experimental findings. The present book intends to approach a few central issues concerning the organization, structure and functioning of the Mental Lexicon, by asking domain experts to look at common, central topics from complementary standpoints, and discuss the advantages of developing converging perspectives. The book will explore the connections between computational and algorithmic models of the mental lexicon, word frequency distributions and information theoretical measures of word families, statistical correlations across psycho-linguistic and cognitive evidence, principles of machine learning and integrative brain models of word storage and processing. Main goal of the book will be to map out the landscape of future research in this area, to foster the development of interdisciplinary curricula and help single-domain specialists understand and address issues and questions as they are raised in other disciplines