316 research outputs found
DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data
Use of online social networks (OSNs) undoubtedly brings the world closer. OSNs like Twitter provide a space for expressing one’s opinions in a public platform. This great potential is misused by the creation of bot accounts, which spread fake news and manipulate opinions. Hence, distinguishing genuine human accounts from bot accounts has become a pressing issue for researchers. In this paper, we propose a framework based on deep learning to classify Twitter accounts as either ‘human’ or ‘bot.’ We use the information from user profile metadata of the Twitter account like description, follower count and tweet count. We name the framework ‘DeeProBot,’ which stands for Deep Profile-based Bot detection framework. The raw text from the description field of the Twitter account is also considered a feature for training the model by embedding the raw text using pre-trained Global Vectors (GLoVe) for word representation. Using only the user profile-based features considerably reduces the feature engineering overhead compared with that of user timeline-based features like user tweets and retweets. DeeProBot handles mixed types of features including numerical, binary, and text data, making the model hybrid. The network is designed with long short-term memory (LSTM) units and dense layers to accept and process the mixed input types. The proposed model is evaluated on a collection of publicly available labeled datasets. We have designed the model to make it generalizable across different datasets. The model is evaluated using two ways: testing on a hold-out set of the same dataset; and training with one dataset and testing with a different dataset. With these experiments, the proposed model achieved AUC as high as 0.97 with a selected set of features
Crowdsensing-driven route optimisation algorithms for smart urban mobility
Urban rörlighet anses ofta vara en av de främsta möjliggörarna för en hållbar statsutveckling.
Idag skulle det dock kräva ett betydande skifte mot renare och effektivare stadstransporter vilket skulle stödja ökad social och ekonomisk koncentration av resurser i städerna. En viktig prioritet för städer runt om i världen är att stödja medborgarnas rörlighet inom stadsmiljöer medan samtidigt minska trafikstockningar, olyckor och föroreningar. Att utveckla en effektivare och grönare (eller med ett ord; smartare) stadsrörlighet är en av de svåraste problemen att bemöta för stora metropoler. I denna avhandling närmar vi oss problemet från det snabba utvecklingsperspektivet av ITlandskapet i städer vilket möjliggör byggandet av rörlighetslösningar utan stora stora investeringar eller sofistikerad sensortenkik.
I synnerhet föreslår vi utnyttjandet av den mobila rörlighetsavkännings, eng. Mobile Crowdsensing (MCS), paradigmen i vilken befolkningen exploaterar sin mobilkommunikation och/eller mobilasensorer med syftet att frivilligt samla, distribuera, lokalt processera och analysera geospecifik information. Rörlighetavkänningssdata (t.ex. händelser, trafikintensitet, buller och luftföroreningar etc.) inhämtad från frivilliga i befolkningen kan ge värdefull information om aktuella rörelsesförhållanden i stad vilka, med adekvata databehandlingsalgoriter, kan användas för att planera människors
rörelseflöden inom stadsmiljön.
Såtillvida kombineras i denna avhandling två mycket lovande smarta rörlighetsmöjliggörare, eng. Smart Mobility Enablers, nämligen MCS och rese/ruttplanering.
Vi kan därmed till viss utsträckning sammanföra forskningsutmaningar från dessa två delar. Vi väljer att separera våra forskningsmål i två delar, dvs forskningssteg: (1) arkitektoniska utmaningar vid design av MCS-system och (2) algoritmiska utmaningar för tillämpningar av MCS-driven ruttplanering.
Vi ämnar att visa en logisk forskningsprogression över tiden, med avstamp i mänskligt dirigerade rörelseavkänningssystem som MCS och ett avslut i automatiserade ruttoptimeringsalgoritmer
skräddarsydda för specifika MCS-applikationer. Även om vi förlitar oss på heuristiska lösningar och algoritmer för NP-svåra ruttproblem förlitar vi oss på äkta applikationer med syftet att visa på fördelarna med algoritm- och infrastrukturförslagen.La movilidad urbana es considerada una de las principales desencadenantes de un desarrollo urbano sostenible. Sin embargo, hoy en día se requiere una transición hacia un transporte urbano más limpio y más eficiente que soporte una concentración de recursos sociales y económicos cada vez mayor en las ciudades. Una de las principales prioridades para las ciudades de todo el mundo es facilitar la movilidad de los ciudadanos dentro de los entornos urbanos, al mismo tiempo que se reduce la congestión, los accidentes y la contaminación. Sin embargo, desarrollar una movilidad urbana más eficiente y más verde (o en una palabra, más inteligente) es uno de los temas más difíciles de afrontar para las grandes áreas metropolitanas. En esta tesis, abordamos este problema desde la perspectiva de un panorama TIC en rápida evolución que nos permite construir movilidad sin la necesidad de grandes inversiones ni sofisticadas tecnologías de sensores. En particular, proponemos aprovechar el paradigma Mobile Crowdsensing (MCS) en el que los ciudadanos utilizan sus teléfonos móviles y dispositivos, para nosotros recopilar, procesar y analizar localmente información georreferenciada, distribuida voluntariamente. Los datos de movilidad recopilados de ciudadanos que voluntariamente quieren compartirlos (por ejemplo, eventos, intensidad del tráfico, ruido y contaminación del aire, etc.) pueden proporcionar información valiosa sobre las condiciones de movilidad actuales en la ciudad, que con el algoritmo de procesamiento de datos adecuado, pueden utilizarse para enrutar y gestionar el flujo de gente en entornos urbanos. Por lo tanto, en esta tesis combinamos dos prometedoras fuentes de movilidad inteligente: MCS y la planificación de viajes/rutas, uniendo en cierta medida los distintos desafíos de investigación. Hemos dividido nuestros objetivos de investigación en dos etapas: (1) Desafíos arquitectónicos en el diseño de sistemas MCS y (2) Desafíos algorítmicos en la planificación de rutas aprovechando la información del MCS. Nuestro objetivo es demostrar una progresión lógica de la investigación a lo largo del tiempo, comenzando desde los fundamentos de los sistemas de detección centrados en personas, como el MCS, hasta los algoritmos de optimización de rutas diseñados específicamente para la aplicación de estos. Si bien nos centramos en algoritmos y heurísticas para resolver problemas de enrutamiento de clase NP-hard, utilizamos ejemplos de aplicaciones en el mundo real para mostrar las ventajas de los algoritmos e infraestructuras propuestas
Recommended from our members
Exploring Societal Computing based on the Example of Privacy
Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. This thesis will consist of the following four projects that aim to address the issues of privacy and software engineering.
First, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives.
Second, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. As social network platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 59 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 70% of respondents found visualizations using crowd sourced data useful for understanding privacy settings, and 80% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls.
Third, as software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective.
Fourth, approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable - systems where privacy could be achieved "for free", i.e., without having to spend extra computational effort. We describe how privacy can indeed be achieved for free an accidental and beneficial side effect of doing some existing computation - in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate.
Finally, we generalize the problem of privacy and its tradeoffs. As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, Societal Computing, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole
Yhteisöllinen tiedonrakentelu ja verkottunut asiantuntijuus Twitter-palvelussa : Case #okfest
Aims. This qualitative study explored a phenomenon of epistemic communality around a Twitter hashtag. The primary aim of the study was to explore communal epistemic production on the Twitter platform, especially in the context of a mutually shared hashtag. The study explored the peer-production of knowledge and epistemic structures in the context of a specialist domain collaborating in the open Web. The secondary aim was to explore how Twitter functions as a platform for networked expertise and as a public agora for practitioners' expert discourse. This nascent mode of cultural production leads to the development of expert cultures on Twitter and in the open Web. This creates new contexts for informal collaborative learning and cultral production potentially answering some of the competence challenges presented by the 21st century.
Methods. The hashtag #okfest was launched for the 'Open Knowledge Festival' conference held in Helsinki, Finland (17–22.9.2012). The participants of the study were open knowledge practitioners who participated in the hashtag discourse of #okfest on Twitter. All public tweets containing the string '#okfest' were collected as data. Tweets were analyzed with qualitative thematic analysis exploring the epistemic contributions either included in the tweets or as hyperlinked attachments.
Results and conclusions. The analysis indicated how the hashtag was appropriated to serve as a node of communal knowledge sharing beyond mere reporting from the conference. The analysis observed six themes of communal knowledge building in the hashtag space. The communal epistemic activities in #okfest were likened to the properties of a community of practice (Wenger, 1998). A network of practitioners engaging in a mutual domain creates a dynamic 'social learning system' combining social interaction with the production and dissemination of knowledge. The study yielded a novel theoretical concept of 'expert microblogging', recognized as a significant genre of cultural production in a specialist domain on Twitter and in the open Web. Finally the Twitter platform was ascertained as a site for the manifestation of cultures of networked expertise.Tavoitteet. Tämä laadullinen tutkielma tutki episteemistä yhteisöllisyyttä Twitter-palvelussa hashtag-aihetunnisteen ympärillä. Hashtag #okfest lanseerattiin Helsingissä pidetyn 'Open Knowledge Festival' –konferenssin taustakanavaksi 17–22.9.2012. Tutkielman pääasiallinen tavoite oli tutkia yhteisöllistä tiedonrakentelua Twitter-palvelussa erityisesti hashtagien ympärillä. Tutkimus kohdistui tietyn toimialan tiedolliseen vertaistuotantoon Twitterissä ja avoimessa Internetissä. Laajempi tavoite oli tutkia miten Twitter toimii alustana verkottuneelle asiantuntijuudelle ja julkisten asiantuntijayhteisöjen vuorovaikutukselle. Tämä uusi kulttuurisen tuotannon konteksti mahdollistaa verkottuneiden asiantuntijakulttuurien kehittymisen Twitterissä ja avoimessa Internetissä. Tämä luo uusia tilaisuuksia informaalille yhteisölliselle oppimiselle ja kulttuuriselle tuotannolle mahdollisesti vastaten nykyajan vaativiin osaamishaasteisiin.
Menetelmät. Tutkimuksen osallistujat olivat avoimen datan ammattilaisia, jotka osallistuivat Twitterissä #okfest keskusteluun konferenssin aikana. Kaikki julkiset Twitter-viestit #okfest aihetunnisteella kerättiin aineistoksi. Viestejä analysoitiin laadullisella temaattisella analyysillä koskien niiden tiedollisia kontribuutioita joko viestiin sisältyen tai linkitettynä.
Tulokset ja johtopäätökset. Tutkimustulokset osoittavat että hashtag-aihetunnisteen ympärille syntyi yhteisöllisen tiedonrakentelun ilmiö, joka oli enemmän kuin pelkkää raportointia tapahtumapaikalta. Analyysissä löytyi kuusi yhteisöllisen tiedonrakentelun teemaa jotka ilmenivät hashtag-tilassa. Yhteisöllinen tiedonrakentelu muistutti käytäntöyhteisöjen teoriaperinteen (Wenger, 1998) vuorovaikutuksen piirteitä. Asiantuntijoiden yhteisöllinen vuorovaikutus synnytti "sosiaalisen oppimisen systeemin" jossa tiedonrakentelu yhdistyi vuorovaikutukseen. Tutkimustuloksista nousi uusi käsitteellistys, asiantuntijoiden alakohtainen tiedollinen tuotanto (eng. expert microblogging). Twitter-alustalle paikantui verkottuneiden asiantuntijakulttuurien kehittyminen avoimessa verkossa
Abstraction and cartographic generalization of geographic user-generated content: use-case motivated investigations for mobile users
On a daily basis, a conventional internet user queries different internet services (available on different platforms) to gather information and make decisions. In most cases, knowingly or not, this user consumes data that has been generated by other internet users about his/her topic of interest (e.g. an ideal holiday destination with a family traveling by a van for 10 days). Commercial service providers, such as search engines, travel booking websites, video-on-demand providers, food takeaway mobile apps and the like, have found it useful to rely on the data provided by other users who have commonalities with the querying user. Examples of commonalities are demography, location, interests, internet address, etc. This process has been in practice for more than a decade and helps the service providers to tailor their results based on the collective experience of the contributors. There has been also interest in the different research communities (including GIScience) to analyze and understand the data generated by internet users.
The research focus of this thesis is on finding answers for real-world problems in which a user interacts with geographic information. The interactions can be in the form of exploration, querying, zooming and panning, to name but a few. We have aimed our research at investigating the potential of using geographic user-generated content to provide new ways of preparing and visualizing these data. Based on different scenarios that fulfill user needs, we have investigated the potential of finding new visual methods relevant to each scenario. The methods proposed are mainly based on pre-processing and analyzing data that has been offered by data providers (both commercial and non-profit organizations). But in all cases, the contribution of the data was done by ordinary internet users in an active way (compared to passive data collections done by sensors).
The main contributions of this thesis are the proposals for new ways of abstracting geographic information based on user-generated content contributions. Addressing different use-case scenarios and based on different input parameters, data granularities and evidently geographic scales, we have provided proposals for contemporary users (with a focus on the users of location-based services, or LBS). The findings are based on different methods such as
semantic analysis, density analysis and data enrichment. In the case of realization of the findings of this dissertation, LBS users will benefit from the findings by being able to explore large amounts of geographic information in more abstract and aggregated ways and get their results based on the contributions of other users. The research outcomes can be classified in the intersection between cartography, LBS and GIScience. Based on our first use case we have
proposed the inclusion of an extended semantic measure directly in the classic map generalization process. In our second use case we have focused on simplifying geographic data depiction by reducing the amount of information using a density-triggered method. And finally, the third use case was focused on summarizing and visually representing relatively large amounts of information by depicting geographic objects matched to the salient topics
emerged from the data
A Novel Design Science Approach for Integrating Chinese User-Generated Content in Non-Chinese Market Intelligence
Market research has long relied on reactive means of data gathering, such as questionnaires or focus groups. With the wide-spread use of social media, millions of comments about customer opinions and feedback regarding products and brands are available. However, before using this ‘wisdom of the crowd’ as a source for marketing research, several challenges have to be tackled: the sheer volume of posts, their unstructured format, and the dozens of different languages used on the internet. All of them make automated usage of this data challenging. In this paper, we draw on dashboard design principles and follow a design science research approach to develop a framework for search, integration, and analysis of cross-language user-generated content. With ‘MarketMiner’, we implement the framework in the automotive industry by analyzing Chinese auto forums. The results are promising in that MarketMiner can dramatically improve utilization of foreign-language social media content for market intelligence purposes
- …