    DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data

    Use of online social networks (OSNs) undoubtedly brings the world closer. OSNs like Twitter provide a space for expressing one’s opinions in a public platform. This great potential is misused by the creation of bot accounts, which spread fake news and manipulate opinions. Hence, distinguishing genuine human accounts from bot accounts has become a pressing issue for researchers. In this paper, we propose a framework based on deep learning to classify Twitter accounts as either ‘human’ or ‘bot.’ We use the information from user profile metadata of the Twitter account like description, follower count and tweet count. We name the framework ‘DeeProBot,’ which stands for Deep Profile-based Bot detection framework. The raw text from the description field of the Twitter account is also considered a feature for training the model by embedding the raw text using pre-trained Global Vectors (GLoVe) for word representation. Using only the user profile-based features considerably reduces the feature engineering overhead compared with that of user timeline-based features like user tweets and retweets. DeeProBot handles mixed types of features including numerical, binary, and text data, making the model hybrid. The network is designed with long short-term memory (LSTM) units and dense layers to accept and process the mixed input types. The proposed model is evaluated on a collection of publicly available labeled datasets. We have designed the model to make it generalizable across different datasets. The model is evaluated using two ways: testing on a hold-out set of the same dataset; and training with one dataset and testing with a different dataset. With these experiments, the proposed model achieved AUC as high as 0.97 with a selected set of features

    Crowdsensing-driven route optimisation algorithms for smart urban mobility

    Urban rörlighet anses ofta vara en av de främsta möjliggörarna för en hållbar statsutveckling. Idag skulle det dock kräva ett betydande skifte mot renare och effektivare stadstransporter vilket skulle stödja ökad social och ekonomisk koncentration av resurser i städerna. En viktig prioritet för städer runt om i världen är att stödja medborgarnas rörlighet inom stadsmiljöer medan samtidigt minska trafikstockningar, olyckor och föroreningar. Att utveckla en effektivare och grönare (eller med ett ord; smartare) stadsrörlighet är en av de svåraste problemen att bemöta för stora metropoler. I denna avhandling närmar vi oss problemet från det snabba utvecklingsperspektivet av ITlandskapet i städer vilket möjliggör byggandet av rörlighetslösningar utan stora stora investeringar eller sofistikerad sensortenkik. I synnerhet föreslår vi utnyttjandet av den mobila rörlighetsavkännings, eng. Mobile Crowdsensing (MCS), paradigmen i vilken befolkningen exploaterar sin mobilkommunikation och/eller mobilasensorer med syftet att frivilligt samla, distribuera, lokalt processera och analysera geospecifik information. Rörlighetavkänningssdata (t.ex. händelser, trafikintensitet, buller och luftföroreningar etc.) inhämtad från frivilliga i befolkningen kan ge värdefull information om aktuella rörelsesförhållanden i stad vilka, med adekvata databehandlingsalgoriter, kan användas för att planera människors rörelseflöden inom stadsmiljön. Såtillvida kombineras i denna avhandling två mycket lovande smarta rörlighetsmöjliggörare, eng. Smart Mobility Enablers, nämligen MCS och rese/ruttplanering. Vi kan därmed till viss utsträckning sammanföra forskningsutmaningar från dessa två delar. Vi väljer att separera våra forskningsmål i två delar, dvs forskningssteg: (1) arkitektoniska utmaningar vid design av MCS-system och (2) algoritmiska utmaningar för tillämpningar av MCS-driven ruttplanering. Vi ämnar att visa en logisk forskningsprogression över tiden, med avstamp i mänskligt dirigerade rörelseavkänningssystem som MCS och ett avslut i automatiserade ruttoptimeringsalgoritmer skräddarsydda för specifika MCS-applikationer. Även om vi förlitar oss på heuristiska lösningar och algoritmer för NP-svåra ruttproblem förlitar vi oss på äkta applikationer med syftet att visa på fördelarna med algoritm- och infrastrukturförslagen.La movilidad urbana es considerada una de las principales desencadenantes de un desarrollo urbano sostenible. Sin embargo, hoy en día se requiere una transición hacia un transporte urbano más limpio y más eficiente que soporte una concentración de recursos sociales y económicos cada vez mayor en las ciudades. Una de las principales prioridades para las ciudades de todo el mundo es facilitar la movilidad de los ciudadanos dentro de los entornos urbanos, al mismo tiempo que se reduce la congestión, los accidentes y la contaminación. Sin embargo, desarrollar una movilidad urbana más eficiente y más verde (o en una palabra, más inteligente) es uno de los temas más difíciles de afrontar para las grandes áreas metropolitanas. En esta tesis, abordamos este problema desde la perspectiva de un panorama TIC en rápida evolución que nos permite construir movilidad sin la necesidad de grandes inversiones ni sofisticadas tecnologías de sensores. En particular, proponemos aprovechar el paradigma Mobile Crowdsensing (MCS) en el que los ciudadanos utilizan sus teléfonos móviles y dispositivos, para nosotros recopilar, procesar y analizar localmente información georreferenciada, distribuida voluntariamente. Los datos de movilidad recopilados de ciudadanos que voluntariamente quieren compartirlos (por ejemplo, eventos, intensidad del tráfico, ruido y contaminación del aire, etc.) pueden proporcionar información valiosa sobre las condiciones de movilidad actuales en la ciudad, que con el algoritmo de procesamiento de datos adecuado, pueden utilizarse para enrutar y gestionar el flujo de gente en entornos urbanos. Por lo tanto, en esta tesis combinamos dos prometedoras fuentes de movilidad inteligente: MCS y la planificación de viajes/rutas, uniendo en cierta medida los distintos desafíos de investigación. Hemos dividido nuestros objetivos de investigación en dos etapas: (1) Desafíos arquitectónicos en el diseño de sistemas MCS y (2) Desafíos algorítmicos en la planificación de rutas aprovechando la información del MCS. Nuestro objetivo es demostrar una progresión lógica de la investigación a lo largo del tiempo, comenzando desde los fundamentos de los sistemas de detección centrados en personas, como el MCS, hasta los algoritmos de optimización de rutas diseñados específicamente para la aplicación de estos. Si bien nos centramos en algoritmos y heurísticas para resolver problemas de enrutamiento de clase NP-hard, utilizamos ejemplos de aplicaciones en el mundo real para mostrar las ventajas de los algoritmos e infraestructuras propuestas

    Yhteisöllinen tiedonrakentelu ja verkottunut asiantuntijuus Twitter-palvelussa : Case #okfest

    Aims. This qualitative study explored a phenomenon of epistemic communality around a Twitter hashtag. The primary aim of the study was to explore communal epistemic production on the Twitter platform, especially in the context of a mutually shared hashtag. The study explored the peer-production of knowledge and epistemic structures in the context of a specialist domain collaborating in the open Web. The secondary aim was to explore how Twitter functions as a platform for networked expertise and as a public agora for practitioners' expert discourse. This nascent mode of cultural production leads to the development of expert cultures on Twitter and in the open Web. This creates new contexts for informal collaborative learning and cultral production potentially answering some of the competence challenges presented by the 21st century. Methods. The hashtag #okfest was launched for the 'Open Knowledge Festival' conference held in Helsinki, Finland (17–22.9.2012). The participants of the study were open knowledge practitioners who participated in the hashtag discourse of #okfest on Twitter. All public tweets containing the string '#okfest' were collected as data. Tweets were analyzed with qualitative thematic analysis exploring the epistemic contributions either included in the tweets or as hyperlinked attachments. Results and conclusions. The analysis indicated how the hashtag was appropriated to serve as a node of communal knowledge sharing beyond mere reporting from the conference. The analysis observed six themes of communal knowledge building in the hashtag space. The communal epistemic activities in #okfest were likened to the properties of a community of practice (Wenger, 1998). A network of practitioners engaging in a mutual domain creates a dynamic 'social learning system' combining social interaction with the production and dissemination of knowledge. The study yielded a novel theoretical concept of 'expert microblogging', recognized as a significant genre of cultural production in a specialist domain on Twitter and in the open Web. Finally the Twitter platform was ascertained as a site for the manifestation of cultures of networked expertise.Tavoitteet. Tämä laadullinen tutkielma tutki episteemistä yhteisöllisyyttä Twitter-palvelussa hashtag-aihetunnisteen ympärillä. Hashtag #okfest lanseerattiin Helsingissä pidetyn 'Open Knowledge Festival' –konferenssin taustakanavaksi 17–22.9.2012. Tutkielman pääasiallinen tavoite oli tutkia yhteisöllistä tiedonrakentelua Twitter-palvelussa erityisesti hashtagien ympärillä. Tutkimus kohdistui tietyn toimialan tiedolliseen vertaistuotantoon Twitterissä ja avoimessa Internetissä. Laajempi tavoite oli tutkia miten Twitter toimii alustana verkottuneelle asiantuntijuudelle ja julkisten asiantuntijayhteisöjen vuorovaikutukselle. Tämä uusi kulttuurisen tuotannon konteksti mahdollistaa verkottuneiden asiantuntijakulttuurien kehittymisen Twitterissä ja avoimessa Internetissä. Tämä luo uusia tilaisuuksia informaalille yhteisölliselle oppimiselle ja kulttuuriselle tuotannolle mahdollisesti vastaten nykyajan vaativiin osaamishaasteisiin. Menetelmät. Tutkimuksen osallistujat olivat avoimen datan ammattilaisia, jotka osallistuivat Twitterissä #okfest keskusteluun konferenssin aikana. Kaikki julkiset Twitter-viestit #okfest aihetunnisteella kerättiin aineistoksi. Viestejä analysoitiin laadullisella temaattisella analyysillä koskien niiden tiedollisia kontribuutioita joko viestiin sisältyen tai linkitettynä. Tulokset ja johtopäätökset. Tutkimustulokset osoittavat että hashtag-aihetunnisteen ympärille syntyi yhteisöllisen tiedonrakentelun ilmiö, joka oli enemmän kuin pelkkää raportointia tapahtumapaikalta. Analyysissä löytyi kuusi yhteisöllisen tiedonrakentelun teemaa jotka ilmenivät hashtag-tilassa. Yhteisöllinen tiedonrakentelu muistutti käytäntöyhteisöjen teoriaperinteen (Wenger, 1998) vuorovaikutuksen piirteitä. Asiantuntijoiden yhteisöllinen vuorovaikutus synnytti "sosiaalisen oppimisen systeemin" jossa tiedonrakentelu yhdistyi vuorovaikutukseen. Tutkimustuloksista nousi uusi käsitteellistys, asiantuntijoiden alakohtainen tiedollinen tuotanto (eng. expert microblogging). Twitter-alustalle paikantui verkottuneiden asiantuntijakulttuurien kehittyminen avoimessa verkossa

    Abstraction and cartographic generalization of geographic user-generated content: use-case motivated investigations for mobile users

    On a daily basis, a conventional internet user queries different internet services (available on different platforms) to gather information and make decisions. In most cases, knowingly or not, this user consumes data that has been generated by other internet users about his/her topic of interest (e.g. an ideal holiday destination with a family traveling by a van for 10 days). Commercial service providers, such as search engines, travel booking websites, video-on-demand providers, food takeaway mobile apps and the like, have found it useful to rely on the data provided by other users who have commonalities with the querying user. Examples of commonalities are demography, location, interests, internet address, etc. This process has been in practice for more than a decade and helps the service providers to tailor their results based on the collective experience of the contributors. There has been also interest in the different research communities (including GIScience) to analyze and understand the data generated by internet users. The research focus of this thesis is on finding answers for real-world problems in which a user interacts with geographic information. The interactions can be in the form of exploration, querying, zooming and panning, to name but a few. We have aimed our research at investigating the potential of using geographic user-generated content to provide new ways of preparing and visualizing these data. Based on different scenarios that fulfill user needs, we have investigated the potential of finding new visual methods relevant to each scenario. The methods proposed are mainly based on pre-processing and analyzing data that has been offered by data providers (both commercial and non-profit organizations). But in all cases, the contribution of the data was done by ordinary internet users in an active way (compared to passive data collections done by sensors). The main contributions of this thesis are the proposals for new ways of abstracting geographic information based on user-generated content contributions. Addressing different use-case scenarios and based on different input parameters, data granularities and evidently geographic scales, we have provided proposals for contemporary users (with a focus on the users of location-based services, or LBS). The findings are based on different methods such as semantic analysis, density analysis and data enrichment. In the case of realization of the findings of this dissertation, LBS users will benefit from the findings by being able to explore large amounts of geographic information in more abstract and aggregated ways and get their results based on the contributions of other users. The research outcomes can be classified in the intersection between cartography, LBS and GIScience. Based on our first use case we have proposed the inclusion of an extended semantic measure directly in the classic map generalization process. In our second use case we have focused on simplifying geographic data depiction by reducing the amount of information using a density-triggered method. And finally, the third use case was focused on summarizing and visually representing relatively large amounts of information by depicting geographic objects matched to the salient topics emerged from the data

    A Novel Design Science Approach for Integrating Chinese User-Generated Content in Non-Chinese Market Intelligence

    Market research has long relied on reactive means of data gathering, such as questionnaires or focus groups. With the wide-spread use of social media, millions of comments about customer opinions and feedback regarding products and brands are available. However, before using this ‘wisdom of the crowd’ as a source for marketing research, several challenges have to be tackled: the sheer volume of posts, their unstructured format, and the dozens of different languages used on the internet. All of them make automated usage of this data challenging. In this paper, we draw on dashboard design principles and follow a design science research approach to develop a framework for search, integration, and analysis of cross-language user-generated content. With ‘MarketMiner’, we implement the framework in the automotive industry by analyzing Chinese auto forums. The results are promising in that MarketMiner can dramatically improve utilization of foreign-language social media content for market intelligence purposes