615 research outputs found

    Analisis Sentimen Masykarakat Indonesia Terhadap Gatra Ekonomi Ketahanan Nasional Menggunakan Fuzzy Ontology-Based Semantic Knowledge

    Get PDF
    Kampanye diantara dua kubu acap kali meramaikan media sosial yang telah menjadi target kampanye dimana total pengguna media sosial di Indonesia telah mencapai 130 juta pengguna. Memanfaatkan momentum ramainya media sosial pada tahun pemilu dan kampanye, penulis mencoba menggali sentimen masyarakat melalui Twitter terhadap gatra ekonomi dalam konsepsi ketahanan nasional menggunakan Fuzzy ontology-based semantic knowledge. Ontologi pada biasanya dianggap tidak terlalu efektif dalam mengekstrak informasi dari tweets, sehingga digunakanlah konsep Fuzzy-ontology based semantic knowledge. Fuzzy ontology-based semantic knowledge merupakan salah satu cara analisis sentimen menggunakan pendekatan gabungan lexicon-based, ontologi, dan fuzzy logic untuk menghasilkan apakah suatu tweet dapat dikategorikan sebagai strong negative, negative, netral, positive, maupun strong positive. Pada akhirnya, ontologi biasa tidak dapat mengklasifikasikan masuk kedalam sentimen apa sebuah tweet jika tweet tersebut memiliki lebih dari satu nilai SentiWord. Dari 2032 tweet bersentimen, terdapat 205 tweet yang memiliki lebih dari satu nilai SentiWord sehingga diperlukan penerapan FuzzyDL untuk memecahkan permasalahan tersebut. Dengan menggunakan metode ini, didapatkan akurasi 78%, dengan tingkat presisi 93%, recall 73%, dan function measure 82%. ================================================================================================================================== The campaign between the two camps often enlivened social media which has become the target of the campaign where the total number of social media users in Indonesia has reached 130 million users. Utilizing the momentum of the hectic social media in the election year and campaign, the author tries to explore the public sentiment through Twitter on gatra economy in the concept of national resilience using Fuzzy ontology-based semantic knowledge. Ontology is usually considered to be not very effective in extracting information from tweets, so the concept of Fuzzy-ontology based semantic knowledge is used. Fuzzy ontology-based semantic knowledge is one method of sentiment analysis using a combined approach of lexicon- based, ontology, and fuzzy logic to produce whether a tweet can be categorized as strong negative, negative, neutral, positive, or strong positive. In the end, ordinary ontologies cannot classify what sentiment is a tweet if the tweet has more than one SentiWord value. Of the 2032 sentiment tweets, there are 205 tweets that have more than one SentiWord value, so FuzzyDL is needed to solve these problems. By using this method, an accuracy of 78% is obtained, with a precision level of 93%, a recall of 73%, and a function measure of 82%

    An Extended Semantic Interoperability Model for Distributed Electronic Health Record Based on Fuzzy Ontology Semantics

    Get PDF
    Semantic interoperability of distributed electronic health record (EHR) systems is a crucial problem for querying EHR and machine learning projects. The main contribution of this paper is to propose and implement a fuzzy ontology-based semantic interoperability framework for distributed EHR systems. First, a separate standard ontology is created for each input source. Second, a unified ontology is created that merges the previously created ontologies. However, this crisp ontology is not able to answer vague or uncertain queries. We thirdly extend the integrated crisp ontology into a fuzzy ontology by using a standard methodology and fuzzy logic to handle this limitation. The used dataset includes identified data of 100 patients. The resulting fuzzy ontology includes 27 class, 58 properties, 43 fuzzy data types, 451 instances, 8376 axioms, 5232 logical axioms, 1216 declarative axioms, 113 annotation axioms, and 3204 data property assertions. The resulting ontology is tested using real data from the MIMIC-III intensive care unit dataset and real archetypes from openEHR. This fuzzy ontology-based system helps physicians accurately query any required data about patients from distributed locations using near-natural language queries. Domain specialists validated the accuracy and correctness of the obtained resultsThis work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (NRF-2021R1A2B5B02002599)S

    Разработка системы для тонового анализа отзывов пользователей портала «AUTOSTRADA.INFO/RU»

    Get PDF
    As a result of the analysis, it was revealed that social networks (Vkontakte, Facebook), thematic communities in microblogging networks (Twitter), resources for travelers (TripAdvisor), transport portals (Autostrada) are a source of up-to-date and operational information about the traffic situation, the quality of transport services and passenger satisfaction with the quality of levels of transport services. However, the existing transport monitoring systems do not contain software tools capable of collecting and analyzing traffic information located in the Internet environment. This paper discusses the task of building a system for automatically retrieving and classifying road traffic information from transport Internet portals and testing the developed system for analyzing the transport networks of Crimea and the city of Sevastopol. To solve this problem, an analysis of open source libraries for thematic data collection and analysis was carried out. An algorithm for extracting and analyzing texts has been developed. A crawler was developed using the Scrapy package in Python3, and user feedback from the portal http://autostrada.info/ru was collected on the state of the transport system of Crimea and the city of Sevastopol. For texts lemmatization and vector text transformation, the tf, idf, tf-idf methods and their implementation in the Scikit-Learn library were considered: CountVectorizer and TF-IDF Vectorizer. For word processing, Bag-of-Words and n-gram methods were considered. During the development of the classifier model, the naive Bayes algorithm (MultinomialNB) and the linear classifier model with optimization of the stochastic gradient descent (SGDClassifier) were used. As a training sample, a corpus of 225,000 labeled texts from the Twitter resource was used. The classifier was trained, during which the cross-validation strategy and the ShuffleSplit method were used. Testing and comparison of the results of the pitch classification were carried out. According to the results of validation, the linear model with the n-gram scheme [1, 3] and the vectorizer TF-IDF turned out to be the best. During the approbation of the developed system, the collection and analysis of reviews related to the quality of transport networks of the Republic of Crimea and the city of Sevastopol were conducted. Conclusions are drawn and prospects for further functional development of the developed tools are defined.В результате анализа выявлено, что социальные сети (Вконтакте, Facebook), тематические сообщества в сетях микроблогинга (Twitter), ресурсы для путешественников (TripAdvisor), транспортные порталы (Autostrada) являются источником актуальной и оперативной информации о дорожно-транспортной обстановке, качестве предоставляемых транспортных услуг и степени удовлетворенности пассажиров уровнем транспортного обслуживания. Однако существующие системы транспортного мониторинга не содержат программных инструментов, способных осуществлять сбор и анализ дорожно-транспортной информации в среде Интернет. В настоящей работе рассматривается задача построения системы автоматического извлечения и классификации дорожно-транспортной информации с транспортных интернет-порталов и апробация разработанной системы для анализа транспортных сетей Крыма и города Севастополя. Для решения этой задачи проанализированы библиотеки с открытым исходным кодом для тематического сбора и исследования данных. Разработан алгоритм для извлечения и анализа текстов. Осуществлена разработка краулера с использованием пакета Scrapy на языке Python3 и собраны отзывы пользователей с портала http://autostrada.info/ru о состоянии транспортной системы Крыма и города Севастополя. Для лемматизации текстов и векторного преобразования текстов были рассмотрены методы tf, idf, tf-idf и их реализация в библиотеке Scikit-Learn: CountVectorizer и TF-IDF Vectorizer. Для обработки текстов были рассмотрены методы Bag-of-Words и n-gram. В ходе разработки модели классификатора рассмотрены наивный байесовский алгоритм (MultinomialNB) и модель линейного классификатора с оптимизацией стохастического градиентного спуска (SGDClassifier). В качестве обучающей выборки использовался корпус объемом 225 тысяч размеченных текстов с ресурса Twitter. Проведено обучение классификатора, в ходе которого использовалась стратегия кросс-валидации и метод ShuffleSplit. Проведено тестирование и сравнение результатов тоновой классификации. По результатам валидации лучшей оказалась линейная модель со схемой n-грамм [1, 3] и векторизатором TF-IDF. В ходе апробации разработанной системы был проведен сбор и анализ отзывов, относящихся к качеству транспортных сетей республики Крым и города Севастополя. Сделаны выводы и определены перспективы дальнейшего функционального развития разрабатываемого инструментария

    Разработка системы для тонового анализа отзывов пользователей портала «AUTOSTRADA.INFO/RU»

    Get PDF
    В результате анализа выявлено, что социальные сети (Вконтакте, Facebook), тематические сообщества в сетях микроблогинга (Twitter), ресурсы для путешественников (TripAdvisor), транспортные порталы (Autostrada) являются источником актуальной и оперативной информации о дорожно-транспортной обстановке, качестве предоставляемых транспортных услуг и степени удовлетворенности пассажиров уровнем транспортного обслуживания. Однако существующие системы транспортного мониторинга не содержат программных инструментов, способных осуществлять сбор и анализ дорожно-транспортной информации в среде Интернет. В настоящей работе рассматривается задача построения системы автоматического извлечения и классификации дорожно-транспортной информации с транспортных интернет-порталов и апробация разработанной системы для анализа транспортных сетей Крыма и города Севастополя. Для решения этой задачи проанализированы библиотеки с открытым исходным кодом для тематического сбора и исследования данных. Разработан алгоритм для извлечения и анализа текстов. Осуществлена разработка краулера с использованием пакета Scrapy на языке Python3 и собраны отзывы пользователей с портала http://autostrada.info/ru о состоянии транспортной системы Крыма и города Севастополя. Для лемматизации текстов и векторного преобразования текстов были рассмотрены методы tf, idf, tf-idf и их реализация в библиотеке Scikit-Learn: CountVectorizer и TF-IDF Vectorizer. Для обработки текстов были рассмотрены методы Bag-of-Words и n-gram. В ходе разработки модели классификатора рассмотрены наивный байесовский алгоритм (MultinomialNB) и модель линейного классификатора с оптимизацией стохастического градиентного спуска (SGDClassifier). В качестве обучающей выборки использовался корпус объемом 225 тысяч размеченных текстов с ресурса Twitter. Проведено обучение классификатора, в ходе которого использовалась стратегия кросс-валидации и метод ShuffleSplit. Проведено тестирование и сравнение результатов тоновой классификации. По результатам валидации лучшей оказалась линейная модель со схемой n-грамм [1, 3] и векторизатором TF-IDF. В ходе апробации разработанной системы был проведен сбор и анализ отзывов, относящихся к качеству транспортных сетей республики Крым и города Севастополя. Сделаны выводы и определены перспективы дальнейшего функционального развития разрабатываемого инструментария

    Design of Smart Open Parking Using Background Subtraction in the IoT Architecture

    Get PDF
    The Internet of Things (IoT) has evolved and penetrated to our live since the end of the last century. Nowadays, many devices for any purpose are connected through the Internet. A smart node, in smart campus environment, can detect an availability of an open parking space by calculating the vehicle that enters or outs from the space. The node applies a background subtraction method, which is deployed in IoT architecture. The Gaussian Mixture Model (GMM) is utilized to determine foreground and background image, in order to detect a moving object at an open area. Furthermore, the node can discriminate the type of vehicle with a high accuracy. The result of vehicle type classification is transmitted by the node through the Internet, and then it is saved to the data server. We observe the designed system succeeds delivering a good performance in terms of average accuracy determining car and motorcycle are 93.47% and 91.73%, respectively

    Urban Transport Evaluation Using Knowledge Extracted from Social Media

    Get PDF
    Atualmente, as redes sociais constituem uma fonte de dados valiosa para vários setores de actividade. No sector da mobilidade, as redes sociais online permitem obter informação em tempo-real a um baixo custo, quando comparado com outros métodos de recolha de informação. Nesta dissertação definiu-se uma metodologia para extrair conhecimento de mensagens extraídas do Twitter para analisar a mobilidade urbana. Esta metodologia foi estruturada em três módulos principais: configuração do sistema, análise de dados e visualização. As mensagens usadas para a demonstração da metodologia proposta foram recolhidas ao longo de dois meses para três cidades distintas: Nova Iorque, Londres e Melbourne. A extração de textos das redes socias e a posterior análise são tarefas muito demoradas devido ao alto volume de dados produzido. Cada mensagem extraída do Twitter é, normalmente, curta, informal e com muita gíria ou erros gramaticais associados. Para tratar estas questões, recorrendo à ferramenta NLTK (Natural Language Toolkit), técnicas de NLP (Natural Language Processing) foram aplicadas para que o texto fosse limpo, adequado e compreendido pelo algoritmo. Para a classificação das mensagens relacionadas com transportes, utilizou-se o modelo BERT (Bidirectional Transformers for Language Understanding) embedding. Trata-se de um modelo não-supervisionado pré-treinado lançado em 2018. No intuito de perceber se um modelo simples pode ter uma boa performance, utilizou-se uma abordagem unigram. Três listas de palavras relacionadas com transportes foram usadas: (i) uma lista pequena de 10 palavras, (ii) uma lista média com 35 palavras, e (iii) uma lista grande com 344 palavras. Os resultados da aplicação deste modelo monstram que este apresenta uma performance elevada, com a precisão e exatidão a registar valores superiores a 0.80 e 0.90, respetivamente. As palavras mais populares são train, walk, street, car, station, street e avenue. Os resultados obtidos são consistentes para as três cidades. Para a avaliação da perceção da opinião pública, as mensagens relacionadas com tráfego foram classificadas quanto ao seu sentimento. Para avaliar a polaridade das mensagens (positivo, neutro ou negativo), utilizou-se a ferramenta VADER (Valence Aware Dictionary and sEntiment Reasone) sentiment. O VADER é uma ferramenta de fácil utilização e com boa compatibilidade com mensagens de redes sociais e textos informais. É baseada em campos lexicais e regras para calcular o valor composto do sentimento de um texto de acordo com as palavras usadas. A metodologia desenvolvida obteve bons resultados de performance na análise de sentimentos. O valor médio da precisão atingiu 0.77 e a exatidão atingiu 0.78. Foi feita uma análise a um evento específico que envolveu um acidente de carro em Nova Iorque no dia 18 de maio de 2017. A análise efetuada, em particular a este dia demonstra que a metodologia é capaz de identificar alterações espaciais e de fluxos de mobilidade indicando quais as potenciais causas da sua origem. O trabalho desenvolvido permitiu concluir que a metodologia proposta pode ser bastante útil para auxiliar gestores de tráfego, planeadores urbanos, investigadores e formuladores de políticas a obter informações sobre as opiniões públicas sobre mobilidade urbana.Public opinion is nowadays a valuable data source for many sectors. Regarding the transportation and mobility sector, it is possible to collect information on real-time with reduced costs compared to other methods of information extraction. In this dissertation, we defined a methodology to extract knowledge from messages collected from Twitter to analyse urban mobility. The methodology was structured according three main modules: system configuration, data analytics and visualization. The messages used for the demonstration of the proposed methodology were extracted during two months for three different cities: New York, London and Melbourne. The text extraction from social media and its analysis are very time-consuming tasks due to the volume of the messages produced. Each message extracted from Twitter is, normally, short, informal and with a lot of slang or misspellings. To deal with that matter, by using NLTK (Natural Language Toolkit) tool, NLP (Natural Language Processing) techniques were applied so the text could be cleared and understandable by the algorithm. For the classification of travel related messages, a BERT (Bidirectional Transformers for Language Understanding) embedding model was used. The model is pre-trained, unsupervised and was released in 2018. In order to understand if a simple model could have good performance, an unigram approach was used. Three lists of travel-related words were used: (i) a small list with 10 traveled-related words, (ii) a medium list with 35 traveled-related words and (iii) a big list with 344 traveled-related words. The results show a high model performance with precision and accuracy higher than 0.80 and 0.90, respectively. Popular words are train, walk, street, car, station, street and avenue. Consistent results were obtained for all the three cities assessed. To evaluate the public opinion, the messages related to transportation and mobility were classified according to its sentiment. Then, to evaluate the polarity of the messages (positive, neutral or negative), VADER (Valence Aware Dictionary and sEntiment Reasone) sentiment tool was used. VADER is an easy tool to use and has great compatibility with social media messages and informal texts. It is a lexicon and rule based tool that calculates the compound value of text emotion according to its words. The developed methodology attained good performance results for the sentiment analysis where the average value of precision scored 0.77 while recall, accuracy and F1-score attained around 0.78. A specific analysis was made regarding a car crash event on New York on May 18, 2017. This analysis demonstrates that the methodology is capable of recognizing spacial changes and mobility flows directing to the potential causes of its origin. The developed work allows the conclusion that the proposed methodology can be very helpful to transport engineers, urban planners, researchers and policymakers in getting insight into public opinions regarding urban mobility

    Investigating transportation research based on social media analysis: A systematic mapping review

    Get PDF
    Social media is a pool of users’ thoughts, opinions, surrounding environment, situation and others. This pool can be used as a real-time and feedback data source for many domains such as transportation. It can be used to get instant feedback from commuters; their opinions toward the transportation network and their complaints, in addition to the traffic situation, road conditions, events detection and many others. The problem is in how to utilize social media data to achieve one or more of these targets. A systematic review was conducted in the field of transportation-related research based on social media analysis (TRRSMA) from the years between 2008 and 2018; 74 papers were identified from an initial set of 703 papers extracted from 4 digital libraries. This review will structure the field and give an overview based on the following grounds: activity, keywords, approaches, social media data and platforms and focus of the researches. It will show the trend in the research subjects by countries, in addition to the activity trends, platforms usage trend and others. Further analysis of the most employed approach (Lexicons) and data (text) will be also shown. Finally, challenges and future works are drawn and proposed

    Multi-class twitter data categorization and geocoding with a novel computing framework

    Get PDF
    This study details the progress in transportation data analysis with a novel computing framework in keeping with the continuous evolution of the computing technology. The computing framework combines the Labeled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy on publicly available Twitter data in determining transportation-related events to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. Approximately 700,010 tweets are analyzed to extract relevant transportation-related information for one week. The SVM classifier achieves \u3e 85% accuracy in identifying transportation-related tweets from structured data. To further categorize the transportation-related tweets into sub-classes: incident, congestion, construction, special events, and other events, three supervised classifiers are used: L-LDA, SVM, and L-LDA incorporated SVM. Findings from this study demonstrate that the analytical framework, which uses the L-LDA incorporated SVM, can classify roadway transportation-related data from Twitter with over 98.3% accuracy, which is significantly higher than the accuracies achieved by standalone L-LDA and SVM

    Measurement Design of Sensor Node for Landslide Disaster Early Warning System (review)

    Get PDF
    corecore