1,291 research outputs found
A Biased Topic Modeling Approach for Case Control Study from Health Related Social Media Postings
abstract: Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with scores which indicate the presence of ADR being generated. A case control study has been performed on a data set of twitter timelines of women that announced their pregnancy, the goals of the study is to compare the ADR risk of medication usage from each medication category during the pregnancy.
In addition, to evaluate the prediction power of this approach, another important aspect of personalized medicine was addressed: the prediction of medication usage through the identification of risk groups. During the prediction process, the health information from Twitter timeline, such as diseases, symptoms, treatments, effects, and etc., is summarized by the topic modelling processes and the summarization results is used for prediction. Dimension reduction and topic similarity measurement are integrated into this framework for timeline classification and prediction. This work could be applied to provide guidelines for FDA drug risk categories. Currently, this process is done based on laboratory results and reported cases.
Finally, a multi-dimensional text data warehouse (MTD) to manage the output from the topic modelling is proposed. Some attempts have been also made to incorporate topic structure (ontology) and the MTD hierarchy. Results demonstrate that proposed methods show promise and this system represents a low-cost approach for drug safety early warning.Dissertation/ThesisDoctoral Dissertation Computer Science 201
ATM : Adversarial-neural topic model
Topic models are widely used for thematic structure discovery in text. But traditional topic models often require dedicated inference procedures for specific tasks at hand. Also, they are not designed to generate word-level semantic representations. To address the limitations, we propose a neural topic modeling approach based on the Generative Adversarial Nets (GANs), called Adversarial-neural Topic Model (ATM) in this paper. To our best knowledge, this work is the first attempt to use adversarial training for topic modeling. The proposed ATM models topics with Dirichlet prior and employs a generator network to capture the semantic patterns among latent topics. Meanwhile, the generator could also produce word-level semantic representations. Besides, to illustrate the feasibility of porting ATM to tasks other than topic modeling, we apply ATM for open domain event extraction. To validate the effectiveness of the proposed ATM, two topic modeling benchmark corpora and an event dataset are employed in the experiments. Our experimental results on benchmark corpora show that ATM generates more coherence topics (considering five topic coherence measures), outperforming a number of competitive baselines. Moreover, the experiments on event dataset also validate that the proposed approach is able to extract meaningful events from news articles
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
Spatio-temporal distribution analysis of brand interest in social networks
Social Networks applications such as Facebook and Twitter became part of many people’s
lives and are used daily by millions of users. In such platforms, users share their emotions,
opinions, experiences, and thoughts. Twitter, in particular, is used to discuss diverse topics,
including brands, their products and services. In this thesis, we analyse how brand interest is
reflected on Twitter and how this platform can be used to monitor what people say about specific
brands, as an indicator of brand interest. Brand interest can be defined as the level of interest
one has in a brand, and the level of curiosity one has to learn more about a brand. For this work,
the volume of tweets is used as a measure of brand interest. Our methodology is based on time,
location, and the number of brand-related tweets to perform a spatio-temporal analysis.
Additionally, we propose a framework for discovering latent patterns (topics) from a large
dataset of grouped short messages to analyse brand interest, using Twitter as a data source. We
applied a well-known Text Mining technique called Topic Modelling, which is an unsupervised
learning technique used when dealing with text data, useful to uncover topics in a collection
of documents. This technique provides a convenient way to retrieve information from unstructured text. Topic Modelling tasks have been applied to track events/trends and uncover topics
in domains such as academic, public health, marketing, and so forth. The framework consists of training LDA (Latent Dirichlet Allocation) topic models on aggregated tweets, and then
applying the model on different documents, also composed by grouped Twitter posts. Furthermore, we describe a set of pre-processing tasks that helped to improve the performance of topic
models, enabling us to obtain a better output, thus performing a better analysis of it. The experiments demonstrated that Topic Modelling can successfully track people’s discussions on Social
Networks even in massive datasets such as the one used in the current work, and capture those
topics spiked by real-life eventsActualmente, plataformas como Twitter e Facebook fazem parte do dia-a-dia de muitas pessoas e são usadas por milhões de utilizadores. Nestas plataformas, denominadas Redes Sociais,
os utilizadores partilham informações incluindo opiniões, sentimentos, experiências e pensamentos. A plataforma Twitter, em particular, e usada para partilhar diversos tópicos, que podem
incluir dicussões sobre marcas, seus produtos e/ou serviços. O presente estudo analisa como o
interesse numa marca e reflectido na Rede Social Twitter e apresenta uma metodologia que permite utilizar o Twitter como fonte de informação para monitorizar o que os utilizadores dizem
acerca de determinadas marcas. O interesse numa marca pode ser definido como o nível de
interesse que um indivíduo tem por uma marca, e o nível de curiosidade que um indivíduo tem
e que o leva a aprender mais acerca dessa marca. Neste estudo, o número de tweets publicados
e usado para medir o interesse nas marcas escolhidas. A metodologia seguida baseia-se na data
em que o tweet foi publicado, localização, e número de publicações, para efectuar uma análise
espacio-temporal.
Adicionalmente, apresenta-se uma framework que possibilita a exploração de um vasto
conjunto de dados, com o objectivo de revelar padrões latentes, bem como analisar o interesse
nas marcas seleccionadas, usando o Twitter como fonte dados. Para o efeito, aplicou-se Topic
Modelling, uma técnica de Text Mining bastante utilizada para descobrir tópicos em texto não
estruturado. Algoritmos de Topic Modelling têm sido amplamente utilizados para monitorizar
eventos e tendências e descobrir tópicos em áreas como educação, marketing, saúde, entre outras. A framework consiste em treinar o modelo de tópicos LDA (Latent Dirichlet Allocation)
usando tweets agrupados (considerando determinado critério) e posteriormente aplicar o modelo treinado noutro conjunto de tweets agrupados (considerando outro critério). Descreve-se um
conjunto de tarefas de pré-processamento dos dados que ajudaram a melhorar o desempenho dos modelos, a obter melhor resultados e, consequentemente, a efectuar uma melhor análise. As experiências revelam que atravês de Topic Modelling e possível rastrear dicussões de utilizadores
de Redes Sociais durante um longo período de tempo, e capturar alterações relacionadas com acontecimentos reais
ORÁCULO: Detection of Spatiotemporal Hot Spots of Conflict-Related Events Extracted from Online News Sources
Dissertation presented as the partial requirement for obtaining a Master's degree in Geographic Information Systems and ScienceAchieving situational awareness in peace operations requires understanding
where and when conflict-related activity is most intense. However, the irregular nature
of most factions hinders the use of remote sensing, while winning the trust of the host
populations to allow the collection of wide-ranging human intelligence is a slow process.
Thus, our proposed solution, ORÁCULO, is an information system which detects
spatiotemporal hot spots of conflict-related activity by analyzing the patterns of events
extracted from online news sources, allowing immediate situational awareness. To do so,
it combines a closed-domain supervised event extractor with emerging hot spots analysis
of event space-time cubes. The prototype of ORÁCULO was tested on tweets scraped
from the Twitter accounts of local and international news sources covering the Central
African Republic Civil War, and its test results show that it achieved near state-of-theart
event extraction performance, significant overlap with a reference event dataset, and
strong correlation with the hot spots space-time cube generated from the reference event
dataset, proving the viability of the proposed solution. Future work will focus on
improving the event extraction performance and on testing ORÁCULO in cooperation
with peacekeeping organizations.
Keywords: event extraction, natural language understanding, spatiotemporal analysis,
peace operations, open-source intelligence.Atingir e manter a consciência situacional em operações de paz requer o
conhecimento de quando e onde é que a atividade relacionada com o conflito é mais
intensa. Porém, a natureza irregular da maioria das fações dificulta o uso de deteção
remota, e ganhar a confiança das populações para permitir a recolha de informações é
um processo moroso. Assim, a nossa solução proposta, ORÁCULO, consiste num sistema
de informações que deteta “hot spots” espácio-temporais de atividade relacionada com o
conflito através da análise dos padrões de eventos extraídos de fontes noticiosas online,
(incluindo redes sociais), permitindo consciência situacional imediata. Nesse sentido, a
nossa solução combina um extrator de eventos de domínio limitado baseado em
aprendizagem supervisionada com a análise de “hot spots” emergentes de cubos espaçotempo
de eventos. O protótipo de ORÁCULO foi testado em tweets recolhidos de fontes
noticiosas locais e internacionais que cobrem a Guerra Civil da República Centro-
Africana. Os resultados dos seus testes demonstram que foram conseguidos um
desempenho de extração de eventos próximo do estado da arte, uma sobreposição
significativa com um conjunto de eventos de referência e uma correlação forte com o
cubo espaço-tempo de “hot spots” gerado a partir desse conjunto de referência,
comprovando a viabilidade da solução proposta. Face aos resultados atingidos, o
trabalho futuro focar-se-á em melhorar o desempenho de extração de eventos e em testar
o sistema ORÁCULO em cooperação com organizações que conduzam operações paz
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
- …