14 research outputs found
Mining Reaction and Diffusion Dynamics in Social Activities
Large quantifies of online user activity data, such as weekly web search
volumes, which co-evolve with the mutual influence of several queries and
locations, serve as an important social sensor. It is an important task to
accurately forecast the future activity by discovering latent interactions from
such data, i.e., the ecosystems between each query and the flow of influences
between each area. However, this is a difficult problem in terms of data
quantity and complex patterns covering the dynamics. To tackle the problem, we
propose FluxCube, which is an effective mining method that forecasts large
collections of co-evolving online user activity and provides good
interpretability. Our model is the expansion of a combination of two
mathematical models: a reaction-diffusion system provides a framework for
modeling the flow of influences between local area groups and an ecological
system models the latent interactions between each query. Also, by leveraging
the concept of physics-informed neural networks, FluxCube achieves high
interpretability obtained from the parameters and high forecasting performance,
together. Extensive experiments on real datasets showed that FluxCube
outperforms comparable models in terms of the forecasting accuracy, and each
component in FluxCube contributes to the enhanced performance. We then show
some case studies that FluxCube can extract useful latent interactions between
queries and area groups.Comment: Accepted by CIKM 202
An Exploratory Empirical Assessment of Italian Open Government Data Quality With an eye to enabling linked open data
Context The diffusion of Linked Data and Open Data in recent years kept a very fast pace. However evidence from practitioners shows that disclosing data without proper quality control may jeopardize datasets reuse in terms of apps, linking, and other transformations. Objective Our goals are to understand practical problems experienced by open data users in using and integrating them and build a set of concrete metrics to assess the quality of disclosed data and better support the transition towards linked open data. Method We focus on Open Government Data (OGD), collecting problems experienced by developers and mapping them to a data quality model available in literature. Then we derived a set of metrics and applied them to evaluate a few samples of Italian OGD. Result We present empirical evidence concerning the common quality problems experienced by open data users when using and integrating datasets. The measurements effort showed a few acquired good practices and common weaknesses, and a set of discriminant factors among datasets. Conclusion The study represents the first empirical attempt to evaluate the quality of open datasets at an operational level. Our long-term goal is to support the transition towards Linked Open Government Data (LOGD) with a quality improvement process in the wake of the current practices in Software Qualit
GI Systems for public health with an ontology based approach
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.Health is an indispensable attribute of human life. In modern age,
utilizing technologies for health is one of the emergent concepts in
several applied fields. Computer science, (geographic) information
systems are some of the interdisciplinary fields which motivates this
thesis.
Inspiring idea of the study is originated from a rhetorical disease
DbHd: Database Hugging Disorder, defined by Hans Rosling at
World Bank Open Data speech in May 2010. The cure of this disease
can be offered as linked open data, which contains ontologies for
health science, diseases, genes, drugs, GEO species etc. LOD-Linked
Open Data provides the systematic application of information by
publishing and connecting structured data on the Web.
In the context of this study we aimed to reduce boundaries
between semantic web and geo web. For this reason a use case data is
studied from Valencia CSISP- Research Center of Public Health in
which the mortality rates for particular diseases are represented
spatio-temporally. Use case data is divided into three conceptual
domains (health, spatial, statistical), enhanced with semantic relations
and descriptions by following Linked Data Principles. Finally in order
to convey complex health-related information, we offer an
infrastructure integrating geo web and semantic web. Based on the
established outcome, user access methods are introduced and future
researches/studies are outlined
WHISPER – service integrated incident management system
ABSTRACT: This paper presents a cohesive summary of existing emergency response systems. We investigate and integrate principles, theories, and practices from four diverse, yet related, fields of knowledge with respect to information representation and decision support capability requirements for emergency planning and response (EPR) systems. This enables the cooperation between constituent agencies (e.g., fire, police and medical) and surrounding municipalities which operate using assorted decision support protocols, system architectures, networking strategies and along different levels of data security needs. Based upon our investigation, we have built a service architectural framework for providing and disseminating an integrated platform of knowledge capable of being used as intelligent interconnects between distributed EPR systems. Such a framework can support affordable integration for municipalities of all sizes, in particular smaller municipalities that often cannot afford costly off-the-shelf software solutions consisting of proprietary logic and requiring extensive customization and support cost. We also present a prototype web service based implementation and summarize the limitations of such an approach. Index: Emergency response system, emergency planning and response, emergency management, decision support, web service
Transforming Web Data Into Knowledge - Implications for Management
Much of one’s online behavior, including browsing, shopping, posting, is recorded in databases on companies’ computers on a daily basis. Those data sets are referred to as web data. The patterns which are the indicators of one’s interests, habits, preferences or behaviors are stored within those data. More useful than an individual indicator is when a company records data on all its users and when it gains an insight into their habits and tendencies. Detecting and interpreting such patterns can help managers to make informed decisions and serve their customers better. Utilizing data mining with respect to web data is said to turn them into web knowledge. The research study conducted in this paper demonstrates how data mining methods and models can be applied to the web-based forms of data, on the one hand, and what the implications of uncovering patterns in web content, the structure and their usage are for management
Gesetz zur Bestimmung des Wortschatzumfangs von Texten. Das Heaps\u27sche Gesetz und die Bestimmung der Wortschatzgröße in kroatischen Texten
Postoje}a formula Heapsova zakona o veli~ini vokabulara
teksta nije univerzalna te zakon treba redefinirati, kako bi se
mogao rabiti za analizu korpusa na raznim jezicima. Analiza
korpusa tekstova na hrvatskom jeziku potvr|uje hipotezu da je
broj funkcionalnih pojavnica u tekstu konstantan te iznosi 21%
veli~ine teksta. Autor dokazuje da se postotak funkcionalnih
pojavnica u tekstu mo`e uzimati kao vrijednost za parametar K
te da je parametar K konstantna vrijednost za svaki jezi~ni
korpus. Empirijska istra`ivanja potvr|uju autorovu tezu da se
broj funkcionalnih pojavnica u tekstu mo`e izra~unati po
formuli F = nK/100, a da za veli~inu najfrekventnije pojavnice
(MF) vrijedi formula MF = n (K/100)2. Vrijednost drugoga
parametra Heapsova zakona tako|er se mo`e precizno
odrediti i zato autor predla`e novi oblik zakona o veli~ini
vokabulara teksta. Istra`ivanja potvr|uju da je vrlo visoka
korelacija izme|u izra~unanih i stvarnih vrijednosti veli~ine
vokabulara, odnosno izme|u stvarnih i izra~unanih vrijednosti
jednokratnih rije~i u tekstu. Ovako interpretiran i definiran
zakon o veli~ini vokabulara teksta omogu}uje izra~un veli~ine
vokabulara teksta na svakom jeziku, kada se zna postotak
funkcionalnih rije~i koji je konstantan za taj jezik. No ova
interpretacija zakona omogu}uje, osim izra~una veli~ine
vokabulara teksta, i odre|ivanje broja funkcionalnih pojavnica
u tekstu, veli~ine najfrekventnije rije~i u tekstu te broja
jednokratnih pojavnica koje tvore vokabular teksta.The existing formula / Vr(n)=Knß / of Heaps\u27 Law regarding the
size of a text\u27s vocabulary is not universal, thus the law needs to
be redefined, in order to be used for analysis of a different
language corpus. The analysis of a corpus of texts in the Croatian
language confirms the hypothesis that the number of
functional items (F) in a text is constant and amounts to 21% of
the size of the text n (there are 26% of functional items in English
texts). The author proves that the percentage of functional items
in a text can be used as the value for the parameter K, and that
the parameter K presents a constant value for every language
corpus. Empirical research has confirmed the author\u27s thesis that
the number of functional items in a text can be calculated according
to the formula F=nK/100, and that for the value of the
most frequent item (MF) the formula MF=n(K/100)2 can be applied.
The value of the other parameter of Heaps\u27 Law can also
be accurately determined: ß=log K/100. The author therefore
suggests a new form of the text vocabulary size law: Vr(n)=(Kn)ß.
The number of words appearing only once (HL) in the text can be
calculated according to the formula: HL= ((Kn)/2)ß . Research
confirms that there is a very high correlation between the calculated
and real values of the vocabulary size, i.e. between the real
and calculated values of single words in the text. Interpreted and
defined in such a way, the law of the text vocabulary size enables
the calculation of the text\u27s vocabulary size in every language, if
the percentage of constant functional words for this language is
known. However, this interpretation of the law enables, apart
from determining the size of the text\u27s vocabulary, also the
calculation of the number of functional items in the text, the size
of the most frequent word in the text, and the number of single
items comprising the text\u27s vocabularyDie bestehende Formel / Vr(n)=Knß / des Heaps\u27schen
Gesetzes zur Bestimmung des Wortschatzumfangs von Texten
hat keine universale Gültigkeit, sodass das Gesetz, soll es zur
Textkorpusanalyse in verschiedenen Sprachen angewandt
werden, redefiniert werden muss. Die Analyse von
Textkorpora in kroatischer Sprache bestätigt die Hypothese,
dass die Zahl funktionaler Wörter (F) in einem Text konstant
ist und 21% der Größe eines Textes n ausmacht (in
englischen Texten beträgt die Zahl funktionaler Wörter 26%).
Der Verfasser weist nach, dass der in einem Text vertretene
Prozentsatz funktionaler Wörter als Wertangabe für den
Parameter K benutzt werden kann und dass der Parameter K
einen gleichbleibenden Wert für jedes sprachliche Korpus
darstellt. Empirische Forschungen bestätigen die These des
Verfassers, dass die Zahl funktionaler Wörter in einem Text
mit der Formel F = nK/100 errechnet werden kann, dass
wiederum für die Größe der häufigsten Wörter (MF) die
Formel MF = n(K/100)2 gilt. Der zweite Parameter des
Heaps\u27schen Gesetzes kann ebenfalls genau bestimmt
werden: ß = log K/100. Der Verfasser schlägt daher vor, das
Heaps\u27sche Gesetz in neuer Form zu bestimmen:
Vr(n) = (Kn)ß. Die Zahl der nur einmal im Text
vorkommenden Wörter (HL) kann anhand der folgenden
Formel errechnet werden: HL = ((Kn)/2)ß. Forschungen haben
bestätigt, dass die errechneten und die wirklichen Werte des
Vokabularumfanges, bzw. dass die wirklichen und die
errechneten Werte von einmalig vorkommenden Wörtern in
einem Text in hohem Maße miteinander korrelieren. Ein
solchermaßen interpretiertes und definiertes Gesetz zur
Bestimmung des Wortschatzumfangs ermöglicht uns, den
Wortschatzumfang eines Textes in jeglicher Sprache
auszurechnen, hat man erst einmal den Prozentsatz
funktionaler Wörter, der für die betreffende Sprache
gleichbleibend ist, erstellt. Des Weiteren ermöglicht diese
Interpretation des Heaps\u27schen Gesetzes, die Zahl der
funktionalen Wörter, den Umfang der am häufigsten
vertretenen Wörter sowie die Zahl der einmalig
vorkommenden Wörter in einem Text zu bestimmen