437 research outputs found
Comparative Analysis of Five XML Query Languages
XML is becoming the most relevant new standard for data representation and
exchange on the WWW. Novel languages for extracting and restructuring the XML
content have been proposed, some in the tradition of database query languages
(i.e. SQL, OQL), others more closely inspired by XML. No standard for XML query
language has yet been decided, but the discussion is ongoing within the World
Wide Web Consortium and within many academic institutions and Internet-related
major companies. We present a comparison of five, representative query
languages for XML, highlighting their common features and differences.Comment: TeX v3.1415, 17 pages, 6 figures, to be published in ACM Sigmod
Record, March 200
False News On Social Media: A Data-Driven Survey
In the past few years, the research community has dedicated growing interest
to the issue of false news circulating on social networks. The widespread
attention on detecting and characterizing false news has been motivated by
considerable backlashes of this threat against the real world. As a matter of
fact, social media platforms exhibit peculiar characteristics, with respect to
traditional news outlets, which have been particularly favorable to the
proliferation of deceptive information. They also present unique challenges for
all kind of potential interventions on the subject. As this issue becomes of
global concern, it is also gaining more attention in academia. The aim of this
survey is to offer a comprehensive study on the recent advances in terms of
detection, characterization and mitigation of false news that propagate on
social media, as well as the challenges and the open questions that await
future research on the field. We use a data-driven approach, focusing on a
classification of the features that are used in each study to characterize
false information and on the datasets used for instructing classification
methods. At the end of the survey, we highlight emerging approaches that look
most promising for addressing false news
Topology comparison of Twitter diffusion networks effectively reveals misleading information
In recent years, malicious information had an explosive growth in social
media, with serious social and political backlashes. Recent important studies,
featuring large-scale analyses, have produced deeper knowledge about this
phenomenon, showing that misleading information spreads faster, deeper and more
broadly than factual information on social media, where echo chambers,
algorithmic and human biases play an important role in diffusion networks.
Following these directions, we explore the possibility of classifying news
articles circulating on social media based exclusively on a topological
analysis of their diffusion networks. To this aim we collected a large dataset
of diffusion networks on Twitter pertaining to news articles published on two
distinct classes of sources, namely outlets that convey mainstream, reliable
and objective information and those that fabricate and disseminate various
kinds of misleading articles, including false news intended to harm, satire
intended to make people laugh, click-bait news that may be entirely factual or
rumors that are unproven. We carried out an extensive comparison of these
networks using several alignment-free approaches including basic network
properties, centrality measures distributions, and network distances. We
accordingly evaluated to what extent these techniques allow to discriminate
between the networks associated to the aforementioned news domains. Our results
highlight that the communities of users spreading mainstream news, compared to
those sharing misleading news, tend to shape diffusion networks with subtle yet
systematic differences which might be effectively employed to identify
misleading and harmful information.Comment: A revised new version is available on Scientific Report
Exploring the evolution of research topics during the COVID-19 pandemic
The COVID-19 pandemic has changed the research agendas of most scientific
communities, resulting in an overwhelming production of research articles in a
variety of domains, including medicine, virology, epidemiology, economy,
psychology, and so on. Several open-access corpora and literature hubs were
established; among them, the COVID-19 Open Research Dataset (CORD-19) has
systematically gathered scientific contributions for 2.5 years, by collecting
and indexing over one million articles. Here, we present the CORD-19 Topic
Visualizer (CORToViz), a method and associated visualization tool for
inspecting the CORD-19 textual corpus of scientific abstracts. Our method is
based upon a careful selection of up-to-date technologies (including large
language models), resulting in an architecture for clustering articles along
orthogonal dimensions and extraction techniques for temporal topic mining.
Topic inspection is supported by an interactive dashboard, providing fast,
one-click visualization of topic contents as word clouds and topic trends as
time series, equipped with easy-to-drive statistical testing for analyzing the
significance of topic emergence along arbitrarily selected time windows. The
processes of data preparation and results visualization are completely general
and virtually applicable to any corpus of textual documents - thus suited for
effective adaptation to other contexts.Comment: 16 pages, 6 figures, 1 tabl
Searching COVID-19 clinical research using graphical abstracts
Objective. Graphical abstracts are small graphs of concepts that visually
summarize the main findings of scientific articles. While graphical abstracts
are customarily used in scientific publications to anticipate and summarize
their main results, we propose them as a means for expressing graph searches
over existing literature. Materials and methods. We consider the COVID-19 Open
Research Dataset (CORD-19), a corpus of more than one million abstracts; each
of them is described as a graph of co-occurring ontological terms, selected
from the Unified Medical Language System (UMLS) and the Ontology of Coronavirus
Infectious Disease (CIDO). Graphical abstracts are also expressed as graphs of
ontological terms, possibly augmented by utility terms describing their
interactions (e.g., "associated with", "increases", "induces"). We build a
co-occurrence network of concepts mentioned in the corpus; we then identify the
best matches of graphical abstracts on the network. We exploit graph database
technology and shortest-path queries. Results. We build a large co-occurrence
network, consisting of 128,249 entities and 47,198,965 relationships. A
well-designed interface allows users to explore the network by formulating or
adapting queries in the form of an abstract; it produces a bibliography of
publications, globally ranked; each publication is further associated with the
specific parts of the abstract that it explains, thereby allowing the user to
understand each aspect of the matching. Discussion and Conclusion. Our approach
supports the process of scientific hypothesis formulation and evidence search;
it can be reapplied to any scientific domain, although our mastering of UMLS
makes it most suited to clinical domains.Comment: 12 pages, 6 figure
Explorative search of distributed bio-data to answer complex biomedical questions
Background
The huge amount of biomedical-molecular data increasingly produced is providing scientists with potentially valuable information. Yet, such data quantity makes difficult to find and extract those data that are most reliable and most related to the biomedical questions to be answered, which are increasingly complex and often involve many different biomedical-molecular aspects. Such questions can be addressed only by comprehensively searching and exploring different types of data, which frequently are ordered and provided by different data sources. Search Computing has been proposed for the management and integration of ranked results from heterogeneous search services. Here, we present its novel application to the explorative search of distributed biomedical-molecular data and the integration of the search results to answer complex biomedical questions.
Results
A set of available bioinformatics search services has been modelled and registered in the Search Computing framework, and a Bioinformatics Search Computing application (Bio-SeCo) using such services has been created and made publicly available at http://www.bioinformatics.deib.polimi.it/bio-seco/seco/. It offers an integrated environment which eases search, exploration and ranking-aware combination of heterogeneous data provided by the available registered services, and supplies global results that can support answering complex multi-topic biomedical questions.
Conclusions
By using Bio-SeCo, scientists can explore the very large and very heterogeneous biomedical-molecular data available. They can easily make different explorative search attempts, inspect obtained results, select the most appropriate, expand or refine them and move forward and backward in the construction of a global complex biomedical query on multiple distributed sources that could eventually find the most relevant results. Thus, it provides an extremely useful automated support for exploratory integrated bio search, which is fundamental for Life Science data driven knowledge discovery
- …