34,706 research outputs found
Building an Archive with Saada
Saada transforms a set of heterogeneous FITS files or VOTables of various
categories (images, tables, spectra ...) in a database without writing code.
Databases created with Saada come with a rich Web interface and an Application
Programming Interface (API). They support the four most common VO services.
Such databases can mix various categories of data in multiple collections. They
allow a direct access to the original data while providing a homogenous view
thanks to an internal data model compatible with the characterization axis
defined by the VO. The data collections can be bound to each other with
persistent links making relevant browsing paths and allowing data-mining
oriented queries.Comment: 18 pages, 5 figures Special VO issu
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Social media is often viewed as a sensor into various societal events such as
disease outbreaks, protests, and elections. We describe the use of social media
as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
approach detects a broad range of cyber-attacks (e.g., distributed denial of
service (DDOS) attacks, data breaches, and account hijacking) in an
unsupervised manner using just a limited fixed set of seed event triggers. A
new query expansion strategy based on convolutional kernels and dependency
parses helps model reporting structure and aids in identifying key event
characteristics. Through a large-scale analysis over Twitter, we demonstrate
that our approach consistently identifies and encodes events, outperforming
existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201
Knowledge Rich Natural Language Queries over Structured Biological Databases
Increasingly, keyword, natural language and NoSQL queries are being used for
information retrieval from traditional as well as non-traditional databases
such as web, document, image, GIS, legal, and health databases. While their
popularity are undeniable for obvious reasons, their engineering is far from
simple. In most part, semantics and intent preserving mapping of a well
understood natural language query expressed over a structured database schema
to a structured query language is still a difficult task, and research to tame
the complexity is intense. In this paper, we propose a multi-level
knowledge-based middleware to facilitate such mappings that separate the
conceptual level from the physical level. We augment these multi-level
abstractions with a concept reasoner and a query strategy engine to dynamically
link arbitrary natural language querying to well defined structured queries. We
demonstrate the feasibility of our approach by presenting a Datalog based
prototype system, called BioSmart, that can compute responses to arbitrary
natural language queries over arbitrary databases once a syntactic
classification of the natural language query is made
Challenges in Bridging Social Semantics and Formal Semantics on the Web
This paper describes several results of Wimmics, a research lab which names
stands for: web-instrumented man-machine interactions, communities, and
semantics. The approaches introduced here rely on graph-oriented knowledge
representation, reasoning and operationalization to model and support actors,
actions and interactions in web-based epistemic communities. The re-search
results are applied to support and foster interactions in online communities
and manage their resources
You can't always sketch what you want: Understanding Sensemaking in Visual Query Systems
Visual query systems (VQSs) empower users to interactively search for line
charts with desired visual patterns, typically specified using intuitive
sketch-based interfaces. Despite decades of past work on VQSs, these efforts
have not translated to adoption in practice, possibly because VQSs are largely
evaluated in unrealistic lab-based settings. To remedy this gap in adoption, we
collaborated with experts from three diverse domains---astronomy, genetics, and
material science---via a year-long user-centered design process to develop a
VQS that supports their workflow and analytical needs, and evaluate how VQSs
can be used in practice. Our study results reveal that ad-hoc sketch-only
querying is not as commonly used as prior work suggests, since analysts are
often unable to precisely express their patterns of interest. In addition, we
characterize three essential sensemaking processes supported by our enhanced
VQS. We discover that participants employ all three processes, but in different
proportions, depending on the analytical needs in each domain. Our findings
suggest that all three sensemaking processes must be integrated in order to
make future VQSs useful for a wide range of analytical inquiries.Comment: Accepted for presentation at IEEE VAST 2019, to be held October 20-25
in Vancouver, Canada. Paper will also be published in a special issue of IEEE
Transactions on Visualization and Computer Graphics (TVCG) IEEE VIS
(InfoVis/VAST/SciVis) 2019 ACM 2012 CCS - Human-centered computing,
Visualization, Visualization design and evaluation method
Metagenomic sequencing unravels gene fragments with phylogenetic signatures of O2-tolerant NiFe membrane-bound hydrogenases in lacustrine sediment
Many promising hydrogen technologies utilising hydrogenase enzymes have been slowed by the fact that most hydrogenases are extremely sensitive to O2. Within the group 1 membrane-bound NiFe hydrogenase, naturally occurring tolerant enzymes do exist, and O2 tolerance has been largely attributed to changes in iron–sulphur clusters coordinated by different numbers of cysteine residues in the enzyme’s small subunit. Indeed, previous work has provided a robust phylogenetic signature of O2 tolerance [1], which when combined with new sequencing technologies makes bio prospecting in nature a far more viable endeavour. However, making sense of such a vast diversity is still challenging and could be simplified if known species with O2-tolerant enzymes were annotated with information on metabolism and natural environments. Here, we utilised a bioinformatics approach to compare O2-tolerant and sensitive membrane-bound NiFe hydrogenases from 177 bacterial species with fully sequenced genomes for differences in their taxonomy, O2 requirements, and natural environment. Following this, we interrogated a metagenome from lacustrine surface sediment for novel hydrogenases via high-throughput shotgun DNA sequencing using the Illumina™ MiSeq platform. We found 44 new NiFe group 1 membrane-bound hydrogenase sequence fragments, five of which segregated with the tolerant group on the phylogenetic tree of the enzyme’s small subunit, and four with the large subunit, indicating de novo O2-tolerant protein sequences that could help engineer more efficient hydrogenases
- …