193,797 research outputs found
Novel Metaknowledge-based Processing Technique for Multimedia Big Data clustering challenges
Past research has challenged us with the task of showing relational patterns
between text-based data and then clustering for predictive analysis using Golay
Code technique. We focus on a novel approach to extract metaknowledge in
multimedia datasets. Our collaboration has been an on-going task of studying
the relational patterns between datapoints based on metafeatures extracted from
metaknowledge in multimedia datasets. Those selected are significant to suit
the mining technique we applied, Golay Code algorithm. In this research paper
we summarize findings in optimization of metaknowledge representation for
23-bit representation of structured and unstructured multimedia data in order
toComment: IEEE Multimedia Big Data (BigMM 2015
Ensuring Cyber-Security in Smart Railway Surveillance with SHIELD
Modern railways feature increasingly complex embedded computing systems for surveillance, that are moving towards fully wireless smart-sensors. Those systems are aimed at monitoring system status from a physical-security viewpoint, in order to detect intrusions and other environmental anomalies. However, the same systems used for physical-security surveillance are vulnerable to cyber-security threats, since they feature distributed hardware and software architectures often interconnected by ‘open networks’, like wireless channels and the Internet. In this paper, we show how the integrated approach to Security, Privacy and Dependability (SPD) in embedded systems provided by the SHIELD framework (developed within the EU funded pSHIELD and nSHIELD research projects) can be applied to railway surveillance systems in order to measure and improve their SPD level. SHIELD implements a layered architecture (node, network, middleware and overlay) and orchestrates SPD mechanisms based on ontology models, appropriate metrics and composability. The results of prototypical application to a real-world demonstrator show the effectiveness of SHIELD and justify its practical applicability in industrial settings
Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications
Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene-and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Assessing Causation in Breast Implant Litigation: The Role of Science Panels
In two recent cases, federal judges appointed panels of scientific experts to help assess conflicting scientific testimony regarding causation of systemic injuries by silicone gel breast implants. This article will describe the circumstances that gave rise to the appointments, the procedures followed in making the appointments and reporting to the courts, and the reactions of the participants in the proceedings
Recommended from our members
Characterisation of FAD-family folds using a machine learning approach
Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in
biological processes. They are major organic cofactors and electron carriers
in both enzymatic activities and biochemical pathways. We have analysed
the relationships between sequence and structure of FAD-containing proteins
using a machine learning approach. Decision trees were generated using the
C4.5 algorithm as a means of automatically generating rules from biological
databases (TOPS, CATH and PDB). These rules were then used as
background knowledge for an ILP system to characterise the four different
classes of FAD-family folds classified in Dym and Eisenberg (2001). These
FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR),
p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily
was characterised by a set of rules. The “knowledge patterns”
generated from this approach are a set of rules containing conserved sequence
motifs, secondary structure sequence elements and folding information.
Every rule was then verified using statistical evaluation on the measured
significance of each rule. We show that this machine learning approach is
capable of learning and discovering interesting patterns from large biological
databases and can generate “knowledge patterns” that characterise the FADcontaining
proteins, and at the same time classify these proteins into four
different families
Knowledge Discovery in Online Repositories: A Text Mining Approach
Before the advent of the Internet, the newspapers were the prominent instrument of
mobilization for independence and political struggles. Since independence in Nigeria, the
political class has adopted newspapers as a medium of Political Competition and
Communication. Consequently, most political information exists in unstructured form and
hence the need to tap into it using text mining algorithm.
This paper implements a text mining algorithm on some unstructured data format in some newspapers. The algorithm involves the following natural language processing techniques: tokenization, text filtering and refinement. As a follow-up to the natural language techniques, association rule mining technique of data mining is used to extract knowledge using the Modified Generating Association Rules based on Weighting scheme (GARW).
The main contributions of the technique are that it integrates information retrieval scheme (Term Frequency Inverse Document Frequency) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) with Data Mining technique for association rules discovery. The program is applied to Pre-Election information gotten from the website of the Nigerian Guardian newspaper. The extracted association rules contained important features and described the informative news included in the documents collection when related to the concluded 2007 presidential election. The system presented useful information that could help sanitize the polity as well as protect the nascent democracy
The Genomic HyperBrowser: inferential genomics at the sequence level
The immense increase in the generation of genomic scale data poses an unmet
analytical challenge, due to a lack of established methodology with the
required flexibility and power. We propose a first principled approach to
statistical analysis of sequence-level genomic information. We provide a
growing collection of generic biological investigations that query pairwise
relations between tracks, represented as mathematical objects, along the
genome. The Genomic HyperBrowser implements the approach and is available at
http://hyperbrowser.uio.no
- …