35,594 research outputs found
Applications of sparse approximation in communications
Sparse approximation problems abound in many scientific, mathematical, and engineering applications. These problems are defined by two competing notions: we approximate a signal vector as a linear combination of elementary atoms and we require that the approximation be both as accurate and as concise as possible. We introduce two natural and direct applications of these problems and algorithmic solutions in communications. We do so by constructing enhanced codebooks from base codebooks. We show that we can decode these enhanced codebooks in the presence of Gaussian noise. For MIMO wireless communication channels, we construct simultaneous sparse approximation problems and demonstrate that our algorithms can both decode the transmitted signals and estimate the channel parameters
An empirical comparison of supervised machine learning techniques in bioinformatics
Research in bioinformatics is driven by the experimental data.
Current biological databases are populated by vast amounts of
experimental data. Machine learning has been widely applied to
bioinformatics and has gained a lot of success in this research
area. At present, with various learning algorithms available in the
literature, researchers are facing difficulties in choosing the best
method that can apply to their data. We performed an empirical
study on 7 individual learning systems and 9 different combined
methods on 4 different biological data sets, and provide some
suggested issues to be considered when answering the following
questions: (i) How does one choose which algorithm is best
suitable for their data set? (ii) Are combined methods better than
a single approach? (iii) How does one compare the effectiveness
of a particular algorithm to the others
Recommended from our members
Characterisation of FAD-family folds using a machine learning approach
Flavin adenine dinucleotide (FAD) and its derivatives play a crucial role in
biological processes. They are major organic cofactors and electron carriers
in both enzymatic activities and biochemical pathways. We have analysed
the relationships between sequence and structure of FAD-containing proteins
using a machine learning approach. Decision trees were generated using the
C4.5 algorithm as a means of automatically generating rules from biological
databases (TOPS, CATH and PDB). These rules were then used as
background knowledge for an ILP system to characterise the four different
classes of FAD-family folds classified in Dym and Eisenberg (2001). These
FAD-family folds are: glutathione reductase (GR), ferredoxin reductase (FR),
p-cresol methylhydroxylase (PCMH) and pyruvate oxidase (PO). Each FADfamily
was characterised by a set of rules. The “knowledge patterns”
generated from this approach are a set of rules containing conserved sequence
motifs, secondary structure sequence elements and folding information.
Every rule was then verified using statistical evaluation on the measured
significance of each rule. We show that this machine learning approach is
capable of learning and discovering interesting patterns from large biological
databases and can generate “knowledge patterns” that characterise the FADcontaining
proteins, and at the same time classify these proteins into four
different families
Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit
This paper demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with nonzero entries in dimension given random linear measurements of that signal. This is a massive improvement over previous results, which require measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems
Recommended from our members
Integrative machine learning approach for multi-class SCOP protein fold classification
Classification and prediction of protein structure has been a central research theme in structural bioinformatics. Due to the imbalanced distribution of proteins over multi SCOP classification, most discriminative machine learning suffers the well-known ‘False Positives ’ problem when learning over these types of problems. We have devised eKISS, an ensemble machine learning specifically designed to increase the coverage of positive examples when learning under multiclass imbalanced data sets. We have applied eKISS to classify 25 SCOP folds and show that our learning system improved over classical learning methods
Simultaneous sparse approximation via greedy pursuit
A simple sparse approximation problem requests an approximation of a given input signal as a linear combination of T elementary signals drawn from a large, linearly dependent collection. An important generalization is simultaneous sparse approximation. Now one must approximate several input signals at once using different linear combinations of the same T elementary signals. This formulation appears, for example, when analyzing multiple observations of a sparse signal that have been contaminated with noise. A new approach to this problem is presented here: a greedy pursuit algorithm called simultaneous orthogonal matching pursuit. The paper proves that the algorithm calculates simultaneous approximations whose error is within a constant factor of the optimal simultaneous approximation error. This result requires that the collection of elementary signals be weakly correlated, a property that is also known as incoherence. Numerical experiments demonstrate that the algorithm often succeeds, even when the inputs do not meet the hypotheses of the proof
Relations between extensional tectonics and magmatism within the Southern Oklahoma aulacogen
Variations in the geometry, distribution and thickness of Cambrian igneous and sedimentary units within southwest Oklahoma are related to a late Proterozoic - early Paleozoic rifting event which formed the Southern Oklahoma aulacogen. These rock units are exposed in the Wichita Mountains, southwest Olkahoma, located on the northern margin of a Proterozoic basin, identified in the subsurface by COCORP reflection data. Overprinting of the Cambrian extensional event by Pennyslvanian tectonism obsured the influence of pre-existing basement structures and contrasting basement lithologies upon the initial development of the aulacogen
Sparse Approximation Via Iterative Thresholding
The well-known shrinkage technique is still relevant for contemporary signal processing problems over redundant dictionaries. We present theoretical and empirical analyses for two iterative algorithms for sparse approximation that use shrinkage. The GENERAL IT algorithm amounts to a Landweber iteration with nonlinear shrinkage at each iteration step. The BLOCK IT algorithm arises in morphological components analysis. A sufficient condition for which General IT exactly recovers a sparse signal is presented, in which the cumulative coherence function naturally arises. This analysis extends previous results concerning the Orthogonal Matching Pursuit (OMP) and Basis Pursuit (BP) algorithms to IT algorithms
Cross-Sender Bit-Mixing Coding
Scheduling to avoid packet collisions is a long-standing challenge in
networking, and has become even trickier in wireless networks with multiple
senders and multiple receivers. In fact, researchers have proved that even {\em
perfect} scheduling can only achieve . Here
is the number of nodes in the network, and is the {\em medium
utilization rate}. Ideally, one would hope to achieve ,
while avoiding all the complexities in scheduling. To this end, this paper
proposes {\em cross-sender bit-mixing coding} ({\em BMC}), which does not rely
on scheduling. Instead, users transmit simultaneously on suitably-chosen slots,
and the amount of overlap in different user's slots is controlled via coding.
We prove that in all possible network topologies, using BMC enables us to
achieve . We also prove that the space and time
complexities of BMC encoding/decoding are all low-order polynomials.Comment: Published in the International Conference on Information Processing
in Sensor Networks (IPSN), 201
List decoding of noisy Reed-Muller-like codes
First- and second-order Reed-Muller (RM(1) and RM(2), respectively) codes are
two fundamental error-correcting codes which arise in communication as well as
in probabilistically-checkable proofs and learning. In this paper, we take the
first steps toward extending the quick randomized decoding tools of RM(1) into
the realm of quadratic binary and, equivalently, Z_4 codes. Our main
algorithmic result is an extension of the RM(1) techniques from Goldreich-Levin
and Kushilevitz-Mansour algorithms to the Hankel code, a code between RM(1) and
RM(2). That is, given signal s of length N, we find a list that is a superset
of all Hankel codewords phi with dot product to s at least (1/sqrt(k)) times
the norm of s, in time polynomial in k and log(N). We also give a new and
simple formulation of a known Kerdock code as a subcode of the Hankel code. As
a corollary, we can list-decode Kerdock, too. Also, we get a quick algorithm
for finding a sparse Kerdock approximation. That is, for k small compared with
1/sqrt{N} and for epsilon > 0, we find, in time polynomial in (k
log(N)/epsilon), a k-Kerdock-term approximation s~ to s with Euclidean error at
most the factor (1+epsilon+O(k^2/sqrt{N})) times that of the best such
approximation
- …