31,382 research outputs found
Protein sectors: statistical coupling analysis versus conservation
Statistical coupling analysis (SCA) is a method for analyzing multiple
sequence alignments that was used to identify groups of coevolving residues
termed "sectors". The method applies spectral analysis to a matrix obtained by
combining correlation information with sequence conservation. It has been
asserted that the protein sectors identified by SCA are functionally
significant, with different sectors controlling different biochemical
properties of the protein. Here we reconsider the available experimental data
and note that it involves almost exclusively proteins with a single sector. We
show that in this case sequence conservation is the dominating factor in SCA,
and can alone be used to make statistically equivalent functional predictions.
Therefore, we suggest shifting the experimental focus to proteins for which SCA
identifies several sectors. Correlations in protein alignments, which have been
shown to be informative in a number of independent studies, would then be less
dominated by sequence conservation.Comment: 36 pages, 17 figure
t-Exponential Memory Networks for Question-Answering Machines
Recent advances in deep learning have brought to the fore models that can
make multiple computational steps in the service of completing a task; these
are capable of describ- ing long-term dependencies in sequential data. Novel
recurrent attention models over possibly large external memory modules
constitute the core mechanisms that enable these capabilities. Our work
addresses learning subtler and more complex underlying temporal dynamics in
language modeling tasks that deal with sparse sequential data. To this end, we
improve upon these recent advances, by adopting concepts from the field of
Bayesian statistics, namely variational inference. Our proposed approach
consists in treating the network parameters as latent variables with a prior
distribution imposed over them. Our statistical assumptions go beyond the
standard practice of postulating Gaussian priors. Indeed, to allow for handling
outliers, which are prevalent in long observed sequences of multivariate data,
multivariate t-exponential distributions are imposed. On this basis, we proceed
to infer corresponding posteriors; these can be used for inference and
prediction at test time, in a way that accounts for the uncertainty in the
available sparse training data. Specifically, to allow for our approach to best
exploit the merits of the t-exponential family, our method considers a new
t-divergence measure, which generalizes the concept of the Kullback-Leibler
divergence. We perform an extensive experimental evaluation of our approach,
using challenging language modeling benchmarks, and illustrate its superiority
over existing state-of-the-art techniques
Replication and discovery of musculoskeletal QTLs in LG/J and SM/J advanced intercross lines
AR056280 awarded to DAB and AL. AIHC supported by IMS and Elphinstone Scholarship from the University of Aberdeen. GRV supported by Medical Research Scotland (Vac-929-2016).Peer reviewedPublisher PD
K2P A photometry pipeline for the K2 mission
With the loss of a second reaction wheel, resulting in the inability to point
continuously and stably at the same field of view, the NASA Kepler satellite
recently entered a new mode of observation known as the K2 mission. The data
from this redesigned mission present a specific challenge; the targets
systematically drift in position on a ~6 hour time scale, inducing a
significant instrumental signal in the photometric time series --- this greatly
impacts the ability to detect planetary signals and perform asteroseismic
analysis. Here we detail our version of a reduction pipeline for K2 target
pixel data, which automatically: defines masks for all targets in a given
frame; extracts the target's flux- and position time series; corrects the time
series based on the apparent movement on the CCD (either in 1D or 2D) combined
with the correction of instrumental and/or planetary signals via the KASOC
filter (Handberg & Lund 2014), thus rendering the time series ready for
asteroseismic analysis; computes power spectra for all targets, and identifies
potential contaminations between targets. From a test of our pipeline on a
sample of targets from the K2 campaign 0, the recovery of data for multiple
targets increases the amount of potential light curves by a factor .
Our pipeline could be applied to the upcoming TESS (Ricker et al. 2014) and
PLATO 2.0 (Rauer et al. 2013) missions.Comment: 14 pages, 20 figures, Accepted for publication in The Astrophysical
Journal (Apj
Multi-document Summarization Based on Sentence Clustering Improved Using Topic Words
Informasi dalam bentuk teks berita telah menjadi salah satu komoditas yang paling penting dalam era informasi ini. Ada banyak berita yang dihasilkan sehari-hari, tetapi berita-berita ini sering memberikan konten kontekstual yang sama dengan narasi berbeda. Oleh karena itu, diperlukan metode untuk mengumpulkan informasi ini ke dalam ringkasan sederhana. Di antara sejumlah subtugas yang terlibat dalam peringkasan multi-dokumen termasuk ekstraksi kalimat, deteksi topik, ekstraksi kalimat representatif, dan kalimat rep-resentatif. Dalam tulisan ini, kami mengusulkan metode baru untuk merepresentasikan kalimat ber-dasarkan kata kunci dari topic teks menggunakan Latent Dirichlet Allocation (LDA). Metode ini terdiri dari tiga langkah dasar. Pertama, kami mengelompokkan kalimat di set dokumen menggunakan kesamaan histogram pengelompokan (SHC). Selanjutnya, peringkat cluster menggunakan klaster penting. Terakhir, kalimat perwakilan yang dipilih oleh topik diidentifikasi pada LDA. Metode yang diusulkan diuji pada dataset DUC2004. Hasil penelitian menunjukkan rata-rata 0,3419 dan 0,0766 untuk ROUGE-1 dan ROUGE-2, masing-masing. Selain itu, dari pembaca prespective, metode kami diusulkan menyajikan pengaturan yang koheren dan baik dalam memesan kalimat representatif, sehingga dapat mempermudah pemahaman bacaan dan mengurangi waktu yang dibutuhkan untuk membaca ringkasan
Correspondence matching with modal clusters
The modal correspondence method of Shapiro and Brady aims to match point-sets by comparing the eigenvectors of a pairwise point proximity matrix. Although elegant by means of its matrix representation, the method is notoriously susceptible to differences in the relational structure of the point-sets under consideration. In this paper, we demonstrate how the method can be rendered robust to structural differences by adopting a hierarchical approach. To do this, we place the modal matching problem in a probabilistic setting in which the correspondences between pairwise clusters can be used to constrain the individual point correspondences. We demonstrate the utility of the method on a number of synthetic and real-world point-pattern matching problems
- …