11,929 research outputs found
Exploratory Analysis of Highly Heterogeneous Document Collections
We present an effective multifaceted system for exploratory analysis of
highly heterogeneous document collections. Our system is based on intelligently
tagging individual documents in a purely automated fashion and exploiting these
tags in a powerful faceted browsing framework. Tagging strategies employed
include both unsupervised and supervised approaches based on machine learning
and natural language processing. As one of our key tagging strategies, we
introduce the KERA algorithm (Keyword Extraction for Reports and Articles).
KERA extracts topic-representative terms from individual documents in a purely
unsupervised fashion and is revealed to be significantly more effective than
state-of-the-art methods. Finally, we evaluate our system in its ability to
help users locate documents pertaining to military critical technologies buried
deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery
and Data Minin
Inferential mistakes in population proxies: A response to Torfing's "Neolithic population and summed probability distribution of 14Cdates"
In his paper "Neolithic population and summed probability distribution of 14C-dates" Torfing opposes the widely held principle originally proposed by Rick (1987) that variation through time in the amount of archaeological material discovered in a region will reflect variation in the size of that local human population. His argument illustrates a persistent divide in archaeology between analytical and descriptive approaches when using proxies for past population size. We critically evaluate the numerous inferential mistakes he makes, showing that his conclusion is unjustified
Scaling and Universality in the Counterion-Condensation Transition at Charged Cylinders
We address the critical and universal aspects of counterion-condensation
transition at a single charged cylinder in both two and three spatial
dimensions using numerical and analytical methods. By introducing a novel
Monte-Carlo sampling method in logarithmic radial scale, we are able to
numerically simulate the critical limit of infinite system size (corresponding
to infinite-dilution limit) within tractable equilibration times. The critical
exponents are determined for the inverse moments of the counterionic density
profile (which play the role of the order parameters and represent the inverse
localization length of counterions) both within mean-field theory and within
Monte-Carlo simulations. In three dimensions (3D), correlation effects
(neglected within mean-field theory) lead to an excessive accumulation of
counterions near the charged cylinder below the critical temperature
(condensation phase), while surprisingly, the critical region exhibits
universal critical exponents in accord with the mean-field theory. In two
dimensions (2D), we demonstrate, using both numerical and analytical
approaches, that the mean-field theory becomes exact at all temperatures
(Manning parameters), when number of counterions tends to infinity. For finite
particle number, however, the 2D problem displays a series of peculiar singular
points (with diverging heat capacity), which reflect successive de-localization
events of individual counterions from the central cylinder. In both 2D and 3D,
the heat capacity shows a universal jump at the critical point, and the energy
develops a pronounced peak. The asymptotic behavior of the energy peak location
is used to locate the critical temperature, which is also found to be universal
and in accordance with the mean-field prediction.Comment: 31 pages, 16 figure
Sofic-Dyck shifts
We define the class of sofic-Dyck shifts which extends the class of
Markov-Dyck shifts introduced by Inoue, Krieger and Matsumoto. Sofic-Dyck
shifts are shifts of sequences whose finite factors form unambiguous
context-free languages. We show that they correspond exactly to the class of
shifts of sequences whose sets of factors are visibly pushdown languages. We
give an expression of the zeta function of a sofic-Dyck shift
Genome-Wide Association with Diabetes-Related Traits in the Framingham Heart Study
BACKGROUND: Susceptibility to type 2 diabetes may be conferred by genetic variants having modest effects on risk. Genome-wide fixed marker arrays offer a novel approach to detect these variants. METHODS: We used the Affymetrix 100K SNP array in 1,087 Framingham Offspring Study family members to examine genetic associations with three diabetes-related quantitative glucose traits (fasting plasma glucose (FPG), hemoglobin A1c, 28-yr time-averaged FPG (tFPG)), three insulin traits (fasting insulin, HOMA-insulin resistance, and 0â120 min insulin sensitivity index); and with risk for diabetes. We used additive generalized estimating equations (GEE) and family-based association test (FBAT) models to test associations of SNP genotypes with sex-age-age2-adjusted residual trait values, and Cox survival models to test incident diabetes. RESULTS: We found 415 SNPs associated (at p 1%) 100K SNPs in LD (r2 > 0.05) with ABCC8 A1369S (rs757110), KCNJ11 E23K (rs5219), or SNPs in CAPN10 or HNFa. PPARG P12A (rs1801282) was not significantly associated with diabetes or related traits. CONCLUSION: Framingham 100K SNP data is a resource for association tests of known and novel genes with diabetes and related traits posted at. Framingham 100K data replicate the TCF7L2 association with diabetes.National Heart, Lung, and Blood Institute's Framingham Heart Study (N01-HC-25195); National Institutes of Health National Center for Research Resources Shared Instrumentation grant (1S10RR163736-01A1); National Center for Research Resources General Clinical Research Center (M01-RR-01066); American Diabetes Association Career Developement Award; GlaxoSmithKline; Merck; Lilly; National Institutes of Health Research Career Award (K23 DK659678-03
Why is the condensed phase of DNA preferred at higher temperature? DNA compaction in the presence of a multivalent cation
Upon the addition of multivalent cations, a giant DNA chain exhibits a large
discrete transition from an elongated coil into a folded compact state. We
performed single-chain observation of long DNAs in the presence of a
tetravalent cation (spermine), at various temperatures and monovalent salt
concentrations. We confirmed that the compact state is preferred at higher
temperatures and at lower monovalent salt concentrations. This result is
interpreted in terms of an increase in the net translational entropy of small
ions due to ionic exchange between higher and lower valence ions.Comment: 4pages,3figure
Mg(, )Na reaction study for spectroscopy of Na
The Mg(, )Na reaction was measured at the Holifield
Radioactive Ion Beam Facility at Oak Ridge National Laboratory in order to
better constrain spins and parities of energy levels in Na for the
astrophysically important F()Ne reaction rate
calculation. 31 MeV proton beams from the 25-MV tandem accelerator and enriched
Mg solid targets were used. Recoiling He particles from the
Mg(, )Na reaction were detected by a highly segmented
silicon detector array which measured the yields of He particles over a
range of angles simultaneously. A new level at 6661 5 keV was observed in
the present work. The extracted angular distributions for the first four levels
of Na and Distorted Wave Born Approximation (DWBA) calculations were
compared to verify and extract angular momentum transfer.Comment: 11 pages, 6 figures, proceedings of the 18th International Conference
on Accelerators and Beam Utilization (ICABU2014
TK: The Twitter Top-K Keywords Benchmark
Information retrieval from textual data focuses on the construction of
vocabularies that contain weighted term tuples. Such vocabularies can then be
exploited by various text analysis algorithms to extract new knowledge, e.g.,
top-k keywords, top-k documents, etc. Top-k keywords are casually used for
various purposes, are often computed on-the-fly, and thus must be efficiently
computed. To compare competing weighting schemes and database implementations,
benchmarking is customary. To the best of our knowledge, no benchmark currently
addresses these problems. Hence, in this paper, we present a top-k keywords
benchmark, TK, which features a real tweet dataset and queries with
various complexities and selectivities. TK helps evaluate weighting
schemes and database implementations in terms of computing performance. To
illustrate TK's relevance and genericity, we successfully performed
tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on
different relational (Oracle, PostgreSQL) and document-oriented (MongoDB)
database implementations, on the other hand
- âŠ