11,929 research outputs found

    Exploratory Analysis of Highly Heterogeneous Document Collections

    Full text link
    We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

    Inferential mistakes in population proxies: A response to Torfing's "Neolithic population and summed probability distribution of 14Cdates"

    Get PDF
    In his paper "Neolithic population and summed probability distribution of 14C-dates" Torfing opposes the widely held principle originally proposed by Rick (1987) that variation through time in the amount of archaeological material discovered in a region will reflect variation in the size of that local human population. His argument illustrates a persistent divide in archaeology between analytical and descriptive approaches when using proxies for past population size. We critically evaluate the numerous inferential mistakes he makes, showing that his conclusion is unjustified

    Scaling and Universality in the Counterion-Condensation Transition at Charged Cylinders

    Full text link
    We address the critical and universal aspects of counterion-condensation transition at a single charged cylinder in both two and three spatial dimensions using numerical and analytical methods. By introducing a novel Monte-Carlo sampling method in logarithmic radial scale, we are able to numerically simulate the critical limit of infinite system size (corresponding to infinite-dilution limit) within tractable equilibration times. The critical exponents are determined for the inverse moments of the counterionic density profile (which play the role of the order parameters and represent the inverse localization length of counterions) both within mean-field theory and within Monte-Carlo simulations. In three dimensions (3D), correlation effects (neglected within mean-field theory) lead to an excessive accumulation of counterions near the charged cylinder below the critical temperature (condensation phase), while surprisingly, the critical region exhibits universal critical exponents in accord with the mean-field theory. In two dimensions (2D), we demonstrate, using both numerical and analytical approaches, that the mean-field theory becomes exact at all temperatures (Manning parameters), when number of counterions tends to infinity. For finite particle number, however, the 2D problem displays a series of peculiar singular points (with diverging heat capacity), which reflect successive de-localization events of individual counterions from the central cylinder. In both 2D and 3D, the heat capacity shows a universal jump at the critical point, and the energy develops a pronounced peak. The asymptotic behavior of the energy peak location is used to locate the critical temperature, which is also found to be universal and in accordance with the mean-field prediction.Comment: 31 pages, 16 figure

    Sofic-Dyck shifts

    Full text link
    We define the class of sofic-Dyck shifts which extends the class of Markov-Dyck shifts introduced by Inoue, Krieger and Matsumoto. Sofic-Dyck shifts are shifts of sequences whose finite factors form unambiguous context-free languages. We show that they correspond exactly to the class of shifts of sequences whose sets of factors are visibly pushdown languages. We give an expression of the zeta function of a sofic-Dyck shift

    Genome-Wide Association with Diabetes-Related Traits in the Framingham Heart Study

    Get PDF
    BACKGROUND: Susceptibility to type 2 diabetes may be conferred by genetic variants having modest effects on risk. Genome-wide fixed marker arrays offer a novel approach to detect these variants. METHODS: We used the Affymetrix 100K SNP array in 1,087 Framingham Offspring Study family members to examine genetic associations with three diabetes-related quantitative glucose traits (fasting plasma glucose (FPG), hemoglobin A1c, 28-yr time-averaged FPG (tFPG)), three insulin traits (fasting insulin, HOMA-insulin resistance, and 0–120 min insulin sensitivity index); and with risk for diabetes. We used additive generalized estimating equations (GEE) and family-based association test (FBAT) models to test associations of SNP genotypes with sex-age-age2-adjusted residual trait values, and Cox survival models to test incident diabetes. RESULTS: We found 415 SNPs associated (at p 1%) 100K SNPs in LD (r2 > 0.05) with ABCC8 A1369S (rs757110), KCNJ11 E23K (rs5219), or SNPs in CAPN10 or HNFa. PPARG P12A (rs1801282) was not significantly associated with diabetes or related traits. CONCLUSION: Framingham 100K SNP data is a resource for association tests of known and novel genes with diabetes and related traits posted at. Framingham 100K data replicate the TCF7L2 association with diabetes.National Heart, Lung, and Blood Institute's Framingham Heart Study (N01-HC-25195); National Institutes of Health National Center for Research Resources Shared Instrumentation grant (1S10RR163736-01A1); National Center for Research Resources General Clinical Research Center (M01-RR-01066); American Diabetes Association Career Developement Award; GlaxoSmithKline; Merck; Lilly; National Institutes of Health Research Career Award (K23 DK659678-03

    Why is the condensed phase of DNA preferred at higher temperature? DNA compaction in the presence of a multivalent cation

    Full text link
    Upon the addition of multivalent cations, a giant DNA chain exhibits a large discrete transition from an elongated coil into a folded compact state. We performed single-chain observation of long DNAs in the presence of a tetravalent cation (spermine), at various temperatures and monovalent salt concentrations. We confirmed that the compact state is preferred at higher temperatures and at lower monovalent salt concentrations. This result is interpreted in terms of an increase in the net translational entropy of small ions due to ionic exchange between higher and lower valence ions.Comment: 4pages,3figure

    24^{24}Mg(pp, α\alpha)21^{21}Na reaction study for spectroscopy of 21^{21}Na

    Full text link
    The 24^{24}Mg(pp, α\alpha)21^{21}Na reaction was measured at the Holifield Radioactive Ion Beam Facility at Oak Ridge National Laboratory in order to better constrain spins and parities of energy levels in 21^{21}Na for the astrophysically important 17^{17}F(α,p\alpha, p)20^{20}Ne reaction rate calculation. 31 MeV proton beams from the 25-MV tandem accelerator and enriched 24^{24}Mg solid targets were used. Recoiling 4^{4}He particles from the 24^{24}Mg(pp, α\alpha)21^{21}Na reaction were detected by a highly segmented silicon detector array which measured the yields of 4^{4}He particles over a range of angles simultaneously. A new level at 6661 ±\pm 5 keV was observed in the present work. The extracted angular distributions for the first four levels of 21^{21}Na and Distorted Wave Born Approximation (DWBA) calculations were compared to verify and extract angular momentum transfer.Comment: 11 pages, 6 figures, proceedings of the 18th International Conference on Accelerators and Beam Utilization (ICABU2014

    T2{}^2K2{}^2: The Twitter Top-K Keywords Benchmark

    Full text link
    Information retrieval from textual data focuses on the construction of vocabularies that contain weighted term tuples. Such vocabularies can then be exploited by various text analysis algorithms to extract new knowledge, e.g., top-k keywords, top-k documents, etc. Top-k keywords are casually used for various purposes, are often computed on-the-fly, and thus must be efficiently computed. To compare competing weighting schemes and database implementations, benchmarking is customary. To the best of our knowledge, no benchmark currently addresses these problems. Hence, in this paper, we present a top-k keywords benchmark, T2{}^2K2{}^2, which features a real tweet dataset and queries with various complexities and selectivities. T2{}^2K2{}^2 helps evaluate weighting schemes and database implementations in terms of computing performance. To illustrate T2{}^2K2{}^2's relevance and genericity, we successfully performed tests on the TF-IDF and Okapi BM25 weighting schemes, on one hand, and on different relational (Oracle, PostgreSQL) and document-oriented (MongoDB) database implementations, on the other hand
    • 

    corecore