198 research outputs found

    Large-scale cluster-based retrieval experiments on Turkish texts

    Get PDF
    We present cluster-based retrieval (CBR) experiments on the largest available Turkish document collection. Our experiments evaluate retrieval effectiveness and efficiency on both an automatically generated clustering structure and a manual classification of documents. In particular, we compare CBR effectiveness with full-text search (FS) and evaluate several implementation alternatives for CBR. Our findings reveal that CBR yields comparable effectiveness figures with FS. Furthermore, by using a specifically tailored cluster-skipping inverted index we significantly improve in-memory query processing efficiency of CBR in comparison to other traditional CBR techniques and even FS

    Information retrieval on turkish texts

    Get PDF
    In this study, we investigate information retrieval (IR) on Turkish texts using a large-scale test collection that contains 408,305 documents and 72 ad hoc queries. We examine the effects of several stemming options and query-document matching functions on retrieval performance. We show that a simple word truncation approach, a word truncation approach that uses language-dependent corpus statistics, and an elaborate lemmatizer-based stemmer provide similar retrieval effectiveness in Turkish IR. We investigate the effects of a range of search conditions on the retrieval performance; these include scalability issues, query and document length effects, and the use of stop-word list in indexing. © 2007 Wiley Periodicals, Inc

    First large-scale information retrieval experiments on Turkish texts

    Get PDF
    We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions

    Bilkent news portal: A personalizaba system with new event detection and tracking capabilities

    Get PDF
    [No abstract available

    New event detection and topic tracking in turkish

    Get PDF
    Topic detection and tracking (TDT) applications aim to organize the temporally ordered stories of a news stream according to the events. Two major problems in TDT are new event detection (NED) and topic tracking (TT). These problems focus on finding the first stories of new events and identifying all subsequent stories on a certain topic defined by a small number of sample stories. In this work, we introduce the first large-scale TDT test collection for Turkish, and investigate the NED and TT problems in this language. We present our test-collection-construction approach, which is inspired by the TDT research initiative. We show that in TDT for Turkish with some similarity measures, a simple word truncation stemming method can compete with a lemmatizer-based stemming approach. Our findings show that contrary to our earlier observations on Turkish information retrieval, in NED word stopping has an impact on effectiveness. We demonstrate that the confidence scores of two different similarity measures can be combined in a straightforward manner for higher effectiveness. The influence of several similarity measures on effectiveness also is investigated. We show that it is possible to deploy TT applications in Turkish that can be used in operational settings. © 2010 ASIS&T

    Search for a W' boson decaying to a bottom quark and a top quark in pp collisions at sqrt(s) = 7 TeV

    Get PDF
    Results are presented from a search for a W' boson using a dataset corresponding to 5.0 inverse femtobarns of integrated luminosity collected during 2011 by the CMS experiment at the LHC in pp collisions at sqrt(s)=7 TeV. The W' boson is modeled as a heavy W boson, but different scenarios for the couplings to fermions are considered, involving both left-handed and right-handed chiral projections of the fermions, as well as an arbitrary mixture of the two. The search is performed in the decay channel W' to t b, leading to a final state signature with a single lepton (e, mu), missing transverse energy, and jets, at least one of which is tagged as a b-jet. A W' boson that couples to fermions with the same coupling constant as the W, but to the right-handed rather than left-handed chiral projections, is excluded for masses below 1.85 TeV at the 95% confidence level. For the first time using LHC data, constraints on the W' gauge coupling for a set of left- and right-handed coupling combinations have been placed. These results represent a significant improvement over previously published limits.Comment: Submitted to Physics Letters B. Replaced with version publishe

    Search for the standard model Higgs boson decaying into two photons in pp collisions at sqrt(s)=7 TeV

    Get PDF
    A search for a Higgs boson decaying into two photons is described. The analysis is performed using a dataset recorded by the CMS experiment at the LHC from pp collisions at a centre-of-mass energy of 7 TeV, which corresponds to an integrated luminosity of 4.8 inverse femtobarns. Limits are set on the cross section of the standard model Higgs boson decaying to two photons. The expected exclusion limit at 95% confidence level is between 1.4 and 2.4 times the standard model cross section in the mass range between 110 and 150 GeV. The analysis of the data excludes, at 95% confidence level, the standard model Higgs boson decaying into two photons in the mass range 128 to 132 GeV. The largest excess of events above the expected standard model background is observed for a Higgs boson mass hypothesis of 124 GeV with a local significance of 3.1 sigma. The global significance of observing an excess with a local significance greater than 3.1 sigma anywhere in the search range 110-150 GeV is estimated to be 1.8 sigma. More data are required to ascertain the origin of this excess.Comment: Submitted to Physics Letters

    Measurement of the Lambda(b) cross section and the anti-Lambda(b) to Lambda(b) ratio with Lambda(b) to J/Psi Lambda decays in pp collisions at sqrt(s) = 7 TeV

    Get PDF
    The Lambda(b) differential production cross section and the cross section ratio anti-Lambda(b)/Lambda(b) are measured as functions of transverse momentum pt(Lambda(b)) and rapidity abs(y(Lambda(b))) in pp collisions at sqrt(s) = 7 TeV using data collected by the CMS experiment at the LHC. The measurements are based on Lambda(b) decays reconstructed in the exclusive final state J/Psi Lambda, with the subsequent decays J/Psi to an opposite-sign muon pair and Lambda to proton pion, using a data sample corresponding to an integrated luminosity of 1.9 inverse femtobarns. The product of the cross section times the branching ratio for Lambda(b) to J/Psi Lambda versus pt(Lambda(b)) falls faster than that of b mesons. The measured value of the cross section times the branching ratio for pt(Lambda(b)) > 10 GeV and abs(y(Lambda(b))) < 2.0 is 1.06 +/- 0.06 +/- 0.12 nb, and the integrated cross section ratio for anti-Lambda(b)/Lambda(b) is 1.02 +/- 0.07 +/- 0.09, where the uncertainties are statistical and systematic, respectively.Comment: Submitted to Physics Letters

    Search for new physics in events with opposite-sign leptons, jets, and missing transverse energy in pp collisions at sqrt(s) = 7 TeV

    Get PDF
    A search is presented for physics beyond the standard model (BSM) in final states with a pair of opposite-sign isolated leptons accompanied by jets and missing transverse energy. The search uses LHC data recorded at a center-of-mass energy sqrt(s) = 7 TeV with the CMS detector, corresponding to an integrated luminosity of approximately 5 inverse femtobarns. Two complementary search strategies are employed. The first probes models with a specific dilepton production mechanism that leads to a characteristic kinematic edge in the dilepton mass distribution. The second strategy probes models of dilepton production with heavy, colored objects that decay to final states including invisible particles, leading to very large hadronic activity and missing transverse energy. No evidence for an event yield in excess of the standard model expectations is found. Upper limits on the BSM contributions to the signal regions are deduced from the results, which are used to exclude a region of the parameter space of the constrained minimal supersymmetric extension of the standard model. Additional information related to detector efficiencies and response is provided to allow testing specific models of BSM physics not considered in this paper.Comment: Replaced with published version. Added journal reference and DO

    Measurement of isolated photon production in pp and PbPb collisions at sqrt(sNN) = 2.76 TeV

    Get PDF
    Isolated photon production is measured in proton-proton and lead-lead collisions at nucleon-nucleon centre-of-mass energies of 2.76 TeV in the pseudorapidity range |eta|<1.44 and transverse energies ET between 20 and 80 GeV with the CMS detector at the LHC. The measured ET spectra are found to be in good agreement with next-to-leading-order perturbative QCD predictions. The ratio of PbPb to pp isolated photon ET-differential yields, scaled by the number of incoherent nucleon-nucleon collisions, is consistent with unity for all PbPb reaction centralities.Comment: Submitted to Physics Letters
    corecore