514 research outputs found

    Constructing computer virus phylogenies

    Get PDF
    There has been much recent algorithmic work on the problem of reconstructing the evolutionary history of biological species. Computer virus specialists are interested in finding the evolutionary history of computer viruses - a virus is often written using code fragments from one or more other viruses, which are its immediate ancestors. A phylogeny for a collection of computer viruses is a directed acyclic graph whose nodes are the viruses and whose edges map ancestors to descendants and satisfy the property that each code fragment is "invented" only once. To provide a simple explanation for the data, we consider the problem of constructing such a phylogeny with a minimum number of edges. In general this optimization problem is NP-complete; some associated approximation problems are also hard, but others are easy. When tree solutions exist, they can be constructed and randomly sampled in polynomial time

    Approximating the partition function of the ferromagnetic Potts model

    Full text link
    We provide evidence that it is computationally difficult to approximate the partition function of the ferromagnetic q-state Potts model when q>2. Specifically we show that the partition function is hard for the complexity class #RHPi_1 under approximation-preserving reducibility. Thus, it is as hard to approximate the partition function as it is to find approximate solutions to a wide range of counting problems, including that of determining the number of independent sets in a bipartite graph. Our proof exploits the first order phase transition of the "random cluster" model, which is a probability distribution on graphs that is closely related to the q-state Potts model.Comment: Minor correction

    A comparison of walk-in counselling and the wait list model for delivering counselling services

    Get PDF
    Background: Walk-in counselling has been used to reduce wait times but there are few controlled studies to compare outcomes between walk-in and the traditional model of service delivery. Aims: To compare change in psychological distress by clients receiving services from two models of service delivery, a walk-in counselling model and a traditional counselling model involving a wait list Method: Mixed methods sequential explanatory design including quantitative comparison of groups with one pre-test and two follow ups, and qualitative analysis of interviews with a subsample. 524 participants 16 years and older were recruited from two Family Counselling Agencies; the General Health Questionnaire assessed change in psychological distress; prior use of other mental health and instrumental services was also reported. Results: Hierarchical linear modelling revealed clients of the walk-in model improved faster and were less distressed at the 4-week follow-up compared to the traditional service delivery model. At the 10-week follow-up, both groups had improved and were similar. Participants receiving instrumental services prior to baseline improved more slowly. Qualitative interviews confirmed participants valued the accessibility of the walk-in model. Conclusions: This study improves methodologically on previous studies of walk-in counselling, an approach to service delivery that is not conducive to randomized controlled trials

    Approximation via Correlation Decay when Strong Spatial Mixing Fails

    Get PDF
    Approximate counting via correlation decay is the core algorithmic technique used in the sharp delineation of the computational phase transition that arises in the approximation of the partition function of antiferromagnetic 2-spin models. Previous analyses of correlation-decay algorithms implicitly depended on the occurrence of strong spatial mixing. This, roughly, means that one uses worst-case analysis of the recursive procedure that creates the subinstances. In this paper, we develop a new analysis method that is more refined than the worst-case analysis. We take the shape of instances in the computation tree into consideration and we amortize against certain “bad” instances that are created as the recursion proceeds. This enables us to show correlation decay and to obtain a fully polynomial-time approximation scheme (FPTAS) even when strong spatial mixing fails. We apply our technique to the problem of approximately counting independent sets in hypergraphs with degree upper bound Δ\Delta and with a lower bound kk on the arity of hyperedges. Liu and Lin gave an FPTAS for k2k\geq2 and Δ5\Delta\leq5 (lack of strong spatial mixing was the obstacle preventing this algorithm from being generalized to Δ=6\Delta=6). Our technique gives a tight result for Δ=6\Delta=6, showing that there is an FPTAS for k3k\geq3 and Δ6\Delta\leq6. The best previously known approximation scheme for Δ=6\Delta=6 is the Markov-chain simulation based fully polynomial-time randomized approximation scheme (FPRAS) of Bordewich, Dyer, and Karpinski, which only works for k8k\geq8. Our technique also applies for larger values of kk, giving an FPTAS for kΔk\geq\Delta. This bound is not substantially stronger than existing randomized results in the literature. Nevertheless, it gives the first deterministic approximation scheme in this regime. Moreover, unlike existing results, it leads to an FPTAS for counting dominating sets in regular graphs with sufficiently large degree. We further demonstrate that in the hypergraph independent set model, approximating the partition function is NP-hard even within the uniqueness regime. Also, approximately counting dominating sets of bounded-degree graphs (without the regularity restriction) is NP-hard

    Accurate HLA type inference using a weighted similarity graph

    Get PDF
    Abstract Background The human leukocyte antigen system (HLA) contains many highly variable genes. HLA genes play an important role in the human immune system, and HLA gene matching is crucial for the success of human organ transplantations. Numerous studies have demonstrated that variation in HLA genes is associated with many autoimmune, inflammatory and infectious diseases. However, typing HLA genes by serology or PCR is time consuming and expensive, which limits large-scale studies involving HLA genes. Since it is much easier and cheaper to obtain single nucleotide polymorphism (SNP) genotype data, accurate computational algorithms to infer HLA gene types from SNP genotype data are in need. To infer HLA types from SNP genotypes, the first step is to infer SNP haplotypes from genotypes. However, for the same SNP genotype data set, the haplotype configurations inferred by different methods are usually inconsistent, and it is often difficult to decide which one is true. Results In this paper, we design an accurate HLA gene type inference algorithm by utilizing SNP genotype data from pedigrees, known HLA gene types of some individuals and the relationship between inferred SNP haplotypes and HLA gene types. Given a set of haplotypes inferred from the genotypes of a population consisting of many pedigrees, the algorithm first constructs a weighted similarity graph based on a new haplotype similarity measure and derives constraint edges from known HLA gene types. Based on the principle that different HLA gene alleles should have different background haplotypes, the algorithm searches for an optimal labeling of all the haplotypes with unknown HLA gene types such that the total weight among the same HLA gene types is maximized. To deal with ambiguous haplotype solutions, we use a genetic algorithm to select haplotype configurations that tend to maximize the same optimization criterion. Our experiments on a previously typed subset of the HapMap data show that the algorithm is highly accurate, achieving an accuracy of 96% for gene HLA-A, 95% for HLA-B, 97% for HLA-C, 84% for HLA-DRB1, 98% for HLA-DQA1 and 97% for HLA-DQB1 in a leave-one-out test. Conclusions Our algorithm can infer HLA gene types from neighboring SNP genotype data accurately. Compared with a recent approach on the same input data, our algorithm achieved a higher accuracy. The code of our algorithm is available to the public for free upon request to the corresponding authors

    Evaluating Visual Acuity in the American Academy of Ophthalmology IRIS® Registry

    Get PDF
    OBJECTIVE: To describe visual acuity data representation in the American Academy of Ophthalmology Intelligent Research in Sight (IRIS) Registry and present a data-cleaning strategy. DESIGN: Reliability and validity study. PARTICIPANTS: Patients with visual acuity records from 2018 in the IRIS Registry. METHODS: Visual acuity measurements and metadata were identified and characterized from 2018 IRIS Registry records. Metadata, including laterality, assessment method (distance, near, and unspecified), correction (corrected, uncorrected, and unspecified), and flags for refraction or pinhole assessment were compared between Rome (frozen April 20, 2020) and Chicago (frozen December 24, 2021) versions. We developed a data-cleaning strategy to infer patients\u27 corrected distance visual acuity in their better-seeing eye. MAIN OUTCOME MEASURES: Visual acuity data characteristics in the IRIS Registry. RESULTS: The IRIS Registry Chicago data set contains 168 920 049 visual acuity records among 23 001 531 unique patients and 49 968 974 unique patient visit dates in 2018. Visual acuity records were associated with refraction in 5.3% of cases, and with pinhole in 11.0%. Mean (standard deviation) of all measurements was 0.26 (0.41) logarithm of the minimum angle of resolution (logMAR), with a range of - 0.3 to 4.0 A plurality of visual acuity records were labeled corrected (corrected visual acuity [CVA], 39.1%), followed by unspecified (37.6%) and uncorrected (uncorrected visual acuity [UCVA], 23.4%). Corrected visual acuity measurements were paradoxically worse than same day UCVA 15% of the time. In aggregate, mean and median values were similar for CVA and unspecified visual acuity. Most visual acuity measurements were at distance (59.8%, vs. 32.1% unspecified and 8.2% near). Rome contained more duplicate visual acuity records than Chicago (10.8% vs. 1.4%). Near visual acuity was classified with Jaeger notation and (in Chicago only) also assigned logMAR values by Verana Health. LogMAR values for hand motion and light perception visual acuity were lower in Chicago than in Rome. The impact of data entry errors or outliers on analyses may be reduced by filtering and averaging visual acuity per eye over time. CONCLUSIONS: The IRIS Registry includes similar visual acuity metadata in Rome and Chicago. Although fewer duplicate records were found in Chicago, both versions include duplicate and atypical measurements (i.e., CVA worse than UCVA on the same day). Analyses may benefit from using algorithms to filter outliers and average visual acuity measurements over time. FINANCIAL DISCLOSURES: Proprietary or commercial disclosure may be found found in the Footnotes and Disclosures at the end of this article

    Search for the standard model Higgs boson in the H to ZZ to 2l 2nu channel in pp collisions at sqrt(s) = 7 TeV

    Get PDF
    A search for the standard model Higgs boson in the H to ZZ to 2l 2nu decay channel, where l = e or mu, in pp collisions at a center-of-mass energy of 7 TeV is presented. The data were collected at the LHC, with the CMS detector, and correspond to an integrated luminosity of 4.6 inverse femtobarns. No significant excess is observed above the background expectation, and upper limits are set on the Higgs boson production cross section. The presence of the standard model Higgs boson with a mass in the 270-440 GeV range is excluded at 95% confidence level.Comment: Submitted to JHE

    Measurement of the Z/gamma* + b-jet cross section in pp collisions at 7 TeV

    Get PDF
    The production of b jets in association with a Z/gamma* boson is studied using proton-proton collisions delivered by the LHC at a centre-of-mass energy of 7 TeV and recorded by the CMS detector. The inclusive cross section for Z/gamma* + b-jet production is measured in a sample corresponding to an integrated luminosity of 2.2 inverse femtobarns. The Z/gamma* + b-jet cross section with Z/gamma* to ll (where ll = ee or mu mu) for events with the invariant mass 60 < M(ll) < 120 GeV, at least one b jet at the hadron level with pT > 25 GeV and abs(eta) < 2.1, and a separation between the leptons and the jets of Delta R > 0.5 is found to be 5.84 +/- 0.08 (stat.) +/- 0.72 (syst.) +(0.25)/-(0.55) (theory) pb. The kinematic properties of the events are also studied and found to be in agreement with the predictions made by the MadGraph event generator with the parton shower and the hadronisation performed by PYTHIA.Comment: Submitted to the Journal of High Energy Physic
    corecore