Search CORE

967 research outputs found

Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation

Author: Gao Lixin
Gao Qixin
Wang Cuirong
Zhang Yanfeng
Publication venue
Publication date: 16/10/2017
Field of study

Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To accelerate these large-scale graph-based iterative computations, we propose delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the "changes" between iterations. By DAIC, we can process only the "changes" to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the high-cost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other state-of-the-art frameworks.Comment: ScienceCloud 2012, TKDE 201

arXiv.org e-Print Archive

CiteSeerX

Inferring Histories of Adaptive Divergence with Gene Flow: Genetic, Demographic and Geographic Effects.

Author: He Qixin
Publication venue
Publication date
Field of study

As genomic data is increasingly available even for non-model organisms, the traditional boundaries among fields such as phylogenetics, phylogeography and genetics of adaptation are disappearing. This thesis provides a synthetic framework for studying ecological genomics, which considers selective processes (such as adaptation to new niches) and neutral processes (such as population size changes due to environmental shifts) simultaneously. Conventionally, studies that look for targets of selection on a genome assume a simple demographic model without validations from the species' ecological or phylogeographic histories. The work demonstrates that one cannot reliably identify selection unless realistic demographic histories are inferred for the species or even a specific genomic region. In particular, I investigate the evolutionary history of large polymorphic inversions in Anopheles gambiae, which maintains adaptive divergence among ecologically divergent populations. By modeling the unique origin and introgression histories of each inversion, I am able to identify target regions of selection within inversions through training discriminant functions with pure drift versus selection simulations. The thesis also extends the existing theory of local adaptation model via chromosomal inversions to consider the source of inversion variation, as well as evaluates the likelihood of such adaptations under different parameter spaces. The findings are particularly important for understanding mosaic genomic evolution in the early stages of speciation, where accumulation of divergence is dampened by gene flow. Finally, I examine how historical events, such as habitat contractions or recolonization, influence current genetic pattern and the application of spatially-explicit demographic modeling under Approximate Bayesian Computation statistics to distinguish different phylogeographic scenarios. The work represents a flexible framework for researchers to translate dynamic phylogeographic hypotheses into testable coalescent models by integrating all the available information of the species, such as distribution records, habitat preference, paleo-climate models, and competition between species. In general, with the amount of information as well as inherent heterogeneity of genomic data, this thesis contributes to the ongoing paradigm shift from studying separate evolutionary processes towards a holistic analysis of the interactions of selective and neutral processes under a rigorous statistical framework.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111443/1/heqixin_1.pd

Deep Blue Documents

Recommended from our members

Quantifying asymptomatic infection and transmission of COVID-19 in New York City using observed cases, serology, and testing capacity

Author: He Qixin
Pascual Mercedes
Subramanian Rahul
Publication venue
Publication date: 10/11/2023
Field of study

The contributions of asymptomatic infections to herd immunity and community transmission are key to the resurgence and control of COVID-19, but are difficult to estimate using current models that ignore changes in testing capacity. Using a model that incorporates daily testing information fit to the case and serology data from New York City, we show that the proportion of symptomatic cases is low, ranging from 13 to 18%, and that the reproductive number may be larger than often assumed. Asymptomatic infections contribute substantially to herd immunity, and to community transmission together with presymptomatic ones. If asymptomatic infections transmit at similar rates as symptomatic ones, the overall reproductive number across all classes is larger than often assumed, with estimates ranging from 3.2 to 4.4. If they transmit poorly, then symptomatic cases have a larger reproductive number ranging from 3.9 to 8.1. Even in this regime, presymptomatic and asymptomatic cases together comprise at least 50% of the force of infection at the outbreak peak. We find no regimes in which all infection subpopulations have reproductive numbers lower than three. These findings elucidate the uncertainty that current case and serology data cannot resolve, despite consideration of different model structures. They also emphasize how temporal data on testing can reduce and better define this uncertainty, as we move forward through longer surveillance and second epidemic waves. Complementary information is required to determine the transmissibility of asymptomatic cases, which we discuss. Regardless, current assumptions about the basic reproductive number of severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) should be reconsidered

Knowledge UChicago

Synthetic Data as Validation

Author: Hu Qixin
Yuille Alan
Zhou Zongwei
Publication venue
Publication date: 24/10/2023
Field of study

This study leverages synthetic data as a validation set to reduce overfitting and ease the selection of the best model in AI development. While synthetic data have been used for augmenting the training set, we find that synthetic data can also significantly diversify the validation set, offering marked advantages in domains like healthcare, where data are typically limited, sensitive, and from out-domain sources (i.e., hospitals). In this study, we illustrate the effectiveness of synthetic data for early cancer detection in computed tomography (CT) volumes, where synthetic tumors are generated and superimposed onto healthy organs, thereby creating an extensive dataset for rigorous validation. Using synthetic data as validation can improve AI robustness in both in-domain and out-domain test sets. Furthermore, we establish a new continual learning framework that continuously trains AI models on a stream of out-domain data with synthetic tumors. The AI model trained and validated in dynamically expanding synthetic data can consistently outperform models trained and validated exclusively on real-world data. Specifically, the DSC score for liver tumor segmentation improves from 26.7% (95% CI: 22.6%-30.9%) to 34.5% (30.8%-38.2%) when evaluated on an in-domain dataset and from 31.1% (26.0%-36.2%) to 35.4% (32.1%-38.7%) on an out-domain dataset. Importantly, the performance gain is particularly significant in identifying very tiny liver tumors (radius < 5mm) in CT volumes, with Sensitivity improving from 33.1% to 55.4% on an in-domain dataset and 33.9% to 52.3% on an out-domain dataset, justifying the efficacy in early detection of cancer. The application of synthetic data, from both training and validation perspectives, underlines a promising avenue to enhance AI robustness when dealing with data from varying domains

arXiv.org e-Print Archive

Quantifying AS Path Inflation by Routing Policies

Author: Gao Lixin
Gao Qixin
Wang Feng
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2016
Field of study

A route in the Internet may take a longer AS path than the shortest AS path due to routing policies. In this paper, we systematically analyze AS paths and quantify the extent to which routing policies inflate AS paths. The results show that AS path inflation in the Internet is more prevalent than expected. We first present the extent of AS path inflation observed from the RouteView and RIPE routing tables. We then employ three common routing policies to show the extent of AS path inflation. We find that No-Valley routing policy causes the least AS path inflation among the three routing policies. PreferCustomer-and-Peer-over-Provider policy causes the most AS path inflation. In addition, we find that single-homed stub ASes experience more path inflations than transit ASes and multi-homed ASes. The AS pairs with shortest AS path of 3 AS hops experience more path inflations than other AS pairs. Finally, we investigate the AS path inflation on the end-to-end path from end users to two popular content providers, Google and Comcast. Although the majority of the shortest AS paths from end users to the two providers consists of no more than three AS hops, the actual end-to-end paths that the traffic will take are longer than the shortest AS paths in many cases. Quantifying AS path inflation in the Internet has important implications on the extent of routing policies, traffic engineering performed on the Internet, and BGP convergence speed

Crossref

ScholarWorks@UMass Amherst

Eugenol Nanoencapsulated by Sodium Caseinate: Physical, Antimicrobial, and Biophysical Properties

Author: Pan Kang
Zhang Yue
Zhong Qixin
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2017
Field of study

To improve the application of essential oils as natural antimicrobial preservatives, the objective of the present study was to determine physical, antimicrobial, and biophysical properties of eugenol after nanoencapsulation by sodium caseinate (NaCas). Emulsions were prepared by mixing eugenol in 20.0 mg/mL NaCas solution at an overall eugenol content of 5.0–137.9 mg/mL using shear homogenization. Stable emulsions were observed up to 38.5 mg/mL eugenol, which had droplet diameters of smaller than 125 nm at pH 5–9 after ambient storage for up to 30 days. The encapsulated eugenol had similar minimal inhibitory and minimal bactericidal concentrations as free eugenol against Escherichia coli O157:H7 ATCC 43895, Listeria monocytogenes Scott A, and Salmonella Enteritidis but showed better inhibition of E. coli O157:H7 than free eugenol during incubation at 37 °C for 48 h. After 20 min interaction at 21 °C, bacteria treated with encapsulated eugenol had a greater reduction of intracellular ATP and a greater increase of extracellular ATP than free eugenol, suggesting the enhanced permeation of eugenol after nanoencapsulation. However, such overall trend was not observed when examining bacterial morphology and uptake of crystal violet, suggesting the possible membrane adaptation. Findings from this study showed the feasibility of preparing nanoemulsions with high loading of eugenol using NaCas

DigitalCommons@University of Nebraska

中國醫學在養老護老中的作用

Author: SUN Qixin
Publication venue: Digital Commons @ Lingnan University
Publication date: 27/06/2012
Field of study

引言上海嘉定康福敬养院是一家民办非企社会投资兴建的，采取的经营与投资分离的管理模式。康福敬养院总投资2500万，占地28亩，共有200张床位。成为了集宾馆、医疗、护理、娱乐为一体，「老有所医、老有所养、老有所乐」的老人们的新家。成立9年多来，我院树立了以中医经络养生、中医养老、寓医于食、医食同源的养老特色品牌，采用「三疗」（食疗、茶疗、乐疗）加精神慰藉对院内老人进行健康调理。现在院内老人平均寿命85 岁，最高寿者104岁，均过着安居乐陶陶的生活。九年多来康福敬养院同时也取得了良好的社会效益和平整运行的经济效益

Digital Commons @ Lingnan University