1,224 research outputs found

    Efficient Processing of k Nearest Neighbor Joins using MapReduce

    Full text link
    k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.Comment: VLDB201

    Extracting transcription factor binding sites from unaligned gene sequences with statistical models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factor binding sites (TFBSs) are crucial in the regulation of gene transcription. Recently, chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-chip array) has been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA interaction loci within 1–2 kb resolution. To find out the exact binding motifs, it is necessary to build a computational method to examine the ChIP-chip array binding sequences and search for possible motifs representing the transcription factor binding sites.</p> <p>Results</p> <p>We developed a program to find out accurate motif sites from a set of unaligned DNA sequences in the yeast genome. Compared with MDscan, the prediction results suggest that, overall, our algorithm outperforms MDscan since the predicted motifs are more consistent with previously known specificities reported in the literature and have better prediction ranks. Our program also outperforms the constraint-less Cosmo program, especially in the elimination of false positives.</p> <p>Conclusion</p> <p>In this study, an improved sampling algorithm is proposed to incorporate the binomial probability model to build significant initial candidate motif sets. By investigating the statistical dependence between base positions in TFBSs, the method of dependency graphs and their expanded Bayesian networks is combined. The results show that our program satisfactorily extract transcription factor binding sites from unaligned gene sequences.</p

    Performance evaluation on the implementation of Pre-established Medical Processes for nurse practitioners in the hospitals

    Get PDF
    In 2015, Taiwan announced the establishment of “Pre-established Medical Processes” and related regulations to assist nurse practitioners in the clinical tasks, maintain medical quality and patient safety, and provide protection in clinical practice. However, the effectiveness of implementation still needs to be improved and strengthened. This study adopts the TAM and the TTF as the research framework, and a cross-sectional design. The questionnaires are administered to the professional nurse practitioners in the hospitals of central Taiwan. A total of 300 questionnaires were distributed, and Smart PLS 3.0 and SPSS 24.0 were both applied to verify interpretability. The questionnaire recovery rate was 88.3%, and the overall predictive power was 65.2%. Technological characteristics and TTF had a significant impact on perceived usefulness

    Self-supervised learning-based general laboratory progress pretrained model for cardiovascular event detection

    Full text link
    The inherent nature of patient data poses several challenges. Prevalent cases amass substantial longitudinal data owing to their patient volume and consistent follow-ups, however, longitudinal laboratory data are renowned for their irregularity, temporality, absenteeism, and sparsity; In contrast, recruitment for rare or specific cases is often constrained due to their limited patient size and episodic observations. This study employed self-supervised learning (SSL) to pretrain a generalized laboratory progress (GLP) model that captures the overall progression of six common laboratory markers in prevalent cardiovascular cases, with the intention of transferring this knowledge to aid in the detection of specific cardiovascular event. GLP implemented a two-stage training approach, leveraging the information embedded within interpolated data and amplify the performance of SSL. After GLP pretraining, it is transferred for TVR detection. The proposed two-stage training improved the performance of pure SSL, and the transferability of GLP exhibited distinctiveness. After GLP processing, the classification exhibited a notable enhancement, with averaged accuracy rising from 0.63 to 0.90. All evaluated metrics demonstrated substantial superiority (p < 0.01) compared to prior GLP processing. Our study effectively engages in translational engineering by transferring patient progression of cardiovascular laboratory parameters from one patient group to another, transcending the limitations of data availability. The transferability of disease progression optimized the strategies of examinations and treatments, and improves patient prognosis while using commonly available laboratory parameters. The potential for expanding this approach to encompass other diseases holds great promise.Comment: published in IEEE Journal of Translational Engineering in Health & Medicin

    Metal-free sp(3) C-H functionalization: a novel approach for the syntheses of selenide ethers and thioesters from methyl arenes

    Get PDF
    A DTBP-promoted metal-free and solvent-free formation of C-Se and C-S bonds through sp(3) C-H functionalization of methyl arenes with diselenides and disulfides is described

    Zigzag magnetic order in a novel tellurate compound Na4δ_{4-\delta}NiTeO6_{6} with S\mathit{S} = 1 chains

    Full text link
    Na4δ_{4-\delta}NiTeO6_{6} is a rare example in the transition-metal tellurate family of realizing an SS = 1 spin-chain structure. By performing neutron powder diffraction measurements, the ground-state magnetic structure of Na4δ_{4-\delta}NiTeO6_{6} is determined. These measurements reveal that below TNT\rm_{N} {\sim} 6.8(2) K, the Ni2+^{2+} moments form a screwed ferromagnetic (FM) spin-chain structure running along the crystallographic aa axis but these FM spin chains are coupled antiferromagnetically along the bb and cc directions, giving rise to a magnetic propagation vector of kk = (0, 1/2, 1/2). This zigzag magnetic order is well supported by first-principles calculations. The moment size of Ni2+^{2+} spins is determined to be 2.1(1) μ\muB\rm_{B} at 3 K, suggesting a significant quenching of the orbital moment due to the crystalline electric field (CEF) effect. The previously reported metamagnetic transition near HCH\rm_{C} {\sim} 0.1 T can be understood as a field-induced spin-flip transition. The relatively easy tunability of the dimensionality of its magnetism by external parameters makes Na4δ_{4-\delta}NiTeO6_{6} a promising candidate for further exploring various types of novel spin-chain physics.Comment: 10 pages, 6 figure

    Recent Development of Graphene-Based Cathode Materials for Dye-Sensitized Solar Cells

    Get PDF
    Dye-sensitized solar cells (DSSCs) have attracted extensive attention for serving as potential low-cost alternatives to silicon-based solar cells. As a vital role of a typical DSSC, the counter electrode (CE) is generally employed to collect electrons via the external circuit and speed up the reduction reaction of I3- to I- in the redox electrolyte. The noble Pt is usually deposited on a conductive glass substrate as CE material due to its excellent electrical conductivity, electrocatalytic activity, and electrochemical stability. To achieve cost-efficient DSSCs, reasonable efforts have been made to explore Pt-free alternatives. Recently, the graphene-based CEs have been intensively investigated to replace the high-cost noble Pt CE. In this paper, we provided an overview of studies on the electrochemical and photovoltaic characteristics of graphene-based CEs, including graphene, graphene/Pt, graphene/carbon materials, graphene/conducting polymers, and graphene/inorganic compounds. We also summarize the design and advantages of each graphene-based material and provide the possible directions for designing new graphene-based catalysts in future research for high-performance and low-cost DSSCs

    Discriminating Glucose Tolerance Status by Regions of Interest of Dual-Energy X-Ray Absorptiometry: Clinical Implications of Body Fat Distribution

    Get PDF
    WSTĘP. Zbadanie, czy ocena rozmieszczenia tkanki tłuszczowej w organizmie metodą absorpcjometrii promieniowania rentgenowskiego o podwójnej energii (DEXA, dual energy X-ray absorptiometry) może być pomocny w ocenie stanu tolerancji glukozy. MATERIAŁ I METODY. U 1015 badanych mieszkańców Chin (559 mężczyzn i 456 kobiet) zastosowano doustny test obciążenia glukozą (75,0 g). Na podstawie jego wyników wyodrębniono osoby o prawidłowej (NGT, normal glucose tolerance) i upośledzonej (IGT, impaired glucose tolerance) tolerancji glukozy oraz osoby, u których rozpoznano cukrzycę (DM, diabetes mellitus). Mierzono wysokość ciśnienia tętniczego i oceniano profil lipidowy. Na podstawie stosunku obwodu talii do bioder (WHR, waist-to-hip ratio) i wyników DEXA oceniano rozmieszczenie tkanki tłuszczowej u osób w poszczególnych grupach. WYNIKI. Rozmieszczenie tkanki tłuszczowej, wyrażone poprzez WHR oraz wskaźnik centralizacji, wykazało znamienną częściową korelację ze stężeniem hemoglobiny glikowanej, wysokością ciśnienia tętniczego i profilem lipidowym u wszystkich badanych. Po skorygowaniu wyników wobec wieku i wskaźnika masy ciała (BMI, body mass index), stwierdzono znamienne różnice częstości wszystkich sercowo-naczyniowych czynników ryzyka w poszczególnych grupach, z wyjątkiem stężenia cholesterolu całkowitego. W grupie DM odnotowano znamiennie wyż-sze wartości WHR i wskaźnika centralizacji przy niższej procentowo zawartości tkanki tłuszczowej w udach. Ponadto, pacjentów z grupy IGT charakteryzował wyższy wskaźnik centralizacji niż osoby z grupy NGT. Nie stwierdzono jednakże znamiennych różnic masy tkanek beztłuszczowych w porównywanych grupach. Po dokonaniu wieloczynnikowej analizy logistycznej regresji wskaźnik centralizacji pozostał istotnym czynnikiem umożliwiającym ocenę tolerancji glukozy, niezależnie od procentowej zawartości tkanki tłuszczowej w organizmie. WNIOSKI. Otyłość centralna wykazuje znamienną korelację z sercowo-naczyniowymi czynnikami ryzyka w grupach osób o różnej tolerancji glukozy. Indeks centralizacji, oceniany metodą DEXA, wydaje się lepszym wskaźnikiem upośledzenia tolerancji glukozy niż WHR, otyłość brzuszna czy uogólniona otyłość (wyrażone odpowiednio jako odsetek zawartości tłuszczu całkowitego lub BMI) w dużej grupie badanych Chińczyków.OBJECTIVE. To determine whether measuring body fat distribution by dual-energy X-ray a bsor ptio metry (DEXA) can be used to discriminate glucose tolerance status. RESEARCH DESIGN AND METHODS. Using a 75-g oral glucose tolerance test, a total of 1,015 Chinese subjects (559 men and 456 women) were categorized as having normal glucose tolerance (NGT), impaired glucose tolerance (IGT), or diabetes. Blood pre ssure and lipid profiles of these subjects were measured. Waist-to-hip ratio (WHR) and DEXA were used to evaluate the varying patterns of body fat distribution among the gro ups. RESULTS. Body fat distribution, as reflected by WHR and the centrality index, showed significant partial correlation coefficients with glycosylated hemoglobin, blood pressure, and lipid profiles in all subjects. After adjusting for age and BMI, there were significant differences among the three glycemic groups for all the cardiovascular risk factors except for total cholesterol level. The diabetic group had a significantly higher WHR and centrality index, but lower femoral fat percentage than the NGT and IGT groups. The diabetic group also showed higher abdominal fat percentage than the NGT group. More over, the IGT group had a higher centrality index than the NGT group. However, no significant differences were found in the percentage of lean tissue mass among the three groups. Using multiple stepwise logistic regression models, the centrality index remained a significant factor for discriminating different glucose tolerance status independent of the percentage total body fat. CONCLUSIONS. Central obesity has shown significant correlation with cardio vascular risk factors among the three different glycemic groups. Centrality index measured by DEXA appears to be the better predictor of glucose intolerance, compared with WHR, abdominal fat, and general obesity (reflected by percentage total body fat or BMI) in a large cohort of the Chinese population
    corecore