523 research outputs found

    Pattern mining under different conditions

    Get PDF
    New requirements and demands on pattern mining arise in modern applications, which cannot be fulfilled using conventional methods. For example, in scientific research, scientists are more interested in unknown knowledge, which usually hides in significant but not frequent patterns. However, existing itemset mining algorithms are designed for very frequent patterns. Furthermore, scientists need to repeat an experiment many times to ensure reproducibility. A series of datasets are generated at once, waiting for clustering, which can contain an unknown number of clusters with various densities and shapes. Using existing clustering algorithms is time-consuming because parameter tuning is necessary for each dataset. Many scientific datasets are extremely noisy. They contain considerably more noises than in-cluster data points. Most existing clustering algorithms can only handle noises up to a moderate level. Temporal pattern mining is also important in scientific research. Existing temporal pattern mining algorithms only consider pointbased events. However, most activities in the real-world are interval-based with a starting and an ending timestamp. This thesis developed novel pattern mining algorithms for various data mining tasks under different conditions. The first part of this thesis investigates the problem of mining less frequent itemsets in transactional datasets. In contrast to existing frequent itemset mining algorithms, this part focus on itemsets that occurred not that frequent. Algorithms NIIMiner, RaCloMiner, and LSCMiner are proposed to identify such kind of itemsets efficiently. NIIMiner utilizes the negative itemset tree to extract all patterns that occurred less than a given support threshold in a top-down depth-first manner. RaCloMiner combines existing bottom-up frequent itemset mining algorithms with a top-down itemset mining algorithm to achieve a better performance in mining less frequent patterns. LSCMiner investigates the problem of mining less frequent closed patterns. The second part of this thesis studied the problem of interval-based temporal pattern mining in the stream environment. Interval-based temporal patterns are sequential patterns in which each event is aligned with a starting and ending temporal information. The ability to handle interval-based events and stream data is lacking in existing approaches. A novel intervalbased temporal pattern mining algorithm for stream data is described in this part. The last part of this thesis studies new problems in clustering on numeric datasets. The first problem tackled in this part is shape alternation adaptivity in clustering. In applications such as scientific data analysis, scientists need to deal with a series of datasets generated from one experiment. Cluster sizes and shapes are different in those datasets. A kNN density-based clustering algorithm, kadaClus, is proposed to provide the shape alternation adaptability so that users do not need to tune parameters for each dataset. The second problem studied in this part is clustering in an extremely noisy dataset. Many real-world datasets contain considerably more noises than in-cluster data points. A novel clustering algorithm, kenClus, is proposed to identify clusters in arbitrary shapes from extremely noisy datasets. Both clustering algorithms are kNN-based, which only require one parameter k. In each part, the efficiency and effectiveness of the presented techniques are thoroughly analyzed. Intensive experiments on synthetic and real-world datasets are conducted to show the benefits of the proposed algorithms over conventional approaches

    Prevalence, risk factors and outcomes of multidrug resistant Tuberculosis identified by GeneXpert MTB/RIF Assay in Bukavu, Democratic Republic of the Congo

    Get PDF
    Background: Tuberculosis (TB) has been the top-ranked infectious disease in human history for a very long time. According to World Health Organization (WHO), Democratic Republic of the Congo (DRC) has been high HIV/TB co-infection burden area for decades. However, MDR-TB were underreported in DRC due to the incompetent healthcare infrastructure and the lack of diagnostic method. Method: This study aims to provide more epidemiological information on the prevalence of multidrug resistant tuberculosis (MDR-TB) infection identified by GeneXpert MTB/RIF assay, which is a new diagnostic method. The data were collected at the General Referral Hospital of Bukavu, DRC. Subject population was the suspected TB patients who visited the hospital of Bukavu, DRC. Retrospective analyses were done based on those data to identify risk factors. In addition, multiple statistical analyses, including multivariate logistic regression, Pearson Chi-square test were performed to investigate how the TB prevalence and treatment outcomes relate to risk factors and GeneXpert assay. Results: The Prevalence of all TB in the study population is 15.6% as identified by GeneXpert Assay. MDR-TB prevalence is 3.2% among the whole study population and 20% among TB positive subgroup. Logistic regression showed that previous TB episode and HIV known status are risk factors for developing MDR-TB. However, no risk factor for treatment failure was found in this study. Conclusions: GeneXpert Assay provided reliable TB and MDR-TB prevalence data in the resource limited area Bukavu, DRC. We noticed that MDR-TB prevalence from our study is significantly higher than national level, which suggests severe underestimation of MDR-TB burden in DRC. However, loss to follow-up of treatment outcome and high proportion of HIV status unknown are significantly affecting the data validity of this study. Public Health Significance: The control of MDR-TB in DR Congo requires diagnostic methods that are not only easy to implement in resource limited area, but also fast in turnaround time for both TB infection and drug resistance detection. GeneXpert MTB/RIF assay is an ideal test, but lack of field test. This study provides solid evidence of the performance of GeneXpert assay in the field as a solution of MDR-TB underreported issue

    Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing

    Full text link
    In natural language processing (NLP), the context of a word or sentence plays an essential role. Contextual information such as the semantic representation of a passage or historical dialogue forms an essential part of a conversation and a precise understanding of the present phrase or sentence. However, the standard attention mechanisms typically generate weights using query and key but ignore context, forming a Bi-Attention framework, despite their great success in modeling sequence alignment. This Bi-Attention mechanism does not explicitly model the interactions between the contexts, queries and keys of target sequences, missing important contextual information and resulting in poor attention performance. Accordingly, a novel and general triple-attention (Tri-Attention) framework expands the standard Bi-Attention mechanism and explicitly interacts query, key, and context by incorporating context as the third dimension in calculating relevance scores. Four variants of Tri-Attention are generated by expanding the two-dimensional vector-based additive, dot-product, scaled dot-product, and bilinear operations in Bi-Attention to the tensor operations for Tri-Attention. Extensive experiments on three NLP tasks demonstrate that Tri-Attention outperforms about 30 state-of-the-art non-attention, standard Bi-Attention, contextual Bi-Attention approaches and pretrained neural language models1

    Treated amblyopes remain deficient in spatial vision: A contrast sensitivity and external noise study

    Get PDF
    AbstractTo evaluate residual spatial vision deficits in treated amblyopia, we recruited five clinically treated amblyopes (mean age=10.6 years). Contrast sensitivity functions (CSF) in both the previously amblyopic eyes (pAE; visual acuity=0.944±0.019 MAR) and fellow eyes (pFE; visual acuity=0.936±0.021 MAR) were measured using a standard psychophysical procedure for all the subjects. The results indicated that the treated amblyopes remained deficient in spatial vision, especially at high spatial frequencies, although their Snellen visual acuity had become normal in the pAEs. To identify the mechanisms underlying spatial vision deficits of treated amblyopes, threshold vs external noise contrast (TvC) functions – the signal contrast necessary for the subject to maintain a threshold performance level in varying amounts of external noise (“TV snow”) – were measured in both eyes of four of the subjects in a sine-wave grating detection task at several spatial frequencies. Two mechanisms of amblyopia were identified: increased internal noise at low to medium spatial frequencies, and both increased internal noise and increased impact of external noise at high spatial frequencies. We suggest that, in addition to visual acuity, other tests of spatial vision (e.g., CSF, TvC) should be used to assess treatment outcomes of amblyopia therapies. Training in intermediate and high spatial frequencies may be necessary to fully recover spatial vision in amblyopia in addition to the occlusion therapy

    Conditional Goal-oriented Trajectory Prediction for Interacting Vehicles with Vectorized Representation

    Full text link
    This paper aims to tackle the interactive behavior prediction task, and proposes a novel Conditional Goal-oriented Trajectory Prediction (CGTP) framework to jointly generate scene-compliant trajectories of two interacting agents. Our CGTP framework is an end to end and interpretable model, including three main stages: context encoding, goal interactive prediction and trajectory interactive prediction. First, a Goals-of-Interest Network (GoINet) is designed to extract the interactive features between agent-to-agent and agent-to-goals using a graph-based vectorized representation. Further, the Conditional Goal Prediction Network (CGPNet) focuses on goal interactive prediction via a combined form of marginal and conditional goal predictors. Finally, the Goaloriented Trajectory Forecasting Network (GTFNet) is proposed to implement trajectory interactive prediction via the conditional goal-oriented predictors, with the predicted future states of the other interacting agent taken as inputs. In addition, a new goal interactive loss is developed to better learn the joint probability distribution over goal candidates between two interacting agents. In the end, the proposed method is conducted on Argoverse motion forecasting dataset, In-house cut-in dataset, and Waymo open motion dataset. The comparative results demonstrate the superior performance of our proposed CGTP model than the mainstream prediction methods.Comment: 14 pages, 4 figure
    • …
    corecore