65 research outputs found

    Embed and Conquer: Scalable Embeddings for Kernel k-Means on MapReduce

    Full text link
    The kernel kk-means is an effective method for data clustering which extends the commonly-used kk-means algorithm to work on a similarity matrix over complex data structures. The kernel kk-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel kk-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. In this paper, we are defining a family of kernel-based low-dimensional embeddings that allows for scaling kernel kk-means on MapReduce via an efficient and unified parallelization strategy. Afterwards, we propose two methods for low-dimensional embedding that adhere to our definition of the embedding family. Exploiting the proposed parallelization strategy, we present two scalable MapReduce algorithms for kernel kk-means. We demonstrate the effectiveness and efficiency of the proposed algorithms through an empirical evaluation on benchmark data sets.Comment: Appears in Proceedings of the SIAM International Conference on Data Mining (SDM), 201

    Scalable Embeddings for Kernel Clustering on MapReduce

    Get PDF
    There is an increasing demand from businesses and industries to make the best use of their data. Clustering is a powerful tool for discovering natural groupings in data. The k-means algorithm is the most commonly-used data clustering method, having gained popularity for its effectiveness on various data sets and ease of implementation on different computing architectures. It assumes, however, that data are available in an attribute-value format, and that each data instance can be represented as a vector in a feature space where the algorithm can be applied. These assumptions are impractical for real data, and they hinder the use of complex data structures in real-world clustering applications. The kernel k-means is an effective method for data clustering which extends the k-means algorithm to work on a similarity matrix over complex data structures. The kernel k-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel k-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. This thesis defines a family of kernel-based low-dimensional embeddings that allows for scaling kernel k-means on MapReduce via an efficient and unified parallelization strategy. Then, three practical methods for low-dimensional embedding that adhere to our definition of the embedding family are proposed. Combining the proposed parallelization strategy with any of the three embedding methods constitutes a complete scalable and efficient MapReduce algorithm for kernel k-means. The efficiency and the scalability of the presented algorithms are demonstrated analytically and empirically

    Human-in-the-Loop Question Answering with Natural Language Interaction

    Get PDF
    Generalizing beyond the training examples is the primary goal of machine learning. In natural language processing (NLP), impressive models struggle to generalize when faced with test examples that differ from the training examples: e.g., in genre, domain, or language. I study interactive methods that overcome such limitations by seeking feedback from human users to successfully complete the task at hand and improve over time while on the job. Unlike previous work that adopts simple forms of feedback (e.g., labeling predictions as correct/wrong or answering yes/no clarification questions), I focus on using free-form natural language as the communication interface for providing feedback which can convey richer information and offer a more flexible interaction. An essential skill that language-based interactive systems should have is to understand user utterances in conversational contexts. I study conversational question answering (CQA) in which humans interact with a question answering (QA) system by asking a sequence of related questions. CQA requires models to link questions together to resolve the conversational dependencies between them such as coreference and ellipsis. I introduce question-in-context rewriting to reduce context-dependent conversational questions to independent stand-alone questions that can be answered with existing QA models. I collect a large dataset of human rewrites and I use it to evaluate a set of models for the question rewriting task. Next, I study semantic parsing in interactive settings in which users correct parsing errors using natural language feedback. Most existing work frames semantic parsing as a one-shot mapping task. I establish that the majority of parsing mistakes that recent neural text-to-SQL parsers make are minor. Hence, it is often feasible for humans to detect and suggest corrections for such mistakes if they have the opportunity to provide precise feedback. I describe an interactive text-to-SQL parsing system that enables users to inspect the inferred parses and correct any errors they find by providing feedback in free-form natural language. I construct SPLASH: a large dataset of SQL correction instances paired with a diverse set of human-authored natural language feedback utterances. Using SPLASH, I posed a new task: given a question paired with an initial erroneous SQL parse, to what extent can we correct the parse based on a provided natural language feedback? Then, I present NL-EDIT: a neural model for the correction task. NL-EDIT combines two key ideas: 1) interpreting the feedback in the context of the other elements of the interaction and, 2) explicitly generating edit operations to correct the initial query instead of re-generating the full query from scratch. I create a simple SQL editing language whose basic units are add/delete operations applied to different SQL clauses. I discuss evaluation methods that help understand the usefulness and limitations of semantic parse correction models. I conclude this thesis by identifying three broad research directions for further advancing collaborative human-computer NLP: (1) developing user-centered explanations, (2) designing and evaluating interaction mechanisms, and (3) learning from interactions

    Reconfiguration technique for Optimization of the Photovoltaic array output power under partial shading conditions

    Get PDF
    A partial shading condition is a case under which the PV array is exposed to many problems such as losses of the output power of the PV array, and the PV array has more than one maximum power point (MPP), which makes it so difficult to track the MPP. This paper presents the effect of different partial shading patterns on PV array characteristics and the effect on the output power of the PV array, and provides a comparative literature review on methods to mitigate these effects and the drawbacks of these methods. It also proposed a new reconfiguration strategy that increases the output power of the PV array by 13.8 % from the total power under shadow condition, and a new technique for enhancing the output power of the PV array by 20 % of the total power under fully illumining conditions by controlling the switch matrix between the photovoltaic array and adaptive batteries bank. This paper gives a solution for the problem of the difficulty of tracking the MPP, because the proposed strategy makes only one MPP. The simulation was carried out by using MATLAB Simulink under different shading patterns.Citation: Mohamed, A. M., Saafan, S. M., Attalla, A. M., and Elgohary, H. (2018). Reconfiguration technique for Optimization of the Photovoltaic array output power under partial shading conditions. Trends in Renewable Energy, 4, 111-124. DOI: 10.17737/tre.2018.4.2.006

    Complications of percutaneous endoscopic gastrostomy (PEG) tube applied to endoscopy unit patients

    Get PDF
    Background: In patients who find difficulty in eating or who have lost the ability to swallow food, percutaneous endoscopic gastrostomy (PEG) is the preferred method of long-term tube feeding. Although PEG is a usually safe technique, several complications sometimes arise.Objective: To study the advantage and disadvantage of PEG to improve the maneuver and increase the success rate by identification of outcome complications of PEG and their management and evaluation of the efficacy of PEG in improving patient's lifestyles.Patients and methods: This retrospective, single-center study was done on 60 patients who needed PEG tube in the endoscopy unit of internal medicine department, Zagazig University Hospital during the period from December 2020 to May 2021. All patients were subjected to complete relevant evaluation before the study in the form of complete history taking, clinical examination, lab investigation, pelvi-abdominal ultrasound, multislice CT or MRI to assess advanced cancer or peritoneal metastasis and endoscopic examination for outlet patency.Results: A total of 16 patients (26.67 %) had PEG-related complications. Fourteen (23.3 %) patients had minor complications. The most common minor complication recorded was insertion site infection that found in 5 (8.3 %) patients. Two (3.3%) patients in our study reported major PEG-related complications. One (1.7%) patient had massive hematemesis and melena and one (1.7%) patient reported buried bumper syndrome.Conclusion: We concluded that PEG had received global acceptability as a safe approach for administering enteral feeding in patients with inadequate oral intake for more than 28 days and a functioning GI system

    Adaptive Fuzzy Supplementary Controller for SSR Damping in a Series-Compensated DFIG-Based Wind Farm

    Get PDF
    Although using a series compensation technique in a long transmission line effectively increases the transmittable power; it may cause a sub-synchronous resonance (SSR) phenomenon. Gate-controlled series capacitor (GCSC) is an effective method for SSR damping by controlling the turn-off angle. In the previous studies, a constant supplementary damping controller (SDC) was used for controlling the turn-off angle, which can mitigate the SSR phenomenon. However, these methods can not capture the maximum transmittable power at different operating points. In this paper, a fuzzy logic controller (FLC) is proposed to compute the gain of SDC based on the wind speed and the error between the measured and reference line currents for transferring as much power as possible and damping the SSR phenomenon simultaneously. Using the MATLAB/SIMULINK program, the proposed method is tested at different operating points to validate its effectiveness and robustness. Compared to the traditional method (constant SDC), the maximum transmittable power, as well as SSR damping, is achieved in all studied cases by the proposed method (variable SDC)
    • ÔÇŽ
    corecore