11 research outputs found

    CLP: A Platform for Competitive Learning

    Get PDF
    We introduce the Competitive Learning Platform (CLP), an online continuous improvement tool that provides automatic partial performance feedback to students or groups of students on individual or collaborative assignments. CLP motivates students to do their best and come up with new solutions that can lead to improved assignment results before the assignment deadline. In this work, we describe the CLP system and present the results of a comprehensive set of analyses aimed at gauging the impact of utilizing this platform on student motivation, engagement, and performance. The analyses are based on a rich dataset containing CLP submission, student outcome, and student feedback data obtained from a variety of undergraduate and graduate classes using the tool at two universities over a period of five years. The sample includes 18 courses, 606 students, and 15782 CLP submissions. Results indicate that CLP is beneficial in this setting, leading to active student participation and improved motivation

    Tutorial: Are You My Neighbor?: Bringing Order to Neighbor Computing Problems

    Get PDF
    Finding nearest neighbors is an important topic that has attracted much attention over the years and has applications in many fields, such as market basket analysis, plagiarism and anomaly detection, community detection, ligand-based virtual screening, etc. As data are easier and easier to collect, finding neighbors has become a potential bottleneck in analysis pipelines. Performing pairwise comparisons given the massive datasets of today is no longer feasible. The high computational complexity of the task has led researchers to develop approximate methods, which find many but not all of the nearest neighbors. Yet, for some types of data, efficient exact solutions have been found by carefully partitioning or filtering the search space in a way that avoids most unnecessary comparisons.In recent years, there have been several fundamental advances in our ability to efficiently identify appropriate neighbors, especially in non-traditional data, such as graphs or document collections. In this tutorial, we provide an in-depth overview of recent methods for finding (nearest) neighbors, focusing on the intuition behind choices made in the design of those algorithms and on the utility of the methods in real-world applications. Our tutorial aims to provide a unifying view of neighbor computing problems, spanning from numerical data to graph data, from categorical data to sequential data, and related application scenarios. For each type of data, we will review the current state-of-the-art approaches used to identify neighbors and discuss how neighbor search methods are used to solve important problems

    Identification of Distinct Characteristics of Antibiofilm Peptides and Prospection of Diverse Sources for Efficacious Sequences

    Get PDF
    A majority of microbial infections are associated with biofilms. Targeting biofilms is considered an effective strategy to limit microbial virulence while minimizing the development of antibiotic resistance. Toward this need, antibiofilm peptides are an attractive arsenal since they are bestowed with properties orthogonal to small molecule drugs. In this work, we developed machine learning models to identify the distinguishing characteristics of known antibiofilm peptides, and to mine peptide databases from diverse habitats to classify new peptides with potential antibiofilm activities. Additionally, we used the reported minimum inhibitory/eradication concentration (MBIC/MBEC) of the antibiofilm peptides to create a regression model on top of the classification model to predict the effectiveness of new antibiofilm peptides. We used a positive dataset containing 242 antibiofilm peptides, and a negative dataset which, unlike previous datasets, contains peptides that are likely to promote biofilm formation. Our model achieved a classification accuracy greater than 98% and harmonic mean of precision-recall (F1) and Matthews correlation coefficient (MCC) scores greater than 0.90; the regression model achieved an MCC score greater than 0.81. We utilized our classification-regression pipeline to evaluate 135,015 peptides from diverse sources for potential antibiofilm activity, and we identified 185 candidates that are likely to be effective against preformed biofilms at micromolar concentrations. Structural analysis of the top 37 hits revealed a larger distribution of helices and coils than sheets, and common functional motifs. Sequence alignment of these hits with known antibiofilm peptides revealed that, while some of the hits showed relatively high sequence similarity with known peptides, some others did not indicate the presence of antibiofilm activity in novel sources or sequences. Further, some of the hits had previously recognized therapeutic properties or host defense traits suggestive of drug repurposing applications. Taken together, this work demonstrates a new in silico approach to predicting antibiofilm efficacy, and identifies promising new candidates for biofilm eradication

    PL2AP: Fast Parallel Cosine Similarity Search

    No full text
    Solving the AllPairs similarity search problem entails finding all pairs of vectors in a high dimensional sparse dataset that have a similarity value higher than a given threshold. The output form this problem is a crucial component in many real-world applications, such as clustering, online advertising, recommender systems, near-duplicate document detection, and query refinement. A number of serial algorithms have been proposed that solve the problem by pruning many of the possible similarity candidates for each query object, after accessing only a few of their non-zero values. The pruning process results in unpredictable memory access patterns that can reduce search efficiency. In this context, we introduce pL2AP, which efficiently solves the AllPairs cosine similarity search problem in a multi-core environment. Our method uses a number of cache-tiling optimizations, combined with fine-grained dynamically balanced parallel tasks, to solve the problem 1.5x--232x faster than existing parallel baselines on datasets with hundreds of millions of non-zeros

    An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks

    No full text
    Forecasting time series with extreme events has been a challenging and prevalent research topic, especially when the time series data are affected by complicated uncertain factors, such as is the case in hydrologic prediction. Diverse traditional and deep learning models have been applied to discover the nonlinear relationships and recognize the complex patterns in these types of data. However, existing methods usually ignore the negative influence of imbalanced data, or severe events, on model training. Moreover, methods are usually evaluated on a small number of generally well-behaved time series, which does not show their ability to generalize. To tackle these issues, we propose a novel probability-enhanced neural network model, called NEC+, which concurrently learns extreme and normal prediction functions and a way to choose among them via selective back propagation. We evaluate the proposed model on the difficult 3-day ahead hourly water level prediction task applied to 9 reservoirs in California. Experimental results demonstrate that the proposed model significantly outperforms state-of-the-art baselines and exhibits superior generalization ability on data with diverse distributions

    Document Clustering: The Next Frontier

    No full text
    The proliferation of documents, on both the Web and in private systems, makes knowledge discovery in document collections arduous. Clustering has been long recognized as a useful tool for the task. It groups like-items together, maximizing intra-cluster similarity and inter-cluster distance. Clustering can provide insight into the make-up of a document collection and is often used as the initial step in data analysis. While most document clustering research to date has focused on moderate length single topic documents, real-life collections are often made up of very short or long documents. Short documents do not contain enough text to accurately compute similarities. Long documents often span multiple topics that general document similarity measures do not take into account. In this paper we will first give an overview of general purpose document clustering, and then focus on recent advancements in the next frontier in document clustering: long and short documents

    Big Data Frequent Pattern Mining

    No full text
    Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called "Big Data". Scalable parallel algorithms hold the key to solving the problem in this context. In this chapter, we review recent advances in parallel frequent pattern mining, analyzing them through the Big Data lens. We identify three areas as challenges to designing parallel frequent pattern mining algorithms: memory scalability, work partitioning, and load balancing. With these challenges as a frame of reference, we extract and describe key algorithmic design patterns from the wealth of research conducted in this domain

    A novel two-box search paradigm for query disambiguation

    No full text
    Precision-oriented search results such as those typically returned by the major search engines are vulnerable to issues of polysemy. When the same term refers to different things, the dominant sense is preferred in the rankings of search results. In this paper, we propose a novel two-box technique in the context of Web search that utilizes contextual terms provided by users for query disambiguation, making it possible to prefer other senses without altering the original query. A prototype system, Bobo, has been implemented. In Bobo, contextual terms are used to capture domain knowledge from users, help estimate relevance of search results, and route them towards a user-intended domain. A vast advantage of Bobo is that a wide range of domain knowledge can be effectively utilized, where helpful contextual terms do not even need to co-occur with query terms on any page. We have extensively evaluated the performance of Bobo on benchmark datasets that demonstrates the utility and effectiveness of our approach

    Alien flora of Europe: species diversity, temporal trends, geographical patterns and research needs

    Get PDF
    The paper provides the first estimate of the composition and structure of alien plants occurring in the wild in the European continent, based on the results of the DAISIE project (2004–2008), funded by the 6th Framework Programme of the European Union and aimed at “creating an inventory of invasive species that threaten European terrestrial, freshwater and marine environments”. The plant section of the DAISIE database is based on national checklists from 48 European countries/regions and Israel; for many of them the data were compiled during the project and for some countries DAISIE collected the first comprehensive checklists of alien species, based on primary data (e.g., Cyprus, Greece, F. Y. R. O. Macedonia, Slovenia, Ukraine). In total, the database contains records of 5789 alien plant species in Europe (including those native to a part of Europe but alien to another part), of which 2843 are alien to Europe (of extra-European origin). The research focus was on naturalized species; there are in total 3749 naturalized aliens in Europe, of which 1780 are alien to Europe. This represents a marked increase compared to 1568 alien species reported by a previous analysis of data in Flora Europaea (1964–1980). Casual aliens were marginally considered and are represented by 1507 species with European origins and 872 species whose native range falls outside Europe. The highest diversity of alien species is concentrated in industrialized countries with a tradition of good botanical recording or intensive recent research. The highest number of all alien species, regardless of status, is reported from Belgium (1969), the United Kingdom (1779) and Czech Republic (1378). The United Kingdom (857), Germany (450), Belgium (447) and Italy (440) are countries with the most naturalized neophytes. The number of naturalized neophytes in European countries is determined mainly by the interaction of temperature and precipitation; it increases with increasing precipitation but only in climatically warm and moderately warm regions. Of the nowadays naturalized neophytes alien to Europe, 50% arrived after 1899, 25% after 1962 and 10% after 1989. At present, approximately 6.2 new species, that are capable of naturalization, are arriving each year. Most alien species have relatively restricted European distributions; half of all naturalized species occur in four or fewer countries/regions, whereas 70% of non-naturalized species occur in only one region. Alien species are drawn from 213 families, dominated by large global plant families which have a weedy tendency and have undergone major radiations in temperate regions (Asteraceae, Poaceae, Rosaceae, Fabaceae, Brassicaceae). There are 1567 genera, which have alien members in European countries, the commonest being globally-diverse genera comprising mainly urban and agricultural weeds (e.g., Amaranthus, Chenopodium and Solanum) or cultivated for ornamental purposes (Cotoneaster, the genus richest in alien species). Only a few large genera which have successfully invaded (e.g., Oenothera, Oxalis, Panicum, Helianthus) are predominantly of non-European origin. Conyza canadensis, Helianthus tuberosus and Robinia pseudoacacia are most widely distributed alien species. Of all naturalized aliens present in Europe, 64.1% occur in industrial habitats and 58.5% on arable land and in parks and gardens. Grasslands and woodlands are also highly invaded, with 37.4 and 31.5%, respectively, of all naturalized aliens in Europe present in these habitats. Mires, bogs and fens are least invaded; only approximately 10% of aliens in Europe occur there. Intentional introductions to Europe (62.8% of the total number of naturalized aliens) prevail over unintentional (37.2%). Ornamental and horticultural introductions escaped from cultivation account for the highest number of species, 52.2% of the total. Among unintentional introductions, contaminants of seed, mineral materials and other commodities are responsible for 1091 alien species introductions to Europe (76.6% of all species introduced unintentionally) and 363 species are assumed to have arrived as stowaways (directly associated with human transport but arriving independently of commodity). Most aliens in Europe have a native range in the same continent (28.6% of all donor region records are from another part of Europe where the plant is native); in terms of species numbers the contribution of Europe as a region of origin is 53.2%. Considering aliens to Europe separately, 45.8% of species have their native distribution in North and South America, 45.9% in Asia, 20.7% in Africa and 5.3% in Australasia. Based on species composition, European alien flora can be classified into five major groups: (1) north-western, comprising Scandinavia and the UK; (2) west-central, extending from Belgium and the Netherlands to Germany and Switzerland; (3) Baltic, including only the former Soviet Baltic states; (4) east-central, comprizing the remainder of central and eastern Europe; (5) southern, covering the entire Mediterranean region. The clustering patterns cut across some European bioclimatic zones; cultural factors such as regional trade links and traditional local preferences for crop, forestry and ornamental species are also important by influencing the introduced species pool. Finally, the paper evaluates a state of the art in the field of plant invasions in Europe, points to research gaps and outlines avenues of further research towards documenting alien plant invasions in Europe. The data are of varying quality and need to be further assessed with respect to the invasion status and residence time of the species included. This concerns especially the naturalized/casual status; so far, this information is available comprehensively for only 19 countries/regions of the 49 considered. Collating an integrated database on the alien flora of Europe can form a principal contribution to developing a European-wide management strategy of alien species
    corecore