84,006 research outputs found

    Effective sampling for large-scale automated writing evaluation systems

    Full text link
    Automated writing evaluation (AWE) has been shown to be an effective mechanism for quickly providing feedback to students. It has already seen wide adoption in enterprise-scale applications and is starting to be adopted in large-scale contexts. Training an AWE model has historically required a single batch of several hundred writing examples and human scores for each of them. This requirement limits large-scale adoption of AWE since human-scoring essays is costly. Here we evaluate algorithms for ensuring that AWE models are consistently trained using the most informative essays. Our results show how to minimize training set sizes while maximizing predictive performance, thereby reducing cost without unduly sacrificing accuracy. We conclude with a discussion of how to integrate this approach into large-scale AWE systems

    Automated Crowdturfing Attacks and Defenses in Online Review Systems

    Full text link
    Malicious crowdsourcing forums are gaining traction as sources of spreading misinformation online, but are limited by the costs of hiring and managing human workers. In this paper, we identify a new class of attacks that leverage deep learning language models (Recurrent Neural Networks or RNNs) to automate the generation of fake online reviews for products and services. Not only are these attacks cheap and therefore more scalable, but they can control rate of content output to eliminate the signature burstiness that makes crowdsourced campaigns easy to detect. Using Yelp reviews as an example platform, we show how a two phased review generation and customization attack can produce reviews that are indistinguishable by state-of-the-art statistical detectors. We conduct a survey-based user study to show these reviews not only evade human detection, but also score high on "usefulness" metrics by users. Finally, we develop novel automated defenses against these attacks, by leveraging the lossy transformation introduced by the RNN training and generation cycle. We consider countermeasures against our mechanisms, show that they produce unattractive cost-benefit tradeoffs for attackers, and that they can be further curtailed by simple constraints imposed by online service providers

    Data assimilation of in situ soil moisture measurements in hydrological models: first annual doctoral progress report, work plan and achievements

    Get PDF
    Water scarcity and the presence of water of good quality is a serious public concern since it determines the availability of water to society. Water scarcity especially in arid climates and due to extreme droughts related to climate change drive water use technologies such as irrigation to become more efficient and sustainable. Plant root water and nutrient uptake is one of the most important processes in subsurface unsaturated flow and transport modeling, as root uptake controls actual plant evapotranspiration, water recharge and nutrient leaching to the groundwater, and exerts a major influence on predictions of global climate models. To improve irrigation strategies, water flow needs to be accurately described using advanced monitoring and modeling. Our study focuses on the assimilation of hydrological data in hydrological models that predict water flow and solute (pollutants and salts) transport and water redistribution in agricultural soils under irrigation. Field plots of a potato farmer in a sandy region in Belgium were instrumented to continuously monitor soil moisture and water potential before, during and after irrigation in dry summer periods. The aim is to optimize the irrigation process by assimilating online sensor field data into process based models. Over the past year, we demonstrated the calibration and optimization of the Hydrus 1D model for an irrigated grassland on sandy soil. Direct and inverse calibration and optimization for both heterogeneous and homogeneous conceptualizations was applied. Results show that Hydrus 1D closely simulated soil water content at five depths as compared to water content measurements from soil moisture probes, by stepwise calibration and local sensivity analysis and optimization the Ks, n and α value in the calibration and optimization analysis. The errors of the model, expressed by deviations between observed and modeled soil water content were, however, different for each individual depth. The smallest differences between the observed value and soil-water content were attained when using an automated inverse optimization method. The choice of the initial parameter value can be optimized using a stepwise approach. Our results show that statistical evaluation coefficients (R2, Ce and RMSE) are suitable benchmarks to evaluate the performance of the model in reproducing the data. The degree of water stress simulated with Hydrus 1D suggested to increase irrigation at least one time, i.e. at the beginning of the simulation period and further distribute the amount of irrigation during the growing season, instead of using a huge amount of irrigation later in the season. In the next year, we will further look for to the best method (using soft data and methods for instance PTFs, EMI, Penetrometer) to derive and predict the spatial variability of soil hydraulic properties (saturated hydraulic conductivity) of the soil and link to crop yield at the field scale. Linear and non-linear pedotransfer functions (PTFs) have been assessed to predict penetrometer resistance of soils from their water status (matric potential, ψ and degree of saturation, S) and bulk density, ρb, and some other soil properties such as sand content, Ks etc. The geophysical EMI (electromagnetic induction) technique provides a versatile and robust field instrument for determining apparent soil electrical conductivity (ECa). ECa, a quick and reliable measurement, is one of ancillary properties (secondary information) of soil, can improve the spatial and temporal estimation of soil characteristics e.g., salinity, water content, texture, prosity and bulk density at different scales and depths. According to previous literature on penetrometer measurements, we determined the effective stress and used some models to find the relationships between soil properties, especially Ks, and penetrometer resistance as one of the prediction methods for Ks. The initial results obtained in the first yearshowed that a new data set would be necessary to validate the results of this part. In the third year, quasi 3D-modelling of water flow at the field scale will be conducted. In this modeling set -up, the field will be modeled as a collection of 1D-columns representing the different field conditions (combination of soil properties, groundwater depth, root zone depth). The measured soil properties are extrapolated over the entire field by linking them to the available spatially distributed data (such as the EMI-images). The data set of predicted Ks and other soil properties for the whole field constructed in the previous steps will be used for parameterising the model. Sensitivity analysis ‘SA’ is essential to the model optimization or parametrization process. To avoid overparameterization, the use of global sensitivity analysis (SA) will be investigated. In order to include multiple objectives (irrigation management parameters, costs, …) in the parameter optimization strategy, multi-objective techniques such as AMALGAM have been introduced. We will investigate multi-objective strategies in the irrigation optimization

    The Hierarchic treatment of marine ecological information from spatial networks of benthic platforms

    Get PDF
    Measuring biodiversity simultaneously in different locations, at different temporal scales, and over wide spatial scales is of strategic importance for the improvement of our understanding of the functioning of marine ecosystems and for the conservation of their biodiversity. Monitoring networks of cabled observatories, along with other docked autonomous systems (e.g., Remotely Operated Vehicles [ROVs], Autonomous Underwater Vehicles [AUVs], and crawlers), are being conceived and established at a spatial scale capable of tracking energy fluxes across benthic and pelagic compartments, as well as across geographic ecotones. At the same time, optoacoustic imaging is sustaining an unprecedented expansion in marine ecological monitoring, enabling the acquisition of new biological and environmental data at an appropriate spatiotemporal scale. At this stage, one of the main problems for an effective application of these technologies is the processing, storage, and treatment of the acquired complex ecological information. Here, we provide a conceptual overview on the technological developments in the multiparametric generation, storage, and automated hierarchic treatment of biological and environmental information required to capture the spatiotemporal complexity of a marine ecosystem. In doing so, we present a pipeline of ecological data acquisition and processing in different steps and prone to automation. We also give an example of population biomass, community richness and biodiversity data computation (as indicators for ecosystem functionality) with an Internet Operated Vehicle (a mobile crawler). Finally, we discuss the software requirements for that automated data processing at the level of cyber-infrastructures with sensor calibration and control, data banking, and ingestion into large data portals.Peer ReviewedPostprint (published version

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

    Interactive retrieval of video using pre-computed shot-shot similarities

    Get PDF
    A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision

    Exploiting low-cost 3D imagery for the purposes of detecting and analyzing pavement distresses

    Get PDF
    Road pavement conditions have significant impacts on safety, travel times, costs, and environmental effects. It is the responsibility of road agencies to ensure these conditions are kept in an acceptable state. To this end, agencies are tasked with implementing pavement management systems (PMSs) which effectively allocate resources towards maintenance and rehabilitation. These systems, however, require accurate data. Currently, most agencies rely on manual distress surveys and as a result, there is significant research into quick and low-cost pavement distress identification methods. Recent proposals have included the use of structure-from-motion techniques based on datasets from unmanned aerial vehicles (UAVs) and cameras, producing accurate 3D models and associated point clouds. The challenge with these datasets is then identifying and describing distresses. This paper focuses on utilizing images of pavement distresses in the city of Palermo, Italy produced by mobile phone cameras. The work aims at assessing the accuracy of using mobile phones for these surveys and also identifying strategies to segment generated 3D imagery by considering the use of algorithms for 3D Image segmentation to detect shapes from point clouds to enable measurement of physical parameters and severity assessment. Case studies are considered for pavement distresses defined by the measurement of the area affected such as different types of cracking and depressions. The use of mobile phones and the identification of these patterns on the 3D models provide further steps towards low-cost data acquisition and analysis for a PMS
    • …
    corecore