4,173 research outputs found

    A matter of words: NLP for quality evaluation of Wikipedia medical articles

    Get PDF
    Automatic quality evaluation of Web information is a task with many fields of applications and of great relevance, especially in critical domains like the medical one. We move from the intuition that the quality of content of medical Web documents is affected by features related with the specific domain. First, the usage of a specific vocabulary (Domain Informativeness); then, the adoption of specific codes (like those used in the infoboxes of Wikipedia articles) and the type of document (e.g., historical and technical ones). In this paper, we propose to leverage specific domain features to improve the results of the evaluation of Wikipedia medical articles. In particular, we evaluate the articles adopting an "actionable" model, whose features are related to the content of the articles, so that the model can also directly suggest strategies for improving a given article quality. We rely on Natural Language Processing (NLP) and dictionaries-based techniques in order to extract the bio-medical concepts in a text. We prove the effectiveness of our approach by classifying the medical articles of the Wikipedia Medicine Portal, which have been previously manually labeled by the Wiki Project team. The results of our experiments confirm that, by considering domain-oriented features, it is possible to obtain sensible improvements with respect to existing solutions, mainly for those articles that other approaches have less correctly classified. Other than being interesting by their own, the results call for further research in the area of domain specific features suitable for Web data quality assessment

    SMOTE: Synthetic Minority Over-sampling Technique

    Full text link
    An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy

    Covering problems in edge- and node-weighted graphs

    Full text link
    This paper discusses the graph covering problem in which a set of edges in an edge- and node-weighted graph is chosen to satisfy some covering constraints while minimizing the sum of the weights. In this problem, because of the large integrality gap of a natural linear programming (LP) relaxation, LP rounding algorithms based on the relaxation yield poor performance. Here we propose a stronger LP relaxation for the graph covering problem. The proposed relaxation is applied to designing primal-dual algorithms for two fundamental graph covering problems: the prize-collecting edge dominating set problem and the multicut problem in trees. Our algorithms are an exact polynomial-time algorithm for the former problem, and a 2-approximation algorithm for the latter problem, respectively. These results match the currently known best results for purely edge-weighted graphs.Comment: To appear in SWAT 201

    When Random Sampling Preserves Privacy

    Full text link
    Abstract. Many organizations such as the U.S. Census publicly release samples of data that they collect about private citizens. These datasets are first anonymized using various techniques and then a small sample is released so as to enable “do-it-yourself ” calculations. This paper investigates the privacy of the second step of this process: sampling. We observe that rare values – values that occur with low frequency in the table – can be problematic from a privacy perspective. To our knowledge, this is the first work that quantitatively examines the relationship between the number of rare values in a table and the privacy in a released random sample. If we require ɛ-privacy (where the larger ɛ is, the worse the privacy guarantee) with probability at least 1 − δ, we say that 1 a value is rare if it occurs in at most Õ ( ) rows of the table (ignoring log ɛ factors). If there are no rare values, then we establish a direct connection between sample size that is safe to release and privacy. Specifically, if we select each row of the table with probability at most ɛ then the sample is O(ɛ)-private with high probability. In the case that there are t rare values, then the sample is Õ(ɛδ/t)-private with probability at least 1 − δ.

    Horizon effects with surface waves on moving water

    Get PDF
    Surface waves on a stationary flow of water are considered, in a linear model that includes the surface tension of the fluid. The resulting gravity-capillary waves experience a rich array of horizon effects when propagating against the flow. In some cases three horizons (points where the group velocity of the wave reverses) exist for waves with a single laboratory frequency. Some of these effects are familiar in fluid mechanics under the name of wave blocking, but other aspects, in particular waves with negative co-moving frequency and the Hawking effect, were overlooked until surface waves were investigated as examples of analogue gravity [Sch\"utzhold R and Unruh W G 2002 Phys. Rev. D 66 044019]. A comprehensive presentation of the various horizon effects for gravity-capillary waves is given, with emphasis on the deep water/short wavelength case kh>>1 where many analytical results can be derived. A similarity of the state space of the waves to that of a thermodynamic system is pointed out.Comment: 30 pages, 15 figures. Minor change

    Influence of atmospheric conditions on the power production of utility-scale wind turbines in yaw misalignment

    Get PDF
    The intentional yaw misalignment of leading, upwind turbines in a wind farm, termed wake steering, has demonstrated potential as a collective control approach for wind farm power maximization. The optimal control strategy and the resulting effect of wake steering on wind farm power production are in part dictated by the power degradation of the upwind yaw misaligned wind turbines. In the atmospheric boundary layer, the wind speed and direction may vary significantly over the wind turbine rotor area, depending on atmospheric conditions and stability, resulting in freestream turbine power production which is asymmetric as a function of the direction of yaw misalignment and which varies during the diurnal cycle. In this study, we propose a model for the power production of a wind turbine in yaw misalignment based on aerodynamic blade elements, which incorporates the effects of wind speed and direction changes over the turbine rotor area in yaw misalignment. The proposed model can be used for the modeling of the angular velocity, aerodynamic torque, and power production of an arbitrary yaw misaligned wind turbine based on the incident velocity profile, wind turbine aerodynamic properties, and turbine control system. A field experiment is performed using multiple utility-scale wind turbines to characterize the power production of yawed freestream operating turbines depending on the wind conditions, and the model is validated using the experimental data. The resulting power production of a yaw misaligned variable speed wind turbine depends on a nonlinear interaction between the yaw misalignment, the atmospheric conditions, and the wind turbine control system

    Influence of atmospheric conditions on the power production of utility-scale wind turbines in yaw misalignment

    Get PDF
    The intentional yaw misalignment of leading, upwind turbines in a wind farm, termed wake steering, has demonstrated potential as a collective control approach for wind farm power maximization. The optimal control strategy, and resulting effect of wake steering on wind farm power production, are in part dictated by the power degradation of the upwind yaw misaligned wind turbines. In the atmospheric boundary layer, the wind speed and direction may vary significantly over the wind turbine rotor area, depending on atmospheric conditions and stability, resulting in freestream turbine power production which is asymmetric as a function of the direction of yaw misalignment and which varies during the diurnal cycle. In this study, we propose a model for the power production of a wind turbine in yaw misalignment based on aerodynamic blade elements which incorporates the effects of wind speed and direction changes over the turbine rotor area in yaw misalignment. A field experiment is performed using multiple utility-scale wind turbines to characterize the power production of yawed freestream operating turbines depending on the wind conditions, and the model is validated using the experimental data. The resulting power production of a yaw misaligned variable speed wind turbine depends on a nonlinear interaction between the yaw misalignment, the atmospheric conditions, and the wind turbine control system.Comment: 37 pages, 15 figure

    Bulk and surface switching in Mn-Fe-based Prussian Blue Analogues

    Get PDF
    Many Prussian Blue Analogues are known to show a thermally induced phase transition close to room temperature and a reversible, photo-induced phase transition at low temperatures. This work reports on magnetic measurements, X-ray photoemission and Raman spectroscopy on a particular class of these molecular heterobimetallic systems, specifically on Rb0.81Mn[Fe(CN)6]0.95_1.24H2O, Rb0.97Mn[Fe(CN)6]0.98_1.03H2O and Rb0.70Cu0.22Mn0.78[Fe(CN)6]0.86_2.05H2O, to investigate these transition phenomena both in the bulk of the material and at the sample surface. Results indicate a high degree of charge transfer in the bulk, while a substantially reduced conversion is found at the sample surface, even in case of a near perfect (Rb:Mn:Fe=1:1:1) stoichiometry. Thus, the intrinsic incompleteness of the charge transfer transition in these materials is found to be primarily due to surface reconstruction. Substitution of a large fraction of charge transfer active Mn ions by charge transfer inactive Cu ions leads to a proportional conversion reduction with respect to the maximum conversion that is still stoichiometrically possible and shows the charge transfer capability of metal centers to be quite robust upon inclusion of a neighboring impurity. Additionally, a 532 nm photo-induced metastable state, reminiscent of the high temperature Fe(III)Mn(II) ground state, is found at temperatures 50-100 K. The efficiency of photo-excitation to the metastable state is found to be maximized around 90 K. The photo-induced state is observed to relax to the low temperature Fe(II)Mn(III) ground state at a temperature of approximately 123 K.Comment: 12 pages, 8 figure
    corecore