4,173 research outputs found
A matter of words: NLP for quality evaluation of Wikipedia medical articles
Automatic quality evaluation of Web information is a task with many fields of
applications and of great relevance, especially in critical domains like the
medical one. We move from the intuition that the quality of content of medical
Web documents is affected by features related with the specific domain. First,
the usage of a specific vocabulary (Domain Informativeness); then, the adoption
of specific codes (like those used in the infoboxes of Wikipedia articles) and
the type of document (e.g., historical and technical ones). In this paper, we
propose to leverage specific domain features to improve the results of the
evaluation of Wikipedia medical articles. In particular, we evaluate the
articles adopting an "actionable" model, whose features are related to the
content of the articles, so that the model can also directly suggest strategies
for improving a given article quality. We rely on Natural Language Processing
(NLP) and dictionaries-based techniques in order to extract the bio-medical
concepts in a text. We prove the effectiveness of our approach by classifying
the medical articles of the Wikipedia Medicine Portal, which have been
previously manually labeled by the Wiki Project team. The results of our
experiments confirm that, by considering domain-oriented features, it is
possible to obtain sensible improvements with respect to existing solutions,
mainly for those articles that other approaches have less correctly classified.
Other than being interesting by their own, the results call for further
research in the area of domain specific features suitable for Web data quality
assessment
SMOTE: Synthetic Minority Over-sampling Technique
An approach to the construction of classifiers from imbalanced datasets is
described. A dataset is imbalanced if the classification categories are not
approximately equally represented. Often real-world data sets are predominately
composed of "normal" examples with only a small percentage of "abnormal" or
"interesting" examples. It is also the case that the cost of misclassifying an
abnormal (interesting) example as a normal example is often much higher than
the cost of the reverse error. Under-sampling of the majority (normal) class
has been proposed as a good means of increasing the sensitivity of a classifier
to the minority class. This paper shows that a combination of our method of
over-sampling the minority (abnormal) class and under-sampling the majority
(normal) class can achieve better classifier performance (in ROC space) than
only under-sampling the majority class. This paper also shows that a
combination of our method of over-sampling the minority class and
under-sampling the majority class can achieve better classifier performance (in
ROC space) than varying the loss ratios in Ripper or class priors in Naive
Bayes. Our method of over-sampling the minority class involves creating
synthetic minority class examples. Experiments are performed using C4.5, Ripper
and a Naive Bayes classifier. The method is evaluated using the area under the
Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy
Covering problems in edge- and node-weighted graphs
This paper discusses the graph covering problem in which a set of edges in an
edge- and node-weighted graph is chosen to satisfy some covering constraints
while minimizing the sum of the weights. In this problem, because of the large
integrality gap of a natural linear programming (LP) relaxation, LP rounding
algorithms based on the relaxation yield poor performance. Here we propose a
stronger LP relaxation for the graph covering problem. The proposed relaxation
is applied to designing primal-dual algorithms for two fundamental graph
covering problems: the prize-collecting edge dominating set problem and the
multicut problem in trees. Our algorithms are an exact polynomial-time
algorithm for the former problem, and a 2-approximation algorithm for the
latter problem, respectively. These results match the currently known best
results for purely edge-weighted graphs.Comment: To appear in SWAT 201
When Random Sampling Preserves Privacy
Abstract. Many organizations such as the U.S. Census publicly release samples of data that they collect about private citizens. These datasets are first anonymized using various techniques and then a small sample is released so as to enable “do-it-yourself ” calculations. This paper investigates the privacy of the second step of this process: sampling. We observe that rare values – values that occur with low frequency in the table – can be problematic from a privacy perspective. To our knowledge, this is the first work that quantitatively examines the relationship between the number of rare values in a table and the privacy in a released random sample. If we require ɛ-privacy (where the larger ɛ is, the worse the privacy guarantee) with probability at least 1 − δ, we say that 1 a value is rare if it occurs in at most Õ ( ) rows of the table (ignoring log ɛ factors). If there are no rare values, then we establish a direct connection between sample size that is safe to release and privacy. Specifically, if we select each row of the table with probability at most ɛ then the sample is O(ɛ)-private with high probability. In the case that there are t rare values, then the sample is Õ(ɛδ/t)-private with probability at least 1 − δ.
Horizon effects with surface waves on moving water
Surface waves on a stationary flow of water are considered, in a linear model
that includes the surface tension of the fluid. The resulting gravity-capillary
waves experience a rich array of horizon effects when propagating against the
flow. In some cases three horizons (points where the group velocity of the wave
reverses) exist for waves with a single laboratory frequency. Some of these
effects are familiar in fluid mechanics under the name of wave blocking, but
other aspects, in particular waves with negative co-moving frequency and the
Hawking effect, were overlooked until surface waves were investigated as
examples of analogue gravity [Sch\"utzhold R and Unruh W G 2002 Phys. Rev. D 66
044019]. A comprehensive presentation of the various horizon effects for
gravity-capillary waves is given, with emphasis on the deep water/short
wavelength case kh>>1 where many analytical results can be derived. A
similarity of the state space of the waves to that of a thermodynamic system is
pointed out.Comment: 30 pages, 15 figures. Minor change
Influence of atmospheric conditions on the power production of utility-scale wind turbines in yaw misalignment
The intentional yaw misalignment of leading, upwind turbines in a wind farm, termed wake steering, has demonstrated potential as a collective control approach for wind farm power maximization. The optimal control strategy and the resulting effect of wake steering on wind farm power production are in part dictated by the power degradation of the upwind yaw misaligned wind turbines. In the atmospheric boundary layer, the wind speed and direction may vary significantly over the wind turbine rotor area, depending on atmospheric conditions and stability, resulting in freestream turbine power production which is asymmetric as a function of the direction of yaw misalignment and which varies during the diurnal cycle. In this study, we propose a model for the power production of a wind turbine in yaw misalignment based on aerodynamic blade elements, which incorporates the effects of wind speed and direction changes over the turbine rotor area in yaw misalignment. The proposed model can be used for the modeling of the angular velocity, aerodynamic torque, and power production of an arbitrary yaw misaligned wind turbine based on the incident velocity profile, wind turbine aerodynamic properties, and turbine control system. A field experiment is performed using multiple utility-scale wind turbines to characterize the power production of yawed freestream operating turbines depending on the wind conditions, and the model is validated using the experimental data. The resulting power production of a yaw misaligned variable speed wind turbine depends on a nonlinear interaction between the yaw misalignment, the atmospheric conditions, and the wind turbine control system
Influence of atmospheric conditions on the power production of utility-scale wind turbines in yaw misalignment
The intentional yaw misalignment of leading, upwind turbines in a wind farm,
termed wake steering, has demonstrated potential as a collective control
approach for wind farm power maximization. The optimal control strategy, and
resulting effect of wake steering on wind farm power production, are in part
dictated by the power degradation of the upwind yaw misaligned wind turbines.
In the atmospheric boundary layer, the wind speed and direction may vary
significantly over the wind turbine rotor area, depending on atmospheric
conditions and stability, resulting in freestream turbine power production
which is asymmetric as a function of the direction of yaw misalignment and
which varies during the diurnal cycle. In this study, we propose a model for
the power production of a wind turbine in yaw misalignment based on aerodynamic
blade elements which incorporates the effects of wind speed and direction
changes over the turbine rotor area in yaw misalignment. A field experiment is
performed using multiple utility-scale wind turbines to characterize the power
production of yawed freestream operating turbines depending on the wind
conditions, and the model is validated using the experimental data. The
resulting power production of a yaw misaligned variable speed wind turbine
depends on a nonlinear interaction between the yaw misalignment, the
atmospheric conditions, and the wind turbine control system.Comment: 37 pages, 15 figure
Bulk and surface switching in Mn-Fe-based Prussian Blue Analogues
Many Prussian Blue Analogues are known to show a thermally induced phase
transition close to room temperature and a reversible, photo-induced phase
transition at low temperatures. This work reports on magnetic measurements,
X-ray photoemission and Raman spectroscopy on a particular class of these
molecular heterobimetallic systems, specifically on
Rb0.81Mn[Fe(CN)6]0.95_1.24H2O, Rb0.97Mn[Fe(CN)6]0.98_1.03H2O and
Rb0.70Cu0.22Mn0.78[Fe(CN)6]0.86_2.05H2O, to investigate these transition
phenomena both in the bulk of the material and at the sample surface. Results
indicate a high degree of charge transfer in the bulk, while a substantially
reduced conversion is found at the sample surface, even in case of a near
perfect (Rb:Mn:Fe=1:1:1) stoichiometry. Thus, the intrinsic incompleteness of
the charge transfer transition in these materials is found to be primarily due
to surface reconstruction. Substitution of a large fraction of charge transfer
active Mn ions by charge transfer inactive Cu ions leads to a proportional
conversion reduction with respect to the maximum conversion that is still
stoichiometrically possible and shows the charge transfer capability of metal
centers to be quite robust upon inclusion of a neighboring impurity.
Additionally, a 532 nm photo-induced metastable state, reminiscent of the high
temperature Fe(III)Mn(II) ground state, is found at temperatures 50-100 K. The
efficiency of photo-excitation to the metastable state is found to be maximized
around 90 K. The photo-induced state is observed to relax to the low
temperature Fe(II)Mn(III) ground state at a temperature of approximately 123 K.Comment: 12 pages, 8 figure
- …