8,046 research outputs found
A systematic review of data quality issues in knowledge discovery tasks
Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
Enhanced imaging of microcalcifications in digital breast tomosynthesis through improved image-reconstruction algorithms
PURPOSE: We develop a practical, iterative algorithm for image-reconstruction
in under-sampled tomographic systems, such as digital breast tomosynthesis
(DBT).
METHOD: The algorithm controls image regularity by minimizing the image total
-variation (TpV), a function that reduces to the total variation when
or the image roughness when . Constraints on the image, such as
image positivity and estimated projection-data tolerance, are enforced by
projection onto convex sets (POCS). The fact that the tomographic system is
under-sampled translates to the mathematical property that many widely varied
resultant volumes may correspond to a given data tolerance. Thus the
application of image regularity serves two purposes: (1) reduction of the
number of resultant volumes out of those allowed by fixing the data tolerance,
finding the minimum image TpV for fixed data tolerance, and (2) traditional
regularization, sacrificing data fidelity for higher image regularity. The
present algorithm allows for this dual role of image regularity in
under-sampled tomography.
RESULTS: The proposed image-reconstruction algorithm is applied to three
clinical DBT data sets. The DBT cases include one with microcalcifications and
two with masses.
CONCLUSION: Results indicate that there may be a substantial advantage in
using the present image-reconstruction algorithm for microcalcification
imaging.Comment: Submitted to Medical Physic
Tools for producing formal specifications : a view of current architectures and future directions
During the last decade, one important contribution towards requirements engineering has been the advent of formal specification languages. They offer a well-defined notation that can improve consistency and avoid ambiguity in specifications.
However, the process of obtaining formal specifications that are consistent with the requirements is itself a difficult activity. Hence various researchers are developing systems that aid the transition from informal to formal specifications.
The kind of problems tackled and the contributions made by these proposed systems are very diverse. This paper brings these studies together to provide a vision for future architectures that aim to aid the transition from informal to formal specifications. The new architecture, which is based on the strengths of existing studies, tackles a
number of key issues in requirements engineering such as identifying ambiguities, incompleteness, and reusability.
The paper concludes with a discussion of the research problems that need to be addressed in order to realise the proposed architecture
OpenJML: Software verification for Java 7 using JML, OpenJDK, and Eclipse
OpenJML is a tool for checking code and specifications of Java programs. We
describe our experience building the tool on the foundation of JML, OpenJDK and
Eclipse, as well as on many advances in specification-based software
verification. The implementation demonstrates the value of integrating
specification tools directly in the software development IDE and in automating
as many tasks as possible. The tool, though still in progress, has now been
used for several college-level courses on software specification and
verification and for small-scale studies on existing Java programs.Comment: In Proceedings F-IDE 2014, arXiv:1404.578
Testing the binary hypothesis: pulsar timing constraints on supermassive black hole binary candidates
The advent of time domain astronomy is revolutionizing our understanding of
the Universe. Programs such as the Catalina Real-time Transient Survey (CRTS)
or the Palomar Transient Factory (PTF) surveyed millions of objects for several
years, allowing variability studies on large statistical samples. The
inspection of 250k quasars in CRTS resulted in a catalogue of 111
potentially periodic sources, put forward as supermassive black hole binary
(SMBHB) candidates. A similar investigation on PTF data yielded 33 candidates
from a sample of 35k quasars. Working under the SMBHB hypothesis, we
compute the implied SMBHB merger rate and we use it to construct the expected
gravitational wave background (GWB) at nano-Hz frequencies, probed by pulsar
timing arrays (PTAs). After correcting for incompleteness and assuming virial
mass estimates, we find that the GWB implied by the CRTS sample exceeds the
current most stringent PTA upper limits by almost an order of magnitude. After
further correcting for the implicit bias in virial mass measurements, the
implied GWB drops significantly but is still in tension with the most stringent
PTA upper limits. Similar results hold for the PTF sample. Bayesian model
selection shows that the null hypothesis (whereby the candidates are false
positives) is preferred over the binary hypothesis at about and
for the CRTS and PTF samples respectively. Although not decisive,
our analysis highlights the potential of PTAs as astrophysical probes of
individual SMBHB candidates and indicates that the CRTS and PTF samples are
likely contaminated by several false positives.Comment: 14 pages, 11 figures, 3 tables. Resubmitted to the Astrophysical
Journal after some major revision of the results including a proper estimate
of the intrinsic mass of the binary candidate
- …