8,046 research outputs found

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Enhanced imaging of microcalcifications in digital breast tomosynthesis through improved image-reconstruction algorithms

    Full text link
    PURPOSE: We develop a practical, iterative algorithm for image-reconstruction in under-sampled tomographic systems, such as digital breast tomosynthesis (DBT). METHOD: The algorithm controls image regularity by minimizing the image total pp-variation (TpV), a function that reduces to the total variation when p=1.0p=1.0 or the image roughness when p=2.0p=2.0. Constraints on the image, such as image positivity and estimated projection-data tolerance, are enforced by projection onto convex sets (POCS). The fact that the tomographic system is under-sampled translates to the mathematical property that many widely varied resultant volumes may correspond to a given data tolerance. Thus the application of image regularity serves two purposes: (1) reduction of the number of resultant volumes out of those allowed by fixing the data tolerance, finding the minimum image TpV for fixed data tolerance, and (2) traditional regularization, sacrificing data fidelity for higher image regularity. The present algorithm allows for this dual role of image regularity in under-sampled tomography. RESULTS: The proposed image-reconstruction algorithm is applied to three clinical DBT data sets. The DBT cases include one with microcalcifications and two with masses. CONCLUSION: Results indicate that there may be a substantial advantage in using the present image-reconstruction algorithm for microcalcification imaging.Comment: Submitted to Medical Physic

    Tools for producing formal specifications : a view of current architectures and future directions

    Get PDF
    During the last decade, one important contribution towards requirements engineering has been the advent of formal specification languages. They offer a well-defined notation that can improve consistency and avoid ambiguity in specifications. However, the process of obtaining formal specifications that are consistent with the requirements is itself a difficult activity. Hence various researchers are developing systems that aid the transition from informal to formal specifications. The kind of problems tackled and the contributions made by these proposed systems are very diverse. This paper brings these studies together to provide a vision for future architectures that aim to aid the transition from informal to formal specifications. The new architecture, which is based on the strengths of existing studies, tackles a number of key issues in requirements engineering such as identifying ambiguities, incompleteness, and reusability. The paper concludes with a discussion of the research problems that need to be addressed in order to realise the proposed architecture

    OpenJML: Software verification for Java 7 using JML, OpenJDK, and Eclipse

    Full text link
    OpenJML is a tool for checking code and specifications of Java programs. We describe our experience building the tool on the foundation of JML, OpenJDK and Eclipse, as well as on many advances in specification-based software verification. The implementation demonstrates the value of integrating specification tools directly in the software development IDE and in automating as many tasks as possible. The tool, though still in progress, has now been used for several college-level courses on software specification and verification and for small-scale studies on existing Java programs.Comment: In Proceedings F-IDE 2014, arXiv:1404.578

    A Scenario-Driven Approach to Trace Dependency Analysis

    Get PDF

    Testing the binary hypothesis: pulsar timing constraints on supermassive black hole binary candidates

    Get PDF
    The advent of time domain astronomy is revolutionizing our understanding of the Universe. Programs such as the Catalina Real-time Transient Survey (CRTS) or the Palomar Transient Factory (PTF) surveyed millions of objects for several years, allowing variability studies on large statistical samples. The inspection of \approx250k quasars in CRTS resulted in a catalogue of 111 potentially periodic sources, put forward as supermassive black hole binary (SMBHB) candidates. A similar investigation on PTF data yielded 33 candidates from a sample of \approx35k quasars. Working under the SMBHB hypothesis, we compute the implied SMBHB merger rate and we use it to construct the expected gravitational wave background (GWB) at nano-Hz frequencies, probed by pulsar timing arrays (PTAs). After correcting for incompleteness and assuming virial mass estimates, we find that the GWB implied by the CRTS sample exceeds the current most stringent PTA upper limits by almost an order of magnitude. After further correcting for the implicit bias in virial mass measurements, the implied GWB drops significantly but is still in tension with the most stringent PTA upper limits. Similar results hold for the PTF sample. Bayesian model selection shows that the null hypothesis (whereby the candidates are false positives) is preferred over the binary hypothesis at about 2.3σ2.3\sigma and 3.6σ3.6\sigma for the CRTS and PTF samples respectively. Although not decisive, our analysis highlights the potential of PTAs as astrophysical probes of individual SMBHB candidates and indicates that the CRTS and PTF samples are likely contaminated by several false positives.Comment: 14 pages, 11 figures, 3 tables. Resubmitted to the Astrophysical Journal after some major revision of the results including a proper estimate of the intrinsic mass of the binary candidate
    corecore