3,921 research outputs found
DRONE : a tool to detect and repair directive defects in Java APIs documentation
Application programming interfaces (APIs) documentation is the official reference of the APIs. Defects in API documentation pose serious hurdles to their comprehension and usage. In this paper, we present DRONE, a tool that can automatically detect the directive defects in APIs documents and recommend repair solutions to fix them. Particularly, DRONE focuses on four defect types related to parameter usage constraints. To achieve this, DRONE leverages techniques from static program analysis, natural language processing and logic reasoning. The implementation is based on the Eclipse-plugin architecture, which provides an integrated user interface. Extensive experiments demonstrate the efficacy of the tool
A Two-Level Information Modelling Translation Methodology and Framework to Achieve Semantic Interoperability in Constrained GeoObservational Sensor Systems
As geographical observational data capture, storage and sharing technologies such as in situ remote monitoring systems and spatial data infrastructures evolve, the vision of a Digital Earth, first articulated by Al Gore in 1998 is getting ever closer. However, there are still many challenges and open research questions. For example, data quality, provenance and heterogeneity remain an issue due to the complexity of geo-spatial data and information representation.
Observational data are often inadequately semantically enriched by geo-observational information systems or spatial data infrastructures and so they often do not fully capture the true meaning of the associated datasets. Furthermore, data models underpinning these information systems are typically too rigid in their data representation to allow for the ever-changing and evolving nature of geo-spatial domain concepts. This impoverished approach to observational data representation reduces the ability of multi-disciplinary practitioners to share information in an interoperable and computable way.
The health domain experiences similar challenges with representing complex and evolving domain information concepts. Within any complex domain (such as Earth system science or health) two categories or levels of domain concepts exist. Those concepts that remain stable over a long period of time, and those concepts that are prone to change, as the domain knowledge evolves, and new discoveries are made. Health informaticians have developed a sophisticated two-level modelling systems design approach for electronic health documentation over many years, and with the use of archetypes, have shown how data, information, and knowledge interoperability among heterogenous systems can be achieved.
This research investigates whether two-level modelling can be translated from the health domain to the geo-spatial domain and applied to observing scenarios to achieve semantic interoperability within and between spatial data infrastructures, beyond what is possible with current state-of-the-art approaches.
A detailed review of state-of-the-art SDIs, geo-spatial standards and the two-level modelling methodology was performed. A cross-domain translation methodology was developed, and a proof-of-concept geo-spatial two-level modelling framework was defined and implemented. The Open Geospatial Consortium’s (OGC) Observations & Measurements (O&M) standard was re-profiled to aid investigation of the two-level information modelling approach. An evaluation of the method was undertaken using II specific use-case scenarios. Information modelling was performed using the two-level modelling method to show how existing historical ocean observing datasets can be expressed semantically and harmonized using two-level modelling. Also, the flexibility of the approach was investigated by applying the method to an air quality monitoring scenario using a technologically constrained monitoring sensor system.
This work has demonstrated that two-level modelling can be translated to the geospatial domain and then further developed to be used within a constrained technological sensor system; using traditional wireless sensor networks, semantic web technologies and Internet of Things based technologies. Domain specific evaluation results show that twolevel modelling presents a viable approach to achieve semantic interoperability between constrained geo-observational sensor systems and spatial data infrastructures for ocean observing and city based air quality observing scenarios. This has been demonstrated through the re-purposing of selected, existing geospatial data models and standards. However, it was found that re-using existing standards requires careful ontological analysis per domain concept and so caution is recommended in assuming the wider applicability of the approach.
While the benefits of adopting a two-level information modelling approach to geospatial information modelling are potentially great, it was found that translation to a new domain is complex. The complexity of the approach was found to be a barrier to adoption, especially in commercial based projects where standards implementation is low on implementation road maps and the perceived benefits of standards adherence are low. Arising from this work, a novel set of base software components, methods and fundamental geo-archetypes have been developed. However, during this work it was not possible to form the required rich community of supporters to fully validate geoarchetypes. Therefore, the findings of this work are not exhaustive, and the archetype models produced are only indicative. The findings of this work can be used as the basis to encourage further investigation and uptake of two-level modelling within the Earth system science and geo-spatial domain. Ultimately, the outcomes of this work are to recommend further development and evaluation of the approach, building on the positive results thus far, and the base software artefacts developed to support the approach
Heliophysics Event Knowledgebase for the Solar Dynamics Observatory and Beyond
The immense volume of data generated by the suite of instruments on SDO
requires new tools for efficient identifying and accessing data that is most
relevant to research investigations. We have developed the Heliophysics Events
Knowledgebase (HEK) to fill this need. The HEK system combines automated data
mining using feature-detection methods and high-performance visualization
systems for data markup. In addition, web services and clients are provided for
searching the resulting metadata, reviewing results, and efficiently accessing
the data. We review these components and present examples of their use with SDO
data.Comment: 17 pages, 4 figure
On Using Machine Learning to Identify Knowledge in API Reference Documentation
Using API reference documentation like JavaDoc is an integral part of
software development. Previous research introduced a grounded taxonomy that
organizes API documentation knowledge in 12 types, including knowledge about
the Functionality, Structure, and Quality of an API. We study how well modern
text classification approaches can automatically identify documentation
containing specific knowledge types. We compared conventional machine learning
(k-NN and SVM) and deep learning approaches trained on manually annotated Java
and .NET API documentation (n = 5,574). When classifying the knowledge types
individually (i.e., multiple binary classifiers) the best AUPRC was up to 87%.
The deep learning and SVM classifiers seem complementary. For four knowledge
types (Concept, Control, Pattern, and Non-Information), SVM clearly outperforms
deep learning which, on the other hand, is more accurate for identifying the
remaining types. When considering multiple knowledge types at once (i.e.,
multi-label classification) deep learning outperforms na\"ive baselines and
traditional machine learning achieving a MacroAUC up to 79%. We also compared
classifiers using embeddings pre-trained on generic text corpora and
StackOverflow but did not observe significant improvements. Finally, to assess
the generalizability of the classifiers, we re-tested them on a different,
unseen Python documentation dataset. Classifiers for Functionality, Concept,
Purpose, Pattern, and Directive seem to generalize from Java and .NET to Python
documentation. The accuracy related to the remaining types seems API-specific.
We discuss our results and how they inform the development of tools for
supporting developers sharing and accessing API knowledge. Published article:
https://doi.org/10.1145/3338906.333894
The NASA Exoplanet Archive: Data and Tools for Exoplanet Research
We describe the contents and functionality of the NASA Exoplanet Archive, a
database and tool set funded by NASA to support astronomers in the exoplanet
community. The current content of the database includes interactive tables
containing properties of all published exoplanets, Kepler planet candidates,
threshold-crossing events, data validation reports and target stellar
parameters, light curves from the Kepler and CoRoT missions and from several
ground-based surveys, and spectra and radial velocity measurements from the
literature. Tools provided to work with these data include a transit ephemeris
predictor, both for single planets and for observing locations, light curve
viewing and normalization utilities, and a periodogram and phased light curve
service. The archive can be accessed at
http://exoplanetarchive.ipac.caltech.edu.Comment: Accepted for publication in the Publications of the Astronomical
Society of the Pacific, 4 figure
- …