60 research outputs found

    Object-oriented query language facilitating construction of new objects

    Get PDF
    In object-oriented database systems, messages can be used to manipulate the database; however, a query language is still a required component of any kind of database system. In the paper, we describe a query language for object-oriented databases where both objects as well as behaviour defined in them are handled. Not only existing objects are manipulated; the introduction of new relationships and new objects constructed out of existing ones is also facilitated. The operations supported in the described query language subsumes those of the relational algebra aiming at a more powerful query language than the relational algebra. Among the additional operators, there is an operator that handles the application of an aggregate function on objects in an operand while still having the result possessing the characteristics of an operand. The result of a query as well as the operands are considered to have a pair of sets, a set of objects and a set of message expressions; where a message expression is a sequence of messages. A message expression handles both stored and derived values and hence provides a full computational power without having an embedded query language with impedance mismatch. Therefore the closure property is maintained by having the result of a query possessing the characteristics of an operand. Furthermore, we define a set of objects and derive a set of message expressions for every class; hence any class can be an operand. Moreover, the result of a query has the characteristics of a class and its superclass/subclass relationships with the operands are established to make it persistent. © 1993

    BreCaHAD: A dataset for breast cancer histopathological annotation and diagnosis

    Get PDF
    Objectives: Histopathological tissue analysis by a pathologist determines the diagnosis and prognosis of most tumors, such as breast cancer. To estimate the aggressiveness of cancer, a pathologist evaluates the microscopic appearance of a biopsied tissue sample based on morphological features which have been correlated with patient outcome. Data description: This paper introduces a dataset of 162 breast cancer histopathology images, namely the breast cancer histopathological annotation and diagnosis dataset (BreCaHAD) which allows researchers to optimize and evaluate the usefulness of their proposed methods. The dataset includes various malignant cases. The task associated with this dataset is to automatically classify histological structures in these hematoxylin and eosin (H&E) stained images into six classes, namely mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule. By providing this dataset to the biomedical imaging community, we hope to encourage researchers in computer vision, machine learning and medical fields to contribute and develop methods/tools for automatic detection and diagnosis of cancerous regions in breast cancer histology images. © 2019 The Author(s)

    Deepfield connect, an innovative decision support system for crops irrigation management under Mediterranean conditions

    Get PDF
    The irrigation management, in the Mediterranean region, represents an important technique useful to reach sustainable yield and improve the quality of the crop. The use of decision support systems and water saving techniques has gained importance during the last decades mainly in arid and semiarid countries where water is considered a precious resource. DeepField Connect by BOSCH is an innovative tool able to support farmers in irrigation management and consists of three main parts: hardware (sensors, device-to-web-data logger and thermo-hygrometer), algorithm and graphic use interface (app). This system is based on GIS analysis, which represents the most innovative and functional tool for such studies, which provides a mapping of soil hydrological characteristics at the regional level. We used, as a reference, soil data analysis obtained at Regional level from the ACLA II Project. In this way, the system creates an interactive mapping system, matching each point of the Apulian surface, in particular, the texture composition of the soil and the values of the hydrological constants (wilting point, WP and field capacity FC), for irrigation planning. These data are integrated with the recharging point (RP) a value calculated for the main regional irrigated crop which represents the level of soil moisture that, together with FC, represent the range of plant-available water. Besides, this tool provides different irrigation strategies such as deficit irrigation or complete restitution of evapotranspiration losses, according to farmer needs. DeepField Connect by BOSCH transmits the data via the Bosch Cloud to the smartphone. This allows to keep track of fields at any given time and to provide assistance in: when to irrigate and which irrigation volumes to use. This intelligent system can be considered as the application of one of the best practices that the agricultural sector can implement to improve its environmental performance and contribute to sustainable food production

    A Dynamic Ontology Mapping Architecture for a Grid Database System

    Full text link
    Abstract — Most large-scale heterogeneous distributed computing systems, such as Grids, rely on Service Oriented Architectures (SOA) to interact with others in different platforms and computing languages. However, we still need to solve the semantic heterogeneity problem of data; we must interpret the data from different systems in some semantically related ways. Ontologies are the most common and well-accepted methodology to handle this problem at multiple levels of granularities across different systems. Nevertheless, using ontologies in a dynamic environment, such as a Grid, to share some common concepts is still a challenge. It is difficult to keep a static mapping between ontologies; the corresponding semantic mapping changes must occur consistently. Therefore, we adopt the concept of Tuple Space and propose a flexible approach for managing ontologies in a Grid. It enables systems and users to interoperate semantically and dynamically by sharing and managing the concepts and semantic ontology mappings in a flexible approach. I

    MCNN-LSTM: Combining CNN and LSTM to classify multi-class text in imbalanced news data

    Get PDF
    Searching, retrieving, and arranging text in ever-larger document collections necessitate more efficient information processing algorithms. Document categorization is a crucial component of various information processing systems for supervised learning. As the quantity of documents grows, the performance of classic supervised classifiers has deteriorated because of the number of document categories. Assigning documents to a predetermined set of classes is called text classification. It is utilized extensively in a wide range of data-intensive applications. However, the fact that real-world implementations of these models are plagued with shortcomings begs for more investigation. Imbalanced datasets hinder the most prevalent high-performance algorithms. In this paper, we propose an approach name multi-class Convolutional Neural Network (MCNN)-Long Short-Time Memory (LSTM), which combines two deep learning techniques, Convolutional Neural Network (CNN) and Long Short-Time Memory, for text classification in news data. CNN's are used as feature extractors for the LSTMs on text input data and have the spatial structure of words in a sentence, paragraph, or document. The dataset is also imbalanced, and we use the Tomek-Link algorithm to balance the dataset and then apply our model, which shows better performance in terms of F1-score (98%) and Accuracy (99.71%) than the existing works. The combination of deep learning techniques used in our approach is ideal for the classification of imbalanced datasets with underrepresented categories. Hence, our method outperformed other machine learning algorithms in text classification by a large margin. We also compare our results with traditional machine learning algorithms in terms of imbalanced and balanced datasets

    Model inference for spreadsheets

    Get PDF
    Many errors in spreadsheet formulas can be avoided if spreadsheets are built automati- cally from higher-level models that can encode and enforce consistency constraints in the generated spreadsheets. Employing this strategy for legacy spreadsheets is dificult, because the model has to be reverse engineered from an existing spreadsheet and existing data must be transferred into the new model-generated spreadsheet. We have developed and implemented a technique that automatically infers relational schemas from spreadsheets. This technique uses particularities from the spreadsheet realm to create better schemas. We have evaluated this technique in two ways: First, we have demonstrated its appli- cability by using it on a set of real-world spreadsheets. Second, we have run an empirical study with users. The study has shown that the results produced by our technique are comparable to the ones developed by experts starting from the same (legacy) spreadsheet data. Although relational schemas are very useful to model data, they do not t well spreadsheets as they do not allow to express layout. Thus, we have also introduced a mapping between relational schemas and ClassSheets. A ClassSheet controls further changes to the spreadsheet and safeguards it against a large class of formula errors. The developed tool is a contribution to spreadsheet (reverse) engineering, because it lls an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.We would like to thank Orlando Belo for his help on running and analyzing the empirical study. We would also like to thank Paulo Azevedo for his help in conducting the statistical analysis of our empirical study. We would also like to thank the anonymous reviewers for their suggestions which helped us to improve the paper. This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-010048. The first author was also supported by FCT grant SFRH/BPD/73358/2010

    Representative transcript sets for evaluating a translational initiation sites predictor

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Translational initiation site (TIS) prediction is a very important and actively studied topic in bioinformatics. In order to complete a comparative analysis, it is desirable to have several benchmark data sets which can be used to test the effectiveness of different algorithms. An ideal benchmark data set should be reliable, representative and readily available. Preferably, proteins encoded by members of the data set should also be representative of the protein population actually expressed in cellular specimens.</p> <p>Results</p> <p>In this paper, we report a general algorithm for constructing a reliable sequence collection that only includes mRNA sequences whose corresponding protein products present an average profile of the general protein population of a given organism, with respect to three major structural parameters. Four representative transcript collections, each derived from a model organism, have been obtained following the algorithm we propose. Evaluation of these data sets shows that they are reasonable representations of the spectrum of proteins obtained from cellular proteomic studies. Six state-of-the-art predictors have been used to test the usefulness of the construction algorithm that we proposed. Comparative study which reports the predictors' performance on our data set as well as three other existing benchmark collections has demonstrated the actual merits of our data sets as benchmark testing collections.</p> <p>Conclusion</p> <p>The proposed data set construction algorithm has demonstrated its property of being a general and widely applicable scheme. Our comparison with published proteomic studies has shown that the expression of our data set of transcripts generates a polypeptide population that is representative of that obtained from evaluation of biological specimens. Our data set thus represents "real world" transcripts that will allow more accurate evaluation of algorithms dedicated to identification of TISs, as well as other translational regulatory motifs within mRNA sequences. The algorithm proposed by us aims at compiling a redundancy-free data set by removing redundant copies of homologous proteins. The existence of such data sets may be useful for conducting statistical analyses of protein sequence-structure relations. At the current stage, our approach's focus is to obtain an "average" protein data set for any particular organism without posing much selection bias. However, with the three major protein structural parameters deeply integrated into the scheme, it would be a trivial task to extend the current method for obtaining a more selective protein data set, which may facilitate the study of some particular protein structure.</p
    corecore