5,896 research outputs found

    GOGGLES: Automatic Image Labeling with Affinity Coding

    Full text link
    Generating large labeled training data is becoming the biggest bottleneck in building and deploying supervised machine learning models. Recently, the data programming paradigm has been proposed to reduce the human cost in labeling training data. However, data programming relies on designing labeling functions which still requires significant domain expertise. Also, it is prohibitively difficult to write labeling functions for image datasets as it is hard to express domain knowledge using raw features for images (pixels). We propose affinity coding, a new domain-agnostic paradigm for automated training data labeling. The core premise of affinity coding is that the affinity scores of instance pairs belonging to the same class on average should be higher than those of pairs belonging to different classes, according to some affinity functions. We build the GOGGLES system that implements affinity coding for labeling image datasets by designing a novel set of reusable affinity functions for images, and propose a novel hierarchical generative model for class inference using a small development set. We compare GOGGLES with existing data programming systems on 5 image labeling tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a minimum of 71% to a maximum of 98% without requiring any extensive human annotation. In terms of end-to-end performance, GOGGLES outperforms the state-of-the-art data programming system Snuba by 21% and a state-of-the-art few-shot learning technique by 5%, and is only 7% away from the fully supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management of Dat

    EXTRACTION OF UNDERLYING GEOLOGICAL STRUCTURE FROM SEISMIC DATA USING DATA MINING TECHNIQUES

    Get PDF
    The development of seismic-imaging technology has substantially improved the exploration of subsurface deposits of crude oil, natural gas and minerals. Recent advances in data capture, processing power and storage capabilities have enabled us to analyze large volumes of seismic data. In this study we report on the implementation of machine learning and data mining techniques for analysis of seismic data to reveal salt deposits underneath the soil. Several seismic attributes have been extracted from these datasets. Using information gain, the best six attributes (homogeneity, contrast, energy, median, peaks and average energy) have been selected for further classification. Finally we compared the results obtained using four different clustering techniques: k-means algorithm, expectation maximization algorithm, min-cut algorithm and Euclidean clustering.Computer Science, Department o

    Reference face graph for face recognition

    Get PDF
    Face recognition has been studied extensively; however, real-world face recognition still remains a challenging task. The demand for unconstrained practical face recognition is rising with the explosion of online multimedia such as social networks, and video surveillance footage where face analysis is of significant importance. In this paper, we approach face recognition in the context of graph theory. We recognize an unknown face using an external reference face graph (RFG). An RFG is generated and recognition of a given face is achieved by comparing it to the faces in the constructed RFG. Centrality measures are utilized to identify distinctive faces in the reference face graph. The proposed RFG-based face recognition algorithm is robust to the changes in pose and it is also alignment free. The RFG recognition is used in conjunction with DCT locality sensitive hashing for efficient retrieval to ensure scalability. Experiments are conducted on several publicly available databases and the results show that the proposed approach outperforms the state-of-the-art methods without any preprocessing necessities such as face alignment. Due to the richness in the reference set construction, the proposed method can also handle illumination and expression variation

    Data Mining Technique for Predicting Telecommunications Industry Customer Churn Using both Descriptive and Predictive Algorithms

    Get PDF
    As markets have become increasingly saturated, companies have acknowledged that their business strategies need to focuson identifying those customers who are most likely to churn. It is becoming common knowledge in business, that retainingexisting customers is the best core marketing strategy to survive in industry. In this research, both descriptive and predictivedata mining techniques were used to determine the calling behaviour of subscribers and to recognise subscribers with highprobability of churn in a telecommunications company subscriber database. First a data model for the input data variablesobtained from the subscriber database was developed. Then Simple K-Means and Expected Maximization (EM) clusteringalgorithms were used for the clustering stage, while Decision Stump, M5P and RepTree Decision Tree algorithms were usedfor the classification stage. The best algorithms in both the clustering and classification stages were used for the predictionprocess where customers that were likely to churn were identified.Keywords: customer churn; prediction; clustering; classificatio

    Geometric and photometric affine invariant image registration

    Get PDF
    This thesis aims to present a solution to the correspondence problem for the registration of wide-baseline images taken from uncalibrated cameras. We propose an affine invariant descriptor that combines the geometry and photometry of the scene to find correspondences between both views. The geometric affine invariant component of the descriptor is based on the affine arc-length metric, whereas the photometry is analysed by invariant colour moments. A graph structure represents the spatial distribution of the primitive features; i.e. nodes correspond to detected high-curvature points, whereas arcs represent connectivities by extracted contours. After matching, we refine the search for correspondences by using a maximum likelihood robust algorithm. We have evaluated the system over synthetic and real data. The method is endemic to propagation of errors introduced by approximations in the system.BAE SystemsSelex Sensors and Airborne System
    • …
    corecore