Search CORE

427,087 research outputs found

Recommended from our members

Efficient Latent Semantic Extraction from Cross Domain Data with Declarative Language

Author: Li Mingda
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

With large amounts of data continuously generated by intelligence devices, efficient analysis of huge data collections to unearth valuable insights has become one of the most elusive challenges for both academia and industry. The key elements to establishing a scalable analyzing framework should involve (1) an intuitive interface to describe the desired outcome, (2) a well-crafted model that integrates all available information sources to derive the optimal outcome and (3) an efficient algorithm that performs the data integration and extraction within a reasonable amount of time. In this dissertation, we address these challenges by proposing (1) a cross-language interface for a succinct expression of recursive queries, (2) a domain specific neural network model that can incorporate information of multiple modalities, and (3) a sample efficient training method that can be used even for extremely-large output-class classifiers. Our contributions in this thesis are thus threefold: First, for the ubiquitous recursive queries in advanced data analytics, on top of BigDatalog and Apache Spark, we design a succinct and expressive analytics tool encapsulating the functionality and classical algorithms of Datalog, a quintessential logic programming language. We provide the Logical Library (LLib), a Spark MLlib-like high-level API supporting a wide range of recursive algorithms and the Logical DataFrame (LFrame), an extension to Spark DataFrame supporting both relational and logical operations. The LLib and LFrame enable smooth collaborations between logical applications and other Spark libraries and cross-language logical programming in Scala, Java, or Python. Second, we utilize variants of recurrent neural network (RNN) to incorporate some enlightening sequential information overlooked by the conventional works in two different domains including Spoken Language Understanding (SLU) and Internet Embedding (IE). In SLU, we address the problem caused by solely relying on the first best interpretation (hypothesis) of an audio command through a series of new architectures comprising bidirectional LSTM and pooling layers to jointly utilize the other hypotheses' texts or embedding vectors, which are neglected but with valuable information missed by the first best hypothesis. In IE, we propose the DIP, an extension of RNN, to build up the internet coordinate system with the IP address sequences, which are also unnoticed in conventional distance-based internet embedding algorithms but encode structural information of the network. Both DIP and the integration of all hypotheses bring significant performance improvements for the corresponding downstream tasks. Finally, we investigate the training algorithm for multi-class classifiers with a large output-class size, which is common in deep neural networks and typically implemented as a softmax final layer with one output neuron per each class. To avoid expensive computing the intractable normalizing constant of softmax for each training data point, we analyze the well-known negative sampling and improve it to the amplified negative sampling algorithm, which gains much higher performance with lower training cost

eScholarship - University of California

Development of Multi-Representation Test As A Solution to Train High-Order Thinking Skills High School Students in Newton’s Law

Author: Puspitaningrum Hidayah Zuliana
Tjipto Prastowo
Wasis
Publication venue: 'Indonesia Approach Education'
Publication date: 31/01/2021
Field of study

This research  aims to develop a multi-representation based test instrument that can be used to measure students' higher-order thinking skills, especially in Newton's law material. development procedures used the Plomp development model, the stages were design, construction/ realization, test, evaluation, revision, and implementation. The subjects in this study were 36 students of class X at one of High School in Surabaya. At the implementation stage, tests were given to students and analysed using Rasch analysis with help of Winstep software. The multi-representation test instrument in question was a question in the form of an essay with a representation model consisted of visual, verbal, and mathematical representations adapted to the cognitive domain of Bloom's taxonomy of higher-order thinking. Data collection techniques were validation of instruments and tests. The results of this study were 9 items of valid test instruments based on logical validity and empirical validity and a reliable instrument based on calculations using the Alpha Cronbach equation. Based on the results of this  research can be concluded that multi-representation test can be train high order thinking skills students. Study with multi-representation test is expected to be able to make students are easier to develop high order thinking skill, in this research students can be categorized as having sufficient high-order thinking skills

IJORER : International Journal of Recent Educational Research

Unconventional machine learning of genome-wide human cancer data

Author: Bajaj Sweta R.
Chittenden Thomas W.
Cilfone Nicholas
Gamel Omar E.
Gujja Sharvari
Gulcher Jeffrey R.
Li Richard Y.
Lidar Daniel A.
Publication venue
Publication date: 13/05/2020
Field of study

Recent advances in high-throughput genomic technologies coupled with exponential increases in computer processing and memory have allowed us to interrogate the complex aberrant molecular underpinnings of human disease from a genome-wide perspective. While the deluge of genomic information is expected to increase, a bottleneck in conventional high-performance computing is rapidly approaching. Inspired in part by recent advances in physical quantum processors, we evaluated several unconventional machine learning (ML) strategies on actual human tumor data. Here we show for the first time the efficacy of multiple annealing-based ML algorithms for classification of high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas. To assess algorithm performance, we compared these classifiers to a variety of standard ML methods. Our results indicate the feasibility of using annealing-based ML to provide competitive classification of human cancer types and associated molecular subtypes and superior performance with smaller training datasets, thus providing compelling empirical evidence for the potential future application of unconventional computing architectures in the biomedical sciences

arXiv.org e-Print Archive

Directory of Open Access Journals

Products of effective topological spaces and a uniformly computable Tychonoff Theorem

Author: Rettinger Robert
Weihrauch Klaus
Publication venue: 'Logical Methods in Computer Science e.V.'
Publication date: 13/11/2013
Field of study

This article is a fundamental study in computable analysis. In the framework of Type-2 effectivity, TTE, we investigate computability aspects on finite and infinite products of effective topological spaces. For obtaining uniform results we introduce natural multi-representations of the class of all effective topological spaces, of their points, of their subsets and of their compact subsets. We show that the binary, finite and countable product operations on effective topological spaces are computable. For spaces with non-empty base sets the factors can be retrieved from the products. We study computability of the product operations on points, on arbitrary subsets and on compact subsets. For the case of compact sets the results are uniformly computable versions of Tychonoff's Theorem (stating that every Cartesian product of compact spaces is compact) for both, the cover multi-representation and the "minimal cover" multi-representation

arXiv.org e-Print Archive

CiteSeerX

Episciences.org