427,087 research outputs found
Recommended from our members
Efficient Latent Semantic Extraction from Cross Domain Data with Declarative Language
With large amounts of data continuously generated by intelligence devices, efficient analysis of huge data collections to unearth valuable insights has become one of the most elusive challenges for both academia and industry. The key elements to establishing a scalable analyzing framework should involve (1) an intuitive interface to describe the desired outcome, (2) a well-crafted model that integrates all available information sources to derive the optimal outcome and (3) an efficient algorithm that performs the data integration and extraction within a reasonable amount of time. In this dissertation, we address these challenges by proposing (1) a cross-language interface for a succinct expression of recursive queries, (2) a domain specific neural network model that can incorporate information of multiple modalities, and (3) a sample efficient training method that can be used even for extremely-large output-class classifiers. Our contributions in this thesis are thus threefold: First, for the ubiquitous recursive queries in advanced data analytics, on top of BigDatalog and Apache Spark, we design a succinct and expressive analytics tool encapsulating the functionality and classical algorithms of Datalog, a quintessential logic programming language. We provide the Logical Library (LLib), a Spark MLlib-like high-level API supporting a wide range of recursive algorithms and the Logical DataFrame (LFrame), an extension to Spark DataFrame supporting both relational and logical operations. The LLib and LFrame enable smooth collaborations between logical applications and other Spark libraries and cross-language logical programming in Scala, Java, or Python. Second, we utilize variants of recurrent neural network (RNN) to incorporate some enlightening sequential information overlooked by the conventional works in two different domains including Spoken Language Understanding (SLU) and Internet Embedding (IE). In SLU, we address the problem caused by solely relying on the first best interpretation (hypothesis) of an audio command through a series of new architectures comprising bidirectional LSTM and pooling layers to jointly utilize the other hypotheses' texts or embedding vectors, which are neglected but with valuable information missed by the first best hypothesis. In IE, we propose the DIP, an extension of RNN, to build up the internet coordinate system with the IP address sequences, which are also unnoticed in conventional distance-based internet embedding algorithms but encode structural information of the network. Both DIP and the integration of all hypotheses bring significant performance improvements for the corresponding downstream tasks. Finally, we investigate the training algorithm for multi-class classifiers with a large output-class size, which is common in deep neural networks and typically implemented as a softmax final layer with one output neuron per each class. To avoid expensive computing the intractable normalizing constant of softmax for each training data point, we analyze the well-known negative sampling and improve it to the amplified negative sampling algorithm, which gains much higher performance with lower training cost
Development of Multi-Representation Test As A Solution to Train High-Order Thinking Skills High School Students in Newton’s Law
This research aims to develop a multi-representation based test instrument that can be used to measure students' higher-order thinking skills, especially in Newton's law material. development procedures used the Plomp development model, the stages were design, construction/ realization, test, evaluation, revision, and implementation. The subjects in this study were 36 students of class X at one of High School in Surabaya. At the implementation stage, tests were given to students and analysed using Rasch analysis with help of Winstep software. The multi-representation test instrument in question was a question in the form of an essay with a representation model consisted of visual, verbal, and mathematical representations adapted to the cognitive domain of Bloom's taxonomy of higher-order thinking. Data collection techniques were validation of instruments and tests. The results of this study were 9 items of valid test instruments based on logical validity and empirical validity and a reliable instrument based on calculations using the Alpha Cronbach equation. Based on the results of this research can be concluded that multi-representation test can be train high order thinking skills students. Study with multi-representation test is expected to be able to make students are easier to develop high order thinking skill, in this research students can be categorized as having sufficient high-order thinking skills
Unconventional machine learning of genome-wide human cancer data
Recent advances in high-throughput genomic technologies coupled with
exponential increases in computer processing and memory have allowed us to
interrogate the complex aberrant molecular underpinnings of human disease from
a genome-wide perspective. While the deluge of genomic information is expected
to increase, a bottleneck in conventional high-performance computing is rapidly
approaching. Inspired in part by recent advances in physical quantum
processors, we evaluated several unconventional machine learning (ML)
strategies on actual human tumor data. Here we show for the first time the
efficacy of multiple annealing-based ML algorithms for classification of
high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas.
To assess algorithm performance, we compared these classifiers to a variety of
standard ML methods. Our results indicate the feasibility of using
annealing-based ML to provide competitive classification of human cancer types
and associated molecular subtypes and superior performance with smaller
training datasets, thus providing compelling empirical evidence for the
potential future application of unconventional computing architectures in the
biomedical sciences
Products of effective topological spaces and a uniformly computable Tychonoff Theorem
This article is a fundamental study in computable analysis. In the framework
of Type-2 effectivity, TTE, we investigate computability aspects on finite and
infinite products of effective topological spaces. For obtaining uniform
results we introduce natural multi-representations of the class of all
effective topological spaces, of their points, of their subsets and of their
compact subsets. We show that the binary, finite and countable product
operations on effective topological spaces are computable. For spaces with
non-empty base sets the factors can be retrieved from the products. We study
computability of the product operations on points, on arbitrary subsets and on
compact subsets. For the case of compact sets the results are uniformly
computable versions of Tychonoff's Theorem (stating that every Cartesian
product of compact spaces is compact) for both, the cover multi-representation
and the "minimal cover" multi-representation
- …