262 research outputs found
Neural Machine Translation for Code Generation
Neural machine translation (NMT) methods developed for natural language
processing have been shown to be highly successful in automating translation
from one natural language to another. Recently, these NMT methods have been
adapted to the generation of program code. In NMT for code generation, the task
is to generate output source code that satisfies constraints expressed in the
input. In the literature, a variety of different input scenarios have been
explored, including generating code based on natural language description,
lower-level representations such as binary or assembly (neural decompilation),
partial representations of source code (code completion and repair), and source
code in another language (code translation). In this paper we survey the NMT
for code generation literature, cataloging the variety of methods that have
been explored according to input and output representations, model
architectures, optimization techniques used, data sets, and evaluation methods.
We discuss the limitations of existing methods and future research directionsComment: 33 pages, 1 figur
Kemahiran pemikiran komputasional pelajar melalui modul pembelajaran berasaskan teknologi internet pelbagai benda
kemahiran pemikiran komputasional pelajar, ke arah lebih kreatif dan kritis
melalui penggunaan Modul Pembelajaran Berasaskan Teknologi Internet
Pelbagai Benda (MP-IoT) yang telah dibangunkan oleh penyelidik.
Pembangunan MP-IoT mengikut Model ADDIE dan melibatkan Teknologi
Arduino yang diterapkan dalam 5 aktiviti pembelajaran secara amali. Kajian
berbentuk kuantitatif jenis kuasi-eksperimental ini telah dijalankan ke atas 52
orang pelajar Tingkatan 4 dari 2 buah sekolah di daerah Batu Pahat, Johor dan
Kuala Kangsar, Perak. Data pula telah dianalisis secara deskriptif dan inferensi.
Satu set ujian pencapaian pra dan pasca sebagai instrument telah dibangunkan.
Analisis Item Indeks Kesukaran (IK), Indeks Diskriminasi, serta Interprestasi
skor bagi nilai Alpha Cronbach telah digunakan bagi memastikan soalan ujian
pencapaian sesuai digunakan. Manakala dalam proses pembangunan modul
MP-IoT, seramai 6 orang guru dari mata pelajaran Sains Komputer dipilih
sebagai pakar untuk mengenal pasti kesesuaian dari segi format, kandungan dan
kebolehgunaan modul yang dibangunkan Skala Likert lima mata digunakan
dalam kajian ini. Secara keseluruhannya, dapatan kajian menggunakan ujian-T
sampel berpasangan, menunjukkan terdapat perbezaan yang signifikan terhadap
tahap pencapaian pelajar kumpulan kawalan yang didedahkan dengan kaedah
konvensional dengan kumpulan rawatan yang didedahkan dengan modul MPIoT,
dengan
nilai
p-value
adalah
.000 iaitu
kurang
dari
.05 (p<0.05).
Selain
itu,
tahap
kemahiran pemikiran komputasional pelajar juga meningkat setelah
didedahkan dengan modul MP-IoT
An automated cloud-based big data analytics platform for customer insights
Product reviews have a significant influence
on strategic decisions for both businesses and customers on
what to produce or buy. However, with the availability of
large amounts of online information, manual analysis of
reviews is costly and time consuming, as well as being
subjective and prone to error. In this work, we present an
automated scalable cloud-based system to harness big
customer reviews on products for customer insights
through data pipeline from data acquisition, analysis to
visualisation in an efficient way. The experimental
evaluation has shown that the proposed system achieves
good performance in terms of accuracy and computing
time
BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution
Identifying records that refer to the same entity is a fundamental step for data integration. Since it is prohibitively expensive to compare every pair of records, blocking techniques are typically employed to reduce the complexity of this task. These techniques partition records into blocks and limit the comparison to records co-occurring in a block. Generally, to deal with highly heterogeneous and noisy data (e.g. semi-structured data of the Web), these techniques rely on redundancy to reduce the chance of missing matches.
Meta-blocking is the task of restructuring blocks generated by redundancy-based blocking techniques, removing superfluous comparisons. Existing meta-blocking approaches rely exclusively on schema-agnostic features.
In this paper, we demonstrate how “loose” schema information (i.e., statistics collected directly from the data) can be exploited to enhance the quality of the blocks in a holistic loosely schema-aware (meta-)blocking approach that can be used to speed up your favorite Entity Resolution algorithm. We call it Blast (Blocking with Loosely-Aware Schema Techniques). We show how Blast can automatically extract this loose information by adopting a LSH-based step for e ciently scaling to large datasets. We experimentally demonstrate, on real-world datasets, how Blast outperforms the state-of-the-art unsupervised meta-blocking approaches, and, in many cases, also the supervised one
Source Code Generation from Descriptions in a Natural Language
Tato diplomová práce pĹ™edstavuje CodeFormer, novĂ˝ model neuronovĂ© sĂtÄ›, schopnĂ˝ na základÄ› popisu Ăşlohy v anglickĂ©m jazyce generovat funkce v programovacĂm jazyce Python. Tento model, zaloĹľenĂ˝ na architektuĹ™e modelu BART, je pĹ™edtrĂ©novanĂ˝ na 230 milionech funkcĂch zĂskanĂ˝ch z veĹ™ejnĂ˝ch GitHub repozitářů. Po dotrĂ©novánĂ na CodeSearchNet datasetu náš model pĹ™ekonává konkurenÄŤnĂ modely a nastavuje tak novĂ© state of the art s 46,12 BLEU, coĹľ pĹ™edstavuje zlepšenĂ o 13,86 BLEU. Vedle CodeFormer modelu tato práce pĹ™edstavuje novĂ˝ Stack Overflow Code Generation Dataset (SOCGD), kterĂ˝ je urÄŤenĂ˝ k trĂ©novánĂ generativnĂch modelĹŻ zdrojovĂ˝ch kĂłdĹŻ. Na tomto datasetu náš model dosahuje vĂ˝sledku 47,68 BLEU. VĂ˝slednĂ˝ model lze integrovat do vĂ˝vojovĂ˝ch prostĹ™edĂ a umoĹľnit tak programátorĹŻm generovat části zdrojovĂ˝ch kĂłdĹŻ s cĂlem zvýšit efektivitu jejich práce. V rámci našeho vĂ˝zkumu jsme takĂ© objevili lepšà pĹ™Ăstup k trĂ©novánĂ modelu BART na Ăşloze strojovĂ©ho pĹ™ekladu. PouĹľitelnost tohoto pĹ™Ăstupu na jinĂ˝ch domĂ©nách je tĹ™eba ověřit v navazujĂcĂ práci.ObhájenoThis work introduces CodeFormer, a Python source code generator pre-trained on a massive GitHub crawl consisting of 230M Python functions. The released model, built on BART architecture, generates Python functions based on descriptions in English. On a CodeSearchNet dataset, the CodeFormer sets a new state of the art with 46.12 BLEU, representing an improvement of 13.86 BLEU. We also release a new parallel corpus for code generation called Stack Overflow Code Generation Dataset (SOCGD), on which our model sets a baseline of 47.68 BLEU. The resulting model is ready to be integrated into a source code suggestion system in an IDE, where it can improve software developers' productivity. During our research, we discovered a better way of training the BART for machine translation. However, the applicability of our approach to other domains must be verified in subsequent work
- …