36,979 research outputs found
Towards the integration of functions, relations and types in an AI programming language
This paper describes the design and implementation of the programming language PC-Life. This language integrates the functional and the Logic-oriented programming style and feature types supporting inheritance. This combination yields a language particularly suited to knowledge representation, especially for application in computational linguistics
Language Processing and the Artificial Mind: Teaching Code Literacy in the Humanities
Humanities majors often find themselves in jobs where they either manage programmers or work with them in close collaboration. These interactions often pose difficulties because specialists in literature, history, philosophy, and so on are not usually code literate. They do not understand what tasks computers are best suited to, or how programmers solve problems. Learning code literacy would be a great benefit to humanities majors, but the traditional computer science curriculum is heavily math oriented, and students outside of science and technology majors are often math averse. Yet they are often interested in language, linguistics, and science fiction. This thesis is a case study to explore whether computational linguistics and artificial intelligence provide a suitable setting for teaching basic code literacy. I researched, designed, and taught a course called âLanguage Processing and the Artificial Mind.â Instead of math, it focuses on language processing, artificial intelligence, and the formidable challenges that programmers face when trying to create machines that understand natural language. This thesis is a detailed description of the material, how the material was chosen, and the outcome for student learning. Student performance on exams indicates that students learned code literacy basics and important linguistics issues in natural language processing. An exit survey indicates that students found the course to be valuable, though a minority reacted negatively to the material on programming. Future studies should explore teaching code literacy with less programming and new ways to make coding more interesting to the target audience
Concurrent Lexicalized Dependency Parsing: The ParseTalk Model
A grammar model for concurrent, object-oriented natural language parsing is
introduced. Complete lexical distribution of grammatical knowledge is achieved
building upon the head-oriented notions of valency and dependency, while
inheritance mechanisms are used to capture lexical generalizations. The
underlying concurrent computation model relies upon the actor paradigm. We
consider message passing protocols for establishing dependency relations and
ambiguity handling.Comment: 90kB, 7pages Postscrip
Generating Aspect-oriented Multi-document Summarization with Event-Aspect Model
In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.
Concurrent Lexicalized Dependency Parsing: A Behavioral View on ParseTalk Events
The behavioral specification of an object-oriented grammar model is
considered. The model is based on full lexicalization, head-orientation via
valency constraints and dependency relations, inheritance as a means for
non-redundant lexicon specification, and concurrency of computation. The
computation model relies upon the actor paradigm, with concurrency entering
through asynchronous message passing between actors. In particular, we here
elaborate on principles of how the global behavior of a lexically distributed
grammar and its corresponding parser can be specified in terms of event type
networks and event networks, resp.Comment: 68kB, 5pages Postscrip
Teaching machine translation and translation technology: a contrastive study
The Machine Translation course at Dublin City University is taught to undergraduate students in Applied Computational
Linguistics, while Computer-Assisted Translation is taught on two translator-training programmes, one undergraduate and
one postgraduate. Given the differing backgrounds of these sets of students, the course material, methods of teaching and assessment all differ. We report here on our experiences of teaching these courses over a number of years, which we hope will be of interest to lecturers of similar existing courses, as well as providing a reference point for others who may be considering the introduction of such material
Semantic Source Code Models Using Identifier Embeddings
The emergence of online open source repositories in the recent years has led
to an explosion in the volume of openly available source code, coupled with
metadata that relate to a variety of software development activities. As an
effect, in line with recent advances in machine learning research, software
maintenance activities are switching from symbolic formal methods to
data-driven methods. In this context, the rich semantics hidden in source code
identifiers provide opportunities for building semantic representations of code
which can assist tasks of code search and reuse. To this end, we deliver in the
form of pretrained vector space models, distributed code representations for
six popular programming languages, namely, Java, Python, PHP, C, C++, and C#.
The models are produced using fastText, a state-of-the-art library for learning
word representations. Each model is trained on data from a single programming
language; the code mined for producing all models amounts to over 13.000
repositories. We indicate dissimilarities between natural language and source
code, as well as variations in coding conventions in between the different
programming languages we processed. We describe how these heterogeneities
guided the data preprocessing decisions we took and the selection of the
training parameters in the released models. Finally, we propose potential
applications of the models and discuss limitations of the models.Comment: 16th International Conference on Mining Software Repositories (MSR
2019): Data Showcase Trac
- âŠ