14,280 research outputs found

    Automated schema matching techniques: an exploratory study

    Get PDF
    Manual schema matching is a problem for many database applications that use multiple data sources including data warehousing and e-commerce applications. Current research attempts to address this problem by developing algorithms to automate aspects of the schema-matching task. In this paper, an approach using an external dictionary facilitates automated discovery of the semantic meaning of database schema terms. An experimental study was conducted to evaluate the performance and accuracy of five schema-matching techniques with the proposed approach, called SemMA. The proposed approach and results are compared with two existing semi-automated schema-matching approaches and suggestions for future research are made

    Feedback Generation for Performance Problems in Introductory Programming Assignments

    Full text link
    Providing feedback on programming assignments manually is a tedious, error prone, and time-consuming task. In this paper, we motivate and address the problem of generating feedback on performance aspects in introductory programming assignments. We studied a large number of functionally correct student solutions to introductory programming assignments and observed: (1) There are different algorithmic strategies, with varying levels of efficiency, for solving a given problem. These different strategies merit different feedback. (2) The same algorithmic strategy can be implemented in countless different ways, which are not relevant for reporting feedback on the student program. We propose a light-weight programming language extension that allows a teacher to define an algorithmic strategy by specifying certain key values that should occur during the execution of an implementation. We describe a dynamic analysis based approach to test whether a student's program matches a teacher's specification. Our experimental results illustrate the effectiveness of both our specification language and our dynamic analysis. On one of our benchmarks consisting of 2316 functionally correct implementations to 3 programming problems, we identified 16 strategies that we were able to describe using our specification language (in 95 minutes after inspecting 66, i.e., around 3%, implementations). Our dynamic analysis correctly matched each implementation with its corresponding specification, thereby automatically producing the intended feedback.Comment: Tech report/extended version of FSE 2014 pape

    Polygraph: Automatically generating signatures for polymorphic worms

    Get PDF
    It is widely believed that content-signature-based intrusion detection systems (IDSes) are easily evaded by polymorphic worms, which vary their payload on every infection attempt. In this paper, we present Polygraph, a signature generation system that successfully produces signatures that match polymorphic worms. Polygraph generates signatures that consist of multiple disjoint content sub-strings. In doing so, Polygraph leverages our insight that for a real-world exploit to function properly, multiple invariant substrings must often be present in all variants of a payload; these substrings typically correspond to protocol framing, return addresses, and in some cases, poorly obfuscated code. We contribute a definition of the polymorphic signature generation problem; propose classes of signature suited for matching polymorphic worm payloads; and present algorithms for automatic generation of signatures in these classes. Our evaluation of these algorithms on a range of polymorphic worms demonstrates that Polygraph produces signatures for polymorphic worms that exhibit low false negatives and false positives. Ā© 2005 IEEE

    Automatic Recognition of Public Transport Trips from Mobile Device Sensor Data and Transport Infrastructure Information

    Full text link
    Automatic detection of public transport (PT) usage has important applications for intelligent transport systems. It is crucial for understanding the commuting habits of passengers at large and over longer periods of time. It also enables compilation of door-to-door trip chains, which in turn can assist public transport providers in improved optimisation of their transport networks. In addition, predictions of future trips based on past activities can be used to assist passengers with targeted information. This article documents a dataset compiled from a day of active commuting by a small group of people using different means of PT in the Helsinki region. Mobility data was collected by two means: (a) manually written details of each PT trip during the day, and (b) measurements using sensors of travellers' mobile devices. The manual log is used to cross-check and verify the results derived from automatic measurements. The mobile client application used for our data collection provides a fully automated measurement service and implements a set of algorithms for decreasing battery consumption. The live locations of some of the public transport vehicles in the region were made available by the local transport provider and sampled with a 30-second interval. The stopping times of local trains at stations during the day were retrieved from the railway operator. The static timetable information of all the PT vehicles operating in the area is made available by the transport provider, and linked to our dataset. The challenge is to correctly detect as many manually logged trips as possible by using the automatically collected data. This paper includes an analysis of challenges due to missing or partially sampled information in the data, and initial results from automatic recognition using a set of algorithms. Improvement of correct recognitions is left as an ongoing challenge.Comment: 22 pages, 7 figures, 10 table

    Highly Scalable Algorithms for Robust String Barcoding

    Full text link
    String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

    Dictionary matching in a stream

    Get PDF
    We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary
    • ā€¦
    corecore