231 research outputs found

    The German turnover tax statistics panel

    Get PDF
    Based on the yearly turnover tax statistics, the German turnover tax statistics panel allows for the first time detailed longitudinal analyses of nearly all economic sectors. In addition to turnover tax related variables, the dataset provides information about exports, imports and, due to the combination with the German business register (Unternehmensregister), information about employees liable to pay social insurance. The panel contains more than 4.3 million enterprises and 1.9 million of these are covered over the whole time period from 2001 to 2005. There is no other German statistics that covers nearly all economic sectors with such completeness. In the following we give an overview of the turnover tax statistics and the matching process (sections 2 and 3). Section 4 describes the variables included in the dataset and in section 5 examples of the research potential are presented. The paper closes with information about the way of data access (section 6).

    Main Memory Adaptive Indexing for Multi-core Systems

    Full text link
    Adaptive indexing is a concept that considers index creation in databases as a by-product of query processing; as opposed to traditional full index creation where the indexing effort is performed up front before answering any queries. Adaptive indexing has received a considerable amount of attention, and several algorithms have been proposed over the past few years; including a recent experimental study comparing a large number of existing methods. Until now, however, most adaptive indexing algorithms have been designed single-threaded, yet with multi-core systems already well established, the idea of designing parallel algorithms for adaptive indexing is very natural. In this regard only one parallel algorithm for adaptive indexing has recently appeared in the literature: The parallel version of standard cracking. In this paper we describe three alternative parallel algorithms for adaptive indexing, including a second variant of a parallel standard cracking algorithm. Additionally, we describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on sorting. We then thoroughly compare all these algorithms experimentally; along a variant of a recently published parallel version of radix sort. Parallel sorting algorithms serve as a realistic baseline for multi-threaded adaptive indexing techniques. In total we experimentally compare seven parallel algorithms. Additionally, we extensively profile all considered algorithms. The initial set of experiments considered in this paper indicates that our parallel algorithms significantly improve over previously known ones. Our results suggest that, although adaptive indexing algorithms are a good design choice in single-threaded environments, the rules change considerably in the parallel case. That is, in future highly-parallel environments, sorting algorithms could be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure

    Only Aggressive Elephants are Fast Elephants

    Full text link
    Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60% with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL.Comment: VLDB201

    Automatische, Deskriptor-basierte Unterstützung der Dokumentanalyse zur Fokussierung und Klassifizierung von Geschäftsbriefen

    Get PDF
    Die vorliegende Arbeit wurde im Rahmen des ALV-Projekts (Automatisches Lesen und Verstehen) am Deutschen Forschungszentrum für Künstliche Intelligenz (DFKI) erstellt. Ziel des ALV-Projektes ist die Entwicklung einer intelligenten Schnittstelle zwischen Papier und Rechner (paper-computer interface). Hierbei soll durch Nachahmung des menschlichen Leseverhaltens ein Schritt in Richtung papierloses Büro ausgeführt werden. Exemplarisch werden in ALV Geschäftsbriefe als Domäne untersucht. Teilgebiete innerhalb des ALV-Projekts sind Layoutextraktion, Logical Labeling, Texterkennung und Textanalyse. Diese Arbeit fällt in den Bereich der Textanalyse. Die Aufgabenstellung bestand darin, mittels der vorkommenden Wörter (im Brieftext) die Art des Briefes sowie erste Hinweise über die Intention des Briefautors zu ermitteln. Derartige Informationen können von anderen Experten zur weiteren Verarbeitung, Verteilung und Archivierung der Briefe genutzt werden. Das innerhalb einer Diplomarbeit entwickelte und implementierte INFOCLAS-System versucht deshalb auf der Basis statistischer Verfahren und Methodiken aus dem Information Retrieval folgende Funktionalität bereitzustellen: i) Extrahierung und Gewichtung von bedeutungstragenden Wörtern; ii) Ermittelung der Kernaussage (Fokus) eines Geschäftsbriefs; iii) Klassifizierung eines Geschäftsbriefs in vordefinierte Nachrichtentypen. Die dafür entwickelten Module Indexierer, Fokussierer und Klassifizierer benutzen -- neben Konzepten aus dem Information Retrieval -- eine Datenbasis, die eine Sammlung von Geschäftsbriefen enthält, sowie spezifische Wortlisten, die die modellierten Briefklassen repräsentieren. Als weiteres Hilfsmittel dient ein morphologisches Werkzeug zur grammatikalischen Analyse der Wörter. Mit diesen Wissensquellen werden Hypothesen über die Briefklasse und die Kernaussage des Briefinhalts aufgestellt.In this documentation existing techniques of information retrieval (IR) are compared and evaluated for their application in document analysis and understanding. Moreover, we have developed a system called INFOCLAS which uses appropriate statistical methods of IR, primarily for the classification of German business letters into corresponding message types such as order, offer, confirmation, inquiry, and advertisement. INFOCLAS is a first step towards understanding of business letters. Actually, it comprises three modules: the central indexer (extraction and weighting of indexing terms), the classifier (classification of business letters into given types) and the focusser (highlighting relevant parts of the letter). INFOCLAS integrates several knowledge sources including a database of about 120 letters, word frequency statistics for German, message type specific words, morphological knowledge as well as the underlying document model (layout and logical structure). As output, the system computes a set of weighted hypotheses about the type of letter at hand. A classification of documents allows the automatic distribution or archiving of letters and is also an excellent starting point for higher-level document analysis

    Energy Efficiency in Machining of Aircraft Components

    Get PDF
    High production costs and material removal rates characterize the manufacturing of aircraft components made of titanium. Due to competitive pressure, the manufacturing processes are highly optimized from an economical perspective, whereas environmental aspects are usually not considered. One example is the recycling of titanium chips. Because of process-induced contaminations they do not meet the quality required for recycling in high-grade titanium alloys. Thus the components need to be manufactured from primary material, which leads to a poor energy balance. This paper describes a methodology to increase the recycling rate and energy efficiency of the manufacturing process by investigating the influencing parameters on chip quality of the machining process with the aim to increase the chip quality to a recyclable degree under monetary aspects. The analysis shows that the recycling rate can be significantly increased through dry cutting, which also brings economic benefits.German Federal Ministry for Economic Affairs and Energy (BMWi)/03ET1174

    Automatic Feature-Based Point Cloud Registration for a Moving Sensor Platform

    Get PDF
    The automatic and accurate alignment of multiple point clouds is a basic requirement for an adequate digitization, reconstruction and interpretation of large 3D environments. Due to the recent technological advancements, modern devices are available which allow for simultaneously capturing intensity and range images with high update rates. Hence, such devices can even be used for dynamic scene analysis and for rapid mapping which is particularly required for environmental applications and disaster management, but unfortu-nately, they also reveal severe restrictions. Facing challenges with respect to noisy range measurements, a limited non-ambiguous range, a limited field of view and the occurrence of scene dynamics, the adequate alignment of captured point clouds has to satisfy additional constraints compared to the classical registration of terrestrial laser scanning (TLS) point clouds for describing static scenes. In this paper, we propose a new methodology for point cloud registration which considers such constraints while maintaining the fundamental properties of high accuracy and low computational effort without relying on a good initial alignment or human interaction. Exploiting 2D image features and 2D/2D correspondences, sparse point clouds of physically almost identical 3D points are derived. Subsequently, these point clouds are aligned with a fast procedure directly taking into account the reliability of the detected correspondences with respect to geometric and radiometric information. The proposed methodology is evaluated and its performance is demonstrated for data captured with a moving sensor platform which has been designed for monitoring from low altitudes. Due to the provided reliability and a fast processing scheme, the proposed methodology offers a high potential for dynamic scene capture and analysis.

    Against Bureaucracy. Why Flexibility and Decentralisation Cannot Solve Organisational Problems

    Get PDF
    Kühl S, Dittrich EJ. Against Bureaucracy. Why Flexibility and Decentralisation Cannot Solve Organisational Problems. In: Makó C, Warhurst C, eds. The Management and Organisation of Firms in the Global Context. Budapest: University of Gödöllo; 1999: 119-125

    Investigations on a standardized process chain and support structure related rework procedures of SLM manufactured components

    Get PDF
    For the successful production of high quality parts by selective laser melting, various process steps are required. Besides the SLM process itself, different pre- and rework steps are needed to produce a final component. Therefore, the first part of this paper presents a concept of a standardized process chain for carrying out the necessary planning and production procedures. For this purpose, the CAD-model is enriched with information regarding support structures, the desired surface quality and the position of tooling points. Since major steps in the reworking procedure are the removal of residual powder, the removal of support structures and the finishing operations for functional component surfaces, selected experimental results concerning these steps are presented in the second part of the paper. Based on the result, recommendations for the design of support structures are given

    Zur Dynamik der Export- und Importbeteiligung deutscher Industrieunternehmen – Empirische Befunde aus dem Umsatzsteuerpanel 2001 – 2006

    Get PDF
    Im Jahr 2008 wurden erstmals die Querschnittsdatensätze der Umsatzsteuerstatistik zu einem Paneldatensatz verknüpft - zunächst für den Zeitraum 2001 bis 2005, seit Mitte 2009 steht nun die aktuelle Version des Umsatzsteuerpanels für den Zeitraum 2001 bis 2006 für Auswertungen zur Verfügung. Dieser Datensatz bietet die einzigartige Möglichkeit, alle in diesem Zeitraum umsatzsteuerpflichtigen Unternehmen über den Zeitverlauf hinweg zu betrachten. Da in den Daten auch Informationen über die Export- und Importaktivitäten der Unternehmen enthalten sind, kann das Umsatzsteuerpanel unter anderem dazu genutzt werden, Auskunft über die Verbreitung von Exportund Importaktivitäten sowie über die Dynamik der Export- und Importbeteiligung auf Unternehmensebene zu geben. In 2006 weisen gut 20 Prozent der westdeutschen Industrieunternehmen und knapp 14 Prozent der ostdeutschen Industrieunternehmen sowohl Export- als auch Importaktivitäten auf. Der Anteil der Industrieunternehmen die in 2006 weder exportiert noch importiert haben liegt bei 59 Prozent in Westdeutschland sowie bei 67 Prozent in Ostdeutschland. Eine Betrachtung der Muster der Export- und Importbeteiligung über die Jahre 2001 bis 2006 sowie Übergangsmatrizen für das Jahr 2001 auf 2006 zeigen, dass der überwiegende Teil der Unternehmen ihren Status (weder Exporteur noch Importeur, nur Exporteur, nur Importeur, sowohl Exporteur als auch Importeur) über die Zeit nicht ändert. Immerhin ein Drittel der in allen betrachteten Jahren im Datensatz enthaltenen Unternehmen haben jedoch mindestens einmal zwischen 2001 und 2006 ihren Status gewechselt.
    • …
    corecore