261 research outputs found
La relación Estados Unidos–América Latina: un reto para Obama
La crisis en la relación de Estados Unidos con América Latina, que surgió con la entrada del nuevo milenio, tocó su punto más bajo en las últimas décadas y se caracterizó por la falta de confianza hacia Estados Unidos, por ser un gran reto para el presidente Barack Obama y por representar también una oportunidad fascinante, según se expresa en el artículo, que indica que las antiguas relaciones interamericanas de dependencia y subordinación enfrentaban la posibilidad de convertirse en una alianza en la cual América Latina y el Caribe asumieran, por primera vez, un papel dominante en la gobernanza global.ITESO, A.C
Data mining with the SAP NetWeaver BI accelerator
The new SAP NetWeaver Business Intelligence accelerator is an engine that supports online analytical processing. It performs aggregation in memory and in query runtime over large volumes of structured data. This paper first briefly describes the accelerator and its main architectural features, and cites test results that indicate its power. Then it describes in detail how the accelerator may be used for data mining. The accelerator can perform data mining in the same large repositories of data and using the same compact index structures that it uses for analytical processing. A first such implementation of data mining is described and the results of a performance evaluation are presented. Association rule mining in a distributed architecture was implemented with a variant of the BUC iceberg cubing algorithm. Test results suggest that useful online mining should be possible with wait times of less than 60 seconds on business data that has not been preprocessed
Datenzentrierte Bestimmung von Assoziationsregeln in parallelen Datenbankarchitekturen
Die folgende Arbeit befasst sich mit der Alltagstauglichkeit moderner Massendatenverarbeitung, insbesondere mit dem Problem der Assoziationsregelanalyse. Vorhandene Datenmengen wachsen stark an, aber deren Auswertung ist für ungeübte Anwender schwierig. Daher verzichten Unternehmen auf Informationen, welche prinzipiell vorhanden sind. Assoziationsregeln zeigen in diesen Daten Abhängigkeiten zwischen den Elementen eines Datenbestandes, beispielsweise zwischen verkauften Produkten. Diese Regeln können mit Interessantheitsmaßen versehen werden, welche dem Anwender das Erkennen wichtiger Zusammenhänge ermöglichen. Es werden Ansätze gezeigt, dem Nutzer die Auswertung der Daten zu erleichtern. Das betrifft sowohl die robuste Arbeitsweise der Verfahren als auch die einfache Auswertung der Regeln. Die vorgestellten Algorithmen passen sich dabei an die zu verarbeitenden Daten an, was sie von anderen Verfahren unterscheidet.
Assoziationsregelsuchen benötigen die Extraktion häufiger Kombinationen (EHK). Hierfür werden Möglichkeiten gezeigt, Lösungsansätze auf die Eigenschaften moderne System anzupassen. Als Ansatz werden Verfahren zur Berechnung der häufigsten Kombinationen erläutert, welche anders als bekannte Ansätze leicht konfigurierbar sind. Moderne Systeme rechnen zudem oft verteilt. Diese Rechnerverbünde können große Datenmengen parallel verarbeiten, benötigen jedoch die Vereinigung lokaler Ergebnisse. Für verteilte Top-N-EHK auf realistischen Partitionierungen werden hierfür Ansätze mit verschiedenen Eigenschaften präsentiert.
Aus den häufigen Kombinationen werden Assoziationsregeln gebildet, deren Aufbereitung ebenfalls einfach durchführbar sein soll. In der Literatur wurden viele Maße vorgestellt. Je nach den Anforderungen entsprechen sie je einer subjektiven Bewertung, allerdings nicht zwingend der des Anwenders. Hierfür wird untersucht, wie mehrere Interessantheitsmaßen zu einem globalen Maß vereinigt werden können. Dies findet Regeln, welche mehrfach wichtig erschienen. Der Nutzer kann mit den Vorschlägen sein Suchziel eingrenzen. Ein zweiter Ansatz gruppiert Regeln. Dies erfolgt über die Häufigkeiten der Regelelemente, welche die Grundlage von Interessantheitsmaßen bilden. Die Regeln einer solchen Gruppe sind daher bezüglich vieler Interessantheitsmaßen ähnlich und können gemeinsam ausgewertet werden. Dies reduziert den manuellen Aufwand des Nutzers.
Diese Arbeit zeigt Möglichkeiten, Assoziationsregelsuchen auf einen breiten Benutzerkreis zu erweitern und neue Anwender zu erreichen. Die Assoziationsregelsuche wird dabei derart vereinfacht, dass sie statt als Spezialanwendung als leicht nutzbares Werkzeug zur Datenanalyse verwendet werden kann.The importance of data mining is widely acknowledged today. Mining for association rules and frequent patterns is a central activity in data mining. Three main strategies are available for such mining: APRIORI , FP-tree-based approaches like FP-GROWTH, and algorithms based on vertical data structures and depth-first mining strategies like ECLAT and CHARM.
Unfortunately, most of these algorithms are only moderately suitable for many “real-world” scenarios because their usability and the special characteristics of the data are two aspects of practical association rule mining that require further work.
All mining strategies for frequent patterns use a parameter called minimum support to define a minimum occurrence frequency for searched patterns. This parameter cuts down the number of patterns searched to improve the relevance of the results. In complex business scenarios, it can be difficult and expensive to define a suitable value for the minimum support because it depends strongly on the particular datasets. Users are often unable to set this parameter for unknown datasets, and unsuitable minimum-support values can extract millions of frequent patterns and generate enormous runtimes. For this reason, it is not feasible to permit ad-hoc data mining by unskilled users. Such users do not have the knowledge and time to define suitable parameters by trial-and-error procedures. Discussions with users of SAP software have revealed great interest in the results of association-rule mining techniques, but most of these users are unable or unwilling to set very technical parameters. Given such user constraints, several studies have addressed the problem of replacing the minimum-support parameter with more intuitive top-n strategies.
We have developed an adaptive mining algorithm to give untrained SAP users a tool to analyze their data easily without the need for elaborate data preparation and parameter determination. Previously implemented approaches of distributed frequent-pattern mining were expensive and time-consuming tasks for specialists. In contrast, we propose a method to accelerate and simplify the mining process by using top-n strategies and relaxing some requirements on the results, such as completeness. Unlike such data approximation techniques as sampling, our algorithm always returns exact frequency counts. The only drawback is that the result set may fail to include some of the patterns up to a specific frequency threshold.
Another aspect of real-world datasets is the fact that they are often partitioned for shared-nothing architectures, following business-specific parameters like location, fiscal year, or branch office. Users may also want to conduct mining operations spanning data from different partners, even if the local data from the respective partners cannot be integrated at a single location for data security reasons or due to their large volume.
Almost every data mining solution is constrained by the need to hide complexity. As far as possible, the solution should offer a simple user interface that hides technical aspects like data distribution and data preparation. Given that BW Accelerator users have such simplicity and distribution requirements, we have developed an adaptive mining algorithm to give unskilled users a tool to analyze their data easily, without the need for complex data preparation or consolidation.
For example, Business Intelligence scenarios often partition large data volumes by fiscal year to enable efficient optimizations for the data used in actual workloads. For most mining queries, more than one data partition is of interest, and therefore, distribution handling that leaves the data unaffected is necessary.
The algorithms presented in this paper have been developed to work with data stored in SAP BW. A salient feature of SAP BW Accelerator is that it is implemented as a distributed landscape that sits on top of a large number of shared-nothing blade servers. Its main task is to execute OLAP queries that require fast aggregation of many millions of rows of data. Therefore, the distribution of data over the dedicated storage is optimized for such workloads. Data mining scenarios use the same data from storage, but reporting takes precedence over data mining, and hence, the data cannot be redistributed without massive costs. Distribution by special data semantics or user-defined selections can produce many partitions and very different partition sizes. The handling of such real-world distributions for frequent-pattern mining is an important task, but it conflicts with the requirement of balanced partition
Robust Real-time Query Processing with QStream
Processing data streams with Quality-of-Service (QoS) guarantees is an emerging area in existing streaming applications. Although it is possible to negotiate the result quality and to reserve the required processing resources in advance, it remains a challenge to adapt the DSMS to data stream characteristics which are not known in advance or are difficult to obtain. Within this paper we present the second generation of our QStream DSMS which addresses the above challenge by using a real-time capable operating system environment for resource reservation and by applying an adaptation mechanism if the data stream characteristics change spontaneously
Robust and distributed top-n frequent-pattern mining with SAP BW accelerator
Mining for association rules and frequent patterns is a central activity in data mining. However, most existing algorithms are only moderately suitable for real-world scenarios. Most strategies use parameters like minimum support, for which it can be very difficult to define a suitable value for unknown datasets. Since most untrained users are unable or unwilling to set such technical parameters, we address the problem of replacing the minimum-support parameter with top-n strategies. In our paper, we start by extending a top-n implementation of the ECLAT algorithm to improve its performance by using heuristic search strategy optimizations. Also, real-world datasets are often distributed and modern database architectures are switching from expensive SMPs to cheaper shared-nothing blade servers. Thus, most mining queries require distribution handling. Since partitioning can be forced by user-defined semantics, it is often forbidden to transform the data. Therefore, we developed an adaptive top-n frequent-pattern mining algorithm that simplifies the mining process on real distributions by relaxing some requirements on the results. We first combine the PARTITION and the TPUT algorithms to handle distributed top-n frequent-pattern mining. Then, we extend this new algorithm for distributions with real-world data characteristics. For frequent-pattern mining algorithms, equal distributions are important conditions, and tiny partitions can cause performance bottlenecks. Hence, we implemented an approach called MAST that defines a minimum absolute-support threshold. MAST prunes patterns with low chances of reaching the global top-n result set and high computing costs. In total, our approach simplifies the process of frequent-pattern mining for real customer scenarios and data sets. This may make frequent-pattern mining accessible for very new user groups. Finally, we present results of our algorithms when run on the SAP NetWeaver BW Acceleratorwith standard and real business datasets
Real-time Scheduling for Data Stream Management Systems
Quality-aware management of data streams is gaining more and more importance with the amount of data produced by streams growing continuously. The resources required for data stream processing depend on different factors and are limited by the environment of the data stream management system (DSMS). Thus, with a potentially unbounded amount of stream data and limited processing resources, some of the data stream processing tasks (originating from different users) may not be satisfyingly answered, and therefore, users should be enabled to negotiate a certain quality for the execution of their stream processing tasks. After the negotiation process, it is the responsibility of the Data Stream Management System to meet the quality constraints by using adequate resource reservation and scheduling techniques. Within this paper, we consider different aspects of real-time scheduling for operations within a DSMS. We propose a scheduling concept which enables us to meet certain time-dependent quality of service requirements for user-given processing tasks. Furthermore, we describe the implementation of our scheduling concept within a real-time capable data stream management system, and we give experimental results on that
La gobernanza regional del Coivd-19 en la Unión Europea y América Latina y el Caribe
El texto explora las respuestas regionales en Europa y América Latina/Caribe al desafío de salud que planteó el virus COVID-19. A tal fin, identificamos cinco desafíos de gobernanza regional y los trasladamos al ámbito de la gestión de la pandemia: información y conocimiento, normas y principios, políticas públicas, instituciones y recursos materiales. Desde una perspectiva comparada analizamos en dos secciones separadas si la cooperación regional en la UE y América Latina/Caribe aumentó (o no) en estos ámbitos. Finalmente destacamos, en un tercer apartado, las similitudes y diferencias y explicamos las causas de las trayectorias diversas de gestión colectiva de la pandemia en ambas regionesThe text asks how Europe and Latin America/Caribbean responded to the virus COVID-19 and addressed the challenges for public health. Consequently, we identify five regional governance gaps and explore their application during the management of the pandemic: information and knowledge, norms and principles, public policies, institutions and material resources. From a comparative perspective, we analyse in a two separated sections if the regional governance in the EU and Latin America/Caribbean in these areas increased (or not). Finally, we emphasize similarities and differences and explain the causes behind the diverse evolution of the collective management of the pandemic in both region
Interleaving with Coroutines: A Practical Approach for Robust Index Joins
Index join performance is determined by the efficiency of the lookup operation on the involved index. Although database indexes are highly optimized to leverage processor caches, main memory accesses inevitably increase lookup runtime when the index outsizes the last-level cache; hence, index join performance drops. Still, robust index join performance becomes possible with instruction stream interleaving: given a group of lookups, we can hide cache misses in one lookup with instructions from other lookups by switching among their respective instruction streams upon a cache miss. In this paper, we propose interleaving with coroutines for any type of index join. We showcase our proposal on SAP HANA by implementing binary search and CSB+-tree traversal for an instance of index join related to dictionary compression. Coroutine implementations not only perform similarly to prior interleaving techniques, but also resemble the original code closely, while supporting both interleaved and non-interleaved execution. Thus, we claim that coroutines make interleaving practical for use in real DBMS codebases
Bridging the Latency Gap between NVM and DRAM for Latency-bound Operations
Non-Volatile Memory (NVM) technologies exhibit 4× the read access latency of conventional DRAM. When the working set does not fit in the processor cache, this latency gap between DRAM and NVM leads to more than 2× runtime increase for queries dominated by latency-bound operations such as index joins and tuple reconstruction. We explain how to easily hide NVM latency by interleaving the execution of parallel work in index joins and tuple reconstruction using coroutines. Our evaluation shows that interleaving applied to the non-trivial implementations of these two operations in a production-grade codebase accelerates end-to-end query runtimes on both NVM and DRAM by up to 1.7× and 2.6× respectively, thereby reducing the performance difference between DRAM and NVM by more than 60%
Biased Signaling of CCL21 and CCL19 Does Not Rely on N-Terminal Differences, but Markedly on the Chemokine Core Domains and Extracellular Loop 2 of CCR7
Chemokine receptors play important roles in the immune system and are linked to several human diseases. Targeting chemokine receptors have so far shown very little success owing to, to some extent, the promiscuity of the immune system and the high degree of biased signaling within it. CCR7 and its two endogenous ligands display biased signaling and here we investigate the differences between the two ligands, CCL21 and CCL19, with respect to their biased activation of CCR7. We use bystander bioluminescence resonance energy transfer (BRET) based signaling assays and Transwell migration assays to determine (A) how swapping of domains between the two ligands affect their signaling patterns and (B) how receptor mutagenesis impacts signaling. Using chimeric ligands we find that the chemokine core domains are central for determining signaling outcome as the lack of β-arrestin-2 recruitment displayed by CCL21 is linked to its core domain and not N-terminus. Through a mutagenesis screen, we identify the extracellular domains of CCR7 to be important for both ligands and show that the two chemokines interact differentially with extracellular loop 2 (ECL-2). By using in silico modeling, we propose a link between ECL-2 interaction and CCR7 signal transduction. Our mutagenesis study also suggests a lysine in the top of TM3, K1303.26, to be important for G protein signaling, but not β-arrestin-2 recruitment. Taken together, the bias in CCR7 between CCL19 and CCL21 relies on the chemokine core domains, where interactions with ECL-2 seem particularly important. Moreover, TM3 selectively regulates G protein signaling as found for other chemokine receptors.publishe
- …