International Research Workshop on

Advanced High Performance Computing Systems

 

Cetraro (Italy) – June 27-29, 2011

 

Workshop Agenda

 

Monday, June 27th

Session

Time

Speaker/Activity

 

9:00 – 9:15

Welcome Address

Session I

 

Grid and Cloud Computing 1

 

9:15 – 9:45

V. GETOV

Smart Cloud Computing: Autonomy, Intelligence and Adaptation

 

9:45 – 10:15

P. MARTIN

Provisioning Data-Intensive Workloads in the Cloud

 

10:15 – 10:45

J.L. LUCAS

A Multi-cloud Management Architecture and Early Experiences

 

10:45 – 11:15

COFFEE BREAK

 

11:15 – 11:45

D. PETCU

How is built a mosaic of Clouds

 

11:45 – 12:15

M. KUNZE

Towards High Performance Cloud Computing (HPCC)

 

12:15 – 12:45

R. MAINIERI

The future of cloud computing and its impact on transforming industries

 

12:45 – 13:00

CONCLUDING REMARKS

Session II

 

Grid and Cloud Computing 2

 

17:00 – 17:30

B. SOTOMAYOR

Reliable File Transfers with Globus Online

 

17:30 – 18:00

K. MIURA

RENKEI: A Light-weight Grid Middleware for e-Science Community

 

18:00 – 18:30

COFFEE BREAK

 

18:30 – 19:00

P. KACSUK

Supporting Scientific and Web-2 Communities by Desktop Grids

 

19:00 – 19:30

L. LEFEVRE

Energy efficiency from networks to large scale distributed systems

 

19:30 – 20:00

M. STILLWELL

Dynamic Fractional Resource Scheduling

 

20:00 – 20:10

CONCLUDING REMARKS

 

 

Tuesday, June 28th

Session

Time

Speaker/Activity

Session III

 

New Heterogeneous Architectures and Software for HPC 1

 

9:00 – 9:30

E. D’HOLLANDER

High-performance Computing for low-power systems

 

9:30 – 10:00

L. SOUSA

Distributed Computing on Highly Heterogeneous Systems

 

10:00 – 10:30

F. PINEL

Utilizing GPUs to Solve Large Instances of the Tasks Mapping Problem

 

10:30 – 11:00

R. ANSALONI

Cray’s Approach to Heterogeneous Computing

               

11:00 – 11:30

COFFEE BREAK

 

11:30 – 12:00

C. PEREZ

Resource management system for complex and non-predictably evolving applications

 

12:00 – 12:30

G. BOSILCA

Flexible Development of Dense Linear Algebra Algorithms on Heterogeneous Parallel Architectures with DAGuE

 

12:30 – 13:00

M. SHEIKHALISHAHI

Resource Management and Green Computing

 

13.00 – 13:15

CONCLUDING REMARKS

Session IV

 

New Heterogeneous Architectures and Software for HPC 2

 

17:00 – 17:30

A. SHAFARENKO

New-Age Component Linking: Compilers Must Speak Constraints

 

17:30 – 18:00

A. BENOIT

Energy-aware mappings of series-parallel workflows onto chip multiprocessors

               

18:00 – 18:30

COFFEE BREAK

 

18:30 – 19:00

H. KAISER

ParalleX - A Cure for Scaling-Impaired Parallel Applications

 

19:00 – 19:30

L. MIRTAHERI

An Algebraic Model for a Runtime High Performance Computing Systems Reconfiguration

 

19:30 – 19:45

CONCLUDING REMARKS

 

 

Wednesday, June 29th

Session

Time

Speaker/Activity

Session V

 

Advanced software issues for top scale HPC

 

9:00 – 9:30

E. LAURE

CRESTA - Collaborative Research into Exascale Systemware, Tools and Applications

 

9:30 – 10:00

T. LIPPERT

Amdahl hits the Exascale

 

10:00 – 10:30

S. DOSANJH

On the Path to Exascale

 

10:30 – 11:00

C. SIMMENDINGER

Petascale in CFD

               

11:00 – 11:30

COFFEE BREAK

Session VI

 

Advanced Infrastructures, Projects and Applications

 

11:30 – 13:00

PANEL DISCUSSION

Exascale Computing: from utopia to reality

 

13:00 – 13:15

CONCLUDING REMARKS

 

 

 

 

INVITED SPEAKERS

 

R. Ansaloni

Cray Italy

Italy

A. Benoit

ENS Lyon and Institut Universitaire de France

France

G. Bosilca

University of Tennessee

USA

E. D’Hollander

Ghent University

Belgium

S. Dosanjh

SANDIA National Laboratories

USA

V. Getov

School of Electronics and Computer Science, University of Westminster

United Kingdom

P. Kacsuk

MTA SZTAKI

Hungary

H. Kaiser

Center of Computation and Technology (CCT)

Louisiana State University

USA

M. Kunze

Karlsruhe Institute of Technology

Germany

E. Laure

Royal Institute of Technology Stockholm

Sweden

L. Lefevre

INRIA RESO – LIP

France

T. Lippert

Juelich Supercomputing Centre

Germany

J.L. Lucas

Complutense University of Madrid

Spain

R. Mainieri

IBM Italy

Italy

P. Martin

Queen’s University, Kingston, Ontario

Canada

L. Mirtaheri

National Technical University, Teheran

Iran

K. Miura

Center for Grid Research and Development National Institute of Informatics

Japan

C. Perez

INRIA – LIP

France

D. Petcu

Research Institute e-Austria Timisoara and West University of Timisoara

Romania

F. Pinel

University of Luxembourg

Luxembourg

A. Shafarenko

University of Hertfordshire

United Kingdom

M. Shekhalishahi

University of Calabria

Italy

C. Simmendinger

T-Systems Solutions for Research GmbH

Germany

B. Sotomayor

Computation Institute, University of Chicago

USA

L. Sousa

INESC and TU Lisbon

Portugal

M. Stillwell

INRIA, University of Lyon, LIP Laboratory

France

 

 

 

ABSTRACTS

 

R. Ansaloni

Cray’s Approach to Heterogeneous Computing

 

There seems to be a general consensus among the HPC community, about the impossibility to reach hexascale performance with systems based only on multi-core chips. Heterogeneous nodes where the traditional CPU is combined with many-core accelerators have the potential to provide a much more energy efficient solution capable to overcome the power consumption challenge.

However this emerging hybrid node architecture is expected to pose significant challenges for applications developers, in order to efficiently program these systems and achieve a significant fraction of the available peak performance.

This is certainly the case for today’s GPU-based accelerators with separate memory space, but it also holds true for future unified nodes with CPU and many-core accelerator on chip sharing common memory.

In this talk I’ll describe Cray’s approach to heterogeneous computing and the first Cray hybrid supercomputing system with its unified programming environment.

I’ll also describe Cray’s proposal to extend the OpenMP standard to support a wide range of accelerators.

 

 

  1. Benoit

Energy-aware mappings of series-parallel workflows onto chip multiprocessors

 

In this talk, we will study the problem of mapping streaming applications that can be odelled by a series-parallel graph, onto a 2-dimensional tiled CMP architecture. The objective of the mapping is to minimize the energy consumption, using dynamic voltage scaling techniques, while maintaining a given level of performance, reflected by the rate of processing the data streams. This mapping problem turns out to be NP-hard, but we identify simpler instances, whose optimal solution can be computed by a dynamic programming algorithm in polynomial time. Several heuristics are proposed to tackle the general problem, building upon the theoretical results. Finally, we assess the performance of the heuristics through a set of comprehensive simulations.

 

 

G. Bosilca

Flexible Development of Dense Linear Algebra Algorithms on Heterogeneous Parallel Architectures with DAGuE

 

In the context of dense linear algebra developing algorithms that seamlessly scales to thousands of cores can be achieved using DPLASMA (Distributed PLASMA). DPLASMA take advantage of a novel generic distributed Direct Acyclic Graph Engine (DAGuE). The engine has been designed for fine granularity tasks and thus it enables scaling of tile algorithms, originating in PLASMA, on large distributed memory systems. The underlying DAGuE framework has many appealing features when considering distributed-memory platforms with heterogeneous multicore nodes: DAG representation that is independent of the problem-size, automatic extraction of the communication from the dependencies, overlapping of communication and computation, task prioritization, and architecture-aware scheduling and management of tasks.

 

 

E. D’Hollander

High-performance computing for low-power systems

 

Intelligent low-power devices such as portable phones, tablet computers, embedded systems and sensor networks require low-power solutions for high-performance applications. GPUs have a highly parallel multithreaded architecture and an efficient programming model, but are power-hungry. On the other hand field programmable gate arrays have a highly configurable parallel architecture and a substantially better energy efficiency, but are difficult to program. An approach is presented which maps the GPU architecture and programming model onto the configuration synthesis and the programming of FPGAs. Implementation details, benefits and trade-offs are discussed. In particular the architecture, memory and communication issues are addressed when porting a biomedical image application with a 20-fold GPU speedup onto an FPGA accelerator.

 

 

S. Dosanjh

On the path to Exascale

This presentation will describe technical and programmatic progress in Exascale computing. The Exascale Initiative was included in the U.S. Department of Energy’s budget starting in the U.S. Government’s 2012 fiscal year. Several partnerships are forming, a number of projects have already been funded and several co-design centers are being planned. These co-design centers will develop applications for Exascale systems and will provide feedback to computer companies on the impact of computer architecture changes on application performance. An enabling technology for these efforts is the Structural Simulation Toolkit (SST), which allows hardware/software co-simulation. Another key aspect of this work is the development of mini-applications. One difficulty of co-design in high performance computing (HPC) is the complexity of HPC applications, many of which have millions of lines of code. Mini-applications, which are typically one-thousand lines of code, have the potential to reduce the complexity of co-design by a factor of one-thousand. Mini-applications representative of finite-elements, molecular dynamics, contact algorithms, and shock physics are described. The performance of these mini-applications on different computer systems is compared to the performance of the full application.

 

 

 

V. Getov

Smart Cloud Computing: Autonomy, Intelligence and Adaptation

 

In recent years, cloud computing has rapidly emerged as a widely accepted computing paradigm. The cloud computing paradigm emerged shortly after the introduction of the “invisible” grid concepts. The research and development community has quickly reached consensus on the core cloud properties such as on-demand computing resources, elastic scaling, elimination of up-front capital and operational expenses, and establishing a pay-as-you-go business model for computing and information technology services. With the widespread adoption of virtualization, service-oriented architectures, and utility computing, there is also consensus on the enabling technologies necessary to support this new consumption and delivery model for information technology services. Additionally, the need to meet quality-of-service requirements and service-level agreements, including security, is well understood. Important limitations of the current cloud computing systems include lack of sufficient autonomy and intelligence based on the existence of dynamic non-functional properties. Such properties together with support for adaptation can change completely the quality of computerised services provided by the future cloud systems. In this presentation, we plan to address these issues and demonstrate the significant advantages provided to the users by the smart cloud computing platforms. Some of the available directions for future work are also discussed.

 

 

 

P. Kacsuk

Supporting scientific and Web-2 communities by desktop grids

 

Although the nature of scientific and Web-2 communities are different they both require more and more processing power to run compute-intensive applications for the sake of community members. Scientific communities typically require run large parameter sweep based simulations that are ideal for both volunteer and institutional desktop grids. Web-2 communities use community portals like facebook through which they organize their social relationship and activities. Such activities also could include time-consuming processing, like water marking the photos of community members.

Both communities prefer to use affordable distributed infrastructures in order to minimize the processing cost. Such a low-cost infrastructure could be a volunteer or institutional desktop grid. The EU EDGI project developed technology and infrastructure to support scientific communities by desktop grids, while the Web2Grid Hungarian national project provides desktop grid technology and the corresponding business model for Web-2 communities.

The talk will discuss the main characteristics of such desktop grid support and also shows the major architectural components of the supporting architecture. The application areas and the possible business models of using volunteer desktops will also be addressed in the talk.

 

 

 

H. Kaiser

ParalleX – A Cure for Scaling-Impaired Parallel Applications

 

High Performance Computing is experiencing a phase change with the challenges of programming and management of heterogeneous multicore systems architectures and large scale systems configurations. It is estimated that by the end of this decade Exaflops computing systems requiring hundreds of millions of cores demanding multi-billion-way parallelism with a power budgets of 50Gflops/watt may emerge. At the same time, there are many scaling-challenged applications that although taking many weeks to complete, cannot scale even to a thousand cores using conventional distributed programming models. This talk describes an experimental methodology, ParalleX, that addresses these challenges through a change in the fundamental model of parallel computation from that of the communicating sequential processes (e.g. MPI) to an innovative synthesis of concepts involving message-driven work-queue computation in the context of a global address space. We will present early but promising results of tests using a proof-of-concept runtime system implementation guiding future work towards full scale parallel programming.

 

 

 

M. Kunze

Towards High Performance Cloud Computing (HPCC)

 

Today’s HPC clusters are typically operated and administrated by a single organization. Demand is fluctuating, however, resulting in periods where dedicated resources are either underutilized or overloaded. A cloud-based Infrastructure-as-a-Service (IaaS) model for HPC promises cost savings and more flexibility, as it allows to move away from physically owned and potentially underutilized HPC clusters to virtualized and elastic HPC resources available on-demand from consolidated large cloud computing providers.

The talk discusses specific issues of the introduction of a resource virtualization layer in HPC environments such as latency, jitter and performance.

 

 

 

E. Laure

CRESTA - Collaborative Research into Exascale Systemware, Tools and Applications

 

For the past thirty years, the need for ever greater supercomputer performance has driven the development of many computing technologies which have subsequently been exploited in the mass market. Delivering an exaflop (or 10^18 calculations per second) by the end of this decade is the challenge that the supercomputing community worldwide has set itself. The Collaborative Research into Exascale Systemware, Tools and Applications project (CRESTA) brings together four of Europe’s leading supercomputing centres, with one of the world’s major equipment vendors, two of Europe’s leading programming tools providers and six application and problem owners to explore how the exaflop challenge can be met.

CRESTA focuses on the use of six applications with exascale potential and uses them as co-design vehicles to develop: the development environment, algorithms and libraries, user tools, and the underpinning and cross-cutting technologies required to support the execution of applications at the exascale. The applications represented in CRESTA have been chosen as a representative sample from across the supercomputing domain including: biomolecular systems, fusion energy, the virtual physiological human, numerical weather prediction and engineering.

No one organisation, be they a hardware or software vendor or service provider can deliver the necessary range of technological innovations required to enable computing at the exascale. This is recognised through the on-going work of the International Exascale Software Project and, in Europe, the European Exascale Software Initiative. CRESTA will actively engage with European and International collaborative activities to ensure that Europe plays its full role worldwide. Over its 36 month duration the project will deliver key, exploitable technologies that will allow the co-design applications to successfully execute on multi-petaflop systems in preparation for the first exascale systems towards the end of this decade.

In this talk we will give an overview of CRESTA, outline the challenges we face in reaching exascale performance and how CRESTA intends to respond to them.

 

 

L. Lefevre

Energy efficiency from networks to large scale distributed systems

 

Energy efficiency begins to be largely adressed for distributed systems like Grids, Clouds or networks. These large-scale distributed systems need an ever-increasing amount of energy and  urgently require effective and scalable solutions to manage and limit their electrical consumption.

The challenge is to coordinate all low-level improvements at the middleware level to improve the energy efficiency of the overall systems. Resource-management solutions can indeed benefit from a broader view to pool the resources and to share them according to the needs of each user. During, this talk I will describe some solutions adopted for large scale monitoring of distributed infrastructures. This talk will present our work on energy efficient approaches for reservation based large scale distributed systems. I will present the ERIDIS model, an Energy-efficient Reservation Infrastructure for large-scale DIstributed Systems which provides a unified and generic framework to manage resources from Grids, Clouds and dedicated networks in an energy-efficient way.

 

 

T. Lippert

Amdahl hits the Exascale

 

With the advent of Petascale supercomputers the scalability of scientific application codes on such systems becomes a most pressing issue. The current world record holder as far as the number of concurrent cores is concerned, the IBM Blue Gene /P system "JUGENE" at the Jülich Supercomputing Centre with 294 912 cores will soon be displaced by systems comprising millions of cores. In this talk I am going to review the constraints put on scalability by Amdahl’s and Gustafson’s Laws. I am proposing architectural concepts that are optimized for the concurrency hierarchies of application codes and I will give a glimpse on the DEEP Exascale supercomputer project, to be funded by the European Community, that explicitly addresses concurrency hierarchies on the hardware, system software and application software level.

 

 

 

J.L. Lucas

A Multi-cloud Management Architecture and Early Experiences

 

In this talk we present a cloud broker architecture for deploying virtual infrastructures across multiple IaaS clouds. We analyse the main challenges of the brokering problem in multi-cloud environments, and we propose different scheduling policies, based on several criteria, that can guide the brokering decisions. Moreover, we present some preliminary results to prove the benefits of this broker architecture in multi-cloud environments in the execution of virtualized computing clusters.

 

 

 

R. Mainieri

The future of cloud computing and its impact on transforming industries

 

In its centennial IBM demonstrated that long-term success requires vision, strategy and managing for the long term. Deciding how and where investing and allocating resources, shaping talent development and taking decisive action. Three years ago IBM started talking about smarter planet and how it was driving innovation across industries. On a Smarter Planet, successful companies think differently about computing and realize IT infrastructure that is designed for data, tuned to the task, and managed in the cloud.

The talk will illustrate IBM cloud computing vision, strategy and management plan for the long term: the smarter computing for a smarter planet. It will discuss about resources invested in research and development, present the main important global projects and how some specific actions such as laboratories around the world, new cloud data center, software companies acquisitions, fostering the adoption of open standards is going to lead a sustainable industry transformation in specific industries.

 

 

P. Martin

Provisioning Data-Intensive Workloads in the Cloud

 

Data-intensive workloads involve significant amounts of data access. The individual requests composing these workloads can vary from complex processing of large data sets, such as in business analytics and OLAP workloads, to small transactions randomly accessing individual records within the large data sets, such as in OLTP workloads. In the cloud, applications generating these workloads may be built on different frameworks from shared-nothing database management systems to MapReduce or even some mix of the two. We believe that effective provisioning methods for data-intensive workloads in the cloud must consider where to place the data in the cloud when they are allocating resources to the workloads.

In the talk, I will provide an overview of an approach that provisions a workload in a public cloud while simultaneously placing the data in an optimal configuration for the workload. We solve this data placement problem by solving two subproblems, namely how to first partition the data to suit the workload and then how to allocate data partitions to virtual machines in the cloud.

 

 

L. Mirtaheri

An Algebraic Model for a Runtime High Performance Computing Systems Reconfiguration

 

Tailored High Performance Computing Systems (HPCS) represent the best performance because their configuration is customized regarding the features of the problem to be solved. 21th century processes are dynamic in nature. However, this dynamicity in nature is caused either because of the dimensions of today’s problems being undetermined or the dynamicity of the underlying platform. A drawback of this dynamicity is for the systems customized at design phase facing challenges at runtime and consequently showing worse performance. The reason for these challenges might be for the processes with dynamic nature being in the opposite direction as that of the system configuration. Many approaches like dynamic reconfiguration with dynamic load balancing are introduced to solve the challenges. In this talk, I will present a mathematical model based on vector algebra for system reconfiguration. This model determines the element (process) causing the opposition and discovers the reason of that regarding both software and hardware at runtime. Results of the presented model show that by defining a general status vector whose direction is towards reaching high performance and size is based on the initial features and explicit requirements of the problem and also by defining a vector for each process in the problem at runtime, we can trace changes in the directions and find out the reason, as well.

 

 

K. Miura

RENKEI: A Light-weight Grid Middleware for e-Science Community

 

The “RENKEI (Resources Linkages for e-Science) Project” started in September 2008 under the auspices of the Ministry of Education, Culture, Sports, Science and Technology (MEXT). In this project, a new light-weight grid middleware and software tools are developed, in order to provide the user-friendly connection between the major grid environment and users’ local computing environment. In particular, technology for enabling the flexible and seamless accesses to the national computing center level and the departmental/laboratory level resources, such as computers, storage and databases, is one of the key objectives. Another key ingredient of this project is “interoperability” with the major international grids along the line of OGF standardization activities, such as GIN, PGI, SAGA and RNS.

With the RENKEI workflow tool users can submit jobs from the local environment or even from a cloud to the “TSUBAME2 supercomputer system at the Tokyo Institute of Technology, via the networking infrastructure called “SINET4”, for example..

 

http://www.naregi.org/index_e.html

http://www.e-sciren.org

 

 

 

C. Perez

Resource management system for complex and non-predictably evolving applications

 

High-performance scientific applications are becoming increasingly complex, in particular because of the coupling of parallel codes. This results in applications having a complex structure, characterized by multiple deploy-time parameters, such as the number of processes of each code. In order to optimize the performance of these applications, the parameters have to be carefully chosen, a process which is highly resource dependent. Moreover, some applications are (non-predictably) changing their resource requirements during their execution.

Abstractions provided by current Resource Management Systems (RMS) appears insufficient to efficiently select resources for such applications. This talks will discuss CooRM, an RMS architecture to support such applications. It will also show how applications can benefit from it to achieve a more efficient resource usage.

 

 

D. Petcu

How is built a mosaic of Clouds

 

The developers of Cloud compliant application are facing the dilemma of which Cloud provider API to select knowing that later on this decision will be lead to a provider dependence. mOSAIC (www.mosaic-cloud.eu) is addressing this issue by proposing a vendor and language-independent API for developing Cloud compliant applications. Moreover it has promise to build a Platform as a Service solution that will allow the selection at run-time of the Cloud services from multiple offers based on semantic processing and agent technologies.

The presentation will focus on the problems raised by implementing the Sky computing concept (cluster of Clouds), the issues of Virtual Cluster deployment on top of multiple Clouds, and the technical solutions that were adopted by mOSAIC.

 

 

 

F. Pinel

Utilizing GPUs to Solve Large Instances of the Tasks Mapping Problem

 

In this work, we present and analyze a local search algorithm designed to solve large instances of the independent tasks mapping problem. The genesis of the algorithm is the sensitivity analysis of a cellular genetic algorithm, which illustrates the benefits of such an analysis for algorithmic design activities.

Moreover, to solve instances of up to 65,536 tasks over 2,048 machines and to achieve scalability, the local search is accelerated by utilizing a GPU. The proposed local search algorithm improves the results of other well-known algorithms in the modern literature.

 

 

 

A. Shafarenko

New-Age Component Linking: Compilers Must Speak Constraints

 

This presentation will focus on the agenda of the FP7 project ADVANCE. The project is seeking to redefine the concept of component technology by investigating the possibility of exporting out of a component not only interfaces, but functional and extrafunctional constraints as well. The new, rich component interface requires a hardware model for the aggregation and resolution of constraints, but if that is available, then a much more targeted approach can be defined for compiling distributed applications down to heterogeneous architectures.

Constraint aggregation can deliver the missing global (program-wide) intelligence to a component compiler and enable it to tune up the code for alternative hardware, communication harness or memory model.

The talk will discuss these ideas in some detail and provide a sketch of a Constraint Aggregation Language, developed in the project.

 

 

 

M. Sheikhalishahi

Resource Management and Green Computing

 

In this talk, we review green and performance aspects of resource management. Components of resource management system are explored in detail to seek new developments by exploiting contemporary emerging technologies, computing paradigms, energy efficient operations, etc. to define, design and develop new metrics, techniques, mechanisms, models, policies, and algorithms. In addition, modeling relationships within and between various layers are considered to present some novel approaches. In particular, as a case study we define and model resource contention metric and consequently we develop two energy aware consolidation policies.

 

 

 

C. Simmendinger

Petascale in CFD

 

In this talk we outline a highly scalable and also highly efficient PGAS implementation for the CFD solver TAU.

TAU is an unstructured RANS CFD Solver and one of the key applications in the European aerospace Eco-System. We show that our implementation is able to scale to petascale systems within the constraints of a single regular production run.

To reach this goal, we have implemented a novel approach for shared memory parallelization, which is based on an asynchronous thread pool model. Due to the asynchronous operation the model is implicitly load balanced, free of global barriers and also allows for a near-optimal overlap of communication and computation. We have complemented this model with an asynchronous global communication strategy, in which we made use of the PGAS API of GPI.

We briefly outline this strategy and show first results.

 

 

 

 

B. Sotomayor

Reliable File Transfers with Globus Online

 

File transfer is both a critical and frustrating aspect of high-performance computing. For a relatively mundane task, moving terabytes of data reliably and efficiently can be surprisingly complicated. One must discover endpoints, determine available protocols, negotiate firewalls, configure software, manage space, negotiate authentication, configure protocols, detect and respond to failures, determine expected and actual performance, identify, diagnose and correct network misconfigurations, integrate with file systems, and a host of other things.  Automating these makes users’ lives much, much easier.

In this presentation I will provide a technical overview of Globus Online: a fast, reliable file transfer service that simplifies large-scale, secure data movement without requiring construction of custom end-to-end systems. The presentation will include a demonstration as well as highlights from several user case studies.

 

 

 

L. Sousa

Distributed computing on highly heterogeneous systems

 

The approaches used in traditional heterogeneous distributed computing to achieve efficient execution across a set of architecturally similar compute nodes (such as CPU-only distributed systems) are only partially applicable to the systems with a high degree of architectural heterogeneity. For example, when considering clusters of multi-core CPUs equipped with specialized accelerators/co-processors, such as GPUs. This is mainly due to the fact that efficient load balancing decisions must also be made at the level of each compute node, in addition to the decisions made at the overall system level.

In this work, we propose a method for dynamic load balancing and performance modeling for heterogeneous distributed systems, when all available compute nodes and all devices in compute nodes are employed for collaborative execution. Contrary to the common practice in task scheduling, we do not make any pre-execution assumptions to ease the modeling of either the application or the system. The heterogeneous system is modeled as it is, by monitoring and recording the behavior of the essential parts affecting the performance.

The parallel execution requires explicit data transfers to be performed prior to and after any actual computation. In order to exploit the concurrency between data transfers and computation, we investigate herein the processing in an iterative multi-installment divisible-load space at both, overall system and compute node levels. Namely, the proposed approach dispatches the load using many sub-loads, whose size is carefully determined to allow the best overlap between communication and computation. The load division is performed according to several factors: i) current performance models (per-device and per-node), ii) modeled bidirectional interconnection bandwidths (between compute nodes and between devices in each compute node), and iii) the amount of supported concurrency by the node/device hardware.

The problem that we tackle herein is how to find task distribution, such that the overall application make-span is the shortest possible according to the current performance models of devices, interconnections and compute nodes. Performance models are application-centric piece-wise linear approximations constructed during the application runtime to direct further load-balancing decisions according to the exact task requirements.

The proposed approach is evaluated in a real distributed environment consisting of quad-core CPU+GPU nodes, for iterative scientific applications, such as matrix multiplication (DGEMM), and Fast Fourier 2D batch Transform (FFT). Due to ability to overlap execution of several sub-loads, our approach results in more accurate performance models comparing to the current state-of-the-art approaches.

 

 

 

M. Stillwell

Dynamic Fractional Resource Scheduling

 

Dynamic Fractional Resource Scheduling is a novel approach for scheduling jobs on cluster computing platforms. Its key feature is the use of virtual machine technology to share \emph{fractional} node resources in a precise and controlled manner. Our previous work focused on the development of task placement and resource allocation heuristics to maximize an objective metric correlated with job performance, and our results were based on simulation experiments run against real traces and established models. We are currently performing a new round of experiments using synthetic workloads that launch parallel benchmark applications in multiple virtual machine instances on a real cluster. Our goals are to see how well our ideas work in practice and determine how they can be improved, and to develop empirically validated models of the interaction between resource allocation decisions and application performance.