HPC 2016

 

High Performance Computing

 

FROM clouds and BIG DATA to EXASCALE AND BEYOND

 

An International Advanced Workshop

 

 

 

June 27 – July 1, 2016, Cetraro, Italy

 

 

image002

 

 

Programme Committee

Organizers

Sponsors &

Media Partners

Speakers

Agenda

Chairmen

Panels

Abstracts

  

 

Final Programme

 

Programme Committee

L. GRANDINETTI (Chair)

University of Calabria

F. BAETKE

Hewlett Packard

P. BECKMAN

Argonne National Lab.

C. CATLETT

Argonne National Lab. and University of Chicago

G. DE PIETRO

National Research Council of Italy

J. DONGARRA

University of Tennessee

S. S. DOSANJH

Lawrence Berkeley National Lab.

I. FOSTER

Argonne National Lab. and University of Chicago

G. FOX

Indiana University

W. GENTZSCH

The UberCloud

V. GETOV

University of Westminster

G. JOUBERT

Technical University Clausthal

E. LAURE

Royal Institute of Technology Stockholm

C. A. LEE

The Aerospace Corporation

T. LIPPERT

Juelich Supercomputing Centre

I. LLORENTE

Universidad Complutense de Madrid

B. LUCAS

University of Southern California

S. MATSUOKA

Tokyo Institute of Technology

P. MESSINA

Argonne National Laboratory

V. PASCUCCI

University of Utah and Pacific Northwest National Lab

N. PETKOV

University of Groningen

J. QIU

School of Informatics and Computing and Indiana University

M. SEAGER

INTEL

S. SEKIGUCHI

National Institute of Industrial Science and Technology

T. STERLING

Indiana University

R. STEVENS

Argonne National Laboratory

D. TALIA

University of Calabria

W. TANG

Princeton University

ITALY

 

U.S.A.

 

U.S.A.

 

U.S.A.

 

ITALY

 

U.S.A.

 

U.S.A.

 

U.S.A.

 

U.S.A.

 

GERMANY

 

U.K.

 

GERMANY

 

SWEDEN

 

U.S.A.

 

GERMANY

 

SPAIN

 

U.S.A.

 

JAPAN

 

U.S.A.

 

U.S.A.

 

NETHERLANDS

 

U.S.A.

 

U.S.A.

 

JAPAN

 

U.S.A.

 

U.S.A.

 

ITALY

 

U.S.A.

 

Co-Organizers

L. GRANDINETTI

Center of Excellence for High Performance Computing, UNICAL, Italy

T. LIPPERT

Institute for Advanced Simulation, Juelich Supercomputing Centre, Germany

Organizing Committee

L. GRANDINETTI (Co-Chair)

ITALY

T. LIPPERT (Co-Chair)

GERMANY

M. ALBAALI

(OMAN)

C. CATLETT

(U.S.A.)

J. DONGARRA

(U.S.A.)

W. GENTZSCH

(GERMANY)

O. PISACANE

(ITALY)

M. SHEIKHALISHAHI

(ITALY)

 

 

 

 

Sponsors

 

 

AMAZON WEB SERVICES

logo_amazon

ARM

ARM

CRAY

CSCS – SWISS NATIONAL SUPERCOMPUTING CENTER

HEWLETT PACKARD ENTERPRISE

INTEL

logo_intel

JUELICH SUPERCOMPUTING CENTER, Germany

logo_fzj

MELLANOX TECHNOLOGIES

logo_mellanox

MICRON TECHNOLOGY

micron

NEC

SCHNEIDER ELECTRIC

DIPARTIMENTO DI INGEGNERIA DELL’INNOVAZIONE – UNIVERSITÀ DEL SALENTO

DipIngInn_solo giallo

UNIVERSITÀ DELLA CALABRIA

UNIVERSITÀ DELLA CALABRIA

NATIONAL RESEARCH COUNCIL OF ITALY - ICAR - INSTITUTE FOR HIGH PERFORMANCE COMPUTING AND NETWORKS

ICAR

 

 

Media Partners

 

 

logo_amazon

 

Free Amazon web Service credits for all HPC 2016 delegates

 

Amazon is very pleased to be able to provide $200 in service credits to all HPC 2016 delegates. Amazon Web Services provides a collection of scalable high performance and data-intensive computing services, storage, connectivity, and integration tools. AWS allows you to increase the speed of research and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand. You have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

 

 

 

 

 

HPCwire_5-inches_RGB

 

 

 

 

 

ubercloud

UberCloud is the online community and marketplace platform for engineers and scientists to discover, try, and buy computing time, on demand, in the Cloud. Our novel software containers facilitate software packaging and portability, simplify access and use of cloud resources, and ease software maintenance and support for end-users and their service providers.

 

Please register for the UberCloud Voice Newsletter, or for performing an HPC Experiment in the Cloud.

 

 

 

 

 

Speakers

 

Jim Ahrens

Los Alamos National Laboratory

Los Alamos, NM

USA

 

James A. Ang

Exascale Computing Program

Center for Computing Research

Sandia National Laboratories

Albuquerque, NM

USA

 

Frank Baetke

HPC in Academia and Scientific Research

Hewlett Packard

Palo Alto, CA

USA

 

Peter Beckman

Exascale Technology and Computing Institute

Argonne National Laboratory

Argonne, IL

USA

 

Isabel Beichl

National Institute of Standards and Technology

Gaithersberg, MD

USA

 

Euro Beinat

University of Salzburg

Salzburg

AUSTRIA

 

Budhendra Bhaduri

Urban/GIS Center

Oak Ridge National Laboratory

Oak Ridge, TN

USA

 

Gil Bloch

Mellanox Technologies

Sunnyvale, CA

USA

 

Brendan Bouffler

Scientific Computing

Amazon Web Services

London

UNITED KINGDOM

 

Ronald Brightwell

Sandia National Laboratories

Albuquerque, NM

USA

 

Charlie Catlett

Math & Computer Science Div.

Argonne National Laboratory

Argonne, IL

and

Computation Institute of

The University of Chicago and Argonne National Laboratory

Chicago, IL

USA

 

Eugenio Cesario

National Research Council of Italy

ICAR – CNR

Rende – Cosenza

ITALY

 

David Chadwick

University of Kent

Canterbury

UNITED KINGDOM

 

Marcello Coppola

STMicroelectronics

Grenoble

FRANCE

 

Beniamino Di Martino

Department of Industrial and Information Engineering

University of Naples 2

Naples

ITALY

 

Jack Dongarra

Innovative Computing Laboratory

Computer Science Dept.

University of Tennessee

Knoxville, TN

USA

 

Sudip S. Dosanjh

National Energy Research Scientific Computing Center

Lawrence Berkeley National Laboratory

Berkeley, CA

USA

 

Ian Foster

Math & Computer Science Div.

Argonne National Laboratory

Argonne, IL

and

Dept of Computer Science

The University of Chicago

Chicago, IL

USA

 

Geoffrey Fox

Community Grid Computing Laboratory

Indiana University

Bloomington, IN

USA

 

Wolfgang Gentzsch

The UberCloud

GERMANY

 

Vladimir Getov

Department of Engineering

Faculty of Science and Technology

University of Westminster

London

UNITED KINGDOM

 

Brett Goldstein

University of Chicago

Chicago, IL

USA

 

Sergei Gorlatch

Universitaet Muenster

Institut für Informatik

Muenster

GERMANY

 

Torsten Hoefler

Scalable Parallel Computing Lab

Computer Science Department

ETH Zurich

Zurich

SWITZERLAND

 

Takeo Hosomi

System Platform Research Laboratories

NEC

Kanagawa

JAPAN

 

Carl Kesselman

Information Sciences Institute

University of Southern California

Marina del Rey, Los Angeles, CA

USA

 

David Keyes

King Abdullah University of Science and Technology

Thuwal

SAUDI ARABIA

 

Julia Lane

Wagner School

Center for Urban Science and Progress

New York University

New York, NY

USA

 

Craig Lee

Computer Systems Research Dept.

The Aerospace Corporation

El Segundo, CA

USA

 

Thomas Lippert

Juelich Supercomputing Centre

Forschungszentrum Juelich

Juelich

GERMANY

 

Yutong Lu

School of Computer Science

National University of Defense Technology

Changsha, Hunan Province

CHINA

 

Stefano Markidis

KTH Royal Institute of Technology

Stockholm

SWEDEN

 

Patrick Martin

School of Computing

Queen’s University

Kingston, Ontario

CANADA

 

Satoshi Matsuoka

Global Scientific Information and Computing Center

& Department of Mathematical and Computing Sciences

Tokyo Institute of Technology

Tokyo

JAPAN

 

Paul Messina

DOE Exascale Computing Project

Argonne National Laboratory

Argonne, IL

USA

 

Jarek Nabrzyski

Department of Computer Science and Engineering

University of Notre Dame

and

Center for Research Computing

and

Great Lakes Consortium for Petascale Computation

Notre Dame, Indiana

USA

 

Stefano Nativi

National Resarch Council of Italy

Florence

ITALY

 

Manish Parashar

Dept. of Computer Science

Rutgers University

Piscataway, NJ

USA

 

Valerio Pascucci

University of Utah

Center for Extreme Data Management, Analysis and Visualization,

Scientific Computing and Imaging Institute,

School of Computing

and

Pacific Northwest National Laboratory

Salt Lake City, UT

USA

 

Stephen Pawlowski

Advanced Computing Solutions

Micron Technology

Portland, OR

USA

 

Kristen Pudenz

Quantum Applications Engineering

Lockheed Martin

Fort Worth, TX

USA

 

Judy Qiu

School of Informatics and Computing

and

Pervasive Technology Institute

Indiana University

USA

 

Ulrich Ruede

Lehrstuhl fuer Simulation

Universitaet Erlangen-Nuernberg

Erlangen

GERMANY

 

Sébastien Rumley

Lightwave Research Laboratory

Department of Electrical Engineering

School of Engineering and Applied Science

Columbia University

New York, NY

USA

 

Thomas Schulthess

CSCS

Swiss National Supercomputing Centre

Lugano

and

ETH

Zurich

Switzerland

 

John Shalf

Lawrence Berkeley National Laboratory

Computing Research Division

and

National Energy Research Supercomputing Center

Berkeley, CA

USA

 

Sadasivan Shankar

Harvard University

School of Engineering and Applied Sciences

Cambridge, MA

USA

 

Karl Solchenbach

Intel

Exascale Labs Europe

GERMANY

 

Thomas Sterling

School of Informatics and Computing

and

CREST Center for Research in Extreme Scale Technologies

Indiana University

Bloomington, IN

USA

 

Rick Stevens

Argonne National Laboratory

and

Department of Computer Science, The University of Chicago

Argonne and Chicago

USA

 

Francis Sullivan

IDA/Center for Computing Sciences

Bowie, MD

USA

 

Domenico Talia

Department of Computer Engineering, Electronics, and Systems

University of Calabria

ITALY

 

William Tang

Princeton University

Dept. of Astrophysical Sciences, Plasma Physics Section

Fusion Simulation Program

Princeton Plasma Physics Lab.

and

Princeton Institute for Computational Science and Engineering

Princeton

USA

 

Adrian Tate

Cray EMEA Research Lab

United Kingdom

 

Steve Tuecke

Computation Institute

The University of Chicago

Chicago, IL

USA

 

Eric Van Hensbergen

ARM Research

Austin, TX

USA

 

Vladimir Voevodin

Moscow State University

Research Computing Center

Moscow

RUSSIA

 

 

 

 

Workshop Agenda

Monday, June 27th

Session

Time

Speaker/Activity

 

9:00 – 9:15

Welcome Address

Session I

 

State of the Art and Future Scenarios

 

9:15 – 9:45

J. Dongarra

An Overview of HPC and the Changing Rules at Exascale

 

9:45 – 10:15

P. BECKMAN

What can we Change?

 

10:15 – 10:45

I. FOSTER

Accelerating discovery with science services

 

10:45 – 11:15

S. MATSUOKA

From FLOPS to BYTES: Distruptive End of Moore’s Law beyond Exascale

 

11:15 – 11:45

COFFEE BREAK

 

11:45 – 12:15

R. STEVENS

DOE-NCI Joint Development of Advanced Computing Solutions for Cancer

 

12:15 – 12:45

C. Kesselman

Big Data and The Internet of Important Things

 

12:45 – 13:00

CONCLUDING REMARKS

Session II

 

Emerging Computer Systems and Solutions

 

16:30 – 17:00

F. BAETKE

Trends in System Architectures: Towards “The Machine” and Beyond

 

17:00 – 17:25

K. SOLCHENBACH

The Challenges of Exascale Computing

 

17:25 – 17:50

A. tATE

Towards Support of Highly-Varied Workloads on Supercomputers

 

17:50 – 18:15

E. VAN HENSBERGEN

ARM’s Path to Exascale

 

18:15 – 18:45

COFFEE BREAK

 

18:45 – 19:10

T.B.A.

 

19:10 – 19:35

t. HOSOMI

Big Data Analytics on Vector Processor

 

19:35 – 20:00

B. BOUFFLER

HPC clusters as code in the (almost) infinite cloud

 

20:00 – 20:10

CONCLUDING REMARKS

 

 

Tuesday, June 28th

Session

Time

Speaker/Activity

Session III

 

Advances in HPC Technology and Systems

 

9:00 – 9:25

S. GORLATCH

Using Modern C++ with Multi-Staging for Unified Programming on GPU Systems

 

9:25 – 9:50

M. COPPOLA

Generic Packet Processing Unit a novel way to implement low cost and efficient FPGA computing

 

9:50 – 10:15

V. GETOV

Application-Specific Energy Modeling of Multi-Core Processors

 

10:15 – 10:40

J. NABRZYSKI

Topology, Application and User Behavior Aware Job Resource Management in Multidimensional Torus-Based HPC Systems

 

10:40 – 11:05

J. SHALF

Open Source HPC Hardware

 

11:05 – 11:15

CONCLUDING REMARKS

 

11:15 – 11:45

COFFEE BREAK

Session IV

 

Software and Architecture for Extreme Scale Computing I

 

11:45 – 12:10

S. DOSANJH

Preparing Applications for Next Generation Architectures

 

12:10 – 12:35

G. FOX

Application and Software Classifications that motivate Big Data and Big Simulation Convergence

 

12:35 – 13:00

R. BRIGHTWELL

The Myth of a Converged Software Stack for HPC and Big Data

Session V

 

Software and Architecture for Extreme Scale Computing II

 

16:30 – 17:00

T. LIPPERT

t.b.a.

 

17:00 – 17:25

J. Ahrens

Envisioning Human-in-the-loop Interactions with Massive Scientific Simulations and Experiments in the Age of Exascale HPC and Big Data

 

17:25 – 17:50

S. MARKIDIS

Towards a Continuous Description of Compute and Idle Phases in Scientific Parallel Applications

 

17:50 – 18:15

v. voevodin

How Well Do We Know Properties of Parallel Algorithms?

 

18:15 – 18:45

COFFEE BREAK

 

18:45 – 19:15

T. HOEFLER

Progress in automatic GPU compilation and why you want to run MPI on your GPU

 

19:15 – 19:45

G. Bloch

Exascale by Co-Design Architecture

 

19:45 – 20:00

CONCLUDING REMARKS

 

 

Wednesday, June 29th

Session

Time

Speaker/Activity

Session VI

 

Exascale Computing and Beyond

 

9:00 – 9:25

R. STEVENS

The potential to augment HPC systems with Neuromorphic Computing Accelerators

 

9:25 – 9:50

T. STERLING

The Asymptotic Computer - Undoing the Damage

 

9:50 – 10:15

S. RUMLEY

Role of Optical Interconnects in Extreme Scale Computing

 

10:15 – 10:40

S. PAWLOWSKI

Convergence of Memory and Computing

 

10:40 – 11:05

P. MESSINA

A Path to Exascale

11:05 – 11:35

COFFEE BREAK

 

11:35 – 12:00

T. SCHULTHESS

t.b.a.

 

12:00 – 12:25

J. ANG

Exascale System and Node Architectures: The Summit and Beyond

 

12:25 – 12:50

J. SHALF

Exascale will be successful by 2025 ….and then what?

 

12:50 – 13:00

CONCLUDING REMARKS

Session VII

 

Cloud Computing Technology and Systems

 

15:45 – 16:10

C. LEE

Update on a Keystone-based General Federation Agent

 

16:10 – 16:35

S. TUECKE

Globus Auth Identity and Access Management

 

16:35 - 17:00

S. NATIVI

High Performance Analytics Services and Infrastructures for Addressing Global Changes: the GEOSS perspective

 

17:00 – 17:25

D. CHADWICK

Homogeneous authorization policies in heterogeneous IAAS clouds

 

17:25 – 17:50

B. DI MARTINO

Semantic Technologies to support Cloud Applications’ Portability and Interoperability on Multiple Clouds

 

17:50 – 18:15

J. QIU

Convergence of HPC and Clouds for Large-Scale Data Enabled Science

 

18:15 – 18:45

COFFEE BREAK

 

18:45 – 20:00

PANEL DISCUSSION:What is Capable Exascale Computing?

Chairman: P. Messina, Argonne National Laboratory, DOE, U.S.A.

 

 

Thursday, June 30th

Session

Time

Speaker/Activity

Session VIII

 

BIG DATA Challenges and Perspectives I

 

9:00 – 9:25

V. PASCUCCI

Extreme Data Management Analysis and Visualization for Exascale Supercomputers

 

9:25 – 9:50

F. SULLIVAN

Merging Data Science and Large Scale Computational Modeling

 

9:40 – 10:15

G. FOX

Implementing parts of HPC-ABDS in a multi-disciplinary collaboration

 

10:15 – 10:40

M. PARASHAR

Big Data Challenges in Simulation-based Science

 

10:40 – 11:05

Y. LU

Convergence of HPC and Bigdata

11:05 – 11:35

COFFEE BREAK

 

11:35 – 12:00

P. MARTIN

Consumable Analytics for Big Data

 

12:00 – 12:50

D. TALIA

From Clouds to Exascale: Programming Issues in Big Data Analysis

 

12:50 – 13:00

CONCLUDING REMARKS

Session IX

 

BIG DATA Challenges and Perspectives II

 

16:00 – 16:25

C. CATLETT

A Proposed Exascale Agenda for Urban Sciences

 

16:25 – 16:50

J. LANE

Borrowing Concepts from Social Media to Enable Integration of Large-Scale Sensitive Data Sets

 

16:50 – 17:15

B. GOLDSTEIN

The New Code of Ethics: Justice and Transparency in the Age of Big Data and Deep Learning

 

17:15 – 17:40

B. BHADURI

Landscape Dynamics, Geographic Data and Scalable Computing: The Oak Ridge Experience

 

17:40 – 18:00

E. BEINAT

Collective Sensing and large-scale predictions: two case studies

 

18:00 – 18:30

COFFEE BREAK

 

18:30 – 20:00

PANEL DISCUSSION:The Potential for Deep Learning to Harness Increasing Flows of Urban Data

Chairman: C. Catlett, Argonne National Laboratory, DOE, U.S.A.

 

 

Friday, July 1st

Session

Time

Speaker/Activity

Session X

 

Challenging applications of HPC and Clouds

 

9:00 – 9:30

W. GENTZSCH

Toward Democratization of HPC with Novel Software Containers

 

9:30 – 10:0

S. SHANKAR

Co-design 3.0 – Configurable Extreme Computing leveraging Moore’s Law for Real Applications

 

10:00 – 10:30

K. PUDENZ

Quantum Annealing and the Satisfiability Problem

 

10:30 – 11:00

D. KEYES

CFD Codes on Multicore and Manycore Architectures

 

11:00 – 11:30

COFFEE BREAK

 

11:30 – 12:00

W. TANG

Kinetic Turbulence Simulations on Top Supercomputers Worldwide

 

12:00 – 12:30

U. RUEDE

Lattice Boltzmann methods on the way to exa-scale

 

12:30 – 12:45

CONCLUDING REMARKS

 

 

Chairmen

 

 

SESSION I

 

Paul Messina

Argonne National Laboratory

Argonne, IL

USA

 

 

SESSION II

 

Wolfgang Gentzsch

The UberCloud

GERMANY

 

 

SESSION III

 

Gerhard Joubert

Technical University Clausthal

GERMANY

 

 

SESSION IV

 

Thomas Sterling

Indiana University

Bloomington, IN

USA

 

 

SESSION V

 

Thomas Sterling

Indiana University

Bloomington, IN

USA

 

 

SESSION VI

 

Peter Beckman

Argonne National Laboratory

Argonne, IL

USA

 

 

SESSION VII

 

Valerio Pascucci

University of Utah

and

Pacific Northwest National Laboratory

Salt Lake City, UT

USA

 

SESSION VIII

 

Craig A. Lee

The Aerospace Corporation

El Segundo, CA

USA

 

 

SESSION IX

 

David Keyes

King Abdullah University of Science and Technology

Thuwal

SAUDI ARABIA

 

 

SESSION X

 

Vladimir Getov

University of Westminster

London

U.K.

 

 

Panels

Panel Discussion 1

 

What is Capable Exascale Computing?

 

Chairman: P. Messina, Argonne National Laboratory, DOE, U.S.A.

 

 

Participants: P. Messina (Argonne National Laboratory), D. Keyes (King Abdullah University of Science and Technology), T. Lippert (Juelich Supercomputing Centre), S. Matsuoka (Tokyo Institute of Technology),

M. Parashar (Rutgers University), T. Sterling (Indiana University)

 

Exascale computing that is “capable” must provide more than hardware whose theoretical peak speed is one or more ExaFLOPS. The panel participants will provide their viewpoints on the features of a computing ecosystem that deserves the adjective “capable,” such as a robust software stack that supports a broad variety of applications, the ability to process vast data volumes, and reliable, affordable operations.

 

Back to Session VII

 

Panel Discussion 2

 

The Potential for Deep Learning to Harness Increasing Flows of Urban Data

 

Chairman: C. Catlett, Argonne National Laboratory, DOE, U.S.A.

 

 

Participants: C. Catlett (Argonne National Laboratory), J. Lane (New York University) B. Goldstein (University of Chicago), B. Bhaduri (Oak Ridge National Laboratory), E. Beinat (University of Salzburg), E. Cesario (National Research Council of Italy)

 

New approaches to data analysis, including machine learning and “deep learning” techniques, are gaining traction in many science areas ranging from computational biology to advanced manufacturing. Cities—both physical and human infrastructures and interconnected systems—provide many opportunities for the use of deep learning, particularly in conjunction with new data from sensor networks. For instance, could intelligent intersections employ deep learning to images and video in order to track “near misses” and adjust traffic signals in real time to improve safety? Concurrently, within cities are sources of data that are not open, but can be used internally. As an analog to predicting the failure of jet engines based on leading indicators, could deep learning techniques help cities and urban scientists to discover models that describe the interdependencies between economics, public safety, education, and other factors, leading to proactive rather than reactive urban planning? A fundamental question is what are the opportunities for exascale computing and deep learning to help to steer the present “smart city” and “Internet of Things” movements toward substantive, long-term urban challenges rather than the more common “smart city” focus areas of engineering and urban mechanics (such as reducing traffic congestion or improving parking).

 

Back to Session IX

 

 

 

 

Abstracts

 

Envisioning Human-in-the-loop Interactions with Massive Scientific Simulations and Experiments in the Age of Exascale HPC and Big Data

 

Jim Ahrens

Los Alamos National Laboratory, Los Alamos, NM, USA

 

This talk will describe a vision for interacting with massive scientific simulations and experiments to better understand underlying natural physical properties and process. Specific solutions that take advantage of advances in HPC and Big Data technology will be discussed.

Back to Session V

Exascale System and Node Architectures: The Summit and Beyond

 

James A. Ang

Exascale Computing Program, Center for Computing Research

Sandia National Laboratories, USA

 

The U.S. Advanced Simulation and Computing (ASC) program has been applying a co-design methodology for over five years. Our use of co-design has evolved along a continuum— from the early reactive approach, to the current proactive methodology, and towards a proposed transformative path. The goal of transformative co-design is to leverage an opportunity to develop future hardware and system architecture designs that are unconstrained by current technology roadmaps1. The HPC community has been working with proxy applications to represent how our real HPC applications use advanced architectures. Proxy applications are also a communication vehicle for helping vendors and system architects understand how our real applications perform. The Advanced Scientific Computing Research (ASCR) program has been funding the Computer Architecture Lab (CAL) project since 2013 to develop a new co-design communication vehicle. CAL is developing abstract machine models, and their associated proxy architectures to help the DOE application teams reason about and design their applications to map into advanced system and node architectures.2

 

On July 29, 2015, President Obama announced the U.S. National Strategic Computing Initiative (NSCI). As a part of this Presidential Directive, the U.S. DOE is launching the Exascale Computing Project (ECP) with direct support from both the ASC and ASCR programs. This project has four technical focus areas: Application Development, Software Technologies, Hardware Technologies, and Exascale Systems. Under Hardware Technology, a new effort has been launched called PathForward to support vendor-led system and node designs that offer the opportunity for “transformative” co-design in which the 2023 exascale system, node, and component designs are influenced by the needs of ECP applications and associated system software.

 

The U.S. DOE ECP has a goal of at least two diverse 2023 Exascale Systems. This goal may be viewed as The Summit. Overall, NSCI has goals that extend Beyond the Summit.3 In my presentation I will address both ECP goals and NSCI goals from both a technology and programmatic perspective.

 

1 http://nnsa.energy.gov/sites/default/files/nnsa/inlinefiles/ASC_Co-design.pdf , ASC Co-Design Strategy, J.A. Ang, T.T. Hoang, S.M. Kelly, A. McPherson, R. Neely, SAND 2015-9821R, February 2016.

2 http://www.cal-design.org/publications, Abstract Machine Models and Proxy Architectures for Exascale Computing, J.A. Ang, R.F. Barrett, R.E. Benner, D. Burke, C. Chan, D. Donofrio, S.D. Hammond, K.S. Hemmert, S.M. Kelly, H. Le, V.J. Leung, D.R. Resnick, A.F. Rodrigues, J. Shalf, D. Stark, D. Unat, N.J. Wright, May 2014.

3 Beyond the Summit, Skinner, Todd, Penguin Group Inc, New York, NY, USA October 2003.

 

Back to Session VI

Trends in System Architectures: Towards ”The Machine” and Beyond

 

Frank Baetke

HPC in Academia and Scientific Research

Hewlett Packard, Palo Alto, CA, USA

 

The talk will address trends in system architecture for HPC and will include related aspects of Big Data and IoT. A specific focus will be on innovative components like next generation memory interconnects, non-volatile memory and silicon photonics that play a key role in future system designs. HPE’s “The Machine” will be used to bring those components into the context of an actual system implementation. Related options and challenges at the level of system software, middleware and programming paradigms will also be addressed.

 

Back to Session II

 

 

Peter Beckman

Exascale Technology and Computing Institute

Argonne National Laboratory, USA

 

 

 

Back to Session

 

 

Isabel Beichl

National Institute of Standards and Technology, Gaithersberg, MD, USA

 

 

 

Back to Session

Collective Sensing and large-scale predictions: two case studies

 

Euro Beinat

University of Salzburg, Austria

 

We use the term “Collective Sensing” to describe the set of methods and tools used to analyze, describe and predict large-scale human dynamics based on the growing availability of digital transaction data (telecom, banking, transportation, sensors, social media). In the recent past an entire stream of literature has developed within and across disciplines with the aim of exploiting new data sources and data science methods to provide better ways to understand the collective behavior of cities, communities or economic sectors. The presentation positions collective sensing in this broad ecosystems and then focuses on two case studies. In the first, we explore the use of online learning algorithms for predicting the short-term location of entire populations. The method draws from sequential learning and leverages the history of millions of individuals, who are used as anonymous “experts” of each other's mobility, to improve individual predictability. The validation on one year of telecom data shows that the method significantly exceeds traditional prediction methods, especially in cases when the data history is short (the case of tourists, for instance). The second use case focuses on the prediction of road incidents on the basis of traffic and weather data. It describes data structuring and the design of three types of neural networks (deep learning, convolutional and LSTM) used for different types of predictions. The presentation illustrates the results of the networks after training on a 5-years dataset and how they outperform purely statistical predictions.

 

Back to Session IX

Landscape Dynamics, Geographic Data and Scalable Computing: The Oak Ridge Experience

 

Budhendra Bhaduri

Urban/GIS Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA

 

Understanding change through analysis and visualization of landscape processes often provide the most effective tool for decision support.

Analysis of disparate and dynamic geographic data provides an effective component of an information extraction framework for multi-level reasoning, query, and extraction of geospatial-temporal features. With increasing temporal resolution of geographic data, there is a compelling motivation to couple the powerful modeling and analytical capability of a GIS to perform spatial-temporal analysis and visualization on dynamic data streams.

However, the challenge in processing large volumes of high resolution earth observation and simulation data by traditional GIS has been compounded by the drive towards real-time applications and decision support. Drawing from our experience at Oak Ridge National Laboratory providing scientific and programmatic support for federal agencies, this presentation will highlight progress and challenges of some of the emerging computational approaches, including algorithms and high performance computing, illustrated with population and urban dynamics, sustainable energy and mobility, and climate change science.

 

Back to Session IX

Exascale by Co-Design Architecture

 

Gil Bloch

Mellanox Technologies, Sunnyvale, CA, USA

 

High performance computing has begun scaling beyond Petaflop performance towards the Exaflop mark. One of the major concerns throughout the development toward such performance capability is scalability – at the component level, system level, middleware and the application level. A Co-Design approach between the development of the software libraries and the underlying hardware can help to overcome those scalability issues and to enable a more efficient design approach towards the Exascale goal.

 

Back to Session V

HPC clusters as code in the [almost] infinite cloud

 

Brendan Bouffler

Scientific Computing Amazon Web Services, London, USA

 

HPC clusters have exploded in capability in the last decade leading to not only breakthrough discoveries like gravitational waves but also techniques that allow us to screen compounds for drug suitability and design better headlights for cars that reduce drag and silence cabin noise. HPC has become a tool that spans industry, research and education, and yet remains out of reach for many because owning a cluster often means a significant investment and complex integration.

 

In the cloud, however, not owning an HPC cluster can be one of the most productive ways to compute everything from the fluid dynamics of a milk bottle to the evolution of the universe, since clusters with specific purposes can be made available off-the-shelf and can be procured in just the right amounts of capacity. Coupled with access to lagre public datasets like those from earth observation satellites or genomic databases and it’s easy to imagine how HPC can become an even more common tool for humble and grand workloads alike.

In this talk we’ll discuss the technologies underpinning the AWS cloud, the scale at which we operate and how we build clusters on the fly, using all the same tools we’ve come to depend upon, and a few new ones to boot. We’ll show real examples from customers and partners who’ve broken new ground because of the agility the cloud offers.

 

Back to Session II

The Myth of a Converged Software Stack for HPC and Big Data

 

Ronald Brightwell

Sandia National Laboratories, Albuquerque, NM, USA

 

The notion that one operating system or a single software stack will support the emerging and future needs of the HPC and Big Data communities is a fantasy. There are many technical and non-technical reasons why functional partitioning through customized software stacks will continue to persist.

Rather than searching the ends of rainbows for a single software stack that satisfies a diverse and competing set of requirements, approaches that enable the use and integration of multiple software stacks should be pursued. This talk will describe the challenges that motivate the need to support multiple concurrent software stacks for enabling application composition, more complex application workflows, and a potentially richer set of usage models for extreme-scale HPC systems. The Hobbes project led by Sandia National Laboratories is exploring operating system infrastructure for supporting multiple concurrent software stacks. This talk will describe this infrastructure and discuss issues that motivate future exploration.

 

Back to Session IV

A Proposed Exascale Agenda for Urban Sciences

 

Charlie Catlett

Math & Computer Science Div., Argonne National Laboratory

& Computation Institute of The University of Chicago, Chicago, IL, USA

 

Urbanization is one of the great challenges and opportunities of this century, inextricably tied to global challenges ranging from climate change to sustainable use of energy and other natural resources, and from personal health and safety to accelerating innovation in metropolitan communities. Enabling science- and evidence-based urban design, policy, and operation will require discovery, characterization, and quantification of the interdependencies between major metropolitan sectors. Many of these sectors or systems are modeled individually, but in order to optimize the design, planned evolution, and operation of cities it is essential that we quantify and understand how they interact. Coupled multi-scale models will be essential for this discovery process. We will discuss the concept of a general framework for such coupled models, highlighting several example coupled systems as well as the challenges of integrating data from sensor networks.

 

Back to Session IX

 

 

Eugenio Cesario

National Research Council of Italy, Italy

 

 

 

Back to Session

Homogeneous authorization policies in heterogeneous IAAS clouds

 

David Chadwick, Carlos Ferraz and Ioram Sette

University of Kent, Canterbury, United Kingdom

 

How can a tenant administrator of multiple cloud accounts set and enforce a single authorisation policy throughout a multi-cloud infrastructure?

In this presentation we will describe the solution we have designed and implemented, using OpenStack and Amazon clouds as exemplars. We propose a Federated Authorisation Policy Management Service (FAPManS), which holds the global authorisation policy in DNF, along with the authorisation ontology and rules for mapping from the global policy terms to cloud specific ones. Policy adaptors convert the global policy into local ones, so that each cloud system keeps its existing authorisation mechanism without needing to change it. A publish-subscribe mechanism ensures the policies are synchronized. We will conclude by listing the strengths and weaknesses of our approach, and where further work still needs to be done.

 

Back to Session VII

Generic Packet Processing Unit a novel way to implement low cost and efficient FPGA computing

 

Marcello Coppola

STMicroelectronics, Grenoble, FRANCE

 

These heterogeneous systems are now widely used in many computing markets including consumer, HPC, automotive, Networking. Using Heterogeneous computing it is possible to improve the performance power tradeoffs compared to the homogeneous solutions. As matter of the fact these systems can offload computation kernels towards specific island of computation. This presentation describes the novel technology, called Generic Packet Processor Unit (GPPU), for offloading (or “dispatching”) computation kernels from the host processor to the FPGA computation islands in a more efficient manner. The GPPU infrastructure (composed by the Generic Packet Processor Unit (GPPU) hardware module and the AQLSM software runtime) in contrast to traditional based approaches that require explicit data movement, enables FPGA computing islands to operate on the same virtual address space of the host processor. In addition during the presentation, we show how the GPPU allows the Host processor to schedule work to the target FPGA in a smart and efficient way by removing the operating system overhead. This talk will conclude showing how the usage of GPPU into heterogeneous systems will enhance programmability and further improve latency of critical operations.

 

Back to Session III

Semantic Technologies to support Cloud Applications’ Portability and Interoperability on Multiple Clouds

 

Beniamino Di Martino

Department of Industrial and Information Engineering

University of Naples 2, Italy

 

Cloud vendor lock-in and interoperability gaps arise (among many reasons) when semantics of resources and services, and of Application Programming Interfaces is not shared. Standards and techniques borrowed from SOA and Semantic Web

Services areas might help in gaining shared, machine readable description of Cloud offerings (resources, Services at Platform and Application level, and their API groundings), thus allowing automatic discovery, matchmaking, and thus supporting selection, brokering, interoperability end composition of Cloud Services among multiple Clouds.

This talk will in particular illustrate the outcomes of the mOSAIC EU funded project (http://www.mosaic-cloud.eu): a Cloud Ontology, a Semantic Engine, a Dynamic Semantic Discovery System, and a uniform semantic representation of Cloud Services and Cloud Patterns, agnostic and vendor specific.

 

Back to Session VII

An Overview of HPC and the Changing Rules at Exascale

 

Jack Dongarra

Innovative Computing Laboratory, Computer Science Dept.

University of Tennessee, Knoxville, IL, USA

 

In this talk we will look at the current state of high performance computing and look at the next stage of extreme computing. With extreme computing there will be fundamental changes in the character of floating point arithmetic and data movement. In this talk we will look at how extreme scale computing has caused algorithm and software developers to changed their way of thinking on how to implement and program certain applications.

 

Back to Session I

Preparing Applications for Next Generation Architectures

 

Sudip S. Dosanjh

National Energy Research Scientific Computing Center

Lawrence Berkeley National Laboratory, Berkeley, CA, USA

 

NERSC’s primary mission is to accelerate scientific discovery at the DOE Office of Science through high performance computing and data analysis. NERSC supports the largest and most diverse research community of any computing facility within the DOE complex, providing large-scale, state-of-the-art computing for DOE’S unclassified research programs in alternative energy sources, climate change, environmental science, materials research, astrophysics and other science areas related to DOE’s science mission.

NERSC’s next supercomputer, Cori, is being deployed in 2016 in Berkeley Laboratory’s new Computational Research and Theory (CRT) Facility. Cori will include over 9300 manycore Intel Knight’s Landing processors, which introduce several technological advances, including higher intra-node parallelism; high-bandwidth, on-package memory; and longer hardware vector lengths. These enhanced features are expected to yield significant performance improvements for applications running on Cori. In order to take advantage of the new features, however, application developers will need to make code modifications because many of today’s applications are not optimized to take advantage of the manycore architecture and on-package memory.

To help users transition to the new architecture, in 2014 NERSC established the NERSC Exascale Scientific Applications Program (NESAP). Through NESAP, several code projects are collaborating with NERSC, Cray and Intel with access to early hardware, special training and “deep dive” sessions with Intel and Cray staff. Eight of the chosen projects also will be working with a postdoctoral researcher to investigate computational science issues associated with manycore systems. The NESAP projects span a range of scientific fields—including astrophysics, genomics, materials science, climate and weather modeling, plasma fusion physics and accelerator science—and represent a significant portion of NERSC’s current and projected computational workload.

Cori will include many enhancements to enable a rapidly growing extreme data science workload at NERSC. Cori will have a 1600 Intel® Haswell processor partition with larger memory nodes to enable extreme data analysis. A fast internet connection will let users stream data from experimental and observational facilities directly into the system. A “Burst Buffer”, a 1.5 Petabyte layer of NVRAM, will help accelerate I/O. Cori will also include a number of software enhancements to enable complex workflows.

For the longer term we are investigating whether a single system can meet the simulation and data analysis requirements of our users. For example, we are adding a genome assembly miniapp (Meraculous) to our benchmark suite and we are considering adding one for genome alignment (Blast). We are also investigating how data intensive workflows (e.g., cosmology and genomics) differ from our simulation workloads.

 

Back to Session IV

Accelerating discovery with science services

 

Ian Foster

Math & Computer Science Div., Argonne National Laboratory

& Dept of Computer Science, The University of Chicago, Chicago, IL, USA

 

Ever more data- and compute-intensive science makes computing increasingly important for research. But for the benefits of advanced computing to accrue to more than the scientific 1%, we need new delivery methods that slash access costs and new platform capabilities to accelerate the development of interoperable tools and services. In this talk, I describe a set of such new methods and report on experiences with their development and application. Specifically, I describe how software-as-a-service methods can be used to move complex and time-consuming research IT tasks out of the lab and into the cloud, thus greatly reducing the expertise and resources required to use them. I also describe how a new class of platform services can accelerate the development and use of an integrated ecosystem of advanced science applications, thus enabling access to powerful data and compute resources by many more people than is possible today.

 

Back to Session I

Application and Software Classifications that motivate Big Data and Big Simulation Convergence

 

Geoffrey Fox

Community Grid Computing Laboratory, Indiana University, Bloomington, IN, USA

 

We combine NAS Parallel Benchmarks, Berkeley Dwarfs, the Computational Giants of NRC Massive Data Analysis Report and the NIST Big Data use cases to get an application classification -- the convergence diamonds that links Big Data and Big Simulation in a unified framework. We combine this with High Performance Computing enhanced Apache Big Data software Stack HPC-ABDS and suggest a simple approach to computing systems that support data management, analytics, visualization and simulations without sacrificing performance. We describe a set of "software defined" application exemplars using an Ansible DevOps tool that we are producing.

 

Back to Session IV

Implementing parts of HPC-ABDS in a multi-disciplinary collaboration

 

Geoffrey Fox

Community Grid Computing Laboratory, Indiana University, Bloomington, IN, USA

 

We introduce the High Performance Computing enhanced Apache Big Data software Stack HPC-ABDS and give several examples of advantageously linking HPC and ABDS. In particular we discuss a Scalable Parallel Interoperable Data Analytics Library SPIDAL that is being developed to embody these ideas and is the HPC-ABDS instantiation of well known Apache libraries Mahout and MLlib. SPIDAL covers some core machine learning, image processing, graph, simulation data analysis and network science kernels. It is a collaboration between teams at Arizona, Emory, Indiana (lead), Kansas, Rutgers, Virginia Tech, and Utah universities.

We give examples of data analytics running on HPC systems including details on persuading Java to run fast.

 

Back to Session VIII

Toward Democratization of HPC with Novel Software Containers

 

Wolfgang Gentzsch

The UberCloud, Germany

 

Countless case studies demonstrate impressively the importance of High Performance Computing (HPC) for engineering insight, product innovation, and market competitiveness. But so far HPC was mostly in the hands of a relatively small elite crowd, not easily accessible by the large majority of scientists and engineers. In this presentation we argue that – despite the ever increasing complexity of HPC tools, hardware, and system components – engineers have never been this close to ubiquitous HPC, as a common tool, for everyone. The main reason for this next progress can be seen in the continuous advance of HPC software tools which assist enormously in the design, development, and optimization of manufacturing products and scientific research. Now, we believe that the next big step towards ubiquitous HPC will be made very soon with new software container technology which will dramatically facilitate software packageability and portability, ease the access and use, and simplify software maintenance and support, and which finally will pass HPC into the hands of every engineer.

During the past two years UberCloud has successfully built HPC containers for application software from ANSYS, CD-adapco STAR-CCM+, COMSOL Multiphysics, NICE DCV, Numeca FINE/Marine and FINE/Turbo, OpenFOAM, PSPP, Red Cedar HEEDS, Scilab, Gromacs, and more. These application containers are now running on cloud resources from Advania, Amazon AWS, CPU 24/7, Microsoft Azure, Nephoscale, OzenCloud, and others. In this presentation we will present the concept and benefits of these novel software containers for engineering and scientific application, and present a live and interactive demo with an engineering application in a cloud container.

 

Back to Session X

Application-Specific Energy Modeling of Multi-Core Processors

 

Vladimir Getov

Department of Engineering, Faculty of Science and Technology

University of Westminster, London, United Kingdom

 

During the last decade, further developments of computer architecture and microprocessor hardware have been hitting the so-called “energy wall” because of their excessive demands for more energy. Subsequently, we have been ushering in a new era with electric power and temperature as the primary concerns for scalable computing. For several years, reducing significantly the energy consumption for data processing and movement has been the most important challenge towards achieving higher computer performance at exascale level and beyond. This is a very difficult and complex problem which requires revolutionary disruptive methods with a stronger integration among hardware features, system software and applications. Equally important are the capabilities for fine-grained spatial and temporal instrumentation, measurement and optimization, in order to facilitate energy-efficient computing across all layers of current and future computer systems. Moreover, the interplay between power, temperature and performance add another layer of complexity to this already difficult group of challenges.

Existing approaches for energy efficient computing rely heavily on power efficient hardware in isolation which is far from acceptable for the emerging challenges. Furthermore, hardware techniques, like dynamic voltage and frequency scaling, are often limited by their granularity (very coarse power management) or by their scope (a very limited system view). More specifically, recent developments of multi-core processors recognize energy monitoring and tuning as one of the main challenges towards achieving higher performance, given the growing power and temperature constraints. To address these challenges, one needs both suitable energy abstraction and corresponding instrumentation which are amongst the core topics of ongoing research and development work.

Since current methodologies and tools are limited by hardware capabilities and their lack of information about the application code, a promising approach is to consider together the characteristics of both the processor and the application-specific workload. Indeed, it is pivotal for hardware to expose mechanisms for optimizing dynamically consumed power and thermal energy for various workloads and for reducing data motion, a major component of energy use. Therefore, our abstract model is based on application-specific parameters such as power consumption, execution time, and equilibrium temperature as well as hardware-specific parameters such as half time for thermal rise or fall. Building upon this recent work, the ongoing and future research efforts involve the development of a novel tuning methodology and the evaluation of its advantages on real use cases. Experimental results demonstrate the efficient use of the model for analyzing and improving significantly the application-specific balance between power, temperature and performance.

 

Back to Session III

The New Code of Ethics: Justice and Transparency in the Age of Big Data and Deep Learning

 

Brett Goldstein

University of Chicago, USA

 

As the world is increasingly shaped by algorithms and machine learning, the underlying code and data also reflects the world around it — including the biases and discrimination that are embedded in society.

Brett Goldstein explores the the broader implications of the black box techniques and methods that are prevalent in data-driven decision making in fields from law enforcement to product marketing, with a particular focus on the implications of streaming sensor data and the internet of things. Drawing on experiences from his time as the Chief Data Officer for the City of Chicago and pioneering predictive crime analytics with the Chicago Police Department, he suggests a framework for thinking about these challenges based on transparency, awareness, and minimizing bias rather than seeking to eliminate it completely. He outlines a vision for proactive management of the concerns that come in an high-performance computing and algorithm-rich environment.

 

Back to Session IX

Using Modern C++ with Multi-Staging for Unified Programming of GPU Systems

 

Sergei Gorlatch

Universitaet Muenster, Institut für Informatik, Muenster, Germany

 

Writing and optimizing programs on systems with Graphics Processing Units (GPUs) remains a challenging task even for expert programmers.

 We present PACXX -- our approach to GPU programming using exclusively C++, with the convenient features of modern C++14 standard: type deduction, lambda expressions, and algorithms from the standard template library (STL). Using PACXX, a GPU program is written as a single C++ program, rather than two distinct host and kernel programs as in OpenCL or CUDA. We extend PACXX with an easy-to-use and type-safe API for multi-stage programming that allows for optimizations during code generation. Using just-in-time compilation techniques, PACXX generates efficient GPU code at runtime.

 Our evaluation shows that using PACXX allows for writing GPU code easier and safer than currently possible in CUDA or OpenCL, and that multi-stage programs can significantly outperform equivalent non-staged versions. Furthermore, we show that PACXX generates code with high performance, comparable to industrial-strength OpenCL compilers.

 

Back to Session III

Progress in automatic GPU compilation and why you want to run MPI on your GPU

 

Torsten Hoefler

Scalable Parallel Computing Lab., Computer Science Department

ETH Zurich, Zurich, SWITZERLAND

 

Auto-parallelization of programs that have not been developed with parallelism in mind is one of the holy grails in computer science.

It requires understanding the source code's data flow to automatically distribute the data, parallelize the computations, and infer synchronizations where necessary. We will discuss our new LLVM-based research compiler Polly-ACC that enables automatic compilation to accelerator devices such as GPUs. Unfortunately, its applicability is limited to codes for which the iteration space and all accesses can be described as affine functions. In the second part of the talk, we will discuss dCUDA, a way to express parallel codes in MPI-RMA, a well-known communication library, to map them automatically to GPU clusters. The dCUDA approach enables simple and portable programming across heterogeneous devices due to programmer-specified locality. Furthermore, dCUDA enables hardware-supported overlap of computation and communication and is applicable to next-generation technologies such as NVLINK. We will demonstrate encouraging initial results and show limitations of current devices in order to start a discussion.

 

Back to Session V

Big Data Analytics on Vector Processor

 

Takeo Hosomi

System Platform Research Laboratories, NEC, Kanagawa, JAPAN

 

Almost every industry start utilizing the power of Big Data analytics. And IoT enables to apply those Big Data analytics to the physical systems by collecting sensor data form the systems. In those analytics, there are a lot of compute-intensive applications, such as image and video processing, data mining, and simulations.

NEC has developed SX series vector supercomputers for HPC markets. And is developing a next generation vector processor for both HPC and Big Data applications.

In this talk, I will show some performance evaluation results of analytics applications on the current system. And I also present a concept of NEC’s next generation vector supercomputer called ‘Aurora’.

 

Back to Session II

Big Data and The Internet of Important Things

 

Carl Kesselman

Information Sciences Institute, University of Southern California

Marina del Rey, Los Angeles, CA, USA

 

In the early days of Grid computing, the integration of high-volume data producing instruments, such as the Advanced Photon Source into an integrated computational pipeline was investigated and prototype systems were produced that combined data production with data analysis. Now, with the emergence of diverse types of scientific instruments and cloud computing, it is time to revisit these combined observational and computational systems. In this talk we consider an “internet of important things” which consists of many diverse network connected instruments that are dynamically interconnected and produce potentially large amounts of data that must be securely and reliable obtained, combined, analyzed and disseminated. I will describe some of the basic problems that must be addressed, and provide an overview of a platform for managing the acquisition and management of these data.

 

Back to Session I

CFD Codes on Multicore and Manycore Architectures

 

David Keyes

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

 

Weak scaling over distributed memory is well established for structured and unstructured CFD simulations, as evidenced by (among other achievements of the CFD community) Gordon Bell Prizes over the decades. Strong scaling over shared memory to exploit multicore and manycore architectures is less satisfactory to date. In this talk, we report on separate campaigns to port to Intel multicore and manycore environments successors to a pair of CFD codes that shared the 1999 Gordon Bell Prize. Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler flow code is evaluated on Ivy Bridge, Haswell, and KNC. We explore several thread-level optimizations to improve flux kernel performance. In addition, a geometry-simplified fork of a widely employed spectral element Navier–Stokes code, Nek5000, has been co-designed with many algorithmic and implementation innovations for the multicore and manycore environments, using very high order elements, resolving duct flow at a record-high Reynolds number 100,000. We emphasize features of the computations at the algorithm-architecture interface.

 

Back to Session X

Borrowing Concepts from Social Media to Enable Integration of Large-Scale Sensitive Data Sets

 

Julia Lane

Wagner School, Center for Urban Science and Progress

New York University, New York, NY, USA

 

Enabling access to data is a fundamental first step to deriving value from their content. That principle has been the driving force for much of the Open Data movement across national and city governments, and resulted in a great deal of citizen engagement. Access to large scale and sensitive data on human beings is much more restricted. As a result there is much less understanding of the availability and content of the datasets. This presentation describes ways to improve that understanding that apply gamification approaches used in social media. It provides a concrete example based on work with federal and city administrative data.

 

Back to Session IX

Update on a Keystone-based General Federation Agent

 

Craig Lee

Computer Systems Research Dept., The Aerospace Corporation, El Segundo, CA USA

 

Cloud federation is an instance of general federation. That is to say, managing a federation of cloud services is essentially no different from managing a set of arbitrary, application-level services. While Keystone (the OpenStack security service) was built to manage access to a set of local OpenStack cloud services, it is actually quite amendable to managing arbitrary federations. This talk presents our current work on using Keystone as a general federation agent for arbitrary federations.

 

Back to Session VII

 

 

Thomas Lippert

Juelich Supercomputing Centre, Forschungszentrum Juelich, Germany

 

 

 

Back to Session

Convergence of HPC and Bigdata

 

Yutong Lu

School of Computer Science

National University of Defense Technology

China

 

Nowadays, advanced computing and visualizing tools are facilitating scientists and engineers to perform virtual experiments and analyze large-scale datasets. Computing-driven and Bigdata-driven scientific discovery has become a necessary approach in global environment, life science, nano-materials, high energy physics and other fields. Furthermore, the fast increasing computing requirements from economic and social development also call for the birth of the Exascale system. This talk will discuss the convergence of the HPC and Bigdata on Tianhe2 system.

 

Back to Session VIII

Towards a Continuous Description of Compute and Idle Phases in Scientific Parallel Applications

 

Stefano Markidis

KTH Royal Institute of Technology, Stockholm, Sweden

 

Parallel scientific applications distribute their workload among several processes that compute, communicate to and synchronize with other processes. Compute and idle phases alternate in parallel applications. Idle periods on a process are often generated by other late processes from which data is expected to be received.

In this talk, we show that a scientific application can be considered as a continuous medium in the limit of a very large number of processes as on current petascale and future exascale supercomputers. Such a medium supports the propagation of idle periods among processes in the same way air supports acoustic waves. We formulate an equation to characterize the idle period propagation in parallel applications. Its characteristic propagation velocity is determined by both network parameters (latency and bandwidth) and application characteristics (average compute time).

This work poses the basis for understanding how local process imbalance impacts globally the overall application performance. It highlights the implication of point-to-point communication among processes. We suggest that many cases of unexpected performance degradation of parallel scientific applications can be explained in terms of propagating idle periods.

 

Back to Session V

Consumable Analytics for Big Data

 

Patrick Martin

School of Computing, Queen’s University, Kingston, Ontario, Canada

 

Consumable analytics attempt to address the shortage of skilled data analysts in many organizations by offering analytic functionality in a form more familiar to employees. Providing consumable analytics for big data faces three main challenges: the large volumes of data require efficient distributed algorithms; the analytics must offer an easy interface to allow in-house experts to use these algorithms while minimizing the learning cycle and existing code rewrites, and the analytics must work on data of different formats stored on heterogeneous data stores.

The talk will explore the above challenges and will give an overview of QDrill, which is a system we are developing to provide consumable analytics for big data. QDrill extends Apache Drill, a schema-free SQL query engine for cloud storage, in two ways. The first is the inclusion of facilities to integrate any sequential single-node data mining library into Drill and run its algorithms in a distributed fashion from within the Drill SQL statements. The second is the Distributed Analytics Query Language (DAQL), which provides users with a familiar SQL interface to train and use analytical models.

 

Back to Session VIII

From FLOPS to BYTES: Distruptive End of Moore’s Law beyond Exascale

 

Satoshi Matsuoka

Global Scientific Information and Computing Center

& Department of Mathematical and Computing Sciences

Tokyo Institute of Technology, Japan

 

The so-called “Moore’s Law”, by which the performance of the processors will increase exponentially by factor of 4 every 3 years or so, is slated to be ending in 10-15 year timeframe due to the lithography of VLSIs reaching its limits around that time, and combined with other physical factors. This is largely due to the transistor power becoming largely constant, and as a result, means to sustain continuous performance increase must be sought otherwise than increasing the clock rate or the number of floating point units in the chips, i.e., increase in the FLOPS. The promising new parameter in place of the transistor count is the perceived increase in the capacity and bandwidth of storage, driven by device, architectural, as well as packaging innovations: DRAM-alternative Non-Volatile Memory (NVM) devices, 3-D memory and logic stacking evolving from VIAs to direct silicone stacking, as well as next-generation terabit optics and networks. The overall effect of this is that, the trend to increase the computational intensity as advocated today will no longer result in performance increase, but rather, exploiting the memory and bandwidth capacities will instead be the right methodology. However, such shift in compute-vs-data tradeoffs would not exactly be return to the old vector days, since other physical factors such as latency will not change when spatial communication is involved in X-Y directions. Such conversion of performance metrics from FLOPS to BYTES could lead to disruptive alterations on how the computing system, both hardware and software, would be evolving towards the future.

 

Back to Session I

A Path to Exascale

 

Paul Messina

DOE Exascale Computing Project, Argonne National Laboratory, Argonne, IL, USA

 

 

 

Back to Session

Topology, Application and User Behavior Aware Job Resource Management in Multidimensional Torus-Based HPC Systems

 

Jarek Nabrzyski

Department of Computer Science and Engineering, University of Notre Dame

& Center for Research Computing

& Great Lakes Consortium for Petascale Computation

Notre Dame, Indiana, USA

 

Communication networks in recent machines often have multidimensional torus topologies, which influences the way jobs should be schedule in the system. This is the case of such systems as Cray XE/XK with 3D torus, BlueGene/Q with 5D, and the K computer with 6D torus. For example, BlueGene allows allocating network links exclusively to the selected jobs to optimize their performance, but it can leave unused nodes within the system partitions, which leads to a lower system utilization. On Blue Waters, in order to improve the application performance and runtime consistency, the system adopts a contiguous allocation strategy. Each job is allocated a convex prism, which reduces job-to-job interference and improves job performance, but it degrades the system utilization. On the other hand, pure non-contiguous allocation causes job performance to go down due to communication interference and increased latency, which can lead to a substantial variability of runtime. These reasons motivate our research in which we develop and investigate various topology-aware scheduling algorithms and strategies for the mapping of application to machine. New scheduling methods that take an advantage of job and system topology, user behavior and application communication patterns will be presented. Experimental results based on Blue Waters traces will demonstrate the need for such new schedulers.

 

Back to Session III

High Performance Analytics Services and Infrastructures for Addressing Global Changes: the GEOSS perspective

 

Stefano Nativi

National Resarch Council of Italy, Italy

 

The presentation deals with the role of High Performance Analytics services and the related Infrastructures to empower advanced platforms dedicated to the Earth System Science and in particular to study the Global Changes effects. The GEOSS (Global Earth Observation System of System) experience and viewpoint will be discussed.

 

Back to Session VII

Big Data Challenges in Simulation-based Science

 

Manish Parashar

Dept. of Computer Science, Rutgers University, Piscataway, NJ, USA

 

Data-related challenges are quickly dominating computational and data-enabled sciences, and are limiting the potential impact of scientific applications enabled by current and emerging extreme scale, high-performance computing environments. These data-intensive application workflows involve dynamic coordination, interactions and data coupling between multiple application processes that run at scale on different resources, and with services for monitoring, analysis and visualization and archiving, and present challenges due to increasing data volumes and complex data-coupling patterns, system energy constraints, increasing failure rates, etc. In this talk I will explore data grand challenges in simulation-based science and investigate how solutions based on data sharing abstractions, managed data pipelines, in-memory data-staging, in-situ placement and execution, and in-transit data processing can be used to address these data challenges at extreme scales.

 

Back to Session VIII

Extreme Data Management Analysis and Visualization for Exascale Supercomputers

 

Valerio Pascucci

University of Utah, Center for Extreme Data Management, Analysis and Visualization, Scientific Computing and Imaging Institute

School of Computing

& Pacific Northwest National Laboratory, USA

 

Effective use of data management techniques for analysis and visualization of massive scientific data is a crucial ingredient for the success of any supercomputing center and cyberinfrastructure for data-intensive scientific investigation. In the progress towards exascale computing, the data movement challenges have fostered innovation leading to complex streaming workflows that take advantage of any data processing opportunity arising while the data is in motion.

In this talk I will present a number of techniques developed at the Center for Extreme Data Management Analysis and Visualization (CEDMAV) that allow to build a scalable data movement infrastructure for fast I/O while organizing the data in a way that makes it immediately accessible for analytics and visualization. In addition, I will present a topological analytics framework that allows processing data in-situ and achieve massive data reductions while maintaining the ability to explore the full parameter space for feature selection.

Overall, this leads to a flexible data streaming workflow that allows working with massive simulation models without compromising the interactive nature of the exploratory process that is characteristic of the most effective data analytics and visualization environment.

 

Back to Session VIII

Convergence of Memory and Computing

 

Stephen Pawlowksi

Advanced Computing Solutions, Micron Technology, Portland, OR, USA

 

This talk will discuss the current trends of next generation memory development. Based on this development, memory and logic are becoming more dependent on each other for correct operation. Given this linkage, opportunities for driving computing near memory and computing in memory are once again starting to emerge and be realized.

 

Back to Session VI

Quantum Annealing and the Satisfiability Problem

 

Kristen Pudenz

Quantum Applications Engineering, Lockheed Martin, Fort Worth, TX, USA

 

The utility of satisfiability (SAT) as an application focused hard computational problem is well established. We explore the potential of quantum annealing to enhance classical SAT solving, especially where sampling from the space of all possible solutions is of interest. We address the formulation of SAT problems to make them suitable for commercial quantum annealers, practical concerns in their implementation, and how the performance of the resulting quantum solver compares to and complements classical SAT solvers.

 

Back to Session X

Convergence of HPC and Clouds for Large-Scale Data Enabled Science

 

Judy Qiu

School of Informatics and Computing

& Pervasive Technology Institute, Indiana University, USA

 

Scientific discovery via advances in computational science and data analytics is an ongoing national priority. A corresponding challenge is to sustain the research, development and deployment of the High Performance Computing (HPC) infrastructure needed to enable those discoveries. Early cloud data centers are being redesigned with new technologies to better support massive data analytics and machine learning. Programming models and tools are one point of divergence between the scientific computing and big data ecosystems. Maturing of Cloud software around the Apache Big Data Stack (ABDS) has gained striking community support while there is continued progress in HPCspanning up to exascale. Analysis of Big Data use cases identifies the need forHPC technologies in ABDS. Deep learning, using GPU clusters, is a clear example. But many machine learning algorithms also need iteration, high performance communication and other HPC optimizations. This rapid change in technology has further implications for research and education.

Our research has concentrated on runtime and data management to supportHPC-ABDS, evolving from standalone systems to modules that can be used within existing software ecosystems. This work has been driven by applications from bioinformatics, computer vision, network science and analysis of simulations. We show promising results from this approach of reusing HPC-ABDS to enhance three well-known Apache systems (Hadoop, Storm andHBase) and construct what I termed Data-Enabled Discovery Environments forScience and Engineering (DEDESE). Our architecture is based on using Map-Collective and Map-Streaming computation models for an integrated solution to handle large data size, complexity and speed. This is illustrated by our Harp plug-in for Hadoop, which can run K-means, Graph Layout, and Multidimensional Scaling algorithms with realistic application datasets over 4096 cores on the IU Big Red II Supercomputer and Intel’s Xeon architectures while achieving linear speedup. Future goals include an efficient data analysis library where we are already looking at Latent Dirichlet Allocation topic model onwikipeida data and subgraph isomorphism algorithms on networks. Our preliminary results show that model data parallelism extends our understanding of distributed and parallel computation to further advancements in handling high dimensional model data and speed of convergence. This findings will hopefully increase interest in using HPC machines for Big Data problems and we will continue to collaborate with national centers in exploring the computational capabilities and their scientific applications.

 

Short Bio: Judy Qiu is an assistant professor of Computer Science at Indiana University. Her general area of research is in data-intensive computing at the intersection of Cloud and HPC multicore technologies. This includes a specialization on programming models that support iterative computation, ranging from storage to analysis which can scalably execute data intensive applications. Her research has been funded by NSF, NIH, Microsoft, Google, Intel and Indiana University. Judy Qiu leads a new Intel Parallel Computing Center (IPCC) site at IU. She is the recipient of a NSF CAREER Award in 2012, Indiana University Trustees Award for Teaching Excellence in 2013-2014, and Indiana University Outstanding Junior Faculty Award in 2015.

 

Back to Session VII

Lattice Boltzmann methods on the way to exa-scale

 

Ulrich Ruede

Lehrstuhl fuer Simulation, Universitaet Erlangen-Nuernberg, Germany

 

Lattice Boltzman methods have become popular as an alternative method to simulate complex flows. One of their strengths is in simulating multiphase systems such as bubbly flows or foams.

When coupled with methods to model granular objects, the LBM can also be used to simulate fluids with a suspended particulate phase.

In this talk we will report on our scalable, adaptive LBM implementation that can

reach up to a trillion (10^12) fluid cells on current Peta-Scale supercomputers.

The practical relevance of these methods will be illustrated with simulations of an additive manufacturing (3D-printing) process.

 

Back to Session X

Role of Optical Interconnects in Extreme Scale Computing

 

Sébastien Rumley

Lightwave Research Laboratory, Department of Electrical Engineering

School of Engineering and Applied Science, Columbia University, USA

 

Improving interconnect performance is one of the key part of meeting the Exascale challenge. With the core count heading toward the billion, and the node count teasing the hundred thousand mark, the amount of data offloaded onto the interconnect every second is massive. There is a high risk of seeing an even more massive amount of energy dissipated by the interconnect if current technology is simply scaled. Furthermore, besides the energy constraint, multi-core packages will eventually see their off-chip bandwidth limited by pin-out limitations, which may cause a dramatic decrease in computing efficiency.

Photonic technologies are among the best placed to alleviate these limitations. Meter scale communications in current supercomputer rely on optical cables already, yet substantial progresses can be achieved to better integrate these links within the compute node structure. Hence, CMOS compatible Silicon Photonics optical devices can typically be integrated along with the cores and caches among the same die. Such a deeper integration will not only curb the energy dissipations. It may also trigger important changes in node architectures. In particular, the organization of memory hierarchies might get totally reviewed if very large amounts of memory, not necessarily co-packaged with the main compute die, can be accessed in a high-bandwidth and energy efficient way.

Next to improved integration of photonic end-points, transparent photonic switching can also be leveraged to increase interconnect flexibility at low cost, or to offload regular electronic packet routers. Yet photonic switching is subject to very distinct rules and constraints, and its insertion in very large scale architectures must be carefully engineered to be advantageous.

In this talk, we review the prospects of integrated Photonics. Main figure of merits of future on-chip embarked optical transceivers and of optical switches will be introduced. In light of these results, potential upcoming changes in interconnect and node architectures will be sketched and discussed.

 

Back to Session VI

 

 

Thomas Schulthess

CSCS Swiss National Supercomputing Centre, Lugano & ETH, Zurich, Switzerland

 

 

Back to Session

Open Source HPC Hardware

 

John Shalf

Lawrence Berkeley National Laboratory, Computing Research Division

& National Energy Research Supercomputing Center, USA

 

We are facing a possible end to Moore’s Law in the coming decade as photolithography reaches close to atomic scales — challenging future technology scaling for computing systems. Already the 10nm is under-performing and 7-5nm technology nodes may be delayed indefinitely due to lack of a compelling performance or economic advantage. Specialization is one of the additional tools in our toolbox to further increase energy efficiency in the face of flagging technology scaling improvements. However, custom hardware (even using FPGAs) has prohibitively high design and verification costs that have kept it in the margins for decades. The design costs for digital logic MUST be brought down dramatically in order for specialization to be cost-effective and agile.

Recent advances in Domain-Specific-Languages for hardware generation and other Hardware Description Languages (HDLs) have reduced the barriers to hardware design. Whereas a new processor core has typically taken hundreds of engineers several years to design and verify, HDLs like Chisel has enabled a small team of engineers over 12 months to implement a family of RISC-V processor cores that offer performance and energy efficiency and area efficiency that is competitive or even superior to commercial offerings. ‘This demonstrate the power of these emerging HDLs and the promise of an open-hardware ecosystem that is built upon this infrastructure.

 

We have embarked upon a program to create all of the basic element required for an open-source many-core chip architecture. OpenSOC uses the Chisel HDL to automate generation of large scale on-chip Network-on-Chip that integrates together processor cores, memory controllers, and other peripherals and specialized accelerators into a integrated System-on-Chip package. This approach takes the first steps towards an open ecosystem for hardware design and specialization and provide the underpinnings for another decade or more of technology scaling without the benefit of lithographic improvements.

 

Back to Session III

Exascale will be successful by 2025...

            . . . and then what?

 

John Shalf

Lawrence Berkeley National Laboratory, Computing Research Division

& National Energy Research Supercomputing Center, USA

 

The US exascale project has made dramatic strides forward in organizing a more detailed plan for a holistic strategy for delivering a productive exasacle system by 2023. Many challenges remain to deliver useful application performance and the scientific breakthroughs that justify the investment, but the roadmap has become much clearer. The investment in exascale, however, is not the end of the line. It is meant to deliver a “first of kind system” (not just a one-off technical achievement). What happens AFTER the program ends in 2025?

This talk provides an updated view of what a 2023 system might look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also will discuss the tapering of historical improvements in lithography that coincide with the completion of the exascale project and how that might affect the roadmap beyond 2025. What options are available to continue scaling of successors to the first exascale machine. Will 2025 see a first-of-kind system, or will it arrive just in time for a new computing technology upheaval?

 

Back to Session VI

Co-design 3.0 – Configurable Extreme Computing leveraging Moore’s Law for Real Applications

 

Sadasivan Shankar

Harvard University, School of Engineering and Applied Sciences

Cambridge, MA, USA

 

In this talk, we will discuss Co-design 3.0, a more adaptable and scalable paradigm in which systems can be dynamically configured driven by the specific needs of the applications. The premise is that with the slowing of Moore’s law and the power of computing to solve problems for addressing societal needs, we need to focus on real applications as they evolve in time compared to standard benchmarks. For this to be practically viable, this should be done in a scalable framework for lower costs. We think that major ongoing research and development centers of computational and physical sciences need to be formally engaged in the co-design of hardware, software, numerical methods, algorithms, and applications. As we will demonstrate with a few examples, this will help address grand scientific (technology) challenges associated with the societal problems: materials and chemistry (energy); biology (environment, health); information processing (computing and communication). In addition, this will help in wider dispersion of the benefits of computing rather than to niche scientific communities. In order to accomplish this, it is likely that the computing framework that is currently being used may be replaced by different information processing architectures. As part of this talk, we will address the key applications and their needs in these areas and illustrate a new class that we have developed in which students are taught hands-on about using extreme computing to address real applications.

 

Back to Session X

The Challenges of Exascale Computing

 

Karl Solchenbach

Intel, Exascale Labs Europe, GERMANY

 

Building, operating and using exascale systems requires the solution of several challenges:

×        The performance/energy ratio has to improve by an order of magnitude

×        A new memory architecture is needed

×        Applications have to become highly scalable, supporting 1M+ cores

 

The presentation will address Intel’s efforts in future HPC system architectures, including many-core nodes, high-bandwidth interconnects, and new memory concepts. A special focus will be on programming models and applications and the related work in the Intel Exascale Labs in Europe. In collaborations with leading European HPC organisations Intel is establishing a co-design process to define the requirements for future HPC systems and to evaluate future system architectures. These systems won’t be pure number crunchers any more, they will solve problems as a mix of HPC, high performance analytics, and data-centric computing.

 

Back to Session II

The Asymptotic Computer – Undoing the Damage

 

Thomas Sterling

School of Informatics and Computing

& CREST Center for Research in Extreme Scale Technologies

Indiana University, Bloomington, IN, USA

 

While the very far future well beyond exaflops computing may encompass such paradigm shifts as quantum computing or Neuromorphic computing, a critical window of change exists within the domain of semiconductor digital logic technology. As key parameters such as Dennard scaling, nano-scale component densities, clock rates, pin I/O, and voltage represent asymptotic operational regimes, one major area of untapped opportunity is computer architecture which has been severely limited by conventional practices of organization and control semantics. Mainstream computer architecture in HPC has been inhibited in innovation by the original von Neumann architecture of seven decades ago. Although notably diverse in forms of parallelism exploited, six major epochs of computer architecture through to the present are all von Neumann derivatives. At their core (no pun intended) is the use of single instruction issue and the prioritization of Floating Point ALU (FPU) utilization. At one time, floating point hardware was the precious resource for which all architecture advances were motivated such as ILP, speculative execution, prefetching, cache hierarchies, TLBs, branch prediction, execution pipelining, and other architecture techniques. However, in the modern age, FPUs consume only a small part of die real estate while the plethora of mechanisms (including caches) to achieve maximum floating point efficiency take up the vast majority of the chip. In the meantime, the von Neumann bottleneck, the separation of memory and processor, is retained. A revolution in computer architecture design is possible, even at the end of Moore’s Law, by undoing the damage of the von Neumann heritage and emphasizing the key challenges of data movement latency and bandwidth which are the true precious resources along with operation/instruction issue control. This presentation will discuss the key tradeoffs that should drive computer architecture in what might be called the “Neo-Digital Age” and will give three stages of advances that are practical even in today’s technology. These include the author’s own work in ParalleX architecture, the latent opportunity of Processor-in-Memory architectures, and the adoption of future cellular architectures, each of which relaxes the von Neumann architecture assumptions and exploits the inherent opportunities of future computer architectures.

 

Back to Session VI

DOE-NCI Joint Development of Advanced Computing Solutions for Cancer

 

Rick Stevens

Argonne National Laboratory

& Department of Computer Science, The University of Chicago, USA

 

The U.S. has recently embarked on an “all government” approach to the problem of cancer. This is codified in the “Cancer Moonshot” initiative of the Obama administration led by Vice President Biden. In this approach, all U.S. Government Agencies were requested to propose ways to bring their resources and capabilities to forward cancer research. As part of this initiative, the Department of Energy (DOE) has entered into a partnership with the National Cancer Institute (NCI) of the National Institutes of Health (NIH). This partnership has identified three key challenges that the combined resources of DOE and NCI can accelerate. The first challenge is to understand the molecular basis of key protein interactions in the RAS/RAF pathway that is present in 30% of cancers. The second challenge is to develop predictive models for drug response that can be used to optimize pre-clinical drug screening and drive precision medicine based treatments for cancer patients. The third challenge is to automate the analysis and extraction of information from millions of cancer patient records to determine optimal cancer treatment strategies across a range of patient lifestyles, environmental exposures, cancer types and healthcare systems.

While each of these three challenges are at different biological scales and have specific scientific teams collaborating on the data acquisition, data analysis, model formulation, and runs of scientific simulations, they also share several common threads. First, they are all linked by common sets of cancer types that will appear at all three scales (i.e., molecular, cellular and population), all have to address significant data management and data analysis problems, and all need to integrate simulation, data analysis and machine learning at a large-scale to make progress. I will outline the strategy for attacking these problems, the scale of the problems and how we plan to utilize Exascale computing. A major goal of this effort is to drive requirements for future systems beyond those needed for traditional scientific computing applications. Of particular priority are the requirement for large-scale data analysis and the application of deep learning to all three problems.

 

Back to Session I

The potential to augment HPC systems with Neuromorphic Computing Accelerators

 

Rick Stevens

Argonne National Laboratory

& Department of Computer Science, The University of Chicago, USA

 

As more businesses, researchers and governments embrace machine learning as key technology we see the emergence of hardware accelerators optimized for execution of machine learning algorithms (e.g. SVM, ensemble and tree methods, lasso, multi-kernel methods, naïve Bayes and deep neural networks). NVIDIA, Intel and others have projects to address this focused marketplace and we are starting to see products placed into vendor roadmaps. The primary focus of these mainstream efforts appears to be accelerating the training phase of deep neural networks since this use case is very compute intensive and is a major bottleneck in machine learning productivity. In addition to CPU and GPU optimizations accomplished by providing support for reduced precision, other groups have designed and/or built hardware that departs from simple functional unit optimization to include entire data path optimization for machine learning and in some cases are ASICs targeting a specific method or software stack such as Google’s TensorFlow engine. The more extreme but still von-Neumann of these optimizations argue factors of greater than 100 in power efficiency over conventional GPUs. While it is certainly the case that progress can be made in adapting CPUs and GPUs to address the needs of large-scale machine learning, there may be major additional factors of 10 to be achieved in power reduction if we can demonstrate the effective use of neuromorphic hardware designs as an effective execution platform for large-scale machine learning applications. To date there is a considerable gap between the scale and scope of state of the art deep learning applications and those that have effectively been run on neuromorphic hardware. I this talk I’ll review what we know about power and performance efficiency gains from custom but conventional accelerators and compare that to what might be achieved with neuromorphic hardware and then discuss a path for integration into traditional HPC ecosystems and why this might be a good idea.

 

Back to Session VI

Merging Data Science and Large Scale Computational Modeling

 

Francis Sullivan

IDA/Center for Computing Sciences, Bowie, MD, USA

 

Future exascale computers will have to be suitable for both data science applications and for more “traditional” modeling and simulation. However, data science applications are often posed as questions about discrete objects such as graphs while problems in modeling and simulation are usually stated initially in terms of classical mathematical analysis. We will present examples and arguments to show that the two points of view are not as distinct as one might think. Recognizing the connections between the two problem sets will be essential to development of algorithms capable of exascale performance. We first touch briefly on methods such as particle swarm optimization, which seem to be ad hoc and suited for discrete machine learning applications but, when examined closely, can be seen to behave very much like classical methods such as gradient descent. And classical problems, such as inverting a Laplacian matrix, can be stated in terms of properties of spanning trees and cycle space of a graph. Our main examples will be from applications of Monte Carlo to attacking hard problems of the kind that occur both in data science and in computational modeling of physical phenomena. In the case of classical problems, methods for determining convergence are often non-rigorous but capable of supplying a physically meaningful answer. While in the discrete world, rigorous results exist but establish complexity bounds that lead to methods that cannot be used in practice. We will illustrate how taking ideas from both worlds pays handsome dividends.

 

Back to Session VIII

From Clouds to Exascale: Programming Issues in Big Data Analysis

 

Domenico Talia

Department of Computer Engineering, Electronics, and Systems

University of Calabria, Italy

 

Scalability is a key feature for big data analysis and machine learning tools and applications that need to analyze very large and real-time data nowadays available from data repositories, social media, sensor networks, smartphones and the Web. Scalable big data analysis today can be achieved by parallel implementations that are able to exploit the computing and storage facilities of HPC systems and clouds, whereas in the next future exascale systems will be used to implement extreme scale data analysis. In this talk we discuss how clouds currently support the development of scalable data mining solutions and outline the main challenges to be addressed and solved for implementing future data analysis exascale systems.

 

Back to Session VIII

Kinetic Turbulence Simulations on Top Supercomputers Worldwide

 

William Tang

Princeton University, Dept. of Astrophysical Sciences, Plasma Physics Section

Fusion Simulation Program, Princeton Plasma Physics Lab.

& Princeton Institute for Computational Science and Engineering, USA

 

A major challenge for high performance computing (HPC) today is to demonstrate how advances in supercomputing technology translate to accelerated progress in key application domains. This is the focus of an exciting program being launched in the US -- the “National Strategic Computing Initiative (NSCI)” – that was announced as an Executive Order on July 29, 2015, involving all research & development (R&D) programs in the country to “enhance strategic advantage in HPC for security, competitiveness, and discovery.” A strong associated focus in key application domains is to accelerate progress in advanced codes that model complex physical systems -- especially with respect to reduction in “time-to-solution” as well as “energy to solution.” If properly validated against experimental measurements/observational data and verified with mathematical tests and computational benchmarks, these codes can expected to improve much-needed predictive capability in many strategically important areas of interest.

 

As an example application domain, computational advances in plasma physics and magnetic fusion energy research have produced particle-in-cell (PIC) simulations of turbulent kinetic dynamics for which computer run-time and problem size scale very well with the number of processors on massively parallel many-core supercomputers. For example, the GTC-Princeton (GTC-P) code, which has been developed with a “co-design” focus, has demonstrated the effective usage of the full power of current leadership class computational platforms worldwide at the petascale and beyond to produce efficient nonlinear PIC simulations that have advanced progress in understanding the complex nature of plasma turbulence and confinement in fusion systems for the largest problem sizes. Results have also provided strong encouragement for being able to include increasingly realistic dynamics in extreme-scale computing campaigns with the goal of enabling predictive simulations characterized by unprecedented physics resolution/realism for increasing problem size challenges. More generally, from a performance modelling perspective, important “lessons learned” from these studies hold significant promise for benefiting particle-in-cell (PIC) based software in other application domains.

 

Back to Session X

Towards Support of Highly-Varied Workloads on Supercomputers

 

Adrian Tate

Cray EMEA Research Lab. United Kingdom

 

The talk will describe current research projects that broadly support the seamless execution of highly varied, data-intensive workloads on Supercomputers. Systems of the near future will run jobs including any mixture of compute-intensive, data-intensive processing, data analytics, visualization and machine learning. As well as describing fundamental software and hardware challenges, the talk will describe how more integrated systems could be used to match pieces of varied workloads with appropriate hardware. The need for advancements in some key software areas will be described, including data- and memory-aware programming abstractions, mathematical optimization support, new task schedulers and intelligent runtimes.

 

Back to Session II

Globus Auth Identity and Access Management

 

Steve Tuecke

Computation Institute, The University of Chicago, Chicago, IL, USA

 

Globus Auth is a foundational identity and access management (IAM) platform service, used for brokering authentication and authorization interactions between end-users, identity providers, applications (including web, mobile, desktop, and command line), and services (including service to service). The goal of Globus Auth is to enable an extensible, integrated ecosystem of applications and services for the research community. In this talk I will introduce and demonstrate Globus Auth, and examine how it can be used to enhance applications and services such as data portals and science gateways with advanced IAM functionality.

 

Back to Session VII

ARM’s Path to Exascale

 

Eric Van Hensbergen

ARM Research, Austin, TX, USA

 

In late 2011 ARM started exploring the use of its technologies in high performance computing through the FP7 Montblanc project. Starting at that time with 32-bit mobile processors, the aim was to investigate how well power-efficient mobile cores could run HPC workloads. In the five years since that project started, combined with efforts from the US Department of Energy’s FastForward program and the European Horizon 2020 program, the ARM architecture has entered the enterprise server market with 64-bit processors and will be announcing architecture extensions to better address compute-intensive workloads while still maintaining energy efficiency. We now believe we have the necessary compute capabilities to address Exascale and our focusing our research efforts on addressing scalability and memory bottlenecks with a particular emphasis on both increasing utilization of existing resources and reducing unnecessary data movement. We believe this will allow ARM solutions to achieve higher performance while maintaining energy efficiency and will also serve to improve the ARM capabilities in both streaming analytics and other forms of big-data. This talk will overview the past, present, and future of ARM technologies in both the high-performance and data-intensive computing.

 

Back to Session II

How Well Do We Know Properties of Parallel Algorithms?

 

Vladimir Voevodin

Moscow State University, Research Computing Center, Moscow, RUSSIA

 

The computing world is changing and all devices – from mobile phones and personal computers to high-performance supercomputers – are becoming parallel. At the same time, if the efficient usage of all the opportunities offered by modern computing systems represents a global challenge, it turns into a large number of challenges at extreme scale. Using full potential of parallel computing systems and distributed computing resources requires new knowledge, skills and abilities, where one of the main roles belongs to understanding of key properties of parallel algorithms. What are these properties? What should be discovered and expressed explicitly in existing algorithms when a new parallel architecture appears? How to ensure efficient implementation of an algorithm on an extreme scale computing platform? All these as well as many other issues will be addressed in the talk.

The idea that we use in our practice is to split a description of an algorithm into two parts. This helps us to explain what a good parallel algorithm is and what is important for its efficient implementation. The first part describes algorithms and their properties. The second part is dedicated to describing particular aspects of their implementation on various computing platforms. The first part draws attention to the key theoretical properties, and the second part puts emphasis on the aspects fundamentally important on practice. This division is made intentionally to highlight the machine-independent properties of algorithms which determine their potential and the quality of their implementations on parallel computing systems, and to describe them separately from a number of issues related to the subsequent stages of coding and execution. In addition to the classical algorithm properties such as serial complexity, we have to deal with concepts such as parallel complexity, parallel structure, determinacy, data locality, performance and scalability estimates, communication profiles for specific implementations, and many others aspects.

This approach was successfully implemented as an open encyclopedia AlgoWiki, which is available for the computational community at www.AlgoWiki-Project.org.

Back to Session V