HPC 2014

 

High Performance Computing

 

FROM clouds and BIG DATA to EXASCALE AND BEYOND

 

An International Advanced Workshop

 

 

 

July 7 – 11, 2014, Cetraro, Italy

 

 

image002

 

 

 

 

 

Final Programme

 

Programme Committee

L. GRANDINETTI (Chair)

University of Calabria

F. BAETKE

Hewlett Packard

C. CATLETT

Argonne National Lab. and University of Chicago

J. DONGARRA

University of Tennessee

S. S. DOSANJH

Lawrence Berkeley National Lab

I. FOSTER

Argonne National Lab. and University of Chicago

G. FOX

Indiana University

W. GENTZSCH

The UberCloud and EUDAT

V. GETOV

University of Westminster

G. JOUBERT

Technical University Clausthal

C. KESSELMAN

University of Southern California

E. LAURE

Royal Institute of Technology Stockholm

T. LIPPERT

Juelich Supercomputing Centre

M. LIVNY

University of Wisconsin

I. LLORENTE

Universidad Complutense de Madrid

B. LUCAS

University of Southern California

S. MATSUOKA

Tokyo Institute of Technology

P. MESSINA

Argonne National Laboratory

K. MIURA

National Institute of Informatics, Tokyo

V. PASCUCCI

University of Utah and Pacific Northwest National Lab

N. PETKOV

University of Groningen

J. QIU

School of Informatics and Computing and Indiana University

S. SEKIGUCHI

National Institute of Industrial Science and Technology

T. STERLING

Indiana University

A. WANG

The University of Hong Kong

ITALY

 

U.S.A.

 

U.S.A.

 

U.S.A.

 

U.S.A.

 

U.S.A.

 

U.S.A.

 

GERMANY

 

U.K.

 

GERMANY

 

U.S.A.

 

SWEDEN

 

GERMANY

 

U.S.A.

 

SPAIN

 

U.S.A.

 

JAPAN

 

U.S.A.

 

JAPAN

 

U.S.A.

 

NETHERLANDS

 

U.S.A.

 

JAPAN

 

U.S.A.

 

HONG KONG

 

Co-Organizers

L. GRANDINETTI

Center of Excellence for High Performance Computing, UNICAL, Italy

T. LIPPERT

Institute for Advanced Simulation, Juelich Supercomputing Centre, Germany

Organizing Committee

L. GRANDINETTI (Co-Chair)

ITALY

T. LIPPERT (Co-Chair)

GERMANY

M. ALBAALI

(OMAN)

C. CATLETT

(U.S.A.)

J. DONGARRA

(U.S.A.)

W. GENTZSCH

(GERMANY)

O. PISACANE

(ITALY)

M. SHEIKHALISHAHI

(ITALY)

 

 

 

 

 Sponsors

 

 

ARM

ARM

CRAY

logo_CRAY

DIMES - Department of Computer Engineering, Electronics, and Systems

University of Calabria – UNICAL

DIMES

Dipartimento di Ingegneria dell’Innovazione

Università del Salento

DipIngInn_solo giallo

Hewlett Packard

HP

IBM

IBM2

INTEL

logo_intel

Juelich Supercomputing Center

logo_fzj

KISTI - Korea Institute of Science and Technology Information

logo_kisti

MELLANOX TECHNOLOGIES

logo_mellanox

MICRON

micron

National Research Council of Italy - ICAR - Institute for High Performance Computing and Networks

ICAR

PARTEC

https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcSKr8IqWWPKzoEFO_ODjCSjTAjAdkUi116nlAVEmRCxRsZtPp2N

 

 

Media Partners

 

 

HPCwire_5-inches_RGB

 

HPCwire is the leader in world-class journalism for HPC. With a legacy dating back to 1986, HPCwire is recognized worldwide for its breakthrough coverage of the fastest computers in the world and the people who run them. For topics ranging from the latest trends and emerging technologies, to expert commentary, in-depth analysis, and original feature coverage, HPCwire delivers it all, as the industry’s leading news authority and most reliable and trusted resource. Visit HPCwire.com and subscribe today!

 

 

 

 

 

AWS

 

Free Amazon web Service credits for all HPC 2014 delegates

 

Amazon is very pleased to be able to provide $200 in service credits to all HPC 2014 delegates. Amazon Web Services provides a collection of scalable high performance and data-intensive computing services, storage, connectivity, and integration tools. AWS allows you to increase the speed of research and to reduce costs by providing Cluster Compute or Cluster GPU servers on-demand.  You have access to a full-bisection, high bandwidth 10Gbps network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications.

 

 

 

 

 

 

ubercloud

The UberCloud is an online community and marketplace platform for engineers and scientists to discover, try, and buy computing time, on demand, in the HPC Cloud, and pay only for what you use.

Please register for the UberCloud Voice Newsletter, or for performing an HPC Experiment in the Cloud.

 

 

 

 

 

Speakers

 

Katrin Amunts

Juelich Resarch Center

Institute of Neuroscience and Medicine

Structural and Functional Organization of the Brain (INM 1) at Forschungszentrum Jülich

Juelich

and

Structural-functional brain mapping at RWTH Aachen University

Aachen

GERMANY

 

Frank Baetke

Global HPC Programs

Academia and Scientific Research

Hewlett Packard

Palo Alto, CA

USA

 

Pete Beckman

Director, Exascale Technology and Computing Institute

Argonne National Laboratory

Argonne, IL

USA

 

Keren Bergman

Department of Electrical Engineering

Columbia University

New YorkNY

USA

 

William Blake

Senior VP and CTO

CRAY

Seattle

USA

 

Charlie Catlett

Math & Computer Science Div.

Argonne National Laboratory

ArgonneIL

and

Computation Institute of

The University of Chicago and Argonne National Laboratory

ChicagoIL

USA

 

Alok Choudhary

Kellogg School of Management

Northwestern University

EvanstonIL

USA

 

Paul Coteus

IBM Research - Yorktown Heights

Data Centric Deep Computing Systems

Yorktown HeightsN.Y.

USA

 

Patrick Demichel

Strategic System Architect in HPC

Hewlett Packard

Palo Alto, CA

USA

 

Jack Dongarra

Innovative Computing Laboratory

University of Tennessee

KnoxvilleTN

and

Oak Ridge National Laboratory

USA

 

Mikhail Dorojevets

Stony Brook University

Dept. of Electrical and Computer Engineering

Stony Brook, NY

USA

 

Sudip S. Dosanjh

Director of the National Energy Research Scientific Computing Center

at Lawrence Berkeley National Laboratory

BerkeleyCA

USA

 

Paul F. Fischer

Argonne National Laboratory

Argonne, IL

USA

 

Ian Foster

Argonne National Laboratory

ArgonneIL

and

Dept of Computer Science

The University of Chicago

ChicagoIL

USA

 

Geoffrey Fox

Community Grid Computing Laboratory

Indiana University

BloomingtonIN

USA

 

Wolfgang Gentzsch

The UberCloud and EUDAT

GERMANY

 

Sergei Gorlatch

Universitaet Muenster

Institut für Informatik

GERMANY

 

Richard Graham

Mellanox

Sunnyvale, CA

USA

 

Bart Ter Haar Romeny

Eindhoven University of Technology

Department of Biomedical Engineering

Biomedical Image Analysis & Interpretation

Eindhoven

THE NETHERLANDS

 

Takahiro Hirofuchi

Information Technology Research Institute

National Institute of Advanced Industrial Science and Technology (AIST)

JAPAN

 

Gerhard Joubert

Technical University Clausthal

GERMANY

 

Carl Kesselman

Information Sciences Institute

University of Southern California

Marina del Rey, Los Angeles, CA

USA

 

Ludek Kucera

Charles University

Faculty of Mathematics and Physics

Prague

CZECH REPUBLIC

 

Marcel Kunze

Forschungsgruppe Cloud Computing
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)

GERMANY

 

Erwin Laure

KTH Royal Institute of Technology

Stockholm

SWEDEN

 

John D. Leidel
Software Compiler Development Manager
Micron Technology, Inc.

Dallas/Forth Worth, Texas

USA

 

Thomas Lippert

Institute for Advanced Simulation

Jülich Supercomputing Centre

and

University of Wuppertal, Computational Theoretical Physics,

and

John von Neumann Institute for Computing (NIC)

also

European PRACE IP Projects and of the DEEP Exascale Project

GERMANY

 

Guy Lonsdale

Vorstand/CEO   scapos AG

Sankt Augustin

GERMANY

 

Bob Lucas

Computational Sciences Division

University of Southern California

Information Sciences Institute

Los Angeles, CA

USA

 

Stefano Markidis

KTH Royal Institute of Technology

Stockholm

SWEDEN

 

Victor Martin-Mayor

Departamento de Fisica Teorica

Universidad Complutense de Madrid

Madrid

SPAIN

 

Satoshi Matsuoka

Global Scientific Information and Computing Center

& Department of Mathematical and Computing Sciences

Tokyo Institute of Technology

Tokyo

JAPAN

 

Paul Messina

Argonne National Laboratory

Argonne, IL

USA

 

Ken Miura

Center for Grid Research and Development

National Institute of Informatics

Tokyo

JAPAN

 

Mark Moraes

Head Engineering Group

D.E. Shaw Research

New York

USA

 

Valerio Pascucci

University of Utah

Center for Extreme Data Management, Analysis and Visualization,

Scientific Computing and Imaging Institute

School of Computing

and

Pacific Northwest National Laboratory

Salt Lake CityUT

USA

 

David Pellerin

AWS High Performance Computing

AMAZON

USA

 

Dana Petcu

Computer Science Department

West University of Timisoara

ROMANIA

 

Judy Qiu

School of Informatics and Computing

and

Pervasive Technology Institute

Indiana University

USA

 

Mark Seager

CTO for HPC Systems

INTEL

Santa Clara, California

USA

 

Alex Shafarenko

School of Computer Science

University of Hertfordshire

Hatfield

UNITED KINGDOM

 

John Shalf

Lawrence Berkeley Laboratory

BerkeleyCalifornia

USA

 

Thomas Sterling

School of Informatics and Computing

and

CREST Center for Research in Extreme Scale Technologies

Indiana University

BloomingtonIN

USA

 

Rick Stevens

Argonne National Laboratory

and

Department of Computer Science,

The University of Chicago

Argonne & Chicago

USA

 

Domenico Talia

Department of Computer Engineering, Electronics, and Systems

University of Calabria

ITALY

 

William M. Tang

Princeton University

Dept. of Astrophysical Sciences, Plasma Physics Section

Fusion Simulation Program

Princeton Plasma Physics Laboratory

and

Princeton Institute for Computational Science and Engineering

Princeton

USA

 

Matthias Troyer

Institut für Theoretische Physik

ETH Zürich

SWITZERLAND

 

Eric Van Hensbergen

ARM Research
Austin, TX
USA

 

Priya Vashishta

Collaboratory for Advanced Computing and Simulations

Departments of Chemical Engineering & Materials Science, Physics & Astronomy, and Computer Science

University of Southern California

Los AngelesCA

USA

 

Jose Luis Vazquez-Poletti

Distributed Systems Architecture Research Group (DSA-Research.org)

Universidad Complutense de Madrid

SPAIN

 

Vladimir Voevodin

Moscow State University

Research Computing Center

Moscow

RUSSIA

 

Robert Wisniewski

Chief Software Architect Exascale Computing

INTEL Corporation

New York, NY

USA

 

 

 

Workshop Agenda

Monday, July 7th

Session

Time

Speaker/Activity

 

9:00 – 9:15

Welcome Address

Session I

 

State of the Art and Future Scenarios

 

9:15 – 9:50

J. DONGARRA

High Performance Computing Today and Benchmark the Future

 

9:50 – 10:25

I. FOSTER

Networking materials data

 

10:25 – 11:00

G. FOX

Returning to Java Grande: High Performance Architecture for Big Data

 

11:00 – 11:30

COFFEE BREAK

 

11:30 – 12:05

S. Matsuoka

Convergence of Extreme Big Data and HPC - Managing the memory hierarchy and data movement the key towards future exascale

 

12:05 – 12:40

R. Stevens

Future Scenarios — Mobile Über Alles: Trends and Open Problems for the Coming Decade, How does HPC Stay Relevant?

 

12:40 – 13:00

CONCLUDING REMARKS

Session II

 

Emerging Computer Systems and Solutions

 

17:00 – 17:25

F. Baetke

Trends and Paradigm Shifts in High Performance Computing

 

17:25 – 17:50

B. Blake

The Fusion of Supercomputing and Big Data: The Role of Global Memory Architectures in Future Large Scale Data Analytics

 

17:50 – 18:15

P. Coteus

Data Centric Systems

 

18:15 – 18:45

COFFEE BREAK

 

18:45 – 19:10

J. Leidel

Programming Challenges in Future Memory Systems

 

19:10 – 19:35

D. Pellerin

Scalability in the Cloud: HPC Convergence with Big Data in Design, Engineering, Manufacturing

 

19:35 – 20:00

M. Kunze

Big Data Technologies

 

20:00 – 20:10

CONCLUDING REMARKS

 

 

Tuesday, July 8th

Session

Time

Speaker/Activity

Session III

 

Advances in HPC Technology and Systems

 

9:00 – 9:25

S. Gorlatch

Towards High-Level Programming for Many-Cores

 

9:25 – 9:50

A. Shafarenko

Coordination programming for self-tuning: the challenge of a heterogeneous open environment

 

9:50 – 10:15

K. Miura

Prospects for the Monte Carlo Methods in the Million Processor-core Era and Beyond

 

10:15 – 10:40

B. Lucas

Accelerating the Multifrontal Method

 

10:40 – 11:05

V. Martin-Mayor

Quantum versus Thermal annealing (or D-wave versus Janus): seeking a fair comparison

 

11:05 – 11:35

COFFEE BREAK

Session IV

 

Software and Architecture for Extreme Scale Computing I

 

11:35 – 12:00

S. Dosanjh

Big Computing, Big Data, Big Science

 

12:00 – 12:25

E. Laure

EPiGRAM - Towards Exascale Programming Models

 

12:25 – 12:50

M. Seager

Beowulf meets Exascale System Software: A horizontally integrated framework

 

12:50 – 13:15

B. Lucas

Adiabatic Quantum Annealing Update

Session V

 

Software and Architecture for Extreme Scale Computing II

 

17:00 – 17:25

P. Beckman

t.b.a.

 

17:25 – 17:50

J. Shalf

Exascale Programming Challenges: Adjusting to the new normal for computer architecture

 

17:50 – 18:15

L. Kucera

A lower bound to energy consumption of an exascale computer

 

18:15 – 18:45

COFFEE BREAK

Session VI

 

Brain related Simulation and Computing

 

18:45 – 19:10

K. Amunts

Ultra-high resolution models of the human brain – computational and neuroscientific challenges

 

19:10 – 19:35

T. Lippert

Creating the HPC Infrastructure for the Human Brain Project

 

19:35 – 20:00

B. ter Haar Romeny

Functional models for early vision circuits from first principles

 

20:00 – 20:10

CONCLUDING REMARKS

 

 

Wednesday, July 9th

Session

Time

Speaker/Activity

Session VII

 

Beyond Exascale Computing

 

9:00 – 9:15

P. Messina

Enabling technologies for beyond exascale computing

 

9:15 – 9:45

R. Stevens

Beyond Exascale — What will Sustain our Quest for Performance in a Post-Moore World?

 

9:45 – 10:15

M. Dorojevets

Energy-Efficient Superconductor Circuits for High-Performance Computing

 

10:15 – 10:45

M. TROYER

T.B.A.

               

10:45 – 11:15

COFFEE BREAK

 

11:15 – 11:45

M. Moraes

Scaling lessons from the software challenges in Anton, a special-purpose machine for molecular dynamics simulation

 

11:45 – 12:15

P. Demichel

New technologies that disrupt our complete ecosystem and their limits in the race to Zettascale

 

12:15 – 12:45

K. Bergman

Scalable Computing Systems with Optically Enabled Data Movement

 

12:45 – 13:00

CONCLUDING REMARKS

 

17:00 – 17:30

R. Wisniewski

System Software for PEZ(Y)

 

17:30 – 18:00

COFFEE BREAK

 

18:00 – 18:30

T. Sterling

Extreme-scale Architecture in the Neo-Digital Age

 

18:30 – 20:00

PANEL DISCUSSION: “Beyond Exascale Computing

Organized and Chaired by P. Messina

 

Participants: F.Baetke (Hewlett Packard), P. Coteus (IBM), R. Graham (Mellanox), G. Fox (Indiana University), T. Lippert (Juelich Supercomputing Centre), S. Matsuoka (Tokyo Institute of Technology), P. Shalf (Lawrence Berkeley Laboratory), V. Voevodin (Moscow State University)

 

 

Thursday, July 10th

Session

Time

Speaker/Activity

Session VIII

 

Cloud Computing Technology and Systems

 

9:00 – 9:30

J. Qiu

Harp: Collective Communication on Hadoop

 

9:30 – 10:00

D. Petcu

Overcoming the Cloud heterogeneity: from uniform interfaces and abstract models to multi-cloud platforms

 

10:00 – 10:30

T. Hirofuchi

AIST Super Green Cloud: A build-once-run-everywhere high performance computing platform

 

10:30 – 11:00

D. Talia

Programming Script-based Data Analytics Workflows on Clouds

               

11:00 – 11:30

COFFEE BREAK

 

11:30 – 12:00

G. Lonsdale

The Fortissimo HPC-Cloud: an enabler for engineering and manufacturing SMEs

 

12:00 – 12:30

J. L. Vazquez

Clouds for meteorology, two cases study

 

12:30 – 13:00

W. Gentzsch

UberCloud - from Project to Product

 

13:00 – 13.10

CONCLUDING REMARKS

Session IX

 

Big Data

 

17:00 – 17:25

V. Pascucci

The Big Gift of Big Data

 

17:25 – 17:50

A. Choudary

BIG DATA + BIG COMPUTE = Power of Two for Scientific Discoveries

 

17:50 – 18:15

G. Fox

Parallelizing Data Analytics

 

18:15 – 18:45

COFFEE BREAK

 

18:45 – 19:10

G. Joubert

Modelling & Big Data

 

19:10 – 19:35

E. Van Hensbergen

From Sensors to Supercomputers, Big Data Begins With Little Data

 

19:35– 20:00

C. Kesselman

A Software as a Services based approach to Digital Asset Management for Complex Big-Data

 

20:00 – 20:10

CONCLUDING REMARKS

 

 

Friday, July 11th

Session

Time

Speaker/Activity

Session X

 

Infrastructures, Solutions and Challenging applications of HPC,

Grids and Clouds

 

9:00 – 9:30

C. Catlett

New Opportunities for Computation and Big Data in Urban Sciences

 

9:30 – 10:00

R. GRAHAM

The Exascale Architecture

 

10:00 – 10:30

S. MARKIDIS

Challenges and Roadmap for Scientific Applications at Exascale

 

10:30 – 11:00

W. Tang

Extreme Scale Computing Advances & Challenges in PIC Simulations

 

11:00 – 11:30

COFFEE BREAK

 

11:30 – 12:00

P. Vashishta

Thermomechanical Behaviour and Materials Damage: Multimillion-Billion Atom Reactive Molecular Dynamics Simulations

 

12:00 – 12:30

P. Fischer

Scalable Simulations of Multiscale Physics

 

12:30 – 13:00

V. Voevodin

Medical practice: diagnostics, treatment and surgery in supercomputer centers

 

13:00 – 13.10

CONCLUDING REMARKS

 

 

Chairmen

 

 

 

Paul Messina

Argonne National Laboratory

Argonne, IL

USA

 

 

 

Gerhard Joubert

Technical University Clausthal

GERMANY

 

 

 

Jack Dongarra

Innovative Computing Laboratory

University of Tennessee

Knoxville, TN

and

Oak Ridge National Laboratory

USA

 

 

 

Bill Blake

Cray Inc.

Seattle, WA

USA

 

 

 

Sudip Dosanjh

Lawrence Berkeley National Laboratory

Berkeley, CA

USA

 

 

 

Nicolai Petkov

University of Groningen

Groningen

THE NETHERLANDS

 

 

 

Paul Messina

Argonne National Laboratory

Argonne, IL

USA

 

 

 

Geoffrey Fox

Indiana University

Bloomington, IN

USA

 

 

 

Bob Lucas

Computational Sciences Division

Univ. of Southern California

Information Sciences Institute

Los Angeles, CA

USA

 

 

 

Wolfgang Gentzsch

The UberCloud and EUDAT

Regensburg

GERMANY

formerly

SUN Microsystems and

Duke University, North Carolina

U.S.A.

 

 

 

Panel Discussion

Beyond Exascale Computing

 

Paul Messina (Chair)

Participants: F. Baetke (Hewlett Packard), P. Coteus (IBM), R. Graham (Mellanox), G. Fox (Indiana University), T. Lippert (Juelich Supercomputing Centre), S. Matsuoka (Tokyo Institute of Technology), P. Shalf (Lawrence Berkeley Laboratory), V. Voevodin (Moscow State University)

 

This panel session will be held at the end of a day dedicated to presentations on the topic “Beyond Exascale Computing.” Beyond exascale systems – as we are defining them – are ones that will be based on new technologies that will finally result in the much anticipated (but unknown) phase-change to truly new paradigms/methodologies. The presentations prior to the panel will have covered promising disruptive technologies and architecture advances that may be enabled as a consequence of technology progress.

 

The focus of this panelisto providea forum for views onforward-looking technologies that may determine future operational opportunities and challenges for computer systems beyond the exascale regime, what impact they will have on computer architectures and the entire computing ecosystem, and what applications they might enable. 

Back to Session VII

 

 

 

Abstracts

Ultra-high resolution models of the human brain – computational and neuroscientific challenges

 

Katrin Amunts

Jülich Research Centre, Germany

 

The human brain is characterized by a multi level organization. Brain models at microscopical resolution provide the bridge between the cellular level of organization and that of cognitive systems. Data size and complexity of brain organization, however, make it challenging to create them. Cytoarchitectonic mapping strategies, as well as 3D-Polarized Light Imaging for analysing nerve fibre bundles and single axons will be discussed. Models of cellular and fiber architecture will be shown, including a new BigBrain data set, based on advanced ICT, thus opening new perspectives to decode the human brain.

 

Back to Session VI

Trends and Paradigm Shifts in High Performance Computing

 

Frank Baetke

HP Global HPC Marketing, Germany

 

HP’s HPC product portfolio, based on standards at the processor, node and interconnect levels, has conquered the High Performance Computing market across all industry and application segments. HP’s extended portfolio of compute, storage and interconnect components powers most HPC sites in the TOP500 list. For specific challenges at node and systems level HP has introduced the SL-series with proven Petascale scalability and leading energy efficiency.

 

The SL-series will continue to lead innovation with the recent addition of new GPU- and coprocessor architectures as well as advanced storage subsystems.

 

Power and cooling efficiency has become a new key focus area. Primarily only an issue of cost, it now extends for the power and thermal density of what can be managed in a data center. Combining all the associated technological advances will result in a new HPC paradigm shifts towards data centers not only running at extreme efficiencies but also enabling extended energy recovery rates.

 

Moving towards Exascale computing we will be facing additional challenges that again will have a significant impact on how large scale systems have to be designed.

Back to Session II

Scalable Computing Systems with Optically Enabled Data Movement

 

Keren Bergman

Department of Electrical Engineering, Columbia University, USA

 

As future computing systems aim to realize Exascale performance the challenge of energy efficient data movement rather than computation is paramount. Silicon photonics has emerged as perhaps the most promising technology to address these challenges by providing ultra-high bandwidth density communication capabilities that is essentially distance independent. Recent advances in chip-scale silicon photonic technologies have created the potential for developing optical interconnection networks that offer highly energy efficient communications and significantly improve computing performance-per-Watt. This talk will explore the design of silicon photonic interconnected architectures for Exascale and their impact on the system level performance.

Back to Session VII

The Fusion of Supercomputing and Big Data: The Role of Global Memory Architectures in Future Large Scale Data Analytics

 

Bill Blake

Senior VP and CTO, CRAY, USA

 

High Performance Computing is gaining the capabilities needed to deliver exascale supercomputers that can deliver the billion-way parallel computing and extreme memory capabilities needed. But at the same time, Big Data approaches to large scale analytics are pursuing another path leading to millions of servers and billions of cores in the cloud that deliver results with advanced distributed computing, This paper will explore the technology and architectural trends facing system and application developers and speculate on a future where the most powerful large scale analytics needs wll be met highly integrated Global Memory Architectures.

 

Back to Session II

BIG DATA + BIG COMPUTE = Power of Two for Scientific Discoveries

 

Alok N. Choudhary

Henry & Isabelle Dever Professor of EECS, Northwestern University, USA

 

Knowledge discovery has been driven by theory, experiments and by large-scale simulations on high-performance computers. Modern experiments and simulations involving satellites, telescopes, high-throughput instruments, sensors, and supercomputers yield massive amounts of data. What has changed recently is that the world is creating massive amounts of data at an astonishing pace and diversity.

 

Processing, mining and analyzing this data effectively and efficiently will be a critical component as we can no longer rely upon traditional ways of dealing with the data due to its scale and speed. But there is a social aspect of acceleration, which is sharing of “big data” and unleashing thousands to ask questions and participate in discovery. This talk addresses the fundamental question "what are the challenges and opportunities for extreme scale systems to be an effective platform" for not only traditional simulations, but their suitability for data-intensive and data driven computing to accelerate time to insights.

 

Biography:

Alok Choudhary is theHenry & Isabelle Dever Professor of Electrical Engineering and Computer Science and aprofessor at Kellogg School of Management. He is also the founder, chairman and chief scientist (served as its CEO during 2011-2013) of 4C insights (formerly Voxsup Inc.)., a big data analytics and social media marketing company. He received the National Science Foundation's Young Investigator Award in 1993. He is a fellow of IEEE, ACM and AAAS. His research interests are in high-performance computing, data intensive computing, scalable data mining, computer architecture, high-performance I/O systems, software and their applications in science, medicine and business. Alok Choudhary has published more than 400 papers in various journals and conferences and has graduated 33 PhD students. Techniques developed by his group can be found on every modern processor and scalable software developed by his group can be found on many supercomputers. Alok Choudhary’s work and interviews have appeared in many traditional media including New York Times, Chicago Tribune, The Telegraph,ABC, PBS, NPR, AdExchange, Business Daily and many international media outlets all over the world.

 

Back to Session IX

Data Centric Systems

 

Paul Coteus

IBM Fellow, Chief Engineer Data Centric Deep Computing Systems

Manager, IBM Research Systems Power, Packaging, and Cooling, USA

 

I will explain the motivation for IBMs Data Centric Systems, and how it connects the needs of big data and high performance computing. Data Centric Systems are built on the principle that moving computing to the data will lead to more cost effective, efficient, and easier to program systems than in the past. I will explain our vision for Data Centric Computing, covering hardware, software and programming models.

Back to Session II

New technologies that disrupt our complete ecosystem and their limits in the race to Zettascale

 

Patrick Demichel

Strategic System Architect in HPC, Hewlett Packard, USA

 

We now have a clear path toward the goal of Exaflop computing by 2020 for a power budget of ~20MWatts.

There are still many components to move from Labs to Development, but now is time to start the journey toward the Zettaflop to discover the future challenges that we will face and that will require even more creativity and endurance to solve.

A Zettaflop system opens the door to solve many of the most fundamental problems that our societies face and create opportunities in particular in climatology, energy, biosciences, security, Big Data, …

Clearly power and economic constraints will continuously and exponentially be the key drivers which influence almost all of our choices. In the era of the Internet Of Things; with potentially trillions of connected objects and Yottabytes of data, we could bring thousands of fundamental breakthroughs in all domains if we know how to extract meaning from the tsunami of information. We need these Zettascale systems, as they will be the brain, central to this highly engineered planet.

 

Back to Session VII

High Performance Computing Today and Benchmark the Future

 

Jack Dongarra

Innovative Computing Laboratory, University of Tennessee

& Oak Ridge National Laboratory, USA

 

In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our numerical scientific software. In addition benchmarking in particular the Linpack Benchmark must change to match the high performance computing today and in the next generation.

Back to Session I

Energy-Efficient Superconductor Circuits for High-Performance Computing

 

Mikhail Dorojevets

Stony Brook University, Dept. of Electrical and Computer Engineering, USA

 

Superconductor technology offers an opportunity to build processing circuits operating at very high frequencies of 20-50 GHz with ultra-low power consumption. The first generation of such circuits used a logic called Rapid Single-Flux-Quantum (RSFQ) to demonstrate ultra-high clock frequencies. However, RSFQ circuits have significant static power consumption in so-called bias resistors (~ 100X dynamic power consumption in Josephson junctions). Recently, the invention of new energy-efficient SFQ logics with practically zero static power dissipation allowed superconductor designers to switch their focus from high frequencies to energy efficiency. First, I will talk about our design methodology, fabrication and demonstration of several wave-pipelined RSFQ processing units operating at 20 GHz and even higher frequencies. Then, I’ll discuss our recent work on design, evaluation and projections for a new generation of energy-efficient superconductor circuits using a benchmark set of 32-/64-bit integer and floating-point units, register files, and other local storage structures designed for some new superconductor fabrication process to be developed by 2016-2017.

 

Acknowledgment:

The research was funded in part by ARO contract W911NF-10-1-0012.

 

Back to Session VII

Big Computing, Big Data, Big Science

 

Sudip Dosanjh

Director, National Energy Research Scientific Computing (NERSC) Center

at Lawrence Berkeley National Laboratory, USA

 

With more than 5,000 users from universities, national laboratories and industry, the National Energy Research Scientific Computing (NERSC) Center supports the largest and most diverse research community of any computing facility within the U.S. Department of Energy (DOE). We provide large-scale, state-of-the-art computing for DOE’S unclassified research programs in alternative energy sources, climate change, energy efficiency, environmental science and other fundamental science areas.

 

NERSC recently installed our newest supercomputing platform, a Cray XC30 system named “Edison” in honor of American inventor Thomas Alva Edison. Scientists from around the globe eagerly queued up to take advantage of the new supercomputer’s capabilities. Edison was the first Cray supercomputer with Intel processors, a new Aries interconnect and a dragonfly topology.  The system was designed to optimize data motion, which is the primary bottleneck for many of our applications, as opposed to peak speed. It has very high memory bandwidth, interconnect speed and bisection bandwidth. In addition, each node has twice the memory of many leading systems. This combination of fast data motion and large memory per node make it well suited for both our traditional HPC workload and newly emerging data intensive applications.

 

NERSC’s primary mission is to accelerate scientific discovery at the DOE’s Office of Science through high performance computing and data analysis. In 2013 we provided 1.25 billion computer hours to our users, and 2013 proved to be a productive year for scientific discovery at the center. In 2013, our users published 1,977 refereed papers and 18 journal cover stories based on the computations performed at NERSC. In addition, long-time NERSC user Martin Karplus, who has been computing at NERSC since 1998, was honored with a Nobel Prize in Chemistry for his contributions to the field of computational chemistry.

 

A clear trend at NERSC is that a growing number of scientific discoveries involve the analysis of extremely large data sets from experimental facilities. For the last four years, more data has been transferred to NERSC than away from NERSC, representing a paradigm shift for a supercomputing center. Most months we ingest more than a Petabyte of data.  A few recent data-intensive highlights are:

          Computing the properties of neutrinos from the Daya Bay Neutrino Experiment led to the discovery of a new type of neutrino oscillation which may help solve the riddle of matter-antimatter asymmetry in the universe (one of Science Magazine’s Top 10 Breakthroughs of 2012).

          NERSC’s integrated resources and services enabled the earliest-ever discovery of a supernova—within hours of its explosion—providing new information about supernova explosion dynamics.

          The IceCube South Pole Neutrino Observatory made the first observations of high-energy cosmic neutrinos, an achievement enabled in part by NERSC resources (This was Physics World's “Breakthrough of the Year” in 2013).

          Data analyzed from the European Space Agency's Planck space telescope revealed new information about the age and composition of the universe (one of Physics World’s “Top 10 Breakthroughs of the Year”).

          The South Pole Telescope made the first detection of a subtle twist in light from the Cosmic Microwave Background.

          The Materials Project, one of NERSC’s most popular Science Gateways, was featured as a “world changing idea” in a November 2013 Scientific American cover story, “How Supercomputers Will Yield a Golden Age of Materials Science.”

 

The demands for larger and more detailed simulations, massive numbers of simulations, and the explosion in the size and number of experimental data sets mean the there is no end in sight to the need for NERSC resources.. This talk will describe NERSC’s strategy for bringing together big computing and big data in the next decade to achieve big science.

 

Back to Session IV

Scalable Simulations of Multiscale Physics

 

Paul F. Fischer

Mathematics and Computer Science Division, Argonne National Laboratory, USA

 

Current high-performance computing platforms feature million-way parallelism

and it is anticipated that exascale computers will feature billion-way concurrency. This talk explores the potential of computing at these scales with a primary focus on fluid flow and heat transfer in application areas that include nuclear energy, combustion, oceanography, vascular flow, and astrophysics.  Following Kreiss and Oliger (72), we argue that high-order methods are essential for efficient simulation of transport phenomena at petascale and beyond.  We demonstrate that these methods can be realized at costs equivalent to those of low-order methods having the same number of gridpoints.  We further show that, with care, efficient multilevel solvers having bounded iteration counts will scale to billion-way concurrency.

Using data from leading-edge platforms over the past 25 years, we analyze the scalability of (low- or high-order) domain decomposition approaches to predict parallel performance on exascale architectures.  The analysis sheds light on the expected scope of exascale physics simulations and provides insight to design requirements for future algorithms, codes, and architectures.

 

Back to Session X

Networking materials data

 

Ian Foster

Computation Institute Argonne National Laboratory and Dept. of Computer Science, The University of Chicago, USA

 

The US Materials Genome Initiative seeks to develop an infrastructure that will accelerate advanced materials development and deployment. The term Materials Genome suggests a science that is fundamentally driven by the systematic capture of large quantities of elemental data. In practice, we know, things are more complex—in materials as in biology. Nevertheless, the ability to locate and reuse data is often essential to research progress. I discuss here three aspects of networking materials data: data publication and discovery; linking instruments, computations, and people to enable new research modalities based on near-real-time processing; and organizing data generation, transformation, and analysis software to facilitate understanding and reuse.

 

Back to Session I

Returning to Java Grande: High Performance Architecture for Big Data

 

Geoffrey Fox

Community Grid Computing Laboratory, Indiana University, USA

 

Here we use a sample of over 50 big data applications to identify characteristics of data intensive applications and to deduce needed runtime and architectures. We propose a big data version of the famous Berkeley dwarfs and NAS parallel benchmarks as the kernel big data applications. We suggest that one must unify HPC with the well known  Apache software stack that is well used in modern cloud computing and surely is most widely used data processing framework in the “real world”.  We give some examples including clustering, deep-learning and multi-dimensional scaling. This work suggests the value of a high performance Java (Grande) runtime that supports simulations and big data.

 

Back to Session I

Parallelizing Data Analytics

 

Geoffrey Fox

Community Grid Computing Laboratory, Indiana University, USA

 

We discuss a variety of large scale optimization/data analytics including deep learning, clustering, image processing, information retrieval, collaborative filtering and dimension reduction. We describe parallelization challenges and nature of kernel operations. We cover both batch and streaming operations and give some measured performance on both MPI and MapReduce frameworks.

 

Back to Session IX

UberCloud - from Project to Product

 

Wolfgang Gentzsch

The UberCloud and EUDAT, Germany

 

On Thursday June 28, 2012, during the last HPC Workshop in Cetraro, the UberCloud HPC Experiment has been announced at http://www.hpcwire.com/2012/06/28/the_uber-cloud_experiment/.

The day before, during breakfast on the beautiful terrace of Hotel San Michele, Tom Tabor and Wolfgang Gentzsch crafted the announcement, and Geoffrey Fox was the first to register for an HPC Experiment.

 

Since then, over 1500 organizations and individuals registered and participated in 148 experiments exploring HPC, CAE, Bio, Finance, and Big Data in the Cloud. Compendium 1 and 2 appeared with more than 40 case studies reporting about benefits, challenges, and lessons learned from porting and running engineering and scientific applications in the Cloud.

And since then the UberCloud online community and marketplace platform has been founded; and more in preparation.

 

This presentation provides a status of the UberCloud Experiment, and the online community and marketplace platform, discusses challenges and lessons learned, and presents several case studies.

 

Back to Session VIII

Towards High-Level Programming for Many-Cores

 

Sergei Gorlatch

Universitaet Muenster, Institut für Informatik, Germany

 

Application development for modern high-performance systems with many cores, i.e., comprising multiple Graphics Processing Units (GPUs) and multi-core CPUs, currently exploits low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs.

In this paper, we advocate a high-level programming approach for such systems, which relies on the following two main principles:

 

a) the model is based on the current OpenCL standard, such that programs remain portable across various many-core systems, independently of the vendor, and all low-level code optimizations can be applied;

b) the model extends OpenCL with three high-level features which simplify many-core programming and are automatically translated by the system into OpenCL code.

 

The high-level features of our programming model are as follows:

 

1) memory management is simplified and automated using parallel container

data types (vectors and matrices);

2) an automatic data (re)distribution mechanism allows for implicit data movements between GPUs and ensures scalability on multiple GPUs;

3) computations are conveniently expressed using parallel algorithmic patterns (skeletons).

 

The well-defined skeletons allow for formal transformations of SkelCL programs which can be used both in the process of program development and in the compilation and optimization phase.

We demonstrate how our programming model and its implementation are used to express parallel applications on one- and two-dimensional data, and we report first experimental results to evaluate our approach in terms of programming effort and program size, as well as target performance.

 

Back to Session III

The Exascale Architecture

 

Richard Graham

Mellanox, Sunnyvale, CA, USA

 

Exascale levels of computing pose many system- and application- level computational challenges.  Mellanox Technologies, Inc. as a provider of end-to-end communication services is progressing the foundation of the InfiniBand architecture to meet the exascale challenges.  This presentation will focus on recent technology improvements which significantly improve InfiniBand’s scalability, performance, and ease of use.

 

Back to Session X

Functional models for early vision circuits from first principles

 

Bart M. ter Haar Romeny

Eindhoven University of Technology, Eindhoven, NL / Northeastern University, Shenyang, China

Dept. of Biomedical Engineering – Biomedical Image Analysis

 

There are many approaches to model functional understanding of early vision mechanisms, by (large-scale) numerical simulations, by neuro-computational mathematical modeling, by plasticity learning rules, pattern recognition paradigms, among others.

This presentation will focus on geometrical models for the visual front-end: the lowest level (V1) is considered as a geometry inference engine, with its extensive filterbanks, with a gradual increase in functional complexity to higher level operations up to V4 (Azzopardi, Petkov) with non-local topological models.

It is of interest to study the emergence and presence of known receptive fields and their interactions with a first principled(axiomatic) approach.We will discuss in detail how the optimal aperture can be modeled as a Gaussian, from a minimum entropy requirement, and how the diffusion equation, first discussed by Koenderink,emerges.This PDE givesrise to a new model for the center-surround receptive fields in the retina as filters that only signal back as being interesting those locations that respond to local variations in receptive field size; We discuss how Gaussian derivative kernels may emerge, from a PCA analysis of eigen patches of an image, generating robust differential operators both for invariant shape detection and the measurement of color differential structure.

We discuss in detail the optimal regularization properties of the Gaussian multi-scale differential operator receptive fields, as an instant of Tichonov regularization.Making the requirements locally adaptive leads to non-linear diffusion PDE’s, inspired by the strong cortico-thalamic feedback.

The famous pinwheel structure may be modeled as stacks of multi-orientation filtered outputs, so-called orientation scores. Assuming invertibility as first principle, a new robust family of wavelets is found to generate these scores uniquely, which are similar to but not the same as Gabor kernels. These new ‘spaces’ give ample opportunities for more contextual image analysis operations, like denoising branching vessel patterns, and enhancement and analysis of multiply crossing brain tracts from diffusion tensor imaging.

 

Short bio:

Prof. Bart M. terHaarRomeny is professor at Eindhoven University of Technology in the Netherlands, heading the Biomedical Image Analysis (BMIA) group in the TU/e Department of Biomedical Engineering. MSc in Applied Physics from Delft University of Technology 1978 and PhD from Utrecht University in 1983. He was Head of Physics at the Utrecht University Hospital Radiology Department and associate professor at the Image Sciences Institute (ISI) of Utrecht University (1989-2001).

He is co-appointed professor at Northeastern University (vice-dean research). His research interests focus on biologically inspired image analysis algorithms, multi-valued 3D visualization, especially brain connectivity,and computer-aided diagnosis (in particular for diabetes), and image guided neurosurgery.

He is President of the Dutch Society for Pattern Recognition and Image Processing, and has been President of the Dutch Society for Biophysics & Biomedical Engineering (1998 – 2002) and the Dutch Society of Clinical Physics (NVKF, 1990-1992).

He initiated the ‘Scale-Space’ conference series in 1997 (now SSVM). He is reviewer for many journals and conferences, and organized several Summer Schools. He is an awarded teacher, and a frequent keynote lecturer. Prof. Romeny is Senior Member of IEEE, Board member of IAPR, registered Clinical Physicist of NVKF, and partner in the Chinese Brainnetome consortium.

 

Back to Session VI

AIST Super Green Cloud: A build-once-run-everywhere high performance

computing platform

 

Dr. Takahiro Hirofuchi

Senior Researcher, Information Technology Research Institute,

National Institute of Advanced Industrial Science and Technology (AIST), Japan

 

AIST Super Green Cloud (ASGC) is a high performance computing Cloud platform built on a supercomputer at National Institute of Advanced Industrial Science and Technology (AIST). It aims at providing users with fully-customizable and highly-scalable high performance computing environments by taking advantages of advanced virtualization technologies and resource management mechanisms. Thanks to virtual machine technologies, users obtain their own virtualized supercomputers on physical resource in a build-once-run-everywhere manner; a virtualized supercomputer is portable and able to scale out to other commercial and academic cloud services. To overcome performance overhead incurred by hypervisors, we have developed hypervisor-bypass I/O technologies and integrated them into the Apache Cloud Stack. Although user environments are fully virtualized, performance degradation from bare-metal environments is negligible.

From the viewpoint of the administrative side, ASGC achieves energy-efficient and flexible management of physical resource by means of dynamic server placement. We have developed advanced migration technologies that efficiently work for HPC and enterprise virtual machines involving intensive memory and network I/O. It also enables us to migrate VMs among geographically-distant HPC clouds. In this talk, I will present the overview of the ASGC project and our first experience obtained since launching a service in June 2014.

 

Back to Session VIII

Modelling & Big Data

 

Gerhard Joubert

Technical University Clausthal, Germany

 

The general concept of the scientific method or procedure consists in systematic observation, experiment and measurement, and the formulation, testing and modification of hypotheses. The method applies in practice to the solution of any real world problem. The order in which the two distinct components of the scientific method are applied in practice depends on the nature of the particular problem considered.

In many cases a hypothesis is formulated in the form of a model, for example a mathematical or simulation model. The correctness of a solution of the problem produced by this model is then verified by comparing it with collected data.

Alternatively, observational data are collected without a clear specification that the data could apply to the solution of one or more particular problems. This is, for example, often the case with medical data. In such cases data analytics are used to extract relationships from and detect structures in the (large) data sets. These can then be used to formulate one or more hypotheses, i.e. models, which lead to a deeper insight in the problem(s) considered.

Since the advent of the wide spread interest in so-called Big Data, there is a growing tendency to consider the results obtained through the analysis of large data sets in their own right, supplying satisfactory solutions to particular problems without the need for hypotheses and models. A notion is thus developing that the scientific method is becoming obsolete in the case of Big Data. In this the fact that a deeper understanding of the problem(s) considered may lead to different and more accurate solutions, is ignored.

In this talk the relationship between data and models are shortly outlined and illustrated in the case of a common problem. The limitations of results obtained with data analyses without gaining the insight resulting from appropriate models, which is fundamental to the scientific method, is exemplified. Considering Big Data in the context of the scientific method one can state that the objective should be to gain “Insight, not Data”.

 

Back to Session IX

A Software as A Services based approach to Digital Asset Management for Complex Big-Data

 

Carl Kesselman

University of Southern California, USA

 

Trends in big-data are resulting in large, complex data sets  are  delivered in a wide variety of forms from diverse instruments, locations and sources. Simply keeping track of and managing the deluge of data (i.e. data wrangling) can be overwhelming let alone integrating data into ongoing scientific activities, thus imposing overheads that often slow or even impede producing research results (1). In response, we have developed the Digital Asset Management System (BDAM) with the goal of drastically reducing the amount of time researchers currently spend managing their data rather than extracting knowledge.  This ‘iPhoto” for big data is deliver via software as a service method and we have demonstrated its use in a variety of big-data management problems.  I will describe the motivation and architecture of our system and illustrate with an example from biomedical science.

 

Back to Session IX

A lower bound to energy consumption of an exascale computer

 

Ludek Kucera

Charles University, Faculty of Mathematics and Physics, Czech Republic

 

Building a computer system with computing power exceeding 1 exaflops is one of the greatest challenges of the modern computer technology. There are two very important problems on the way to such a system - money and energy. Extrapolating the known data about Titan-2, the second most powerful system to date (and the largest for which the data are published), namely 17.59 petaflops (LINPACK), the cost of 97 million $ and 8.2 MW power, we get 5.5 billion $ and 466 MW for an exaflops machine. Both values are feasible, but there is a consensus that the future exascale computer should be much more efficient.

 

The approach of the present paper is to find ways to get the lower bound to achievable power to understand how about 500 MW of power is (or can be) distributed to different tasks performed by the computer.

 

The first step is to figure put how much energy is necessary to perform $10^{18}$ floating point 64 bit multiplications, excluding any other operations like getting the operand from a cache or a memory. A very rough estimation can be based on the fact that the standard $O(n^2)$ multiplication algorithms couple each bit of the first operand with each bit of the second operand, and hence we can expect about at least 4000 bit operations per one floating point multiplications. Taking into account that the recent semiconductor technology requires about 1 femtoJoule per a voltage level change of one bi-state element and it can be assumed that about one half of the bi-state elements of a multiplier changes their state, the lower bound (under the recent technology) of one floating point 64 bit multiplication can be set to about 2 picoJoule ($10^{-12}$ J). It follows that $10^{18}$ multiplications per second would need more than 2 MW. This value is substantially less than 1 percent of the extrapolated 500 MW, and therefore it would be very important to know where the remaining power disappears and whether there are ways to decrease this overhead.

 

The next step is to investigate the traffic between multipliers (and other arithmetic units) and cache and/or memory. Assuming a feasible value of 1 femtoJoule for transferring one bit along 1 cm of a wire within a 22 nm technology chip, and taking into account that we need to transport only 64 bits, we could see that in-chip feeding and draining of multipliers by data would match the multiplication energy only if the average path of the operands is about 64 cm, which is much more than distances within a chip. Thus, it seems that in-chip communication power would be only a small fraction of the multiplication power.

 

The present paper is a report on an on-going research (better to say about a starting research) and our next goal is to investigate the energy requirements for communication among different chips (that ranges from data transfer between two neighboring chips of one board to a transfer from one corner of the building to the opposite corner, but always includes a passage through chip pins and their drivers that is much more energy hungry than in-chip communication). This problem is very more complex, because different problems have quite different locality, i.e. the amounf of long-distance data traffic within a computer system.

 

The conclusion is that the arithmetic operations and communication of arithmetic circuits with the on-chip cache accouns (or could account) for a very small part of the computing power of a recent supercomputer and this opens a wide potential space for energy savings by reducing the computing overhead and off-chip communication. To achieve this goal we have to understand more energy requirements of different overhead activities of recent supercomputers.

 

Back to Session V

Big Data Technologies

 

Marcel Kunze

Forschungsgruppe Cloud Computing, Germany

 

The speech addresses the technical foundations and non-technical framework of Big Data. A new era of data analytics promises tremendous value on the basis of cloud computing technology. Can we perform predictive analytics in real-time? Can our models scale to rapidly growing data? The “Smart Data Innovation Lab”  at KIT addresses these challenges by supporting R&D projects to be carried out in close cooperation between industry and science (http://www.sdil.de/en/). Some practical examples as well as open research questions are discussed.

 

Back to Session II

EPiGRAM - Towards Exascale Programming Models

 

Erwin Laure

KTH Royal Institute of Technology, Sweden

 

Exascale computing is posing many challenges, including the question of how to efficiently program systems exposing hundreds of millions of parallel activities. The Exascale Programming Models (EPiGRAM) project is addressing this challenge by improving one of the most widely used programming models, message passing, considering also the impact of PGAS approaches. In this talk we will discuss the exascale programming challenge, motivate our choices of message passing and PGAS, discuss initial findings on scalability limits, and propose directions to overcome them.

 

Back to Session IV

Programming Challenges in Future Memory Systems

 

John Leidel

Micron Technology, Inc.

 

Given the recent hurdles associated with the pursuit of power-performance scaling in traditional microprocessors, we have witnessed a resurgence in research associated with the overall memory hierarchy.

The traditional symbiotic relationship of fast, multi-level caches and larger, DRAM-based main memories has given way to complex relationships between software-managed scratchpads, high-bandwidth memory protocols and the use of non-volatile memories in the shared address space.

Furthermore, manufacturing technologies such as through-silicon-via [TSV] techniques have begun to blur the lines between processors and memory devices.

The result is a significant challenge for those constructing compiler, runtime, programming model and application technology to address the ever-increasing heterogeneity of future system architectures.

In this talk, we outline the pitfalls and potential solutions for the programming challenges of future system architectures based upon highly diverse memory technologies.

 

Back to Session II

Creating the HPC Infrastructure for the Human Brain Project

 

Thomas Lippert

Juelich Supercomptuing Centre, Germany

 

The Human Brain Project, one of two European flagship projects, is a collaborative effort to reconstruct the brain, piece by piece, in multi-scale models and their supercomputer-based simulation, integrating and federating giant amounts of existing information and creating new information and knowledge about the human brain. A fundamental impact on our understanding of the human brain and its diseases as well as on novel brain-inspired computing technologies is expected.

 

The HPC Platform will be one of the central elements of the project. Including major European supercomputing centres and several universities, its mission is to build, integrate and operate the hardware, network and software components of the supercomputing and big data infrastructures from the cell to full-scale interactive brain simulations, with data management, processing and visualization.

 

In my contribution, I will discuss the requirements of the HBP on HPC hardware and software technology. These requirements follow the multi-scale approach of the HBP to decode the brain and recreate it virtually. On the cellular level, hardware-software architectures for quantum mechanical ab-initio molecular dynamics methods and for classical molecular dynamics methods will be included in the platform. On the level of the full-scale brain simulation, on the one hand, a development system to “build” the brain by integration of all accessible data distributed worldwide as well as for tests and evaluation of the brain software is foreseen, and, on the other hand, a system that acts as the central brain simulation facility, eventually allowing for interactive simulation and visualization of the entire human brain. Additionally, the brain needs to be equipped with the proper sensory environment, a body, provided by virtual robotics codes developed on a suitable hardware system.  It is expected that the human brain project can trigger innovative solutions for future exascale architectures permitting hierarchical memory structures and interactive operation.

 

Back to Session VI

The Fortissimo HPC-Cloud: an enabler for engineering and manufacturing SMEs

 

Guy Lonsdale

Scapos AG, Germany

 

The Fortissimo project1 is funded Fortissimo is funded under the European Commission’s 7th Framework Programme and is part of the I4MS (ICT Innovation for Manufacturing SMEs) group of projects within the Factories of the Future initiative. Fortissimo’s principal objective is to enable engineering and manufacturing SMEs to benefit from the use of HPC and digital simulation. While the importance of advanced simulation to the competitiveness of both large and small companies is wellestablished its broader industrial take up requires supportive actions, since digital simulation requires significant computing power and specialised softwaretools and services.Generally, large companies, which have a greater pool of skills and resources, find accessto advanced simulation easier than SMEs which can neither afford expensive High Performance Computing (HPC) equipment nor the licensing cost for the relevant tools. This means that SMEs are not able to take advantageof advanced simulation, even though it can clearly make them more competitive. The goal of Fortissimo is to overcome this impasse through the provision of simulation services and tools running on a cloud infrastructure.A “one-stop-shop” will greatly simplify access to advanced simulation, particularly to SMEs. This will make hardware, expertise, applications, visualisation and tools easily available and affordable on a pay-per-use basis. In doing this Fortissimo will create and demonstrate a viable and sustainable commercial ecosystem.

Fortissimo will be driven by end-user requirements: approximately 50 business-relevant application experiments will serve to develop, test and demonstrate both the infrastructure and the Fortissimo marketplace. 20 experiments – all HPC-cloud-based – have already been defined in fields such as the simulation of continuous and die casting, environmental control and urban planning, and aerodynamic design and optimisation. A second wave of 22 new experiments is set to commence as a result of the first open call, which broadens the engineering and manufacturing applications from an extended range of industrial sectors. Amongst the new partners who will be joining the project are a total of 34 SMEs, solving core business challenges with the support of application-domain and HPC experts and resources.

 

1 FP7 Project 609029,project title: FORTISSIMO: Factories of the Future Resources, Technology, Infrastructure and Services for Simulation and Modelling.

 

Back to Session VIII

Accelerating the Multifrontal Method

 

Robert Lucas

Computational Sciences Division, University of Southern California

Information Sciences Institute, USA


Solving large sparse systems of equations is the computational bottleneck in many scientific and engineering applications, and the multifrontal method is often the preferred approach as it turns the sparse problem into a graph of small dense ones, suitable for today’s virtual memory hierarchies.  Most of the arithmetic operations performed factoring these dense frontal matrices can be performed with calls to highly tuned mathematical kernels such as DGEMM.  In the last few years, the multifrontal method’s dense factorization kernels have been ported to accelerators, first GPUs and more recently the Intel Phi.  This talk will present one strategy for porting the frontal matrix factorization kernel from a standard, multicore host to both of the accelerators.  We will compare and contrast the programming effort required, as well as discuss the resulting performance gains.

 

Back to Session III

Adiabatic Quantum Annealing Update

 

Robert Lucas

Computational Sciences Division, University of Southern California

Information Sciences Institute, USA

 

Two years ago, at HPC 2012, it was reported that the USC - Lockheed Martin Quantum Computing Center was the first to take delivery of a D-Wave open system, adiabatic quantum annealer.  In the past two-and-a-half years, we have studied its behavior, found its performance to be consistent with a quantum annealer, and to exhibit entanglement. We have also benchmarked its performance against algorithms running on classical computers, developed a tool for programming the machine, and begun studying applications for it.  This talk will give a brief overview of this body of research.

 

Back to Session IV

Challenges and Roadmap for Scientific Applications at Exascale

 

Stefano Markidis

KTH Royal Institute of Technology, Stockholm, Sweden

 

One of the main challenges for scientific applications on exascale supercomputers will be to deal with an unprecedented amount of parallelism. Current studies of exascale machines estimate a total of billion processes available for computation. Collective communication and synchronization of such large number of processes will constitute bottleneck, and system noise might amplified by non-blocking communication making the use of exascale machine ineffective. In this talk, we discuss the challenges that have been identified by studying the communication kernels of two applications from the EC-funded EPiGRAM project: the Nek5000 and iPIC3D codes. The Nek5000 is a Computational Fluid Dynamics Fortran code based on the spectral element method to solve the Navier-Stokes equations in the incompressible limit. The iPIC3D code is a C++ Particle-in-Cell code used for space-weather. Both application communication kernels are based on MPI. The communication kernels of and simulations of the two applications on very large number of processes varying the interconnection network latency and bandwidth are presented. A roadmap to bring these applications to exascale will be finally discussed.

 

Back to Session X

Quantum versus Thermal annealing (or D-wave versus Janus):

seeking a fair comparison

 

Victor Martin-Mayor

Departamento de Fisica Teorica, Universidad Complutense de Madrid, Spain & Janus Collaboration

 

The D-Wave Two machine presumably exploits quantum annealing effects to solve optimization problems.

One of the preferred benchmarks is the search of ground-states for spin-glasses, one of the most computationally demanding problems in Physics. In fact, the Janus computer has been specifically built for spin-glasses simulations. Janus has allowed to extend the time scale of classical simulations by a factor of 1000, thus setting the standard to which D-wave should be measured.

Whether D-wave’s quantum annealing achieves a real speed-up as compared to the classical (thermal) annealing or not is a matter of investigation.a Di
culties are twofold. On the one hand, the number of q-bits (503), although a World record, is still small. On the other hand, the 503 q-bits are disposed in a particular topology

(the chimera lattice), where hard-to-solve instances are extremely rare for a small system.

However, general physical arguments (temperature chaos) tell us that, given a large enough number of q-bits, rough free-energy landscapes should be the rule, rather than the exception. The rough landscape implies that simulated annealing will get trapped in local minima and thus be ine
fficient in the search of the ground-state.

Therefore, the meaningful question is: how well quantum-annealing performs in those instances displaying temperature-chaos?

For a small number of q-bits, temperature-chaos is rare but fortunately not nonexistent. The talk describes a program to identify chaotic instances with only 503 q-bits by means of state of the art methods (multi spin coding, parallel tempering simulations and the related stochastic time-series analysis). The performance of both thermal annealing (Janus) and quantum-annealing (D-wave) will be assessed over this set of samples.

This is joint work with the Janus Collaboration and Itay Hen (Information Sciences Institute, USC).

 

a By thermal annealing we refer to a refined form of simulated annealing named parallel tempering (also known as exchange Monte Carlo).

 

Back to Session III

Convergence of Extreme Big Data and HPC - managing the memory hierarchy and data movement the key towards future exascale

 

Satoshi Matsuoka

Global Scientific Information and Computing Center

& Department of Mathematical and Computing Sciences

Tokyo Institute of Technology, Japan

 

Big data applications such as health are, system biology, social networks, business intelligence, and electric power grids, etc., require fast and scalable data analytics capability, posing significant opportunities for HPC, as evidenced by recent attentions to the Graph500 list and the Green Graph500 list. In order to cope with massive capacity requirements of such big data applications, emerging NVM(Non-Volatile Memory) devices, such as Flash, realize low cost high energy-efficiency compared to conventional DRAM devices, at the expense of low throughput and latency, requiring deepening of the memory hierarchy. As such effective abstractions and efficient implementation techniques for algorithms and data structures for big data to overcome the deepening memory hierarchy is becoming essential.

Our recent project JST-CREST EBD (Extreme Big Data) aims to come up with a big data / HPC convergence architecture that provide such algorithms and abstractions. In particular, our objective is to control the deep memory hierarchy and data movement effectively to achieve tremendous boost in performance per resource cost (power, dollars, etc.), which is becoming the dominating metric in future exascale supercomputers as well as big data.

Although we are still in early stages of our research, we have already achieved several results such as novel graph data offloading technique using NVMs for the hybrid BFS (Breadth-First Search) algorithm widely used in the Graph500 benchmark, achieving 4.35MTEPS/Watt on a Scale 30 problem, ranked 4th in the big data category in the Green Graph500 (November 2013).

 

Back to Session I

Prospects for the Monte Carlo Methods in the Million Processor-core Era and Beyond

 

Kenichi Miura, Ph.D.

National Institute of Informatics, Tokyo, Japan

 

With the recent trends in HPC architecture toward higher and higher degree of parallelism, some of the traditional numerical algorithms need to be reconsidered due to their poor scalability. This is due to (1) declining memory capacity per CPU core, (2) limits in the inter-processor communication bandwidth (bytes/flop ratio), (3) fault-tolerance issue, etc.

The Monte Carlo Methods (MCMs) are numerical methods based on statistical sampling, and were systematically studied in the early days of computing in various application areas.

The MCMs have properties which match very well with the above-mentioned trends in HPC architecture. They are: (1) an inherently high degree of parallelism, (2) small memory requirements per processor,

(3) natural resilience due to their statistical approach.

In the MCMs, good pseudo-random number generator is essential. Their requirements are:(1)long period,(2) good statistical characteristics (both within a processor and across the processors),(3) fast generation of random sequence, and (4) low overhead in initializing the generators on each processor. Last requirement is particularly important as the number of processor is very large. I have been proposing one such generator called MRG8 with such properties.

In my talk, I will discuss the prospects and issues of the MCMs as well as propose widening application areas in the million-core era and beyond.

 

Back to Session III

Scaling lessons from the software challenges in Anton, a special-purpose machine for molecular dynamics simulation

 

Mark Moraes

Head Engineering Group, D.E. Shaw Research, Pelham, NY, USA

 

Anton is a massively parallel special-purpose machine that accelerates molecular dynamics simulations by orders of magnitude compared with the previous state of the art. The hardware architecture, many of the algorithms, and all of the software were developed specifically for Anton. We exploit the highly specialized nature of the hardware to gain both weak and strong scaling of molecular dynamics simulations.  However, the tradeoffs involved in specialized hardware create many interesting challenges for software. This talk describes how we tackled these challenges and the techniques used to achieve efficient scaling of simulation performance on two successive generations of a special-purpose machine.

 

Back to Session VII

The Big Gift of Big Data

 

Valerio Pascucci

Director, Center for Extreme Data Management Analysis and Visualization

Professor, School of Computing, University of Utah

Laboratory Fellow, Pacific Northwest National Laboratory

CTO, ViSUS Inc. (visus.net)

We live in the era of Big Data, which is characterized by an unprecedented increase in information generated from many sources including (i) massive simulations for science and engineering, (ii) sensing devices for experiments and diagnostics, and (iii) records of people's activities left actively or passively primarily on the web. This is a gift to many disciplines in science and engineering since it will undoubtedly lead to a wide range of new amazing discoveries. This is also changing the nature of scientific investigation, combining theory, experiments, and simulations with the so-called “fourth paradigm” of data-driven discovery. Interdisciplinary work, traditionally confined to a few heroic efforts, will become a central requirement in most research activities since progress can only be achieved with a combination of intense computing infrastructures and domain expertise. For example, computational efforts can only validated in the proper application context, such as in climate modeling, biology, economics, and social sciences, to name just a few.

 

In this talk I will discuss some of the experiences in Big Data discovery that have driven the activities at the Center for Extreme Data Management Analysis and Visualization. The technical work, for example, is systematically reshaped to involve integrated use of multiple computer science techniques such as data management, analytics, high performance computing, and visualization. Research agendas are motivated by grand challenges, for instance, the development of new, sustainable energy sources, or predicting and understanding climate change. Furthermore, such efforts rely on multi-disciplinary partnerships with teams that extend across academia, government laboratories and industry. Overall, the great opportunities of Big Data research come with great challenges in terms of how we reshape scientific investigation, collaborations across disciplines, and how we educate the future generations of scientists and engineers.

 

BIOGRAPHY

Valerio Pascucci is the founding Director of the Center for Extreme Data Management Analysis and Visualization (CEDMAV) of the University of Utah. Valerio is also a Faculty of the Scientific Computing and Imaging Institute, a Professor of the School of Computing, University of Utah, and a Laboratory Fellow, of PNNL. Before joining the University of Utah, Valerio was the Data Analysis Group Leader of the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, and Adjunct Professor of Computer Science at the University of California Davis. Valerio's research interests include Big Data management and analytics, progressive multi-resolution techniques in scientific visualization, discrete topology, geometric compression, computer graphics, computational geometry, geometric programming, and solid modeling. Valerio is the coauthor of more than one hundred refereed journal and conference papers and has been an Associate Editor of the IEEE Transactions on Visualization and Computer Graphics.

 

Back to Session IX

Scalability in the Cloud: HPC Convergence with Big Data in Design, Engineering, Manufacturing

 

David Pellerin

AWS High Performance Computing

 

HPC in the cloud is now a common approach for research computing, and is becoming mainstream as well for commercial HPC across industries. Cloud enables the convergence of big data analytics, for example Hadoop, with scalable and low-cost computing, allowing commercial firms to perform analytics that would not otherwise be practical. Cloud allows customized, application-specific HPC clusters to be created, used, and decommissioned in a matter of minutes or hours, enabling entirely new kinds of parallel applications to be run against widely diverse datasets. Cloud also enables global collaboration for distributed teams, using remote visualization and remote login methods. Scalability in the cloud provides HPC users with large amounts of computing power, but also requires new thinking about application fault-tolerance, cluster right-sizing, and data storage architectures. This session will provide an overview of current cloud capabilities and best practices for scalable HPC and for global collaboration, using specific use-cases in design, engineering, and manufacturing.

 

Back to Session II

Overcoming the Cloud heterogeneity: from uniform interfaces and abstract models to multi-cloud platforms

 

Dana Petcu

Research Institute e-Austria Timisoara and West University of Timisoara, Romania

 

The Cloud heterogeneity is manifested today in the set of interfaces of the services from different Public Clouds, in the set of services from the same provider, in the software or hardware stacks, in the terms of performance or user quality of experience. This heterogeneity is favoring the Cloud service providers allowing them to be competitive in a very dynamic market especially by exposing unique solutions. However such heterogeneity is hindering the interoperability between these services and the portability of the applications consuming the services, as well as the seamless migration of legacy applications towards Cloud environments.

Various solutions to overcome the Cloud heterogeneity have been investigated in the last half decade, starting from the definition of uniform interfaces (defining the communalities, but loosing the specificities) and arriving to domain specific languages (allowing to conceive applications at a Cloud-agnostic level, but introducing a high overhead).

We discuss the existing approaches and their completeness from the perspective of building support platforms for Multi-Clouds, identifying the gaps and potential solutions. Concrete examples are taken from recent experiments in developing Multi-Cloud platform prototypes: mOSAIC [1] in what concerns uniform interfaces, MODAClouds [2] in what concerns domain specific languages, SPECS [3] in what concerns user’s quality of experience, HOST [4] in what concerns the usage of Cloud HPC services.

References:

[1] D. Petcu, B. Di Martino, S. Venticinque, M. Rak, T. Máhr, G. Esnal Lopez, F. Brito, R. Cossu, M. Stopar, S. Sperka, V. Stankovski, Experiences in Building a mOSAIC of Clouds, Journal of Cloud Computing: Advances, Systems and Applications 2013, 2:12, on-line May 2013, doi: 10.1186/2192-113X-2-12

[2] D. Ardagna, E. Di Nitto, G. Casale, D. Petcu, P. Mohagheghi, S. Mosser, P.Matthews, A. Gericke, C. Ballagny, F. D’Andria, C.S. Nechifor, C. Sheridan, MODACLOUDS: A Model-Driven Approach for the Design and Execution of Applications on Multiple Clouds, Procs. MISE 2012, 50-56, doi: 10.1109/MISE.2012.6226014

[3] M. Rak, N. Suri, J. Luna, D. Petcu, V. Casola, U. Villano, Umberto, Security as a Service Using an SLA-Based Approach via SPECS, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom), vol.2, no., 1-6, 2-5 Dec. 2013, doi: 10.1109/CloudCom.2013.165

[4] M.E. Frincu, D. Petcu, Resource Management for HPC on the Cloud, In: Emmanuel Jeannot, Julius Zilinskas (eds.), High-Performance Computing on Complex Environments, ISBN: 978-1-118-71205-4, June 2014, 303-323

 

Back to Session VIII

Harp: Collective Communication on Hadoop

 

Judy Qiu

Indiana University, U.S.A.

 

Many scientific applications are data-intensive. It is estimated that organizations with high end computing infrastructures and data centers are doubling the amount of data that they are archiving every year. Harp extends MapReduce, enabling HPC-Cloud Interoperability. We show how to apply Harp to support large-scale iterative computations that are common in many important data mining and machine learning applications. Further one needs additional communication patterns than those made familiar in MapReduce. This leads us to the Map-Collective programming model that captures the full range of traditional MapReduce and MPI features, which is built on a new communication abstraction, Harp, that is integrated with Hadoop. It provides optimized communication operations on different data abstractions such as arrays, key-values and graphs. With improved expressiveness and performance on collective communication, Hadoop/Harp can do in-memory communication between Map tasks without writing intermediate data to HDFS, enabling simultaneous support of applications from HPC to Cloud. Our work includes a detailed performance evaluation on IaaS or HPC environments such as FutureGrid and the Big Red II supercomputer, and provides useful insights to both frameworks and applications.

Short Bio

Dr. Judy Qiu is an assistant professor of Computer Science in the School of Informatics and Computing at Indiana University and an assistant director of the school’s Digital Science Center. Her research interests are parallel and distributed systems, cloud computing, and high-performance computing. Qiu leads the SALSA project, involving professional staff and Ph.D. students from the School of Informatics and Computing. SALSA focuses on data-intensive computing at the intersection of cloud and multicore technologies with an emphasis on scientific data analysis applications by using MapReduce and traditional parallel computing approaches. Her research has been funded by NSF, NIH, Microsoft, Google and Indiana University. She is a recipient of NSF CAREER Award in 2012 and Indiana University Trustees Award for Teaching Excellence in 2013-2014.

 

Link to website: http://www.cs.indiana.edu/~xqiu/

 

 

Back to Session VIII

Beowulf meets Exascale System Software: A horizontally integrated framework

 

Mark Seager

CTO for HPC Systems, INTEL, Santa Clara, California, USA

 

The challenges of system software for Exascale systems requires a codesign between hardware, system software and applications in order to address the massive parallelism, enhance RAS, and parallel IO.  However, the current approach to Linux cluster software with its roots in Beowulf clusters, is simply an coordinated collection of elements/services (e.g., resource management, RAS, parallel file system). We discuss a new scalable system software approach that enables individual system software elements/services to leverage horizontally integrated set of system software services.  This allows Intel Architecture ecosystem contributors to enhance components to add value and leverage this infrastructure to more efficiently produce a robust system software stack for Exascale class systems.  We also discuss how this system software stack can be scaled down to the broader HPC space.

 

Back to Session IV

 

Coordination programming for self-tuning: the challenge of a heterogeneous open environment

 

Alex Shafarenko

Compiler Technology and Computer Architecture Group, University of Hertfordshire, United Kingdom

 

Kahn process networks (KPNs) are a convenient basis for coordination programming as they effectively isolate a component, which encapsulates a self-contained algorithm, from the network that connects, controls and synchronises the components’ activities. The strength of KPNs is in their clear parallel semantics, defined by Kahn’s famous paper of 1974. Their weakness is that they by themselves do not suggest relative priorities of the vertex processes for maximising performance on a finite system and that if they are not regulated properly, they may demand unlimited resources and deadlock when they don’t get them.

 

This talk will present the project AstraKahn which introduces a coordination model for KPN self-regulation (also known as self-tuning), based on the concept of positive (supply) and negative (demand) pressures, proliferation of vertices under large supply and demand and fragmentation of messages in order to improve the pressure situation. This is achieved by a map/reduce classification of KPN vertices, the use of synchro-vertices in the form of pressure-controlled FSM and the idea of vertex morphism enabling message fragmentation. The talk will touch upon the AstraKahn coordination language that consists of three layers: a Topology and Progress Layer, a Constraint Aggregation Layer and a Data and Instrumentation Layer, which we maintain in combination will achieve a large degree of automatic self-tuning of coordination programs.

 

The practical significance of this work is in its attempt to propose an HPC approach for a non-autonomous computing platform, such as public clouds, where a priori optimisations may be inhibited and where run-time adaptation may be the only way to tackle platform unpredictability.

 

Back to Session III

Exascale Programming Challenges: Adjusting to the new normal for computer architecture

 

John Shalf

Lawrence Berkeley Laboratory, USA

 

For the past twenty-five years, a single model of parallel programming (largely bulk-synchronous MPI), has for the most part been sufficient to permit translation of this into reasonable parallel programs for more complex applications. In 2004, however, a confluence of events changed forever the architectural landscape that underpinned our current assumptions about what to optimize for when we design new algorithms and applications.  We have been taught to prioritize and conserve things that were valuable 20 years ago, but the new technology trends have inverted the value of our former optimization targets. The time has come to examine the end result of our extrapolated design trends and use them as a guide to re-prioritize what resources to conserve in order to derive performance for future applications. This talk will describe the challenges of programming future computing systems. It will then provide some highlights from the search for durable programming abstractions more closely track track emerging computer technology trends so that when we convert our codes over, they will last through the next decade.

 

Back to Session V

Extreme-scale Architecture in the Neo-Digital Age

 

Thomas Sterling

School of Informatics and Computing

& CREST Center for Research in Extreme Scale Technologies

Indiana University, USA

 

As the end of Moore’s Law nears and the challenges of exascale and beyond are present on the horizon, architecture innovation becomes the new frontier in performance growth in the next decade. Already such changes to conventional practices as multi-core and GPU streaming processing have explored some incremental advances in computer architecture. But even now, architecture is dominated by traditional forms and assumptions inherited from the classical age of von Neumann architecture. These are that the ALU is the precious resource and that architecture is based on variations of the core as the basic building block of all parallel architecture. Ironically, if anything, this presiding premise is exactly wrong with the floating-point units among the least expensive in terms of logic area and energy consumption. Yet, the memory hierarchy, cache structures, speculative execution, and reservation-stations are all intended to maximize their utilization. Instead, data movement and memory capacity are the dominant deployment and operational costs. Utilization of network and data access bandwidth is of principal importance. As much as this stresses conventional wisdom, ALUs should be optimized for low energy and high availability, suggesting wide arrays of these to make best use of other, more precious resources. This presentation will investigate possible alternatives to the use of basic logic, storage, and communication building blocks avoiding the conventional pitfalls of classical von Neumann derived architectures and exploiting the new architecture opportunities afforded by the Neo-Digital age based on nano-scale technologies in the next decade. Among these are new forms of prior art in research. These include advances on cellular architecture, processor in memory, and systolic arrays. In each case, the density of arithmetic units is allowed to increase, but with lower utilization to increase storage bandwidth and reduce effective latency. Ironically, lower utilization will actually improve performance and energy efficiency to enable the exascale performance regime. Such architectures will incorporate intrinsic mechanisms for asynchrony and locality management even in the context of a global address space. This talk will conclude with the anticipated limitations on the Neo-Digital age imposed by fundamental physics of atomic density, speed of light, and Boltzmann’s constant.

 

Back to Session VII

 

Programming Script-based Data Analytics Workflows on Clouds

 

Domenico Talia

Department of Computer Engineering, Electronics, and Systems

University of Calabria, Italy

 

Data analysis applications often involve large datasets and are complex software systems in which multiple data processing tools are executed in a coordinated way. Data analysis workflows are effective in expressing task coordination and they can be designed through visual and script-based frameworks.

End users prefer the visual approach whereas expert developers use workflow languages to program complex applications more effectively.

For providing Cloud users with an effective script-based data analysis workflow formalism, we designed the JS4Cloud language based on the well-known JavaScript language, so that users do not have to learn a new programming language from scratch. JS4Cloud implements a data-driven task parallelism that spawns ready-to-run tasks to Cloud resources. It exploits implicit parallelism that frees users from duties like work partitioning, synchronization and communication.

In this talk we present how JS4Cloud has been integrated within the Data Mining Cloud Framework (DMCF), a system supporting the scalable execution of data analysis workflows on Cloud platforms. We also describe how data analysis workflows are modeled as JS4Cloud scripts and executed in parallel on DMCF to enable scalable data processing on Clouds.

 

Back to Session VIII

Extreme Scale Computing Advances & Challenges in PIC Simulations

 

William M. Tang

Princeton University, Dept. of Astrophysical Sciences, Plasma Physics Section

Fusion Simulation Program, Princeton Plasma Physics Lab.

& Princeton Instit. for Computational Science and Engineering, USA

 

The primary challenge in extreme scale computing is to translate the combination of the rapid advances in super-computing power together with the emergence of effective new algorithms and computational methodologies to help enable corresponding increases in the realism and reduction in “time-to-solution” of advanced scientific and engineering codes used to model complex physical systems.

If properly validated against experimental measurements/observational data and verified with mathematical tests and computational benchmarks, these codes can greatly improve high-fidelity predictive capability for the behaviour of complex systems -- including fusion-energy-relevant high temperature plasmas. The nuclear fusion energy project “NuFuSE” within the International G8 Exascale Program has made excellent progress in developing advanced codes for which computer run-time and problem size scale very well with the number of processors on massively parallel many-core supercomputers. A good example is the effective usage of the full power of modern leadership class computational platforms at the petascale and beyond to produce nonlinear particle-in-cell (PIC) gyrokinetic simulations which have accelerated progress in understanding the nature of plasma turbulence in magnetically-confined high temperature plasmas. Illustrative results provide great encouragement for being able to include increasingly realistic dynamics in extreme-scale computing campaigns with the goal of enabling predictive simulations characterized by unprecedented physics realism.

This presentation will review progress and discuss open issues associated with new challenges encountered in extreme scale computing for the fusion energy science application domain. Some illustrative examples will be presented of the algorithmic advances for dealing with low memory per core extreme scale computing challenges on prominent supercomputers worldwide. These include advanced homogeneous systems -- such as the IBM-Blue-Gene-Q systems (“Mira” at the Argonne National Laboratory & “Sequoia” at the Lawrence Livermore National Laboratory in the US), the Fujitsu K Computer at the RIKEN AICS, Japan – as well as leading heterogenous systems – such as the GPU-CPU hybrid system “Titan” at the Oak Ridge National Laboratory, the world-leading TH-2 CPU/Xeon-Phi system in Guangzhou, China, and the new GPU-accelerated XC30 (“Piz Daint”) system at the CSCS in Switzerland.

 

Back to Session X

From Sensors to Supercomputers, Big Data Begins With Little Data

 

Eric Van Hensbergen

ARM Research, USA

 

Semiconductor technology has made it possible to build a 32 bit microprocessor subsystem with a sensor and a network connection on a piece of silicon the size of a spec of dust, and do it almost for free. As a result, over the next 5-10 years pretty much anything that can benefit from being connected to the internet soon will be.

This talk will explore the technology challenges across the spectrum from sensors to supercomputers.

It will discuss the opportunity for collaboration on distributed system architectures which create a platform for deploying intelligence capable of coping with the lifecycle of information as it goes from little data to big data to valuable insights.

 

Back to Session IX

Thermomechanical Behaviour and Materials Damage:

Multimillion-Billion Atom Reactive Molecular Dynamics Simulations

 

Priya Vashishta

Collaboratory for Advanced Computing and Simulations

Departments of Chemical Engineering & Materials Science, Physics & Astronomy, and Computer Science, University of Southern California, USA

 

Advanced materials and devices with nanometer grain/feature sizes are being developed to achieve higher strength and toughness in materials and greater speeds in electronic devices. Below 100nm, however, continuum description of materials and devices must be supplemented by atomistic descriptions.  Reactive molecular dynamics simulations are used to investigate critical issues in the area materials damage using structural and dynamical correlations, and reactive processes in metals and glasses under extreme conditions.

In this talk I will discuss three simulations.

Embrittlement of Nickel by Sulfur Segregation-Induced Amorphization:

Impurities segregated to grain boundaries of a material essentially alter its fracture behavior. A prime example is sulfur segregation-induced embrittlement of nickel, where an observed relation between sulfur-induced amorphization of grain boundaries and embrittlement remains unexplained. Here, 48 million-atom reactive-force-field molecular dynamics simulations (MD), run for 45 million core hours using 64,000 cores at IBM BlueGene/P, provide the missing link.

Nanobubble Collapse in Water – Billion-atom Reactive Molecular Dynamics Simulations:

Cavitation bubbles readily occur in fluids subjected to rapid changes in pressure. We use billion-atom reactive molecular dynamics simulations on full 163,840-processor BlueGene/P supercomputer run for 67 million core hours to investigate chemical and mechanical damages caused by shock-induced collapse of nanobubbles in water near silica surface. Collapse of an empty nanobubble generates high-speed nanojet, resulting in the formation of a pit its volume is found to be directly proportional to the volume of the nanobubble. The gas-filled bubbles undergo partial collapse and consequently the damage on the silica surface is mitigated.

Hydrogen-on-demand Using Metallic Alloy Particles in Water – 16,611-atom Quantum Molecular Dynamics Simulations:

Hydrogen production from water using Al particles could provide a renewable energy cycle. However, its practical application is hampered by the low reaction rate and poor yield. Our large quantum molecular dynamics simulations involving up to 16,611 atoms on 786,432-processor BlueGene/Q show that orders-of-magnitude faster reactions can be achieved by alloying Al particles with Li. A key nanostructural design is identified where water-dissociation and hydrogen-production require very small activation energies. Furthermore, dissolution of Li atoms into water produces a corrosive basic solution that inhibits the formation of a reaction-stopping oxide layer on the particle surface, thereby increasing the hydrogen yield.

Acknowledgement: This research was supported by the DOE-BES-Theoretical Condensed Matter Physics Grant Number DE-FG02-04ER46130. The computing resources for this research were provided by a DOE—Innovative and Novel Computational Impact on Theory and Experiment (INCITE) award. We thank Paul Messina, Nichols Romero, William Scullin and the support team of Argonne Leadership Computing Facility for scheduling 67 million core hours in units of 60-hour blocks on full BlueGene/P for the simulations and Joseph Insley, Argonne Visualization Center for helping with the visualization.

 

Back to Session X

Clouds for meteorology, two cases study

 

Jose Luis Vazquez-Poletti

Dpt. de Arquitectura de Computadores y Automática

Universidad Complutense de Madrid, Spain

 

Meteorology is among the most promising areas that benefit from cloud computing, due to its intersection with society’s critical aspects. Executing meteorological applications involves HPC and HTC challenges, but also economic ones. 

The present talk will introduce two cases with different backgrounds and motivations, but always sharing a similar cloud methodology: the first one is about weather forecasting in the context of planet Mars exploration; and the second one deals with data processing from weather sensor networks, in the context of an agriculture improving plan at Argentina.

 

Back to Session VIII

Medical practice: diagnostics, treatment and surgery in supercomputer centers

 

Vladimir V. Voevodin

Moscow State University

 

We are used to the extraordinary capabilities of supercomputers and expect them to be applied in practice accordingly. These are reasonable expectations; however, reality isn’t so optimistic. It is widely known how inefficient supercomputers can be when applied to actual problems: only a tiny share of their peak performance is usually achieved. However, few people are aware of efficiency levels demonstrated by a supercomputer center as a whole. While the efficiency factor of a supercomputer executing a particular application is comparable to that of a steam locomotive (about 5%), the total efficiency of a supercomputer center constitutes only a small fraction of it. The losses that occur at each stage may be insignificant, but they accumulate over processing the entire user jobs flow and multiply considerably. Every detail of this process is important and all elements of supercomputer centers should be taken into consideration, starting from jobs queue policy and jobs flow structure and ending with system software configuration and efficient operation of engineering infrastructure.

Is it possible to significantly increase the efficiency of supercomputer centers without investing huge amounts of money into their upgrades? Yes, it is. But it requires permanent diagnostics, very similar to medical ones, and if necessary – intensive treatment and emergency surgery on supercomputer systems.

 

Back to Session X

System Software for PEZ(Y)

 

Robert Wisniewski

Chief Software Architect Exascale Computing

INTEL Corporation

 

Many discussions have occurred on system software needed to support exascale.  In recent talks I described a simultaneous revolutionary and evolutionary approach, but observed that exascale is just a step along the PEZ (Peta-Exa-Zetta) path.  HPC researchers study both capacity computing (cloud-like workloads) and capability computing (classical HPC and big science workloads).  Some researchers see a convergence of these needs and technologies.  Part of that answer will lie in the needs of the future of each of these paths' this talk will explore the capability aspect.  A zettascale or yottascale capability machine would open new frontiers in contributions computing could make to fundamental science including breakthroughs in biological sciences that are just beginning to benefit from capability-class HPC machines.  It is important to recognize that although there need not, and applications do not want, a discontinuity, there needs to be a significant shift in software models.  In this talk, I will present a peak into what system software might look like for a post-exascale machine.  While I will base this view of system software on potential hardware projections from one side and application's needs on the other, I will be liberal in my assumptions about where each of these could get to.  The talk will therefore be a window into what system software might look like to support capability machines in the post-exascale era.

 

Back to Session VII