| 
   Computing the Brain 
    
  Katrin
  Amunts 
  Human Brain Project,
  Chair of The Science and Infrastructure Board / Scientific Research Director,
  Institute for Neuroscience and Medicine, Structural and Functional
  Organisation of the Brain, Forschungszentrum Juelich GmbH, Juelich, Germany 
  and 
  Institute for Brain Research, Heinrich Heine
  University Duesseldorf, University Hospital Duesseldorf, Germany 
    
  Neuroscience research is covering a large spectrum
  of empirical and theoretical approaches, with an increasing demand in terms
  of computation, data handling, analytics and storage. This is true in
  particular for research targeting the human brain, with its incredible high
  number of neurons that form complex networks. Demands for HPC arise from a
  heterogeneous portfolio of neuroscientific approaches: (i)
  studying human brains at cellular and ultra-structural level with Petabytes
  for a single brain data set; (ii) the reconstruction of the human connectome,
  i.e., the totality of connections of nerve cells and their ways to interact;
  (iii) modeling and simulation at different levels
  of brain organization with finer details, to make models biologically more
  realistic; (iv) the analysis of large, multimodal data sets using workflows
  that employ deep learning and machine learning, simulation, graph based
  interference etc.; (v) large cohort studies including many thousands of
  subjects with data from neuroimaging, behavioral
  tests, genetics, biochemical markers etc. to disclose relationships between
  genes, environment and the brain, while considering large variations between
  subjects. To address such diverse requirements, the Human Brain Project is
  developing its digital research infrastructure EBRAINS, and FENIX, as the HPC
  platform for Computing the Brain. 
    
  Back to Session VI 
   | 
 
 
  | 
   The Role of EOFS and the Future of Parallel
  File Systems for HPC 
    
  Frank
  Baetke 
  EOFS European Open File System Organization
  formerly Hewlett Packard Enterprise, Munich, GERMANY 
    
  Parallel File Systems are an essential part of
  almost all HPC-Systems. The need for that architectural concept originated
  with the growing influence and finally complete takeover of the HPC spectrum
  by parallel computers either defined as clusters or as MPPs following the
  nomenclature of the TOP500. 
  A major step towards parallel file systems for the
  high end of HPC systems occurred around 2001 when the US DoE funded the
  development of such an architecture called LUSTRE as part of the ASCI path
  forward project with external contractors that included Cluster File Systems
  Inc. (CFS), Hewlett Packard and Intel. The acquisition of the assets of CFS
  by SUN Microsystems in 2007 and its subsequent acquisition by ORACLE in 2010
  led to a crisis with the cancellation of future work on LUSTRE. 
  To save the assets and ensure further development a
  few HPC-focused individuals founded organizations as EOFS, OpenSFS and Whamcloud to move
  LUSTRE to a community-driven development. In 2019 EOFS and OpenSFS jointly acquired the LUSTRE trademark, logo and
  related assets. 
  In Europe development of a parallel file system
  focused on HPC began in 2005 at the German Fraunhofer
  Society also as an open-source project dubbed FhGFS
  (Fraunhofer Global Parallel File System) that has
  now - driven by its spin-off ThinkParQ and renamed BeeGFS – gained worldwide recognition and visibility. 
  In contrast to community-driven open-source concepts
  several proprietary parallel file systems are widely in use with IBM’s
  Spectrum Scale – originally known as GFPS – having the lead in HPC with a
  significant number of installations at the upper ranks in the TOP500 list.
  But there are other interesting proprietary concepts with specific areas of
  focus and related benefits. 
  In this talk we will review the role of EOFS
  (European Open File Systems - SCE) and provide hints towards the future of
  the HPC parallel file systems landscape. 
    
  Note: all trademarks are the property of their
  respective owners 
    
  Back to Session II 
   | 
 
 
  | 
   High Performance
  Computing for Bioinformatics 
    
  Mario
  Cannataro 
  Department of Medical and Surgical Sciences,
  University of Catanzaro, ITALY 
    
  Omics sciences (e.g. genomics, proteomics, and interactomics) are gaining an increasing interest in the
  scientific community due to the availability of novel, high throughput
  platforms for the investigation of the cell machinery, and have a central
  role in the so called P4 (predictive, preventive, personalized and
  participatory) medicine and in particular in cancer research. High-throughput experimental platforms and clinical
  diagnostic tools, such as next generation sequencing, microarray, mass
  spectrometry, and medical imaging, are producing overwhelming volumes of
  molecular and clinical data and the storage, integration, and analysis of
  such data is today the main bottleneck of bioinformatics pipelines. 
  This Big Data trend in bioinformatics, poses new challenges
  both for the efficient storage and integration of the data and for their
  efficient preprocessing and analysis. Thus,
  managing omics and clinical data requires both support and spaces for data
  storing as well as algorithms and software pipelines for data preprocessing, integration, analysis, and sharing.
  Moreover, as it is already happening in several application fields, the
  service-oriented model enabled by the Cloud is more and more spreading in
  bioinformatics. 
  Parallel Computing offers the computational power to
  face this Big Data trend, while Cloud Computing is a key technology to hide
  the complexity of computing infrastructures, to reduce the cost of the data
  analysis task, and to change the overall model of biomedical and
  bioinformatics research towards a service-oriented model. 
  The talk introduces main omics data (e.g. gene
  expression and SNPs, mass spectra, protein-protein interactions) and
  discusses some parallel and distributed bioinformatics tools and their
  application in real case studies in cancer research, as well as recent
  initiatives to exploit international Electronic Health Records to face
  COVID-19, including: 
  
   - preprocessing and mining of microarray data for
       pharmacogenomics applications,
 
   - biological networks alignment, community
       detection, and applications in brain connectome,
 
   - integrative bioinformatics, integration
       and enrichment of biological pathways,
 
   
  analysis of international
  Electronic Health Records to face the COVID-19 pandemic: the Consortium for
  Clinical Characterization of COVID-19 by EHR (4CE). 
    
  Short bio 
  Mario Cannataro is a Full Professor of computer engineering at the
  University "Magna Græcia" of Catanzaro,
  Italy, and the Director of the Data Analytics Research Center.
  His current research interests include parallel computing, bioinformatics,
  health informatics, artificial intelligence. He published three books and
  more than 300 papers in international journals and conference proceedings.
  Mario Cannataro is a Senior Member of ACM, ACM SIGBio,
  IEEE, BITS (Bioinformatics Italian Society) and SIBIM (Italian Society of
  Biomedical Informatics). 
    
  Back to Session VI 
   | 
 
 
  | 
   High Performance Computing and Cloud
  Computing, key enablers for digital transformation 
    
  Carlo Cavazzoni 
  Leonardo S.p.A., Head of Cloud Computing,
  Director High Performance Computing Lab, Chief Technology & Innovation
  Office, Genova, Italy 
    
  HPC for many industries is becoming a key technology
  for the competitiveness and digitalization. In particular every industry will
  have to applying digital technologies determining a paradigm shift, the value
  of goods/services move from the exploitation of physical systems to the
  exploitation of knowledge. AI, computer simulations and other digital
  technologies are tools to help mining out more knowledge, faster. The more
  the better. 
  In this scenario HPC is a tool, a tool to process BigData, enable AI and perform simulations, and more
  often it is combined with Cloud Computing services (virtual machines and
  containers especially popular for BigData and AI
  frameworks). 
  HPC can accelerate the creation of value thanks to
  the capability to generate new knowledge and perform more accurate
  predictions (e.g. developing Digital Twins). 
  Whereas computational capacity is a fundamental resource
  for competitiveness, row computational capacity alone is useless, the
  software is the key to unlock the value. This is why, beside the
  supercomputer, we need to create the capability to implement applications or
  improve the already existing one. 
  In the talk I will present how Leonardo with the key
  contribution of the HPC Lab, intends to implement leadership software tools
  and computational infrastructure able to add value to the company, and
  ultimately transform it, to be more digital than physical. 
    
  Back to Session II 
   | 
 
 
  | 
   A domain wall encoding of variables for
  quantum annealing 
    
  Nicholas
  Chancellor 
  Department of Physics, Durham University,
  United Kingdom 
    
  I will discuss the application of a relatively new
  method for encoding discrete variables into binary ones on a quantum annealer. This encoding is based on the physics of domain
  walls in frustrated Ising spin chains and can be
  shown to perform better than the traditional one-hot encoding both in terms
  of efficiency of embedding the problems into quantum annealers
  and in terms of performance on actual devices. 
    
  I first review this encoding strategy and contrast
  it with the one-hot technique as well as numerical evidence of an embedding
  advantage following the discussion in [Chancellor Quantum Sci. Technol. 4
  045004]. Next, I will discuss recent experimental evidence presented in
  [Chen, Stollenwerk, Chancellor arXiv:2102.12224]
  which shows that this encoding can lead to a large improvement in the
  performance of quantum annealers on coloring problems, this improvement is large enough that
  using the domain-wall encoding on an older generation D-Wave 2000Q quantum
  processing unit yields superior result to using the one-hot encoding on a
  more advanced Advantage QPU, indicating that better encoding can make a large
  difference in performance. Additionally I will touch on some more recent work
  inolving the quadratic assignment problem. Finally,
  I will discuss the importance of this encoding for the simulation of quantum
  field theories directly on trasverse Ising model quantum annealers
  [Abel, Chancellor, Spannowsky Phys. Rev. D 103,
  016008]. 
    
  Back to Session V 
   | 
 
 
  | 
   Quantum Computer, dream or reality? 
    
  Daniele Dragoni 
  Leonardo
  S.p.A., High Performance Computing Lab, Genova, ITALY 
    
  As the miniaturization of semiconductor transistors
  approaches its physical limits, the performance increase of microprocessors
  is slowing down to the point that the operating frequency increase from one
  chip generation to the next it is almost nil. In the attempt to catch up with
  Moore's law, the computing architectures have evolved to take full advantage
  of parallelization schemes: vectorial, multicore,
  GPU, etc ... Following the current trends, however,
  it will never be possible to efficiently address selected computational tasks
  of practical interest. 
  In this scenario, it is clear that any hypothesis
  that leads to a radical overcoming of the limitations of digital computing is
  highly interesting. In particular, quantum-computing devices that operate by
  exploiting the principles of quantum physics are believed to provide a route
  for such a paradigmatic shift. In practice, however, building a quantum
  computer is an engineering challenge unmatched (comparable to nuclear fusion).
  To date, quantum computers have been built with very few logical units (the
  Qubits), and it is not yet fully clear if and when they will prove superior
  to digital computers in concrete problems of practical interest. In the
  presentation we will introduce the research streams that we are following in
  the quantum computing domain, from quantum inspired up to real quantum
  applications that we will test on simulated and physical quantum computers.
  Finally, we will analyze all the elements and steps
  to consider for the introduction of quantum computing within our own
  infrastructure. 
    
  Back to Session V 
   | 
 
 
  | 
   HPTMT
  High-Performance Data Science and Data Engineering based on Data-parallel
  Tensors, Matrices, and Tables 
    
  Geoffrey
  Fox 
  School of Informatics, Computing and
  Engineering, Department of Intelligent Systems Engineering; Digital Science Center and Data Science program 
  University of Indiana Bloomington, IN, USA 
    
  The continuously increasing size and complexity of
  data-intensive applications demand high-performance but still highly usable
  environments. We integrate a set of ideas developed in various data science
  and data engineering frameworks. They employ a set of operators on specific
  data abstractions that include vectors, matrices, arrays, tensors, graphs,
  and tables. Our key concepts are inspired by systems like MPI, HPF
  (High-Performance Fortran), NumPy, Pandas, Spark, Modin, PyTorch, TensorFlow, RAPIDS(NVIDIA), and OneAPI
  (Intel). Further, it is crucial to support different languages in everyday
  use in the Big Data arena, including Python, R, C++, and Java. We note the
  importance of Apache Arrow and Parquet for enabling language-agnostic high
  performance and interoperability. We identify the fundamental principles of
  operator-based architecture for data-intensive applications that are needed
  for performance and usability success. We illustrate these principles by a
  discussion of examples using our software environments, Cylon
  and Twister2 that embody HPTMT. We describe the results of benchmarks that
  are being developed by MLCommons (MLPerf). 
    
  Back to Session I 
   | 
 
 
  | 
   Deep Learning for Time Series 
    
  Geoffrey
  Fox 
  School of Informatics, Computing and
  Engineering, Department of Intelligent Systems Engineering; Digital Science Center and Data Science program 
  University of Indiana Bloomington, IN, USA 
    
  We show that one can study several sets of sequences
  or time-series in terms of an underlying evolution operator which can be
  learned with a deep learning network. We use the language of geospatial time
  series as this is a common application type but the series can be any
  sequence and the sequences can be in any collection (bag) - not just those in
  Euclidean space-time -- as we just need sequences labeled
  in some way and having properties dependent on this label (position in
  abstract space). This problem has been successfully tackled by deep learning
  in many ways and in many fields. Comparing deep learning for such time series
  with coupled ordinary differential equations used to describe multi-particle
  systems, motivates the introduction of an evolution operator that describes
  the time dependence of complex systems. With an appropriate training process,
  we interpret deep learning applied to spatial time series as a particular
  approach to finding the time evolution operator for the complex system giving
  rise to the spatial time series. Whimsically we view this training process as
  determining hidden variables that represent the theory (as in Newton’s laws)
  of the complex system. We apply these ideas to predicting Covid
  infections and Earthquake occurrences. 
    
  Back to Session IV 
   | 
 
 
  | 
   An automated, self-service, multi-cloud
  engineering simulation platform for a complex living heart simulation
  workflow with ML 
    
  Wolfgang
  Gentzsch 
  The UberCloud,
  Germany and Sunnyvale, CA, USA 
  Co-authors: Daniel Gruber, Director of
  Architecture at UberCloud; Yaghoub
  Dabiri, Scientist at 3DT Holdings; Julius Guccione, Professor of Surgery at the UCSF Medical Center, San Francisco; and Ghassan
  Kassab, President at California Medical Innovations
  Institute, San Diego. 
    
  Many
  companies are finding that replicating an existing on-premise
  HPC architecture in the Cloud does not lead to the desired breakthrough
  improvements. With this in mind, from day one, a fully automated,
  self-service, and multi-cloud Engineering Simulation Platform has been
  developed, resulting in highly increased productivity of the HPC engineers,
  significantly improving IT security, reducing cloud costs and administrative
  overhead to a minimum, and maintaining full control for engineers and
  corporate IT over their HPC cloud environment and corporate assets.  
    
  This
  platform has been implemented on Google Cloud Platform (GCP) for 3DT Holdings
  for their highly complex Living Heart Project and Machine Learning, with the
  final result of reducing simulation times from many hours per simulation to
  just a few seconds of highly accurate prediction of an optimal medical device
  placement during heart surgery. 
    
  The team
  ran 1500 simulations needed to train the ML algorithm. The whole simulation
  process took place as a multi-cloud approach, with all computations running
  on 1500 HPC clusters in Google GCP, and management, monitoring, and
  health-checks orchestrated from Azure Cloud and performed through SUSE’s
  Kubernetes management platform Rancher. 
    
  Technology
  used: UberCloud Engineering Simulation Platform,
  multi-node HPC-enhanced Docker containers, Kubernetes, SUSE Rancher, Dassault Abaqus, Tensorflow, preemptible GCP
  instances (c2_standard_60), managed Kubernetes clusters (GKE), Google Filestore, Terraform, and DCV remote visualization. 
    
  Back to Session VII 
   | 
 
 
  | 
   Dynamic Decentralized Workload Scheduling for
  Cloud Computing 
    
  Vladimir
  Getov 
  Distributed and Intelligent Systems Research
  Group, School of Computer Science and Engineering, University of Westminster,
  London, UNITED KINGDOM 
    
  Virtualized frameworks typically form the foundations
  of Cloud systems, where Virtual Machine (VM) instances provide execution
  environments for a diverse range of applications and services. Modern VMs
  support Live Migration (LM) – a feature wherein a VM instance is transferred
  to an alternative node dynamically without stopping its execution. This paper
  presents a detailed design of a decentralized agent-based scheduler, which
  can be used to manage workloads within the computing cells of a Cloud system
  using Live Migration. Our proposed solution is based on the concept of
  service allocation negotiation, whereby all system nodes communicate between
  themselves, and the scheduling logic is decentralized. The presented
  architecture has been implemented, with multiple simulation runs using
  real-world workloads.  
  The focus of this research is to analyze
  and evaluate the LM transfer cost which we define as the total size of data
  to be transferred to another node for a particular migrated VM instance.
  Several different virtualization approaches are categorized with a shortlist
  of candidate VMs for evaluation. The paper highlights the major areas of the
  LM transfer process – CPU registers, memory, permanent storage, and network
  switching – and analyzes their impact on the volume
  of information to be migrated which includes the VM instance with the
  required libraries, the application code and any data associated with it.
  Then, using several representative applications, we report experimental
  results for the transfer cost of LM for respective VM instances. We also
  introduce a novel Live Migration Data Transfer (LMDT) formula, which has been
  experimentally validated and confirms the exponential nature of the LMDT
  process. Our estimation model supports efficient design and development
  decisions in the process of analyzing and building
  modern Cloud systems based on dynamic decentralized workload scheduling. 
    
  Back to Session VII 
   | 
 
 
  | 
   Practical Quantum Computing 
    
  Victoria
  Goliber 
  Senior Technical Analyst, D-Wave Systems
  Inc., GERMANY 
    
  D-Wave's mission is to unlock the power of quantum
  computing for the world. We do this by delivering customer value with
  practical quantum applications for a diverse set of problems. Join us to
  learn about the tools that D-Wave has available and how they are impacting business
  around the world. We’ll conclude with a live demo showing how easy it is to
  get started and build quantum applications today. 
    
  Back to Session V 
   | 
 
 
  | 
   AIOps as a future of Cloud Operations 
    
  Odej Kao 
  Distributed and Operating Systems Research
  Group and Einstein Center Digital Future, Berlin
  University of Technology, GERMANY 
    
  Artificial Intelligence for IT Operations (AIOps) combines big data and machine learning to replace
  a broad range of IT operations tasks including availability, performance, and
  monitoring of services. By exploiting log, tracing, metric, and network data,
  AIOps aim at detecting service and system anomalies
  before these turn into failures. This talk will present the developed  methods 
  for automated anomaly detection, root cause analysis, for remediation,
  optimization, and for automated initiation of self-stabilizing activities.
  Extensive experimental measurements and Initial results show that AIOps platforms can help to reach the required level of
  availability, reliability, dependability, and serviceability for future
  settings, where latency and response times are of crucial importance. While
  the automation is mandatory due to the system complexity and the criticality
  of a QoS-bounded response, the measures compiled
  and deployed by the AI-controlled administration are not easily understood or
  reproducible. Therefore, explainable actions taken by the automated system is
  becoming a regulatory requirement for future IT infrastructures. Finally, we
  describe a developed and deployed system named logsight.ai in order to
  provide an example for the design of the corresponding architecture, tools,
  and methods. 
    
    
  CV Odej Kao 
    
  Odej Kao is full professor at Technische
  Universität Berlin, head of the research group on distributed
  and operating systems, chairman of the Einstein Center
  Digital Future with 50 interdisciplinary professors, and chairman of the DFN
  board. Moreover, he is the CIO of the university and principal investigator
  in the national centers on Big Data and on
  Foundations of Learning and Data. Dr. Kao is a
  graduate from the TU Clausthal (master computer
  science in 1995, PhD in 1997, habilitation in
  2002). In 2002 Dr. Kao joined the Paderborn
  University as associated professor for operating systems and director of the center for parallel computing. In 2006, he moved to
  Berlin and focused his research on AIOps, big data
  / streaming analytics, cloud computing, and fault tolerance. He has published
  over 350 papers in peer-reviewed proceedings and journals. 
    
  Back to Session VII 
   | 
 
 
  | 
   Building
  the European EuroHPC Ecosystem 
    
  Kimmo Koski 
  CSC - Finnish IT Center
  for Science, Espoo, Finland 
    
  LUMI is one of the
  three pre-exaflop systems acquired by EuroHPC Joint Undertaking (JU), which after being fully
  operational, will provide more than 500 PF of computing power for European
  research and industry. The system, hosted by CSC, the Finnish IT Center for Science, and run by a consortium of 10
  European countries, will install in two phases: first parts during the summer
  of 2021 and rest at the end of 2021.  
  LUMI will be an
  essential part of European HPC collaboration and one of the main platforms
  for European research. It will fit together with other EuroHPC
  sites, such as pre-exascale systems of Spain and
  Italy, five petascale systems and future exaflop installations, all forming together the European
  HPC ecosystem. 
  The talk introduces
  LUMI and its role in European HPC Ecosystem, and discusses various aspects
  motivating the architectural and functional choices when building an
  international collaboration with heterogeneous resources placed in different
  countries. Talk discusses the different needs and priorities of research,
  which are driving the decisions aiming at the optimal performance for the
  most challenging applications. Benefits obtained from research and industry,
  are addressed. 
  The talk discusses
  the eco-efficient and low carbon footprint operational environment and its
  impact for the European Green Deal development. In addition, it analyzes the opportunities for developing the European
  competitive advantage through intensive collaboration in building the
  European EuroHPC Ecosystem. 
    
  Back to Session III 
   | 
 
 
  | 
   Exascale Programming Models for Heterogeneous Systems 
    
  Stefano Markidis 
  KTH Royal Institute of Technology, Computer
  Science Department / Computational Science and Technology Division, Stokholm, SWEDEN 
    
  The first exascale
  supercomputer is likely to be online any time soon. A production-quality
  programming environment, probably based on existing dominant programming
  interfaces such as MPI, needs to be in place to support application
  deployment and development on the exascale
  machines. The most striking characteristic of an exascale
  supercomputer will be the amount of available parallelism required to achieve
  the exaFLOPS barrier with the High-Performance
  LINPACK benchmark. The first exascale machine will
  provide programmers with between 100 million and a billion threads. The
  second characteristic of an exascale supercomputer
  will be the high level of heterogeneity of the compute and memory subsystems.
  This fact drastically increases the number of FLOPS per Watt, making it
  feasible to build an exascale machine using a power
  budget in the order of 20-100 MW range. Low-power microprocessors,
  accelerators, and reconfigurable hardware are the main design choice for an exascale machine. This heterogeneity in the compute will
  also be accompanied by deeper memory hierarchies comprised of
  high-performance and low-power memory technologies. 
  While it is not yet evident what would be the best
  programming approach for developing applications on large-scale heterogeneous
  supercomputers, a consensus in the HPC community is that programmers need an
  extension of the dominant programming models to ensure the programmability of
  new architectures. In this talk, I introduce the EPiGRAM-HS
  project to address the heterogeneity challenge of programming exascale supercomputers. EPiGRAM-HS
  improves their programmability, extending MPI and GASPI to exploit
  accelerators, reconfigurable hardware, and heterogeneous memory systems. In
  addition, EPiGRAM-HS takes MPI and GASPI at the
  core of the software stack and extending the programmability and productivity
  with additional software layers. 
    
  Back to Session III 
   | 
 
 
  | 
   Brain-like Machine Learning and HPC 
    
  Stefano Markidis 
  KTH Royal Institute of Technology, Computer
  Science Department / Computational Science and Technology Division, Stokholm, SWEDEN 
    
  The modern deep learning methods based on
  backpropagation have surged in popularity and have been used in multiple
  domains and application areas. At the same time, there are other machine
  learning algorithms inspired by modern models of brain neocortex functioning.
  Unlike traditional deep learning, these models use a localized (and
  unsupervised) brain-like rule to determine the neural network’s weights and
  biases. The learning of the graph connection weights complies with Hebb’s
  postulate: learning depends only on the available local information provided
  by the activities of the pre- and post-synaptic units. A Hebbian
  learning rule allows higher scalability and better utilization of HPC
  systems. In this talk, I introduce brain-like machine learning and describe
  the Bayesian Confidence Propagation (BCPNN) Neural Network, one of the most
  established brain-inspired machine learning methods. I also discuss the
  potential for these emerging methods to exploit HPC systems and present an
  HPC BCPNN implementation, called StreamBrain, for
  CPUs, GPUs, and FPGAs. 
    
  Back to Session IV 
   | 
 
 
  | 
   Data Analytics and AI on HPC Systems: About
  the impact on Science 
    
  Wolfgang
  Nagel 
  Center for Information Services and High Performance Computing, Technische Universitaet
  Dresden, GERMANY 
    
  Methods and techniques of Artificial Intelligence
  (AI) and Machine Learning (ML) have been investigated for decades in pursuit
  of a vision where computers can mimic human intelligence. In recent years,
  these methods have become more mature and, in some specialized applications,
  evolved to super-human abilities, e.g. in image recognition or in games such
  as Chess and Go. Nonetheless, formidable questions remain in the area of
  fundamental algorithms, training data usage, or explainability
  of results, to name a few. The AI – and especially the ML – developments have
  been boosted by powerful HPC-Systems, mainly driven by the GPU architectures
  built in in many if not most HPC systems these days.  The talk will explain the challenges of
  integrating AI and HPC into “monolithic” systems. And it will provide a broad
  overview of what impact the availability of such systems will have on the
  science system. 
  Back to Session I 
   | 
 
 
  | 
   Cloud Native Supercomputing 
    
  Gilad Shainer 
  NVIDIA, Menlo Park, CA, USA 
    
  High performance computing and Artificial
  Intelligence are the most essential tools fueling
  the advancement of science. In order to handle the ever growing demands for
  higher computation performance and the increase in the complexity of research
  problems, the world of scientific computing continues to re-innovate itself
  in a fast pace. The session will review the recent development of the cloud
  native supercomputing architecture, aiming on bringing together bare metal
  performance and cloud services. 
    
  Back to Session II 
   | 
 
 
  | 
   Towards an Active Memory Architecture for
  Time-Varying Graph-based Execution 
    
  Thomas
  Sterling 
  School of Informatics, Computing and
  Engineering and AI Computing Systems Laboratory, Indiana University,
  Bloomington, IN, USA 
    
  A diversity of new GPUs and special purpose devices
  are under development and in production for significant acceleration of a
  wide range of Machine Learning and AI applications. For such problems
  exhibiting high data reuse, these emerging platforms hold great promise in
  commercial, medical, and defense domains. In
  workflows heavily dependent upon irregular graph structures with rapidly
  changing topologies defined by intra-graph meta-data like links, edges, and
  arcs, a new generation of innovative memory-centric architectures are being
  literally invented, some by entrepreneurs through new start-up companies.
  Integration and tight coupling of memory with support logic has decades of
  prior experiment. The new generation of architecture innovation is being
  pursued to address such challenges as latency hiding, global naming, graph
  processing idioms, and associated overheads for AI, ML, and AMR. Chief among
  these is extreme scalability at the limitations of Moore’s Law and nanoscale
  semiconductor fabrication technology. The Active Memory Architecture (AMA) is
  one possible new class of graph-driven memory-centric architecture. The AMA
  is under development, supported by NASA, to exploit opportunities exposed by
  classic von Neumann architecture cores and advanced concepts for graph
  processing. This address will present the innovative principles being
  explored through the AMA and describe a prototype currently under testing.
  All questions from the audience will be welcome throughout the presentation. 
    
  Brief Biography 
  Thomas Sterling is a Full Professor of Intelligent Systems Engineering
  at Indiana University (IU) serving as Director of the AI Computing Systems
  Laboratory at IU’s Luddy School of Informatics,
  Computing, and Engineering. Since receiving his Ph.D
  from MIT as a Hertz Fellow, Dr. Sterling has
  engaged in applied research in parallel computing system structures,
  semantics, and operation in industry, government labs, and academia. Dr. Sterling is best known as the "father of
  Beowulf" for his pioneering research in commodity/Linux cluster
  computing for which he shared the Gordon Bell Prize in 1997. His current
  research is associated with innovative extreme scale computing through
  memory-centric non von Neumann architecture concepts to accelerate dynamic
  graph processing.  In 2018, he
  co-founded the new tech company, Simultac, and
  serves as its President and Chief Scientist. Dr.
  Sterling was the recipient of the 2013 Vanguard Award and is a Fellow of the
  AAAS. He is the co-author of seven books and holds six patents. Most
  recently, he co-authored the introductory textbook, “High Performance
  Computing”, published by Morgan-Kaufmann in 2017. 
    
  Back to Session I 
   | 
 
 
  | 
   Parallel Runtime Systems for Dynamic Resource
  Management and Task Scheduling 
    
  Thomas
  Sterling 
  School of Informatics, Computing and
  Engineering and AI Computing Systems Laboratory, Indiana University,
  Bloomington, IN, USA 
    
  Runtime systems, principally through software, play
  diverse roles in the management of resources expanding dynamic control and
  filling perceived gaps between compilers and operating systems on the one
  hand and hardware execution on the other. They can add workflow management of
  distributed processing components or more fine-grained supervision for
  optimality of efficiency and scaling through introspection. MPI, OpenMP and other user programming interfaces (e.g., Java,
  Python, Lisp) incorporate some runtime functionality as does SLURM, Charm++
  or Cilk++ to mention a few. Legion, Habanero and
  HPX operate at the intra-application multi-thread level. Detailed experiments
  with HPX-5 explored the potential advantages but also limitations of runtime
  functionality and their sensitivity to application flow control properties.
  This presentation describes and discusses the findings and conclusions of
  this investigation demonstrating the potential improvements in some cases but
  also areas in which they may prove a hindrance due to software overheads with
  little or no gains. The talk concludes by considering future runtimes
  optimized for different objective functions like memory bandwidth or latency
  from the conventional ALU/FPU utilization. Exposed are possible targets for
  hardware mechanisms for greater efficiency and scalability by reduction of
  overhead times. 
    
  Back to Session III 
   | 
 
 
  | 
   Knowing your quantum computer: benchmarking, verification and
  classical simulation at scale 
    
  Sergii Strelchuk 
  Department of Applied Mathematics and
  Theoretical Physics and Centre for Quantum Information and Foundations,
  University of Cambridge, United Kingdom 
    
  To ensure that a quantum device operates as
  expected, we need to check its functioning on two levels. On a lower level,
  we need to map all the noise sources and ensure they do not render our device
  classical. On a higher level, we need to have a practical method to confirm
  that the output to the computational problem produced by the quantum computer
  can be trusted. In this talk, I will explain the core ideas behind these
  tasks and discuss the unexpected role of classical simulability
  which emerges in the above scenarios. 
    
  Back to Session V 
   | 
 
 
  | 
   Quantum computing for natural sciences and
  machine learning applications 
    
  Francesco
  Tacchino 
  Quantum Applications Researcher, IBM Quantum,
  IBM Research – Zurich, Switzerland 
    
  The future of computing is being shaped today around
  rapidly growing technologies, such as quantum and neuromorphic systems, in
  combination with high performance classical architectures. In the coming
  years, these innovative information processing paradigms may radically
  transform and accelerate the mechanisms of scientific discovery, potentially
  opening new avenues of research. 
  In particular, quantum computing could offer
  scalable and efficient solutions for many classically intractable problems in
  different domains including physics, chemistry, biology and medicine, as well
  as optimisation, artificial intelligence and finance. In this talk, I will
  review the state-of-the-art and recent progress in the field, both in terms
  of hardware and software, and present some advanced applications, with a
  focus on natural sciences, materials design and machine learning. 
  Back to Session V 
   | 
 
 
  | 
   Data-Centric Programming for Large-Scale
  Parallel Systems - 
  The DCEx Model 
    
  Domenico
  Talia 
  Department of Computer Engineering,
  Electronics, and Systems and DtoK Lab,  
  University of Calabria, ITALY 
    
  For designing scalable parallel applications,
  data-oriented programming models are effective solutions based on the
  exploitation of local data structures and on the limitation of the amount of
  shared data among parallel processes. This talk discusses the main features
  and the programming mechanisms of the DCEx
  programming model designed for the implementation of data-centric large-scale
  parallel applications. The basic idea of the DCEx
  model is structuring programs into data-parallel blocks to be managed by a
  large number of parallel threads. Parallel blocks are the units of
  distributed-memory parallel computation, communication, and migration in the
  memory/storage hierarchy. Threads execute close to data using near-data
  synchronization according to the PGAS model. A machine learning use case is
  also discussed showing the DCEx features for Exascale programming. 
    
  Back to Session VII 
   |