The Art Of High Performance Computing For Compu... [BETTER]
HPC integrates systems administration (including network and security knowledge) and parallel programming into a multidisciplinary field that combines digital electronics, computer architecture, system software, programming languages, algorithms and computational techniques.[1]HPC technologies are the tools and systems used to implement and create high performance computing systems.[2] Recently[when?], HPC systems have shifted from supercomputing to computing clusters and grids.[1] Because of the need of networking in clusters and grids, High Performance Computing Technologies are being promoted[by whom?] by the use of a collapsed network backbone, because the collapsed backbone architecture is simple to troubleshoot and upgrades can be applied to a single router as opposed to multiple ones.
The Art of High Performance Computing for Compu...
The term is most commonly associated with computing used for scientific research or computational science. A related term, high-performance technical computing (HPTC), generally refers to the engineering applications of cluster-based computing (such as computational fluid dynamics and the building and testing of virtual prototypes). HPC has also been applied to business uses such as data warehouses, line-of-business (LOB) applications, and transaction processing.
High-performance computing (HPC) as a term arose after the term "supercomputing".[3] HPC is sometimes used as a synonym for supercomputing; but, in other contexts, "supercomputer" is used to refer to a more powerful subset of "high-performance computers", and the term "supercomputing" becomes a subset of "high-performance computing". The potential for confusion over the use of these terms is apparent.
Because most current applications are not designed for HPC technologies but are retrofitted, they are not designed or tested for scaling to more powerful processors or machines.[2] Since networking clusters and grids use multiple processors and computers, these scaling problems can cripple critical systems in future supercomputing systems. Therefore, either the existing tools do not address the needs of the high performance computing community or the HPC community is unaware of these tools.[2] A few examples of commercial HPC technologies include:
A list of the most powerful high-performance computers can be found on the TOP500 list. The TOP500 list ranks the world's 500 fastest high-performance computers, as measured by the High Performance LINPACK (HPL) benchmark. Not all computers are listed, either because they are ineligible (e.g., they cannot run the HPL benchmark) or because their owners have not submitted an HPL score (e.g., because they do not wish the size of their system to become public information for defense reasons). In addition, the use of the single LINPACK benchmark is controversial, in that no single measure can test all aspects of a high-performance computer. To help overcome the limitations of the LINPACK test, the U.S. government commissioned one of its originators, Jack Dongarra of the University of Tennessee, to create a suite of benchmark tests that includes LINPACK and others, called the HPC Challenge benchmark suite. This evolving suite has been used in some HPC procurements, but, because it is not reducible to a single number, it has been unable to overcome the publicity advantage of the less useful TOP500 LINPACK test. The TOP500 list is updated twice a year, once in June at the ISC European Supercomputing Conference and again at a US Supercomputing Conference in November.
Computational research often requires resources that exceed those of a personal laptop or desktop computer. High-performance computing (HPC) aggregates the resources from individual computers (known as nodes) into a cluster that works together to perform advanced, specialized computing jobs.
Many academic fields, including epigenetics, geophysics, materials science, engineering, natural language translation, and health sciences, utilize high-performance computing to advance their research beyond what would be possible with a personal computer.
As the amount of data used in research continues to grow with the popularity of such technologies as artificial intelligence (AI) and advanced data analysis, high-performance computing is becoming increasingly necessary for technological advancement.
The Center for Advanced Research Computing launched its new high-performance computing cluster, Discovery, in August 2020. This new cluster includes additional compute nodes and a rebuilt software stack, as well as new system configurations to better serve CARC users. Discovery marks a significant upgrade to CARC's cyberinfrastructure, and the first step in a major, user-focused overhaul of the program. The Discovery cluster consists of 2 shared login nodes and a total of around 21,000 CPU cores in around 600 compute nodes. Of these, over 200 nodes are equipped with graphics processing units (GPUs) witha total of over 180 NVIDIA GPUs available. The typical compute node has dual 8 to 16 core processors and resides on a 56 gigabit FDR InfiniBand backbone.
CARC uses a customized distribution of the Community Enterprise Operating System (CentOS), built using the publicly available RPM Package Manager (RPM). CentOS is a high-quality Linux distribution that gives CARC complete control of its open-source software packages and is fully customized to suit advanced research computing needs, without the need for license fees.
The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of metrics such as execution time, energy consumption, and temperature with consideration of imposed power limits. Control methods include scheduling, DVFS/DFS/DCT, power capping with programmatic APIs such as Intel RAPL, NVIDIA NVML, as well as application optimizations, and hybrid methods. We discuss tools and APIs for energy/power management as well as tools and environments for prediction and/or simulation of energy/power consumption in modern HPC systems. Finally, programming examples, i.e., applications and benchmarks used in particular works are discussed. Based on our review, we identified a set of open areas and important up-to-date problems concerning methods and tools for modern HPC systems allowing energy-aware processing.
Firstly, the matter of appropriate energy and performance metrics has been investigated in the literature [3]. There are several survey papers related to energy-aware high-performance computing but as the field, technology, and features are evolving very rapidly, these lack certain aspects that we present in this paper.
Power- and energy-related analytical models for high-performance computing systems and applications are discussed in detail in [9], with references and contributions in other works, in this particular subarea. Node architecture is discussed, and models considering CPUs, GPUs, Intel Xeon Phis, FPGAs are included. Counter-based models are analyzed. We focus more on methods and tools as well as whole simulation environments that can make use of such models.
In view of the existing reviews of work on energy-related aspects in high-performance computing, the contribution of our work can be considered as the up-to-date survey and analysis of progress in the field including the following aspects:(1)Study of available APIs and tools for energy and power management in HPC(2)Consideration of various target systems such as single devices, multiprocessor systems, cluster, grid, and cloud systems(3)Consideration of various device types including CPUs, GPUs, and also hybrid systems(4)Consideration of variety of optimization metrics and their combinations considered in the literature including performance, power, energy, and temperature(5)Consideration of various optimization methods including known scheduling, DVFS/DFS/DCT but also latest power capping features for both CPUs and GPUs, application optimizations, and hybrid approaches(6)Consideration of applications used for measurements and benchmarking in energy-aware works(7)Tools for prediction and simulation of energy and power consumption in HPC systems(8)Formulation of open research problems in the field based on latest developments and results
In the paper, we have discussed APIs for controlling energy and power aspects of high-performance computing systems incorporating state-of-the-art CPUs and GPUs and presented tools for prediction and/or simulation of energy/power consumption in an HPC system. We analyzed approaches, control methods, optimization criteria, and programming examples as well as benchmarks used in state-of-the-art research on energy-aware high-performance computing. Specifically, solutions for systems such as workstations, clusters, grids, and clouds using various computing devices such as multi- and manycore CPUs and GPUs were presented. Optimization metrics including combinations of execution time, energy used, power consumption, and temperature were analyzed. Control methods used in available approach include scheduling, DVFS/DFS/DCT, power capping, application optimizations, and hybrid approaches. We have finally presented open areas and recommendations for future research in this field.
Powerful compute speeds enable researchers to understand scientific questions at unprecedented levels. From detailed atomic level simulations to massive cosmological studies, researchers use HEC to probe spaces inaccessible by any other experimental method. Scientific insights from supercomputers have driven forward research and technology across industrial sectors and disciplinary fields. Examples include aerospace engineering, drug development, climate science, genomics, and exploration of fundamental particles that make up our universe. From industry to academia, the scientific need for compute power pushes the limits of current supercomputers and continues to drive innovation and development for future high performance computers. 041b061a72