- Ленты заголовков
- Темы
-
Newsmakers
- Army, Pentagon, CIA, FBI Tech.
- Biohacking
- Bitcoin
- Chemical computer
- CyberSex
- Cyborgs
- Elon Musk, Tesla, SpaceX ...
- Energy storage
- Fintech
- Fusion
- Google and Alphabet
- IBM
- Immunotherapy
- Intel
- Laser
- Lockheed
- Molecular
- NASA, ESA
- Nobel
- Space Launch System (NASA)
- SpaceX
- Spy
- Supercomputers
- TechInvestorNews.com
Supercomputers
Scientists are rethinking the universe’s deepest mysteries using numerical relativity, complex computer simulations of Einstein’s equations in extreme conditions. This method could help explore what happened before the Big Bang, test theories of cosmic inflation, investigate multiverse collisions, and even model cyclic universes that endlessly bounce through creation and destruction.

arXiv:2508.13523v1 Announce Type: cross Abstract: Since its inception in 1995, LAMMPS has grown to be a world-class molecular dynamics code, with thousands of users, over one million lines of code, and multi-scale simulation capabilities. We discuss how LAMMPS has adapted to the modern heterogeneous computing landscape by integrating the Kokkos performance portability library into the existing C++ code. We investigate performance portability of simple pairwise, many-body reactive, and machine-learned force-field interatomic potentials. We present results on GPUs across different vendors and generations, and analyze performance trends, probing FLOPS throughput, memory bandwidths, cache capabilities, and thread-atomic operation performance. Finally, we demonstrate strong scaling on all current US exascale machines -- OLCF Frontier, and ALCF Aurora, and NNSA El Capitan -- for the three potentials.

arXiv:2508.13523v1 Announce Type: new Abstract: Since its inception in 1995, LAMMPS has grown to be a world-class molecular dynamics code, with thousands of users, over one million lines of code, and multi-scale simulation capabilities. We discuss how LAMMPS has adapted to the modern heterogeneous computing landscape by integrating the Kokkos performance portability library into the existing C++ code. We investigate performance portability of simple pairwise, many-body reactive, and machine-learned force-field interatomic potentials. We present results on GPUs across different vendors and generations, and analyze performance trends, probing FLOPS throughput, memory bandwidths, cache capabilities, and thread-atomic operation performance. Finally, we demonstrate strong scaling on all current US exascale machines -- OLCF Frontier, and ALCF Aurora, and NNSA El Capitan -- for the three potentials.

Simulations still can't predict exactly when an earthquake will happen, but with the incredible processing power of modern exascale supercomputers, they can now predict how they will happen and how much damage they will likely cause.

Scientists at Lawrence Livermore National Laboratory (LLNL) have helped develop an advanced, real-time tsunami forecasting system—powered by El Capitan, the world's fastest supercomputer—that could dramatically improve early warning capabilities for coastal communities near earthquake zones.

arXiv:2508.06710v1 Announce Type: new Abstract: Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces a novel approach to increase arithmetic intensity in CFD by leveraging repeated matrix block structures. The method transforms the conventional sparse matrix-vector product (SpMV) into a sparse matrix-matrix product (SpMM), enabling simultaneous processing of multiple right-hand sides. This shifts the computation towards a more compute-bound regime by reusing matrix coefficients. Additionally, an inline mesh-refinement strategy is proposed: simulations initially run on a coarse mesh to establish a statistically steady flow, then refine to the target mesh. This reduces the wall-clock time to reach transition, leading to faster convergence with equivalent computational cost. The methodology is evaluated

Tesla has pulled the plug on Dojo, scrapping the AI supercomputer project that Elon Musk once called essential

Pete Bannon, who joined Tesla from Apple in 2016, is leaving the electric vehicle maker.

Pete Bannon, who joined Tesla from Apple in 2016, is leaving the electric vehicle maker.

As Greenland's ice retreats, it's fueling tiny ocean organisms. To test why, scientists turned to a computer model from JPL and MIT that's been called a laboratory in itself.

arXiv:2507.20719v1 Announce Type: new Abstract: Our fully kinetic, implicit Particle-in-Cell (PIC) simulations of global magnetospheres on up to 32,768 of El Capitan's AMD Instinct MI300A Accelerated Processing Units (APUs) represent an unprecedented computational capability that addresses a fundamental challenge in space physics: resolving the multi-scale coupling between microscopic (electron-scale) and macroscopic (global-scale) dynamics in planetary magnetospheres. The implicit scheme of iPIC3D supports time steps and grid spacing that are up to 10 times larger than those of explicit methods, without sacrificing physical accuracy. This enables the simulation of magnetospheres while preserving fine-scale electron physics, which is critical for key processes such as magnetic reconnection and plasma turbulence. Our algorithmic and technological innovations include GPU-optimized kernels, particle control, and physics-aware data compression using Gaussian Mixture Models. With

Scientists at the University of Stuttgart's Institute of Aerodynamics and Gas Dynamics (IAG) have produced a novel dataset that will improve the development of turbulence models. With the help of the Hawk supercomputer at the High-Performance Computing Center Stuttgart (HLRS), investigators in the laboratory of Dr. Christoph Wenzel conducted a large-scale direct numerical simulation of a spatially evolving turbulent boundary layer.

arXiv:2507.16697v1 Announce Type: new Abstract: Turbulence plays a crucial role in multiphysics applications, including aerodynamics, fusion, and combustion. Accurately capturing turbulence's multiscale characteristics is essential for reliable predictions of multiphysics interactions, but remains a grand challenge even for exascale supercomputers and advanced deep learning models. The extreme-resolution data required to represent turbulence, ranging from billions to trillions of grid points, pose prohibitive computational costs for models based on architectures like vision transformers. To address this challenge, we introduce a multiscale hierarchical Turbulence Transformer that reduces sequence length from billions to a few millions and a novel RingX sequence parallelism approach that enables scalable long-context learning. We perform scaling and science runs on the Frontier supercomputer. Our approach demonstrates excellent performance up to 1.1 EFLOPS on 32,768 AMD GPUs, with a

arXiv:2507.16697v1 Announce Type: cross Abstract: Turbulence plays a crucial role in multiphysics applications, including aerodynamics, fusion, and combustion. Accurately capturing turbulence's multiscale characteristics is essential for reliable predictions of multiphysics interactions, but remains a grand challenge even for exascale supercomputers and advanced deep learning models. The extreme-resolution data required to represent turbulence, ranging from billions to trillions of grid points, pose prohibitive computational costs for models based on architectures like vision transformers. To address this challenge, we introduce a multiscale hierarchical Turbulence Transformer that reduces sequence length from billions to a few millions and a novel RingX sequence parallelism approach that enables scalable long-context learning. We perform scaling and science runs on the Frontier supercomputer. Our approach demonstrates excellent performance up to 1.1 EFLOPS on 32,768 AMD GPUs, with a

The Isambard-AI supercomputer is made fully operational as the government unveils fresh AI plans.

arXiv:2507.11512v1 Announce Type: cross Abstract: Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a

arXiv:2507.11512v1 Announce Type: new Abstract: Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a

arXiv:2507.09337v1 Announce Type: new Abstract: Heterogeneity is the prevalent trend in the rapidly evolving high-performance computing (HPC) landscape in both hardware and application software. The diversity in hardware platforms, currently comprising various accelerators and a future possibility of specializable chiplets, poses a significant challenge for scientific software developers aiming to harness optimal performance across different computing platforms while maintaining the quality of solutions when their applications are simultaneously growing more complex. Code synthesis and code generation can provide mechanisms to mitigate this challenge. We have developed a toolchain, ORCHA, which arises from the needs of a large multiphysics simulation software, Flash-X, which were not met by any of the existing solutions. ORCHA is composed of three stand-alone tools -- one to express high-level control flow and a map of what to execute where on the platform, a second one to express

arXiv:2507.09337v1 Announce Type: new Abstract: Heterogeneity is the prevalent trend in the rapidly evolving high-performance computing (HPC) landscape in both hardware and application software. The diversity in hardware platforms, currently comprising various accelerators and a future possibility of specializable chiplets, poses a significant challenge for scientific software developers aiming to harness optimal performance across different computing platforms while maintaining the quality of solutions when their applications are simultaneously growing more complex. Code synthesis and code generation can provide mechanisms to mitigate this challenge. We have developed a toolchain, ORCHA, which arises from the needs of a large multiphysics simulation software, Flash-X, which were not met by any of the existing solutions. ORCHA is composed of three stand-alone tools -- one to express high-level control flow and a map of what to execute where on the platform, a second one to express

Our Milky Way could have many more satellite galaxies than we've detected so far. They're just too faint to be seen.

Kenneth Merz, Ph.D., of Cleveland Clinic's Center for Computational Life Sciences and a team are exploring how quantum computers can work with supercomputers to better simulate molecule behavior.

Working in tandem, a quantum computer and a supercomputer modelled the behaviour of several molecules, paving the way for useful applications in chemistry and pharmaceutical research

Elon Musk's artificial intelligence startup attained an official permit to power its Memphis supercomputer facility using natural gas-burning turbines.

Researchers have used machine learning to dramatically speed up the processing time when simulating galaxy evolution coupled with supernova explosion. This approach could help us understand the origins of our own galaxy, particularly the elements essential for life in the Milky Way.

Researchers have successfully demonstrated quantum speedup in kernel-based machine learning.

Scientists have built a compact physical qubit with built-in error correction, and now say it could be scaled into a 1,000-qubit machine that is small enough to fit inside a data center. They plan to release this machine in 2031.

Using the now-decommissioned Summit supercomputer, researchers at the Department of Energy's Oak Ridge National Laboratory ran the largest and most accurate molecular dynamics simulations yet of the interface between water and air during a chemical reaction. The simulations have uncovered how water controls such chemical reactions by dynamically coupling with the molecules involved in the process.

Part of the company's plan involves the new IBM Quantum Nighthawk processor, which is set to release later this year.

Part of the company's plan involves the new IBM Quantum Nighthawk processor, which is set to release later this year.

The company has unveiled new innovations in quantum hardware and software that researchers hope will make quantum computing both error-proof and useful before the end of the decade

Brain-inspired computers could boost AI efficiency—a tantalizing prospect as the industry's energy bills mount. The post Sandia Fires Up a Brain-Like Supercomputer That Can Simulate 180 Million Neurons appeared first on SingularityHub.

Physicists are always searching for new theories to improve our understanding of the universe and resolve big unanswered questions.

Merging neutron stars are excellent targets for multi-messenger astronomy. This modern and still very young method of astrophysics coordinates observations of the various signals from one and the same astrophysical source. When two neutron stars collide, they emit gravitational waves, neutrinos and radiation across the entire electromagnetic spectrum. To detect them, researchers need to add gravitational wave detectors and neutrino telescopes to ordinary telescopes that capture light.

The new supercomputer shows the increasing desire of government labs to adopt more technologies from commercial artificial intelligence systems.

The new supercomputer shows the increasing desire of government labs to adopt more technologies from commercial artificial intelligence systems.

arXiv:2411.16025v2 Announce Type: replace Abstract: Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from inefficient memory access patterns and high communication overhead. To address these challenges, we introduce \method{}, an efficient and scalable distributed GCN training framework tailored for CPU-powered supercomputers. Our contributions are threefold: (1) we develop general and efficient aggregation operators designed for irregular memory access, (2) we propose a hierarchical aggregation scheme that reduces communication costs without altering the graph structure, and (3) we present a communication-aware quantization scheme to enhance performance. Experimental results demonstrate that \method{} achieves a speedup of up to 6$\times$ compared with the SoTA implementations, and scales to 1000s of HPC-grade CPUs on the largest publicly available

China has launched the first cluster of satellites for a planned AI supercomputer array. The first-of-its-kind array will enable scientists to perform in-orbit data processing.

A groundbreaking new supercomputer model shows how magnetic fields shape the turbulent flow of charged particles in space.

arXiv:2505.14796v1 Announce Type: new Abstract: As supercomputers grow in size and complexity, power efficiency has become a critical challenge, particularly in understanding GPU power consumption within modern HPC workloads. This work addresses this challenge by presenting a data co-analysis approach using system data collected from the Polaris supercomputer at Argonne National Laboratory. We focus on GPU utilization and power demands, navigating the complexities of large-scale, heterogeneous datasets. Our approach, which incorporates data preprocessing, post-processing, and statistical methods, condenses the data volume by 94% while preserving essential insights. Through this analysis, we uncover key opportunities for power optimization, such as reducing high idle power costs, applying power strategies at the job-level, and aligning GPU power allocation with workload demands. Our findings provide actionable insights for energy-efficient computing and offer a practical, reproducible

Scientists have connected two quantum computers, paving the way for distributed quantum computing, quantum supercomputers and a quantum internet.

Exascale computing can process over a quintillion operations every second — enabling supercomputers to perform complex simulations that were previously impossible. But how does it work?

Detailed weather forecasts and better predictions about the rain will soon be enjoyed in the UK.

Analyzing massive datasets from nuclear physics experiments can take hours or days to process, but researchers are working to radically reduce that time to mere seconds using special software being developed at the Department of Energy's Lawrence Berkeley and Oak Ridge national laboratories.

arXiv:2505.05623v1 Announce Type: new Abstract: We characterize the GPU energy usage of two widely adopted exascale-ready applications representing two classes of particle and mesh solvers: (i) QMCPACK, a quantum Monte Carlo package, and (ii) AMReX-Castro, an adaptive mesh astrophysical code. We analyze power, temperature, utilization, and energy traces from double-/single (mixed)-precision benchmarks on NVIDIA's A100 and H100 and AMD's MI250X GPUs using queries in NVML and rocm smi lib, respectively. We explore application-specific metrics to provide insights on energy vs. performance trade-offs. Our results suggest that mixed-precision energy savings range between 6-25% on QMCPACK and 45% on AMReX-Castro. Also there are still gaps in the AMD tooling on Frontier GPUs that need to be understood, while query resolutions on NVML have little variability between 1 ms and 1 s. Overall, application level knowledge is crucial to define energy-cost/science-benefit opportunities for the

arXiv:2505.04802v1 Announce Type: cross Abstract: Sparse observations and coarse-resolution climate models limit effective regional decision-making, underscoring the need for robust downscaling. However, existing AI methods struggle with generalization across variables and geographies and are constrained by the quadratic complexity of Vision Transformer (ViT) self-attention. We introduce ORBIT-2, a scalable foundation model for global, hyper-resolution climate downscaling. ORBIT-2 incorporates two key innovations: (1) Residual Slim ViT (Reslim), a lightweight architecture with residual learning and Bayesian regularization for efficient, robust prediction; and (2) TILES, a tile-wise sequence scaling algorithm that reduces self-attention complexity from quadratic to linear, enabling long-sequence processing and massive parallelism. ORBIT-2 scales to 10 billion parameters across 32,768 GPUs, achieving up to 1.8 ExaFLOPS sustained throughput and 92-98% strong scaling efficiency. It

arXiv:2505.04802v1 Announce Type: new Abstract: Sparse observations and coarse-resolution climate models limit effective regional decision-making, underscoring the need for robust downscaling. However, existing AI methods struggle with generalization across variables and geographies and are constrained by the quadratic complexity of Vision Transformer (ViT) self-attention. We introduce ORBIT-2, a scalable foundation model for global, hyper-resolution climate downscaling. ORBIT-2 incorporates two key innovations: (1) Residual Slim ViT (Reslim), a lightweight architecture with residual learning and Bayesian regularization for efficient, robust prediction; and (2) TILES, a tile-wise sequence scaling algorithm that reduces self-attention complexity from quadratic to linear, enabling long-sequence processing and massive parallelism. ORBIT-2 scales to 10 billion parameters across 32,768 GPUs, achieving up to 1.8 ExaFLOPS sustained throughput and 92-98% strong scaling efficiency. It supports

A quantum computer can solve optimization problems faster than classical supercomputers, a process known as "quantum advantage" and demonstrated by a USC researcher in a paper recently published in Physical Review Letters.

arXiv:2504.18658v1 Announce Type: new Abstract: We evaluate the current state of collective communication on GPU-based supercomputers for large language model (LLM) training at scale. Existing libraries such as RCCL and Cray-MPICH exhibit critical limitations on systems such as Frontier -- Cray-MPICH underutilizes network and compute resources, while RCCL suffers from severe scalability issues. To address these challenges, we introduce PCCL, a communication library with highly optimized implementations of all-gather and reduce-scatter operations tailored for distributed deep learning workloads. PCCL is designed to maximally utilize all available network and compute resources and to scale efficiently to thousands of GPUs. It achieves substantial performance improvements, delivering 6-33x speedups over RCCL and 28-70x over Cray-MPICH for all-gather on 2048 GCDs of Frontier. These gains translate directly to end-to-end performance: in large-scale GPT-3-style training, PCCL provides up to

arXiv:2504.16026v1 Announce Type: new Abstract: Frontier AI development relies on powerful AI supercomputers, yet analysis of these systems is limited. We create a dataset of 500 AI supercomputers from 2019 to 2025 and analyze key trends in performance, power needs, hardware cost, ownership, and global distribution. We find that the computational performance of AI supercomputers has doubled every nine months, while hardware acquisition cost and power needs both doubled every year. The leading system in March 2025, xAI's Colossus, used 200,000 AI chips, had a hardware cost of \$7B, and required 300 MW of power, as much as 250,000 households. As AI supercomputers evolved from tools for science to industrial machines, companies rapidly expanded their share of total AI supercomputer performance, while the share of governments and academia diminished. Globally, the United States accounts for about 75% of total performance in our dataset, with China in second place at 15%. If the observed

Spending fight with Congress could force NSF to pull plug on fastest university-based machine

Its Blackwell AI chips have started production in Phoenix, Arizona, at Taiwan Semiconductor plants.

arXiv:2504.03632v1 Announce Type: new Abstract: The Aurora supercomputer is an exascale-class system designed to tackle some of the most demanding computational workloads. Equipped with both High Bandwidth Memory (HBM) and DDR memory, it provides unique trade-offs in performance, latency, and capacity. This paper presents a comprehensive analysis of the memory systems on the Aurora supercomputer, with a focus on evaluating the trade-offs between HBM and DDR memory systems. We explore how different memory configurations, including memory modes (Flat and Cache) and clustering modes (Quad and SNC4), influence key system performance metrics such as memory bandwidth, latency, CPU-GPU PCIe bandwidth, and MPI communication bandwidth. Additionally, we examine the performance of three representative HPC applications -- HACC, QMCPACK, and BFS -- each illustrating the impact of memory configurations on performance. By using microbenchmarks and application-level analysis, we provide insights into

A new device enables remote entanglement, allowing distant quantum processors to communicate with one another with reduced error rates.

The new DGX machines are portable but powerful enough to drive complex AI modules and research, with processing capabilities previously only available in data centers.

Each day, a human adult loses on average 50 to 70 billion cells, which die from natural causes alone. New cells replace lost ones by the complex process of cell division, which relies on what scientists call molecular machines to transport chemical cargo to where it is needed for reactions that keep us alive.

arXiv:2503.22981v1 Announce Type: new Abstract: Many extreme-scale applications require the movement of large quantities of data to, from, and among leadership computing facilities, as well as other scientific facilities and the home institutions of facility users. These applications, particularly when leadership computing facilities are involved, can touch upon edge cases (e.g., terabyte files) that had not been a focus of previous Globus optimization work, which had emphasized rather the movement of many smaller (megabyte to gigabyte) files. We report here on how automated client-driven chunking can be used to accelerate both the movement of large files and the integrity checking operations that have proven to be essential for large data transfers. We present detailed performance studies that provide insights into the benefits of these modifications in a range of file transfer scenarios.

Dawn, the AI supercomputer from Cambridge, accelerates breakthroughs in climate science, medical diagnostics, and clean energy with advanced computing power.

arXiv:2503.21415v1 Announce Type: new Abstract: The proceedings of Workshop Scientific HPC in the pre-Exascale era (SHPC), held in Pisa, Italy, September 18, 2024, are part of 3rd Italian Conference on Big Data and Data Science (ITADATA2024) proceedings (arXiv: 2503.14937). The main objective of SHPC workshop was to discuss how the current most critical questions in HPC emerge in astrophysics, cosmology, and other scientific contexts and experiments. In particular, SHPC workshop focused on: $\bullet$ Scientific (mainly in astrophysical and medical fields) applications toward (pre-)Exascale computing $\bullet$ Performance portability $\bullet$ Green computing $\bullet$ Machine learning $\bullet$ Big Data management $\bullet$ Programming on heterogeneous architectures $\bullet$ Programming on accelerators $\bullet$ I/O techniques

The Aardvark Weather machine learning algorithm is much faster than traditional systems and can work on a desktop computer.

While earlier weather-forecasting AIs have replaced some tasks done by traditional models, new research uses machine learning to replace the entire process, making it much faster

arXiv:2503.09917v1 Announce Type: new Abstract: MareNostrum5 is a pre-exascale supercomputer at the Barcelona Supercomputing Center (BSC), part of the EuroHPC Joint Undertaking. With a peak performance of 314 petaflops, MareNostrum5 features a hybrid architecture comprising Intel Sapphire Rapids CPUs, NVIDIA Hopper GPUs, and DDR5 and high-bandwidth memory (HBM), organized into four partitions optimized for diverse workloads. This document evaluates MareNostrum5 through micro-benchmarks (floating-point performance, memory bandwidth, interconnect throughput), HPC benchmarks (HPL and HPCG), and application studies using Alya, OpenFOAM, and IFS. It highlights MareNostrum5's scalability, efficiency, and energy performance, utilizing the EAR (Energy Aware Runtime) framework to assess power consumption and the effects of direct liquid cooling. Additionally, HBM and DDR5 configurations are compared to examine memory performance trade-offs. Designed to complement standard technical

This new superconducting prototype quantum processor achieved benchmarking results to rival Google's new Willow QPU.

Sunburns and aging skin are obvious effects of exposure to harmful UV rays, tobacco smoke and other carcinogens. But the effects aren't just skin deep. Inside the body, DNA is literally being torn apart.

arXiv:2503.07953v1 Announce Type: new Abstract: Engineering, medicine, and the fundamental sciences broadly rely on flow simulations, making performant computational fluid dynamics solvers an open source software mainstay. A previous work made MFC 3.0 a published open source source solver with many features. MFC 5.0 is a marked update to MFC 3.0, including a broad set of well-established and novel physical models and numerical methods and the introduction of GPU and APU (or superchip) acceleration. We exhibit state-of-the-art performance and ideal scaling on the first two exascale supercomputers, OLCF Frontier and LLNL El Capitan. Combined with MFC's single-GPU/APU performance, MFC achieves exascale computation in practice. With these capabilities, MFC has evolved into a tool for conducting simulations that many engineering challenge problems hinge upon. New physical features include the immersed boundary method, $N$-fluid phase change, Euler--Euler and Euler--Lagrange sub-grid bubble

arXiv:2503.07953v1 Announce Type: cross Abstract: Engineering, medicine, and the fundamental sciences broadly rely on flow simulations, making performant computational fluid dynamics solvers an open source software mainstay. A previous work made MFC 3.0 a published open source source solver with many features. MFC 5.0 is a marked update to MFC 3.0, including a broad set of well-established and novel physical models and numerical methods and the introduction of GPU and APU (or superchip) acceleration. We exhibit state-of-the-art performance and ideal scaling on the first two exascale supercomputers, OLCF Frontier and LLNL El Capitan. Combined with MFC's single-GPU/APU performance, MFC achieves exascale computation in practice. With these capabilities, MFC has evolved into a tool for conducting simulations that many engineering challenge problems hinge upon. New physical features include the immersed boundary method, $N$-fluid phase change, Euler--Euler and Euler--Lagrange sub-grid

arXiv:2503.04428v1 Announce Type: cross Abstract: Numerical simulations of multidimensional astrophysical fluids present considerable challenges. However, the development of exascale computing has significantly enhanced computational capabilities, motivating the development of new codes that can take full advantage of these resources. In this article, we introduce HERACLES++, a new hydrodynamics code with high portability, optimized for exascale machines with different architectures and running efficiently both on CPUs and GPUs. The code is Eulerian and employs a Godunov finite-volume method to solve the hydrodynamics equations, which ensures accuracy in capturing shocks and discontinuities. It includes different Riemann solvers, equations of state, and gravity solvers. It works in Cartesian and spherical coordinates, either in 1-D, 2-D, or 3-D, and uses passive scalars to handle gases with several species. The code accepts a user-supplied heating or cooling term to treat a variety of

For decades, scientists assumed that only large ocean temperature patterns covering 200 kilometers (124 miles) or more could strongly influence storms. Now, by leveraging advances in computing power, a team of scientists from UC San Diego's Scripps Institution of Oceanography, NASA Jet Propulsion Laboratory and NASA Goddard Space Flight Center have discovered that small-scale ocean processes can have a much larger influence on storm development than previously thought.

Zuchongzhi-3, a superconducting quantum computing prototype with 105 qubits and 182 couplers, has made significant advancements in random quantum circuit sampling. This prototype was successfully developed by a research team from the University of Science and Technology of China (USTC).

The mysterious Oort cloud is the source of many of our solar system's comets, but astronomers still have no idea what it looks like. Now, new simulations may have given them a first glimpse.

By combining digital and analog quantum simulation into a new hybrid approach, scientists have already started to make fresh scientific discoveries using quantum computers.

Frontier, the second fastest supercomputer in the world, used dark matter and the movement of gas and plasma rather than just gravity to model the observable universe.

arXiv:2502.08145v1 Announce Type: new Abstract: Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In this work, we present a novel four-dimensional hybrid parallel algorithm implemented in a highly scalable, portable, open-source framework called AxoNN. We describe several performance optimizations in AxoNN to improve matrix multiply kernel performance, overlap non-blocking collectives with computation, and performance modeling to choose performance optimal configurations. These have resulted in unprecedented scaling and peak flop/s (bf16) for training of GPT-style transformer models on Perlmutter (620.1 Petaflop/s), Frontier (1.381 Exaflop/s) and Alps (1.423 Exaflop/s). While the abilities of LLMs improve with the number of trainable parameters, so do privacy and copyright risks caused by memorization of training data, which can cause disclosure of

Extreme fire seasons in recent years highlight the urgent need to better understand wildfires within the broader context of climate change. Under climate change, many drivers of wildfires are expected to change, such as the amount of carbon stored in vegetation, rainfall, and lightning strikes.

Enabled by supercomputing, University of Pretoria (UP) researchers have led an international team of astronomers that has provided deeper insight into the entire life cycle (birth, growth and death) of giant radio galaxies, which resemble "cosmic fountains"—jets of superheated gas that are ejected into near-empty space from their spinning supermassive black holes.

Japan's Fugaku supercomputer has gained an edge following the installation of the Reimei quantum computer.

In real life, mutants can arise when their DNA changes to give them an advantage over the rest of the population. A team from the University of Michigan has used simulations on the Pittsburgh Supercomputing Center's Neocortex system to find out why beneficial mutants rarely come to dominate real organisms.

In a milestone that brings quantum computing tangibly closer to large-scale practical use, scientists have demonstrated the first instance of distributed quantum computing. Using a photonic network interface, they successfully linked two separate quantum processors to form a single, fully connected quantum computer, paving the way to tackling computational challenges previously out of reach.

In a milestone that brings quantum computing tangibly closer to large-scale practical use, scientists at Oxford University Physics have demonstrated the first instance of distributed quantum computing.

arXiv:2411.10406v2 Announce Type: cross Abstract: In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits and proof-of-principle error-correction on a single logical qubit. Nevertheless, despite significant progress and excitement, the path toward a full-stack scalable technology is largely unknown. There are significant outstanding quantum hardware, fabrication, software architecture, and algorithmic challenges that are either unresolved or overlooked. These issues could seriously undermine the arrival of utility-scale quantum computers for the foreseeable future. Here, we provide a comprehensive review of these scaling challenges. We show how the road to scaling could be paved by adopting existing semiconductor technology to build much higher-quality qubits, employing system engineering

arXiv:2501.11962v1 Announce Type: new Abstract: Molecular dynamics (MD)-based path sampling algorithms are a very important class of methods used to study the energetics and kinetics of rare (bio)molecular events. They sample the highly informative but highly unlikely reactive trajectories connecting different metastable states of complex (bio)molecular systems. The metadynamics of paths (MoP) method proposed by Mandelli, Hirshberg, and Parrinello [Pys. Rev. Lett. 125 2, 026001 (2020)] is based on the Onsager-Machlup path integral formalism. This provides an analytical expression for the probability of sampling stochastic trajectories of given duration. In practice, the method samples reactive paths via metadynamics simulations performed directly in the phase space of all possible trajectories. Its parallel implementation is in principle infinitely scalable, allowing arbitrarily long trajectories to be simulated. Paving the way for future applications to study the thermodynamics and

The world's fastest supercomputer 'El Capitan' can reach a peak performance of 2.746 exaFLOPS, making it the planet's third exascale computer.

New Project Digits mini PC offers a petaFLOP of power for local AI processing and data science.

arXiv:2501.01628v1 Announce Type: new Abstract: 3D visualization and rendering in HPC are very heterogenous applications, though fundamentally the tasks involved are well-defined and do not differ much from application to application. The Khronos Group's ANARI standard seeks to consolidate 3D rendering across sci-vis applications. This paper makes an effort to convey challenges of 3D rendering and visualization with ANARI in the context of HPC, where the data does not fit within a single node or GPU but must be distributed. It also provides a gentle introduction to parallel rendering concepts and challenges to practitioners from the field of HPC in general. Finally, we present a case study showcasing data parallel rendering on the new supercomputer RAMSES at the University of Cologne.

arXiv:2412.15518v1 Announce Type: new Abstract: Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution in localized areas across expansive domains. Today's supercomputers' extreme heterogeneity presents a significant challenge for dynamically adaptive codes, highlighting the importance of achieving performance portability at scale. Our research focuses on astrophysical simulations, particularly stellar mergers, to elucidate early universe dynamics. We present Octo-Tiger, leveraging Kokkos, HPX, and SIMD for portable performance at scale in complex, massively parallel adaptive multi-physics simulations. Octo-Tiger supports diverse processors, accelerators, and network backends. Experiments demonstrate exceptional scalability across several heterogeneous supercomputers including Perlmutter, Frontier, and Fugaku, encompassing major GPU architectures and x86, ARM, and RISC-V CPUs. Parallel

Modeling how cars deform in a crash, how spacecraft responds to extreme environments, or how bridges resist stress could be made thousands of times faster thanks to new artificial intelligence that enables personal computers to solve massive math problems that generally require supercomputers.

Google's new 105-qubit 'Willow' quantum processor has surpassed a key error-correction threshold first proposed in 1995 — with errors now reducing exponentially as you scale up quantum machines.

The simulations will be used by astronomers to test the standard model of cosmology.

Scientists at the Department of Energy’s Argonne National Laboratory have created the largest astrophysical simulation of the Universe ever. They used what was until recently the world’s most powerful supercomputer to simulate the Universe at an unprecedented scale. The simulation’s size corresponds to the largest surveys conducted by powerful telescopes and observatories. The Frontier Supercomputer … Continue reading "A Superfast Supercomputer Creates the Biggest Simulation of the Universe Yet" The post A Superfast Supercomputer Creates the Biggest Simulation of the Universe Yet appeared first on Universe Today.

The universe just got a whole lot bigger—or at least in the world of computer simulations, that is. In early November, researchers at the Department of Energy's Argonne National Laboratory used the fastest supercomputer on the planet to run the largest astrophysical simulation of the universe ever conducted.

One mystery in planetary science is a satisfying origin story for Mars's moons, Phobos and Deimos. Were they chunks of Mars blasted into space by a meteor impact? Were they captured asteroids from the belt? A new supercomputer simulation found that a reasonable explanation could come from a massive asteroid passing just close enough to Mars that it was torn into pieces. Over time, chunks and debris would have settled into a disk around Mars and clumped into moons. The post New Supercomputer Simulation Explains How Mars Got Its Moons appeared first on Universe Today.

A NASA study using a series of supercomputer simulations reveals a potential new solution to a longstanding Martian mystery: How did Mars get its moons? The first step, the findings say, may have involved the destruction of an asteroid.

Researchers have used machine learning and supercomputer simulations to investigate how tiny gold nanoparticles bind to blood proteins. The studies discovered that favorable nanoparticle-protein interactions can be predicted from machine learning models that are trained from atom-scale molecular dynamics simulations. The new methodology opens ways to simulate efficacy of gold nanoparticles as targeted drug delivery systems in precision nanomedicine.

The vast computational power of the El Capitan supercomputer at Lawrence Livermore National Laboratory in California will be used to support the US nuclear deterrent

arXiv:2411.10637v1 Announce Type: new Abstract: Exascale computers offer transformative capabilities to combine data-driven and learning-based approaches with traditional simulation applications to accelerate scientific discovery and insight. However, these software combinations and integrations are difficult to achieve due to the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms. We present the ExaWorks project, which addresses many of these challenges. We developed a workflow Software Development Toolkit (SDK), a curated collection of workflow technologies that can be composed and interoperated through a common interface, engineered following current best practices, and specifically designed to work on HPC platforms. ExaWorks also developed PSI/J, a job management abstraction API, to simplify the construction of portable software components and applications that can be used over various HPC schedulers. The PSI/J API is a

Researchers in the Nanoscience Center at the University of Jyväskylä, Finland, have used machine learning and supercomputer simulations to investigate how tiny gold nanoparticles bind to blood proteins. The studies discovered that favorable nanoparticle-protein interactions can be predicted from machine learning models that are trained from atom-scale molecular dynamics simulations. The new methodology opens ways to simulate the efficacy of gold nanoparticles as targeted drug delivery systems in precision nanomedicine.

Inside cells, there exists an extensive system of canals known as the endoplasmic reticulum (ER), which consists of membrane-encased tubes that are partially broken down as needed -- for instance in case of a nutrient deficiency. As part of this process, bulges or protrusions form in the membrane, which then pinch off and are recycled by the cell. A study has examined this protrusion process using computer simulations. Its finding: certain structural motifs of proteins in the ER membrane play a central role in this process.

Residents say Mr. Musk’s data center for artificial intelligence is compounding their pollution burden and adding stress on the local electrical grid.

Residents say Mr. Musk’s data center for artificial intelligence is compounding their pollution burden and adding stress on the local electrical grid.

arXiv:2410.18126v1 Announce Type: new Abstract: In the rapidly evolving domain of high-performance computing (HPC), heterogeneous architectures such as the SX-Aurora TSUBASA (SX-AT) system architecture, which integrate diverse processor types, present both opportunities and challenges for optimizing resource utilization. This paper investigates workload interference within an SX-AT system, with a specific focus on resource contention between Vector Hosts (VHs) and Vector Engines (VEs). Through comprehensive empirical analysis, the study identifies key factors contributing to performance degradation, such as cache and memory bandwidth contention, when jobs with varying computational demands share resources. To address these issues, we develop a predictive model that leverages hardware performance counters (HCs) and machine learning (ML) algorithms to classify and predict workload interference. Our results demonstrate that the model accurately forecasts performance degradation, offering

Nuclear fission—when the nucleus of an atom splits in two, releasing energy—may seem like a process that is fully understood. First discovered in 1939 and thoroughly studied ever since, fission is a constant factor in modern life, used in everything from nuclear medicine to power-generating nuclear reactors. However, it is a force of nature that still contains mysteries yet to be solved.
