- Ленты заголовков
- Темы
-
Newsmakers
- Army, Pentagon, CIA, FBI Tech.
- Biohacking
- Bitcoin
- Chemical computer
- CyberSex
- Cyborgs
- Elon Musk, Tesla, SpaceX ...
- Energy storage
- Fintech
- Fusion
- Google and Alphabet
- IBM
- Immunotherapy
- Intel
- Laser
- Lockheed
- Molecular
- NASA, ESA
- Nobel
- Space Launch System (NASA)
- SpaceX
- Spy
- Supercomputers
- TechInvestorNews.com
Supercomputers
arXiv:2605.26388v1 Announce Type: cross Abstract: We present \texttt{MARUT}, a scalable multi-GPU computational fluid dynamics (CFD) framework designed for high-fidelity simulations of compressible flows spanning subsonic to hypersonic regimes, including chemically reacting nonequilibrium flows with finite-rate chemistry and adaptive mesh refinement (AMR). The framework addresses a central challenge in contemporary scientific computing: the development of numerically accurate and computationally scalable algorithms capable of resolving strongly nonlinear, multiscale flow physics on emerging heterogeneous supercomputing architectures. Built around a distributed-memory MPI-parallel infrastructure and implemented natively on NVIDIA GPUs, \texttt{MARUT} combines high-order spectral discontinuous Galerkin discretisations with strong-stability-preserving Runge--Kutta time integration to achieve low-dissipation and high-resolution representation of shocks, vortical structures and reactive
arXiv:2605.26388v1 Announce Type: new Abstract: We present \texttt{MARUT}, a scalable multi-GPU computational fluid dynamics (CFD) framework designed for high-fidelity simulations of compressible flows spanning subsonic to hypersonic regimes, including chemically reacting nonequilibrium flows with finite-rate chemistry and adaptive mesh refinement (AMR). The framework addresses a central challenge in contemporary scientific computing: the development of numerically accurate and computationally scalable algorithms capable of resolving strongly nonlinear, multiscale flow physics on emerging heterogeneous supercomputing architectures. Built around a distributed-memory MPI-parallel infrastructure and implemented natively on NVIDIA GPUs, \texttt{MARUT} combines high-order spectral discontinuous Galerkin discretisations with strong-stability-preserving Runge--Kutta time integration to achieve low-dissipation and high-resolution representation of shocks, vortical structures and reactive
arXiv:2605.26384v1 Announce Type: new Abstract: At global scale, data-center electricity demand is growing faster than the grids that supply it, while system operators increasingly require large flexible loads that can adjust power within seconds to absorb variable wind and solar generation. For multi-megawatt AI/HPC facilities, the key unresolved question is practical and measurable: how quickly can the software stack translate a grid request into a real change in GPU power at the facility meter, where commitments are settled? We answer this on real hardware with GridPilot, a three-tier predictive controller operating across milliseconds, seconds, and hours, augmented by a deterministic safety-island bypass for fast response. On a three-GPU NVIDIA V100 testbed, GridPilot achieves a measured end-to-end trigger-to-target response of 97.2 ms, which is 6.9x faster than the 700 ms requirement of Nordic Fast Frequency Reserve. We further incorporate an instantaneous Power Usage
arXiv:2605.24896v1 Announce Type: cross Abstract: Seasonal forecasting of summer rainfall in East Asia remains a grand challenge, as predictability at 3 to 6 month lead times is constrained by the spring predictability barrier, weak large-scale signals, and localized nonlinear convective extremes. We address this challenge with CAPES, which integrates a kilometer-resolution coupled regional model with atmosphere, land, and ocean components and a data-driven AI seasonal forecasting system. At 15 km resolution, the fused workflow combines 174 numerical members from varying start times, physics schemes, and parameter perturbations with 1,600 AI members generated from initial and physical perturbations. Using the full LineShine system, CAPES completes ten annual 1,774-member hindcasts for 2016 to 2025 within 14.6 hours, improving the mean prediction score from ECMWF's 71.8 to 75.9 and delivering a major gain in operational forecasting capability. The 1-km configuration further enables
arXiv:2605.25655v1 Announce Type: new Abstract: Large language model (LLM) inference is limited by high computational cost and memory bandwidth demands, making deployment on heterogeneous many-core processors challenging. Taking the MT-3000 processor used in the Tianhe supercomputer as an example, its limited main-memory bandwidth and distributed memory hierarchy exemplify these bottlenecks, making it difficult to directly migrate existing GPU-based inference frameworks. To address this problem, we propose THInfer, a hardware-aware inference framework that maximizes data locality under bandwidth-constrained conditions through hardware-software co-design and parallel strategy optimization. THInfer incorporates three key techniques: (1) a high-performance operator library for the VLIW SIMD architecture, providing hand-optimized FP16 kernels that achieve up to 70 percent of the peak performance per cluster; (2) a density-driven computation graph fusion and unified kernel scheduling
arXiv:2605.24896v1 Announce Type: new Abstract: Seasonal forecasting of summer rainfall in East Asia remains a grand challenge, as predictability at 3 to 6 month lead times is constrained by the spring predictability barrier, weak large-scale signals, and localized nonlinear convective extremes. We address this challenge with CAPES, which integrates a kilometer-resolution coupled regional model with atmosphere, land, and ocean components and a data-driven AI seasonal forecasting system. At 15 km resolution, the fused workflow combines 174 numerical members from varying start times, physics schemes, and parameter perturbations with 1,600 AI members generated from initial and physical perturbations. Using the full LineShine system, CAPES completes ten annual 1,774-member hindcasts for 2016 to 2025 within 14.6 hours, improving the mean prediction score from ECMWF's 71.8 to 75.9 and delivering a major gain in operational forecasting capability. The 1-km configuration further enables
Scientists used some of the most advanced plasma simulations ever created to uncover how the universe builds enormous magnetic fields out of turbulence. The discovery could reshape our understanding of stars, black holes, neutron star collisions, and dangerous solar eruptions.
Researchers used the world’s fastest supercomputer for open science to train an artificial intelligence model that captures magnetic
arXiv:2605.21874v1 Announce Type: new Abstract: The project described in this paper explores the informative sonification of data received in real time from a supercomputer. These data capture the current activities in all the nodes of the computer, therefore, their sonification functions as a form of continuous monitoring of the nodes' behavior and, by extension, of the system as a whole. Because such monitoring is theoretically unending, the resulting sonification must be musically capable of conveying information through sound in a way that remains both intelligible and engaging over long durations. Rather than imposing a predefined musical style onto the data, we sought to identify one which the data themselves could plausibly support. From a small set of candidates, we selected EDM because it is a family of genres whose structural and temporal characteristics align well with continuous, data-driven processes and long-term listening. Through this style-based approach, this
Scientists in Germany have pulled off a staggering computing feat by fully simulating a 50-qubit quantum computer for the first time ever using Europe’s new exascale supercomputer, JUPITER. The breakthrough shatters the previous 48-qubit record and highlights just how powerful next-generation supercomputers have become.
arXiv:2605.04333v1 Announce Type: new Abstract: Tail latency dominates the performance of synchronous pretraining jobs when running at very large scales. We describe a three-pronged approach: (1) a new RDMA-based transport protocol, MRC, sprays across many paths and actively load-balances between them, eliminating the issue of flow collisions (2) the use of multi-plane Clos topologies to get the benefits of high switch radix and redundancy, allowing training clusters well over 100K GPUs to be built as two-tier topologies while increasing physical redundancy, and (3) the use of static source-routing using SRv6 to allow MRC the freedom to bypass failures by itself. We describe our experiences running MRC and static SRv6 routing in production in OpenAI and Microsoft's largest training clusters, where it has been used to train the latest frontier models. We demonstrate how MRC allows AI training jobs to ride out many network failures that previously would have interrupted training.
arXiv:2605.03561v1 Announce Type: new Abstract: As exascale systems reach unprecedented concurrency, traditional performance analysis tools struggle with the overhead of massive-scale telemetry. We present an accelerated infrastructure for the hpcanalysis framework that leverages a high-performance C++ API and GPU parallelism to enable high-throughput diagnostics. Our C++ API achieves a 9.69-second ingestion time for 100,000 MPI ranks on Aurora. Furthermore, our GPU-accelerated layer achieves up to 314x speedup over CPU-based processing when analyzing 100,000 execution traces. Finally, we implement a topology-aware workflow that maps logical performance outliers to physical Slingshot interconnect coordinates, localizing network congestion across 22 distinct racks on Aurora. We also demonstrate how the framework's advanced interface seamlessly integrates with external tools to provide sophisticated analytical models. We introduce a novel tri-dimensional performance model that
arXiv:2605.00426v1 Announce Type: new Abstract: Understanding HPC facilities users' behaviors and how computational resources are requested and utilized is not only crucial for the cluster productivity but also essential for designing and constructing future exascale HPC systems. This paper tackles Challenge 4, 'Analyzing Resource Utilization and User Behavior on Titan Supercomputer', of the 2021 Smoky Mountains Conference Data Challenge. Specifically, we dig deeper inside the records of Titan to discover patterns and extract relationships. This paper explores the workload distribution and usage patterns from resource manager system logs, GPU traces, and scientific areas information collected from the Titan supercomputer. Furthermore, we want to know how resource utilization and user behaviors change over time. Using data science methods, such as correlations, clustering, or neural networks, our findings allow us to investigate how projects, jobs, nodes, GPUs and memory are related.
arXiv:2604.22571v1 Announce Type: new Abstract: Large language models (LLMs) and agentic systems have recently demonstrated potential for automating scientific workflows, including atomistic simulations. However, their deployment in high-performance computing (HPC) environments remains limited by the lack of mechanisms ensuring correctness, reproducibility, and safe interaction with computational resources. Generated workflows suffer from inconsistencies, incorrect API usage, or invalid physical configurations - leading to failed or unreliable simulations. In this work, we introduce LARA-HPC, a validation-driven agentic framework to enable reliable workflow generation for atomistic modeling on HPC systems. Our approach is based on three key components: (i) a controlled execution layer that mediates all interactions with HPC resources; (ii) simulation-native validation through dry-run capabilities, enabling execution-level verification without incurring resource cost; and (iii) a
arXiv:2604.15380v1 Announce Type: cross Abstract: We present an exascale workflow for materials discovery using atomistic graph foundation models built on HydraGNN. We jointly train on 16 open first-principles datasets (544+ million structures covering 85+ elements) using a multi-task architecture with per-dataset heads and a scalable ADIOS2/DDStore data pipeline. On Frontier, we execute six large-scale DeepHyper hyperparameter optimization campaigns in FP64 and promote the top-performing message-passing models to sustained 2,048-node training, yielding a PaiNN-based lead model. The resulting model enables billion-scale screening, evaluating 1.1 billion atomistic structures in 50 seconds, compressing a workload that would require years of first-principles computation, and supports data-scarce fine-tuning across diverse downstream tasks. We quantify precision-performance tradeoffs (BF16/FP32/FP64), demonstrate transfer across twelve chemically diverse downstream tasks, and establish
Aluminum die-cast components are widely used in automotive and precision machinery applications due to their combination of low weight and structural strength. However, internal defects known as porosity—voids formed by entrapped air during casting—remain a persistent challenge. These defects are difficult to detect through external inspection and can compromise mechanical integrity and long-term reliability.
arXiv:2604.09517v1 Announce Type: new Abstract: Sustaining exascale performance in production requires engineering choices and operational practices that emerge only under real deployment constraints and demand coordination across system layers. This paper reports experience from three successive campaigns running HPL and HPL-MxP on Aurora, an Intel-based exascale system featuring the first large-scale deployment of Intel discrete GPUs, CPU-attached network interfaces, and the largest production Slingshot-11 interconnect. Aurora progressed from 0.585EF/s on 5,439 nodes to 1.01EF/s on 9,234 nodes in FP64 HPL, while HPL-MxP reached 11.64EF/s, an 11.5x speedup over FP64 enabled by mixed-precision arithmetic and Intel AMX acceleration. We identify and classify by role at production scale the system-level choices that sustained these results, including deterministic locality-aware resource mapping, explicit CPU-GPU pipelining, mixed-precision orchestration, and a hybrid P2P/collective
arXiv:2604.06056v1 Announce Type: new Abstract: Modern exascale GPU- and APU-based systems provide multiple power and energy sensors, but differences in scope, update rate, timing, and filtering complicate the attribution of short-lived accelerator activity. This paper presents a methodology to characterize and correct these effects on Cray EX systems with AMD Instinct MI250X GPUs (Frontier) and MI300A APUs (Portage). Using controlled square-wave workloads, we quantify update intervals, delay, aliasing, and variability across up to 512 GPUs and 480 APUs with on-chip (rocm-smi/amd-smi) and off-chip Cray Power Management sensors. We reconstruct power from cumulative energy counters to achieve faster response times, validate it against on-chip, off-chip, and node-level sensors, and integrate the resulting streams into a Score-P/PAPI-based tool for time-aligned, phase-level attribution. Applied to rocHPL, rocHPL-MxP, and HPG-MxP, the method separates energy savings due to reduced runtime
arXiv:2603.27125v1 Announce Type: new Abstract: Supercomputers are complex, dynamic systems that serve thousands of users and are built with thousands of compute nodes. Due to the vast amounts of system and performance data needed to accurately capture their status, supercomputers require complex methods to monitor, maintain, and optimize. Data visualization is a powerful technique for overseeing these large streams of data in an easily interpretable way. The MIT Lincoln Laboratory Supercomputing Center (LLSC) enables effective monitoring through combining 3D gaming technology with compound data streams in the TX-Digital Twin, a 3D simulation of the supercomputer. The TX-Digital Twin offers both live and historical data, in visual and text formats, and tracks a multitude of revealing performance metrics. Recent increasing interest in GPU-accelerated computing has driven a need for monitoring and maintenance of GPU-accelerated resources in supercomputers. In this paper, we build on
arXiv:2603.26411v1 Announce Type: new Abstract: Predicting how protein mutations affect drug binding remains a major challenge, particularly when the mutations are distal from the binding site. In this study, we introduce a coupled simulation workflow that combines long-time-scale molecular dynamics (MD) with high-throughput quantum mechanical (QM) analysis to reveal the electronic structure signatures of mutation induced drug resistance in the HIV-1 protease. Our workflow leverages GPU-accelerated MD to generate conformational ensembles, and performs in-operando linear-scaling density functional theory (DFT) calculations on selected frames parallelized on a coupled partition of CPU nodes. This design enables efficient, massively parallel quantum analysis of protein-ligand complexes at atomic resolution. Using this approach, we investigate resistance to the antiviral Darunavir in a multi-mutant HIV-1 protease variant. By mapping the network of electronic interactions across the
A new study from the Italian Institute of Technology (IIT), in collaboration with Uppsala University (Sweden) and AstraZeneca, shows how computational chemistry and supercomputers can help scientists better understand the fundamental mechanisms of life, specifically those of human cells. This research was conducted by the Molecular Modeling and Drug Discovery Unit, led by Marco De Vivo at IIT in Genoa, and was published in the journal Proceedings of the National Academy of Sciences.
arXiv:2603.24508v1 Announce Type: new Abstract: Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with explicit dependencies to overlap computation and communication across devices. Portability is achieved through persistent device-resident memory, an optimized contiguous one-dimensional data layout, and a transition from unified to pinned host memory to improve large data-transfer efficiency, together with GPU Direct Memory Access (DMA) and runtime interoperability for direct device-pointer access. Standardized and scalable I/O is provided using openPMD and ADIOS2, supporting high-performance file I/O,
arXiv:2603.24508v1 Announce Type: cross Abstract: Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with explicit dependencies to overlap computation and communication across devices. Portability is achieved through persistent device-resident memory, an optimized contiguous one-dimensional data layout, and a transition from unified to pinned host memory to improve large data-transfer efficiency, together with GPU Direct Memory Access (DMA) and runtime interoperability for direct device-pointer access. Standardized and scalable I/O is provided using openPMD and ADIOS2, supporting high-performance file
Astronomers have finally cracked a decades-old mystery about red giant stars—how material from their deep interiors makes its way to the surface. Using cutting-edge supercomputer simulations, researchers discovered that stellar rotation plays a powerful role in mixing elements across a previously unexplained barrier inside the star.
arXiv:2603.19544v1 Announce Type: new Abstract: Artificial Intelligence for scientific applications increasingly requires training large models on data that cannot be centralized due to privacy constraints, data sovereignty, or the sheer volume of data generated. Federated learning (FL) addresses this by enabling collaborative training without centralizing raw data, but scientific applications demand model scales that requires extensive computing resources, typically offered at High Performance Computing (HPC) facilities. Deploying FL experiments across HPC facilities introduces challenges beyond cloud or enterprise settings. We present a comprehensive cross-facility FL framework for heterogeneous HPC environments, built on Advanced Privacy-Preserving Federated Learning (APPFL) framework with Globus Compute and Transfer orchestration, and evaluate it across four U.S. Department of Energy (DOE) leadership-class supercomputers. We demonstrate that FL experiments across HPC facilities
Carbon forms the graphite in pencils, the diamonds in jewelry and the molecules that make up every living thing. But under extreme conditions—like the heat and pressure of intense explosions—carbon can transform into exotic nanometer-sized structures called nanocarbons. These materials are often stronger than steel, lighter than plastic and adaptable for uses in medicine, energy and national security.
A neutron star merger is an extraordinary event. It features extremely powerful, chaotic magnetic fields that generate extremely energetic photons. Supercomputer simulations show that the extreme gamma-ray photons created in the mayhem can't even escape the chaos.
arXiv:2603.10970v1 Announce Type: new Abstract: Quantum computers have demonstrated utility in simulating quantum systems beyond brute-force classical approaches. As the community builds on these demonstrations to explore using quantum computing for applied research, algorithms and workflows have emerged that require leveraging both quantum computers and classical high-performance computing (HPC) systems to scale applications, especially in chemistry and materials, beyond what either system can simulate alone. Today, these disparate systems operate in isolation, forcing users to manually orchestrate workloads, coordinate job scheduling, and transfer data between systems -- a cumbersome process that hinders productivity and severely limits rapid algorithmic exploration. These challenges motivate the need for flexible and high-performance Quantum-Centric Supercomputing (QCSC) systems that integrate Quantum Processing Units (QPUs), Graphics Processing Units (GPUs), and Central Processing
Using the Frontier supercomputer at the Department of Energy’s Oak Ridge National Laboratory, researchers from the Georgia Institute
Large protein machines in the body carry out many of the cell's most essential tasks, from energy production to the regulation of signal transmission. Although they can now be imaged in great detail using cryo-electron microscopy, it has long been difficult to understand how these complexes actually move and function. Researchers at Karolinska Institutet have now developed a computational method capable of simulating the movements of some of the cell's largest protein complexes.
The portable computing powerhouse is capable of running 120-billion-parameter LLMs, roughly three times larger than GPT-3, without needing to access the internet or the cloud.
Have you ever stopped to wonder how forecasters can predict the weather days in advance, or how scientists figure out how the climate might evolve under different policies?
Researchers at University of Victoria's Astronomy Research Centre (ARC) and the University of Minnesota study the changes in the chemical composition at the surface of red giant stars.
Advances in supercomputing have made solving a long‐standing astronomical conundrum possible: How can we explain the changes in the chemical composition at the surface of red giant stars as they evolve?
Researchers used a pair of powerful supercomputers to simulate the potential trajectories of 1 million satellites in a cislunar orbit between Earth and the moon. Less than 10% of these orbits remained stable throughout the simulations, but this is not as disastrous as it may sound.
In a long-running collaboration with GE Aerospace, researchers at the University of Melbourne in Australia have been steadily
arXiv:2602.13789v1 Announce Type: new Abstract: As cloud computing scales toward the Exascale regime ($10^5+$ nodes), the prevailing "Newtonian" orchestration paradigm -- exemplified by Kubernetes -- approaches fundamental physical limits. The centralized, deterministic scheduling model suffers from $O(N)$ latency scaling, "Head-of-Line" blocking, and thermodynamic blindness, rendering it incapable of managing the stochastic chaos of next-generation AI workloads. This paper proposes a paradigm shift from orchestration to Thermodynamic Governance. We model the compute cluster not as a static state machine, but as a Dissipative Structure far from equilibrium. We introduce TEG (Thermo-Economic Governor), a decentralized architecture that establishes a rigorous topological isomorphism between cluster resource contention and many-body physics. TEG replaces the global scheduler with Langevin Agents that execute Brownian motion on a Holographic Potential Field, reducing decision complexity
The National Science Foundation said management of the machine, used by researchers for forecasts, disaster warnings and pure science, would be transferred to a “third-party operator.”
The National Science Foundation said management of the machine, used by researchers for forecasts, disaster warnings and pure science, would be transferred to a “third-party operator.”
Using the Frontier supercomputer at the Department of Energy's Oak Ridge National Laboratory, researchers from the Georgia Institute of Technology have performed the largest direct numerical simulation (DNS) of turbulence in three dimensions, attaining a record resolution of 35 trillion grid points. Tackling such a complex problem required the exascale (1 billion billion or more calculations per second) capabilities of Frontier, the world's most powerful supercomputer for open science.
A light has emerged at the end of the tunnel in the long pursuit of developing quantum computers, which are expected to radically reduce the time needed to perform some complex calculations from thousands of years down to a matter of hours.
arXiv:2601.17606v1 Announce Type: new Abstract: Performant all-to-all collective operations in MPI are critical to fast Fourier transforms, transposition, and machine learning applications. There are many existing implementations for all-to-all exchanges on emerging systems, with the achieved performance dependent on many factors, including message size, process count, architecture, and parallel system partition. This paper presents novel all-to-all algorithms for emerging many-core systems. Further, the paper presents a performance analysis against existing algorithms and system MPI, with novel algorithms achieving up to 3x speedup over system MPI at 32 nodes of state-of-the-art Sapphire Rapids systems.
From computers to smartphones, from smart appliances to the internet itself, the technology we use every day only exists thanks to decades of improvements in the semiconductor industry, that have allowed engineers to keep miniaturizing transistors and fitting more and more of them onto integrated circuits, or microchips. It's the famous Moore's scaling law, the observation—rather than an actual law—that the number of transistors on an integrated circuit tends to double roughly every two years.
Picture a Northern California vineyard, rows of grapevines bathed in morning fog, workers hand-thinning vines, exposing them to
The world’s most powerful supercomputers can now run simulations of billions of neurons, and researchers hope such models will offer unprecedented insights into how our brains work
A preliminary analysis suggests that industrially useful quantum computers designs come with a broad spectrum of energy footprints, including some larger than the most powerful existing supercomputers
arXiv:2512.21697v1 Announce Type: new Abstract: Modern heterogeneous high-performance computing (HPC) systems powered by advanced graphics processing unit (GPU) architectures enable accelerating computing with unprecedented performance and scalability. Here, we present a GPU-accelerated solver for the three-dimensional (3D) time-dependent Dirac equation optimized for distributed HPC systems. The solver named GaDE is designed to simulate the electron dynamics in atoms induced by electromagnetic fields in the relativistic regime. It combines MPI with CUDA/HIP to target both NVIDIA and AMD GPU architectures. We discuss our implementation strategies in which most of the computations are carried out on GPUs, taking advantage of the GPU-aware MPI feature to optimize communication performance. We evaluate GaDE on the pre-exascale supercomputer LUMI, powered by AMD MI250X GPUs and HPE's Slingshot interconnect. Single-GPU performance on NVIDIA A100, GH200, and AMD MI250X shows comparable
arXiv:2512.18883v1 Announce Type: cross Abstract: High Performance Computing (HPC) based simulations are crucial in Astrophysics and Cosmology (A&C), helping scientists investigate and understand complex astrophysical phenomena. Taking advantage of exascale computing capabilities is essential for these efforts. However, the unprecedented architectural complexity of exascale systems impacts legacy codes. The SPACE Centre of Excellence (CoE) aims to re-engineer key astrophysical codes to tackle new computational challenges by adopting innovative programming paradigms and software (SW) solutions. SPACE brings together scientists, code developers, HPC experts, hardware (HW) manufacturers, and SW developers. This collaboration enhances exascale A&C applications, promoting the use of exascale and post-exascale computing capabilities. Additionally, SPACE addresses high-performance data analysis for the massive data outputs from exascale simulations and modern observations, using machine
As Earth continues to warm, Australia faces some important decisions.
arXiv:2512.07401v1 Announce Type: new Abstract: Otus is a high-performance computing cluster that was launched in 2025 and is operated by the Paderborn Center for Parallel Computing (PC2) at Paderborn University in Germany. The system is part of the National High Performance Computing (NHR) initiative. Otus complements the previous supercomputer Noctua 2, offering approximately twice the computing power while retaining the three node types that were characteristic of Noctua 2: 1) CPU compute nodes with different memory capacities, 2) high-end GPU nodes, and 3) HPC-grade FPGA nodes. On the Top500 list, which ranks the 500 most powerful supercomputers in the world, Otus is in position 164 with the CPU partition and in position 255 with the GPU partition (June 2025). On the Green500 list, ranking the 500 most energy-efficient supercomputers in the world, Otus is in position 5 with the GPU partition (June 2025). This article provides a comprehensive overview of the system in terms of
Physicists have transformed a decades-old technique for simplifying quantum equations into a reusable, user-friendly "conversion table" that works on a laptop and returns results within hours.
arXiv:2512.03914v1 Announce Type: new Abstract: Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including turbulence and confinement, which are essential for optimizing fusion reactor performance. Transitioning to exascale simulations introduces significant challenges, with traditional file input/output (I/O) inefficiencies remaining a key bottleneck. This work advances BIT1, an electrostatic PIC MC code, by improving the particle mover with OpenMP task-based parallelism, integrating the openPMD streaming API, and enabling in-memory data streaming with ADIOS2's Sustainable Staging Transport (SST) engine to enhance I/O performance, computational efficiency, and system storage utilization. We employ profiling tools such as gprof, perf, IPM and Darshan, which provide insights into computation, communication, and I/O operations. We implement
arXiv:2512.03914v1 Announce Type: cross Abstract: Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including turbulence and confinement, which are essential for optimizing fusion reactor performance. Transitioning to exascale simulations introduces significant challenges, with traditional file input/output (I/O) inefficiencies remaining a key bottleneck. This work advances BIT1, an electrostatic PIC MC code, by improving the particle mover with OpenMP task-based parallelism, integrating the openPMD streaming API, and enabling in-memory data streaming with ADIOS2's Sustainable Staging Transport (SST) engine to enhance I/O performance, computational efficiency, and system storage utilization. We employ profiling tools such as gprof, perf, IPM and Darshan, which provide insights into computation, communication, and I/O operations. We implement
New York aims to democratize access to hardware often limited to federal labs and Big Tech
A.I. has added urgency to the U.S. national laboratories that have been sites of cutting-edge scientific research, leading to deals with tech giants like Nvidia to speed up.
A.I. has added urgency to the U.S. national laboratories that have been sites of cutting-edge scientific research, leading to deals with tech giants like Nvidia to speed up.
Researchers created scalable quantum circuits capable of simulating fundamental nuclear physics on more than 100 qubits. These circuits efficiently prepare complex initial states that classical computers cannot handle. The achievement demonstrates a new path toward simulating particle collisions and extreme forms of matter. It may ultimately illuminate long-standing cosmic mysteries.
Researchers have created one of the most detailed virtual mouse cortex simulations ever achieved by combining massive biological datasets with the extraordinary power of Japan’s Fugaku supercomputer. The digital brain behaves like a living system, complete with millions of neurons and tens of billions of synapses, giving scientists the ability to watch diseases like Alzheimer’s or epilepsy unfold step by step. The project opens a new path for studying brain function, tracking how damage spreads across neural circuits, and testing ideas that once required countless experiments on real tissue.
Cutting-edge simulations show that Enceladus’ plumes are losing 20–40% less mass than earlier estimates suggested. The new models provide sharper insights into subsurface conditions that future landers may one day probe directly.
A broad association of researchers from across Lawrence Berkeley National Laboratory (Berkeley Lab) and the University of California, Berkeley have collaborated to perform an unprecedented simulation of a quantum microchip, a key step forward in perfecting the chips required for this next-generation technology. The simulation used more than 7,000 NVIDIA GPUs on the Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy (DOE) user facility.
The Fugaku supercomputer built a highly detailed virtual mouse cortex with millions of neurons, enabling unprecedented simulations of brain function and disease and marking a major step toward full-brain digital models. The post Supercomputer-Driven Simulation Creates Near-Cellular Digital Mouse Cortex appeared first on GEN - Genetic Engineering and Biotechnology News.
arXiv:2511.11542v1 Announce Type: cross Abstract: Simulation of physical systems is essential in many scientific and engineering domains. Commonly used domain decomposition methods are unable to deliver high simulation rate or high utilization in network computing environments. In particular, Exascale systems deliver only a small fraction their peak performance for these workloads. This paper introduces the novel \algorithmpropernoun{} algorithm, designed to overcome these limitations. We apply this method and show simulations running in excess of 1.6 million time steps per second and simulations achieving 84 PFLOP/s. Our implementation can achieve 90\% of peak performance in both single-node and clustered environments. We illustrate the method by applying the shallow-water equations to model a tsunami following an asteroid impact at 460m-resolution on a planetary scale running on a cluster of 64 Cerebras CS-3 systems.
arXiv:2511.11542v1 Announce Type: new Abstract: Simulation of physical systems is essential in many scientific and engineering domains. Commonly used domain decomposition methods are unable to deliver high simulation rate or high utilization in network computing environments. In particular, Exascale systems deliver only a small fraction their peak performance for these workloads. This paper introduces the novel \algorithmpropernoun{} algorithm, designed to overcome these limitations. We apply this method and show simulations running in excess of 1.6 million time steps per second and simulations achieving 84 PFLOP/s. Our implementation can achieve 90\% of peak performance in both single-node and clustered environments. We illustrate the method by applying the shallow-water equations to model a tsunami following an asteroid impact at 460m-resolution on a planetary scale running on a cluster of 64 Cerebras CS-3 systems.
Aalto University researchers have developed a method to execute AI tensor operations using just one pass of light. By encoding data directly into light waves, they enable calculations to occur naturally and simultaneously. The approach works passively, without electronics, and could soon be integrated into photonic chips. If adopted, it promises dramatically faster and more energy-efficient AI systems.
arXiv:2511.10159v1 Announce Type: new Abstract: The demand for computer in our daily lives has led to the proliferation of Datacenters that power indispensable many services. On the other hand, computing has become essential for some research for various scientific fields, that require Supercomputers with vast computing capabilities to produce results in reasonable time. The scale and complexity of these systems, compared to our day-to-day devices, are like comparing a cell to a living organism. To make them work properly, we need state-of-the-art technology and engineering, not just raw resources. Interconnecting the different computer nodes that make up a whole is a delicate task, as it can become the bottleneck for the whole infrastructure. In this work, we explore two aspects of the network: how to prevent degradation under heavy use with congestion control, and how to save energy when idle with power management; and how the two may interact.
The dream of creating game-changing quantum computers—supermachines that encode information in single atoms rather than conventional bits—has been hampered by the formidable challenge known as quantum error correction.
arXiv:2511.05149v1 Announce Type: new Abstract: Over the past decade, Supercomputers and Data centers have evolved dramatically to cope with the increasing performance requirements of applications and services, such as scientific computing, generative AI, social networks or cloud services. This evolution have led these systems to incorporate high-speed networks using faster links, end nodes using multiple and dedicated accelerators, or a advancements in memory technologies to bridge the memory bottleneck. The interconnection network is a key element in these systems and it must be thoroughly designed so it is not the bottleneck of the entire system, bearing in mind the countless communication operations that generate current applications and services. Congestion is serious threat that spoils the interconnection network performance, and its effects are even more dramatic when looking at the traffic dynamics and bottlenecks generated by the communication operations mentioned above. In
From Cassini’s awe-inspiring flybys to cutting-edge simulations, scientists are decoding the secrets of Enceladus’s geysers. Supercomputer models show the icy moon’s plumes lose less mass than expected, refining our understanding of its mysterious interior. These discoveries could shape future missions that may one day explore its subsurface ocean—and perhaps even detect life below the ice.
arXiv:2511.03359v1 Announce Type: cross Abstract: We have developed a new version of the high-performance J\"ulich universal quantum computer simulator (JUQCS-50) that leverages key features of the GH200 superchips as used in the JUPITER supercomputer, enabling simulations of a 50-qubit universal quantum computer for the first time. JUQCS-50 achieves this through three key innovations: (1) extending usable memory beyond GPU limits via high-bandwidth CPU-GPU interconnects and LPDDR5 memory; (2) adaptive data encoding to reduce memory footprint with acceptable trade-offs in precision and compute effort; and (3) an on-the-fly network traffic optimizer. These advances result in an 11.4-fold speedup over the previous 48-qubit record on the K computer.
arXiv:2511.04677v1 Announce Type: new Abstract: The rapid growth of data-intensive applications such as generative AI, scientific simulations, and large-scale analytics is driving modern supercomputers and data centers toward increasingly heterogeneous and tightly integrated architectures. These systems combine powerful CPUs and accelerators with emerging high-bandwidth memory and storage technologies to reduce data movement and improve computational efficiency. However, as the number of accelerators per node increases, communication bottlenecks emerge both within and between nodes, particularly when network resources are shared among heterogeneous components.
arXiv:2511.03359v1 Announce Type: cross Abstract: We have developed a new version of the high-performance J\"ulich universal quantum computer simulator (JUQCS-50) that leverages key features of the GH200 superchips as used in the JUPITER supercomputer, enabling simulations of a 50-qubit universal quantum computer for the first time. JUQCS-50 achieves this through three key innovations: (1) extending usable memory beyond GPU limits via high-bandwidth CPU-GPU interconnects and LPDDR5 memory; (2) adaptive data encoding to reduce memory footprint with acceptable trade-offs in precision and compute effort; and (3) an on-the-fly network traffic optimizer. These advances result in an 11.4-fold speedup over the previous 48-qubit record on the K computer.
arXiv:2511.00224v1 Announce Type: cross Abstract: Quantum computers must operate in concert with classical computers to deliver on the promise of quantum advantage for practical problems. To achieve that, it is important to understand how quantum and classical computing can interact together, and how one can characterize the scalability and efficiency of hybrid quantum-classical workflows. So far, early experiments with quantum-centric supercomputing workflows have been limited in scale and complexity. Here, we use a Heron quantum processor deployed on premises with the entire supercomputer Fugaku to perform the largest computation of electronic structure involving quantum and classical high-performance computing. We design a closed-loop workflow between the quantum processors and 152,064 classical nodes of Fugaku, to approximate the electronic structure of chemistry models beyond the reach of exact diagonalization, with accuracy comparable to some all-classical approximation methods.
The nuclear reactions that fuel the sun could soon be harnessed to generate electricity on Earth — with
arXiv:2510.24545v1 Announce Type: cross Abstract: Modern simulations and observations in Astronomy & Cosmology (A&C) produce massively large data volumes, posing significant challenges for storage, access and data analysis. A long-standing bottleneck in high-performance computing, especially now in the exascale era, has been the requirement to write these large datasets to disks, which limits the performance. A promising solution to this challenge is in-situ processing, where analysis and visualization are performed concurrently with the simulation itself, bypassing the storage of the simulation data. In this work, we present new results from an approach for in-situ processing based on Hecuba, a framework that provides a highly distributed database for streaming A&C simulation data directly into the visualization pipeline to make possible on-line visualization. By integrating Hecuba with the high-performance cosmological simulator ChaNGa, we enable real-time, in-situ visualization of
arXiv:2510.24175v1 Announce Type: new Abstract: Developing and redesigning astrophysical, cosmological, and space plasma numerical codes for existing and next-generation accelerators is critical for enabling large-scale simulations. To address these challenges, the SPACE Center of Excellence (SPACE-CoE) fosters collaboration between scientists, code developers, and high-performance computing experts to optimize applications for the exascale era. This paper presents our strategy and initial results on the Leonardo system at CINECA for three flagship codes, namely gPLUTO, OpenGadget3 and iPIC3D, using profiling tools to analyze performance on single and multiple nodes. Preliminary tests show all three codes scale efficiently, reaching 80% scalability up to 1,024 GPUs.
Scientists at the University of Glasgow have harnessed a powerful supercomputer, normally used by astronomers and physicists to study the universe, to develop a new machine learning model which can help translate the language of proteins.
Researchers from Google Quantum AI report that their quantum processor, Willow, ran an algorithm for a quantum computer that solved a complex physics problem thousands of times faster than the world's most powerful classical supercomputers. If verified, this would be one of the first demonstrations of practical quantum advantage, in which a quantum computer solves a real-world problem faster and more accurately than a classical computer.
arXiv:2510.19783v1 Announce Type: new Abstract: The increase in computation and storage has led to a significant growth in the scale of systems powering applications and services, raising concerns about sustainability and operational costs. In this paper, we explore power-saving techniques in high-performance computing (HPC) and datacenter networks, and their relation with performance degradation. From this premise, we propose leveraging Energy Efficient Ethernet (EEE), with the flexibility to extend to conventional Ethernet or upcoming Ethernet-derived interconnect versions of BXI and Omnipath. We analyze the PerfBound proposal, identifying possible improvements and modeling it into a simulation framework. Through different experiments, we examine its impact on performance and determine the most appropriate interconnect. We also study traffic patterns generated by selected HPC and machine learning applications to evaluate the behavior of power-saving techniques. From these
The new quantum computing algorithm, called "Quantum Echoes," is the first that can be independently verified by running it on another quantum computer.
A team at the University at Buffalo has made it possible to simulate complex quantum systems without needing a supercomputer. By expanding the truncated Wigner approximation, they’ve created an accessible, efficient way to model real-world quantum behavior. Their method translates dense equations into a ready-to-use format that runs on ordinary computers. It could transform how physicists explore quantum phenomena.
arXiv:2510.03557v1 Announce Type: cross Abstract: Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys. To achieve the realism needed for this critical scientific partnership, detailed gas dynamics, along with a host of astrophysical effects, must be treated self-consistently with gravity for end-to-end modeling of structure formation. As an important step on this roadmap, exascale computing enables simulations that span survey-scale volumes while incorporating key subgrid processes that shape complex cosmic structures. We present results from CRK-HACC, a cosmological hydrodynamics code built for the extreme scalability requirements set by modern cosmological surveys. Using separation-of-scale techniques, GPU-resident tree solvers, in situ analysis pipelines, and multi-tiered I/O, CRK-HACC executed Frontier-E: a four trillion particle full-sky simulation, over an order
arXiv:2510.03557v1 Announce Type: new Abstract: Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys. To achieve the realism needed for this critical scientific partnership, detailed gas dynamics, along with a host of astrophysical effects, must be treated self-consistently with gravity for end-to-end modeling of structure formation. As an important step on this roadmap, exascale computing enables simulations that span survey-scale volumes while incorporating key subgrid processes that shape complex cosmic structures. We present results from CRK-HACC, a cosmological hydrodynamics code built for the extreme scalability requirements set by modern cosmological surveys. Using separation-of-scale techniques, GPU-resident tree solvers, in situ analysis pipelines, and multi-tiered I/O, CRK-HACC executed Frontier-E: a four trillion particle full-sky simulation, over an order
An international research collaboration has harnessed supercomputing power to better understand how massive slabs of ancient ocean floors are shaped as they sink hundreds of kilometers below Earth's surface.
Since the early 20th century, scientists have gathered compelling evidence that the universe is expanding at an accelerating rate. This acceleration is attributed to what is known as dark energy—a fundamental property of spacetime that has a repulsive effect on galaxies.
arXiv:2510.01170v1 Announce Type: cross Abstract: Exascale computing is transforming the field of materials science by enabling simulations of unprecedented scale and complexity. We present exa-AMD, an open-source, high-performance simulation code specifically designed for accelerated materials discovery on modern supercomputers. exa-AMD addresses the computational challenges inherent in large-scale materials discovery by employing task-based parallelization strategies and optimized data management tailored for high performance computers. The code features a modular design, supports both distributed and on-node parallelism, and is designed for flexibility and extensibility to accommodate a wide range of materials science applications. We detail the underlying algorithms and implementation, and provide comprehensive benchmark results demonstrating strong scaling across multiple high performance computing platforms. We provide two example applications, the design of Fe-Co-Zr and Na-B-C
To probe the mysteries of how galaxies evolve over time, scientists needed a supercomputer with out-of-this-world computational power.
Researchers have discovered a novel criterion for sorting particles in microfluidic channels, paving the way for advancements in disease diagnostics and liquid biopsies. Using the supercomputer "Fugaku," a joint team from the University of Osaka, Kansai University and Okayama University revealed that soft particles, like biological cells, exhibit unique focusing patterns compared to rigid particles.
Of the many roads leading to successful Artemis missions, one is paved with high-tech computing chips called superchips. Along the way, a partnership between NASA wind tunnel engineers, data visualization scientists, and software developers verified a quick, cost-effective solution to improve NASA's SLS (Space Launch System) rocket for the upcoming Artemis II mission. This will be the first crewed flight of the SLS rocket and Orion spacecraft, on an approximately 10-day journey around the moon.
Astronomers have long relied on supercomputers to simulate the immense structure of the Universe, but a new tool called Effort.jl is changing that. By mimicking the behavior of complex cosmological models, this emulator delivers results with the same accuracy — and sometimes even finer detail — in just minutes on a standard laptop. The breakthrough combines neural networks with clever use of physical knowledge, cutting computation time dramatically while preserving reliability.
arXiv:2509.13575v1 Announce Type: new Abstract: Deploying new supercomputers requires testing and evaluation via application codes. Portable, user-friendly tools enable evaluation, and the Multicomponent Flow Code (MFC), a computational fluid dynamics (CFD) code, addresses this need. MFC is adorned with a toolchain that automates input generation, compilation, batch job submission, regression testing, and benchmarking. The toolchain design enables users to evaluate compiler-hardware combinations for correctness and performance with limited software engineering experience. As with other PDE solvers, wall time per spatially discretized grid point serves as a figure of merit. We present MFC benchmarking results for five generations of NVIDIA GPUs, three generations of AMD GPUs, and various CPU architectures, utilizing Intel, Cray, NVIDIA, AMD, and GNU compilers. These tests have revealed compiler bugs and regressions on recent machines such as Frontier and El Capitan. MFC has benchmarked
Simulations still can’t predict exactly when an earthquake will happen, but with the incredible processing power of modern
Nature is the foremost international weekly scientific journal in the world and is the flagship journal for Nature Portfolio. It publishes the finest peer-reviewed research in all fields of science and technology on the basis of its originality, importance, interdisciplinary interest, timeliness, accessibility, elegance and surprising conclusions. Nature publishes landmark papers, award winning news, leading comment and expert opinion on important, topical scientific news and events that enable readers to share the latest discoveries in science and evolve the discussion amongst the global scientific community.
Researchers from DTU and Amager and Hvidovre Hospital will have access to the Gefion supercomputer in a series
arXiv:2509.08207v1 Announce Type: new Abstract: Aurora is Argonne National Laboratory's pioneering Exascale supercomputer, designed to accelerate scientific discovery with cutting-edge architectural innovations. Key new technologies include the Intel(TM) Xeon(TM) Data Center GPU Max Series (code-named Sapphire Rapids) with support for High Bandwidth Memory (HBM), alongside the Intel(TM) Data Center GPU Max Series (code-named Ponte Vecchio) on each compute node. Aurora also integrates the Distributed Asynchronous Object Storage (DAOS), a novel exascale storage solution, and leverages Intel's oneAPI programming environment. This paper presents an in-depth exploration of Aurora's node architecture, the HPE Slingshot interconnect, the supporting software ecosystem, and DAOS. We provide insights into standard benchmark performance and applications readiness efforts via Aurora's Early Science Program and the Exascale Computing Project.
arXiv:2508.20603v1 Announce Type: cross Abstract: Petabyte-scale data volumes are generated by observations and simulations in modern astronomy and astrophysics. Storage, access, and data analysis are significantly hampered by such data volumes and are leading to the development of a new generation of software tools. The Visualization Interface for the Virtual Observatory (VisIVO) has been designed, developed and maintained by INAF since 2005 to perform multi-dimensional data analysis and knowledge discovery in multivariate astrophysical datasets. Utilizing containerization and virtualization technologies, VisIVO has already been used to exploit distributed computing infrastructures including the European Open Science Cloud (EOSC). We intend to adapt VisIVO solutions for high performance visualization of data generated on the (pre-)Exascale systems by HPC applications in Astrophysics and Cosmology (A\&C), including GADGET (GAlaxies with Dark matter and Gas) and PLUTO simulations,
Darwin Monkey or 'Wukong' features over 2 billion artificial neurons and more than 100 billion synapses — similar to the neural structure of a macaque.
arXiv:2508.19138v1 Announce Type: new Abstract: Designing nanoscale electronic devices such as the currently manufactured nanoribbon field-effect transistors (NRFETs) requires advanced modeling tools capturing all relevant quantum mechanical effects. State-of-the-art approaches combine the non-equilibrium Green's function (NEGF) formalism and density functional theory (DFT). However, as device dimensions do not exceed a few nanometers anymore, electrons are confined in ultra-small volumes, giving rise to strong electron-electron interactions. To account for these critical effects, DFT+NEGF solvers should be extended with the GW approximation, which massively increases their computational intensity. Here, we present the first implementation of the NEGF+GW scheme capable of handling NRFET geometries with dimensions comparable to experiments. This package, called QuaTrEx, makes use of a novel spatial domain decomposition scheme, can treat devices made of up to 84,480 atoms, scales very
Scientists are rethinking the universe’s deepest mysteries using numerical relativity, complex computer simulations of Einstein’s equations in extreme conditions. This method could help explore what happened before the Big Bang, test theories of cosmic inflation, investigate multiverse collisions, and even model cyclic universes that endlessly bounce through creation and destruction.
arXiv:2508.13523v1 Announce Type: cross Abstract: Since its inception in 1995, LAMMPS has grown to be a world-class molecular dynamics code, with thousands of users, over one million lines of code, and multi-scale simulation capabilities. We discuss how LAMMPS has adapted to the modern heterogeneous computing landscape by integrating the Kokkos performance portability library into the existing C++ code. We investigate performance portability of simple pairwise, many-body reactive, and machine-learned force-field interatomic potentials. We present results on GPUs across different vendors and generations, and analyze performance trends, probing FLOPS throughput, memory bandwidths, cache capabilities, and thread-atomic operation performance. Finally, we demonstrate strong scaling on all current US exascale machines -- OLCF Frontier, and ALCF Aurora, and NNSA El Capitan -- for the three potentials.
arXiv:2508.13523v1 Announce Type: new Abstract: Since its inception in 1995, LAMMPS has grown to be a world-class molecular dynamics code, with thousands of users, over one million lines of code, and multi-scale simulation capabilities. We discuss how LAMMPS has adapted to the modern heterogeneous computing landscape by integrating the Kokkos performance portability library into the existing C++ code. We investigate performance portability of simple pairwise, many-body reactive, and machine-learned force-field interatomic potentials. We present results on GPUs across different vendors and generations, and analyze performance trends, probing FLOPS throughput, memory bandwidths, cache capabilities, and thread-atomic operation performance. Finally, we demonstrate strong scaling on all current US exascale machines -- OLCF Frontier, and ALCF Aurora, and NNSA El Capitan -- for the three potentials.