- Ленты заголовков
- Темы
-
Newsmakers
- Army, Pentagon, CIA, FBI Tech.
- Biohacking
- Bitcoin
- Chemical computer
- CyberSex
- Cyborgs
- Elon Musk, Tesla, SpaceX ...
- Energy storage
- Fintech
- Fusion
- Google and Alphabet
- IBM
- Immunotherapy
- Intel
- Laser
- Lockheed
- Molecular
- NASA, ESA
- Nobel
- Space Launch System (NASA)
- SpaceX
- Spy
- Supercomputers
- TechInvestorNews.com
Supercomputers
Astronomers have finally cracked a decades-old mystery about red giant stars—how material from their deep interiors makes its way to the surface. Using cutting-edge supercomputer simulations, researchers discovered that stellar rotation plays a powerful role in mixing elements across a previously unexplained barrier inside the star.
arXiv:2603.19544v1 Announce Type: new Abstract: Artificial Intelligence for scientific applications increasingly requires training large models on data that cannot be centralized due to privacy constraints, data sovereignty, or the sheer volume of data generated. Federated learning (FL) addresses this by enabling collaborative training without centralizing raw data, but scientific applications demand model scales that requires extensive computing resources, typically offered at High Performance Computing (HPC) facilities. Deploying FL experiments across HPC facilities introduces challenges beyond cloud or enterprise settings. We present a comprehensive cross-facility FL framework for heterogeneous HPC environments, built on Advanced Privacy-Preserving Federated Learning (APPFL) framework with Globus Compute and Transfer orchestration, and evaluate it across four U.S. Department of Energy (DOE) leadership-class supercomputers. We demonstrate that FL experiments across HPC facilities
Carbon forms the graphite in pencils, the diamonds in jewelry and the molecules that make up every living thing. But under extreme conditions—like the heat and pressure of intense explosions—carbon can transform into exotic nanometer-sized structures called nanocarbons. These materials are often stronger than steel, lighter than plastic and adaptable for uses in medicine, energy and national security.
A neutron star merger is an extraordinary event. It features extremely powerful, chaotic magnetic fields that generate extremely energetic photons. Supercomputer simulations show that the extreme gamma-ray photons created in the mayhem can't even escape the chaos.
arXiv:2603.10970v1 Announce Type: new Abstract: Quantum computers have demonstrated utility in simulating quantum systems beyond brute-force classical approaches. As the community builds on these demonstrations to explore using quantum computing for applied research, algorithms and workflows have emerged that require leveraging both quantum computers and classical high-performance computing (HPC) systems to scale applications, especially in chemistry and materials, beyond what either system can simulate alone. Today, these disparate systems operate in isolation, forcing users to manually orchestrate workloads, coordinate job scheduling, and transfer data between systems -- a cumbersome process that hinders productivity and severely limits rapid algorithmic exploration. These challenges motivate the need for flexible and high-performance Quantum-Centric Supercomputing (QCSC) systems that integrate Quantum Processing Units (QPUs), Graphics Processing Units (GPUs), and Central Processing
Using the Frontier supercomputer at the Department of Energy’s Oak Ridge National Laboratory, researchers from the Georgia Institute
Large protein machines in the body carry out many of the cell's most essential tasks, from energy production to the regulation of signal transmission. Although they can now be imaged in great detail using cryo-electron microscopy, it has long been difficult to understand how these complexes actually move and function. Researchers at Karolinska Institutet have now developed a computational method capable of simulating the movements of some of the cell's largest protein complexes.
The portable computing powerhouse is capable of running 120-billion-parameter LLMs, roughly three times larger than GPT-3, without needing to access the internet or the cloud.
Have you ever stopped to wonder how forecasters can predict the weather days in advance, or how scientists figure out how the climate might evolve under different policies?
Researchers at University of Victoria's Astronomy Research Centre (ARC) and the University of Minnesota study the changes in the chemical composition at the surface of red giant stars.
Advances in supercomputing have made solving a long‐standing astronomical conundrum possible: How can we explain the changes in the chemical composition at the surface of red giant stars as they evolve?
Researchers used a pair of powerful supercomputers to simulate the potential trajectories of 1 million satellites in a cislunar orbit between Earth and the moon. Less than 10% of these orbits remained stable throughout the simulations, but this is not as disastrous as it may sound.
In a long-running collaboration with GE Aerospace, researchers at the University of Melbourne in Australia have been steadily
arXiv:2602.13789v1 Announce Type: new Abstract: As cloud computing scales toward the Exascale regime ($10^5+$ nodes), the prevailing "Newtonian" orchestration paradigm -- exemplified by Kubernetes -- approaches fundamental physical limits. The centralized, deterministic scheduling model suffers from $O(N)$ latency scaling, "Head-of-Line" blocking, and thermodynamic blindness, rendering it incapable of managing the stochastic chaos of next-generation AI workloads. This paper proposes a paradigm shift from orchestration to Thermodynamic Governance. We model the compute cluster not as a static state machine, but as a Dissipative Structure far from equilibrium. We introduce TEG (Thermo-Economic Governor), a decentralized architecture that establishes a rigorous topological isomorphism between cluster resource contention and many-body physics. TEG replaces the global scheduler with Langevin Agents that execute Brownian motion on a Holographic Potential Field, reducing decision complexity
The National Science Foundation said management of the machine, used by researchers for forecasts, disaster warnings and pure science, would be transferred to a “third-party operator.”
The National Science Foundation said management of the machine, used by researchers for forecasts, disaster warnings and pure science, would be transferred to a “third-party operator.”
Using the Frontier supercomputer at the Department of Energy's Oak Ridge National Laboratory, researchers from the Georgia Institute of Technology have performed the largest direct numerical simulation (DNS) of turbulence in three dimensions, attaining a record resolution of 35 trillion grid points. Tackling such a complex problem required the exascale (1 billion billion or more calculations per second) capabilities of Frontier, the world's most powerful supercomputer for open science.
A light has emerged at the end of the tunnel in the long pursuit of developing quantum computers, which are expected to radically reduce the time needed to perform some complex calculations from thousands of years down to a matter of hours.
arXiv:2601.17606v1 Announce Type: new Abstract: Performant all-to-all collective operations in MPI are critical to fast Fourier transforms, transposition, and machine learning applications. There are many existing implementations for all-to-all exchanges on emerging systems, with the achieved performance dependent on many factors, including message size, process count, architecture, and parallel system partition. This paper presents novel all-to-all algorithms for emerging many-core systems. Further, the paper presents a performance analysis against existing algorithms and system MPI, with novel algorithms achieving up to 3x speedup over system MPI at 32 nodes of state-of-the-art Sapphire Rapids systems.
From computers to smartphones, from smart appliances to the internet itself, the technology we use every day only exists thanks to decades of improvements in the semiconductor industry, that have allowed engineers to keep miniaturizing transistors and fitting more and more of them onto integrated circuits, or microchips. It's the famous Moore's scaling law, the observation—rather than an actual law—that the number of transistors on an integrated circuit tends to double roughly every two years.
Picture a Northern California vineyard, rows of grapevines bathed in morning fog, workers hand-thinning vines, exposing them to
The world’s most powerful supercomputers can now run simulations of billions of neurons, and researchers hope such models will offer unprecedented insights into how our brains work
A preliminary analysis suggests that industrially useful quantum computers designs come with a broad spectrum of energy footprints, including some larger than the most powerful existing supercomputers
arXiv:2512.21697v1 Announce Type: new Abstract: Modern heterogeneous high-performance computing (HPC) systems powered by advanced graphics processing unit (GPU) architectures enable accelerating computing with unprecedented performance and scalability. Here, we present a GPU-accelerated solver for the three-dimensional (3D) time-dependent Dirac equation optimized for distributed HPC systems. The solver named GaDE is designed to simulate the electron dynamics in atoms induced by electromagnetic fields in the relativistic regime. It combines MPI with CUDA/HIP to target both NVIDIA and AMD GPU architectures. We discuss our implementation strategies in which most of the computations are carried out on GPUs, taking advantage of the GPU-aware MPI feature to optimize communication performance. We evaluate GaDE on the pre-exascale supercomputer LUMI, powered by AMD MI250X GPUs and HPE's Slingshot interconnect. Single-GPU performance on NVIDIA A100, GH200, and AMD MI250X shows comparable
arXiv:2512.18883v1 Announce Type: cross Abstract: High Performance Computing (HPC) based simulations are crucial in Astrophysics and Cosmology (A&C), helping scientists investigate and understand complex astrophysical phenomena. Taking advantage of exascale computing capabilities is essential for these efforts. However, the unprecedented architectural complexity of exascale systems impacts legacy codes. The SPACE Centre of Excellence (CoE) aims to re-engineer key astrophysical codes to tackle new computational challenges by adopting innovative programming paradigms and software (SW) solutions. SPACE brings together scientists, code developers, HPC experts, hardware (HW) manufacturers, and SW developers. This collaboration enhances exascale A&C applications, promoting the use of exascale and post-exascale computing capabilities. Additionally, SPACE addresses high-performance data analysis for the massive data outputs from exascale simulations and modern observations, using machine
As Earth continues to warm, Australia faces some important decisions.
arXiv:2512.07401v1 Announce Type: new Abstract: Otus is a high-performance computing cluster that was launched in 2025 and is operated by the Paderborn Center for Parallel Computing (PC2) at Paderborn University in Germany. The system is part of the National High Performance Computing (NHR) initiative. Otus complements the previous supercomputer Noctua 2, offering approximately twice the computing power while retaining the three node types that were characteristic of Noctua 2: 1) CPU compute nodes with different memory capacities, 2) high-end GPU nodes, and 3) HPC-grade FPGA nodes. On the Top500 list, which ranks the 500 most powerful supercomputers in the world, Otus is in position 164 with the CPU partition and in position 255 with the GPU partition (June 2025). On the Green500 list, ranking the 500 most energy-efficient supercomputers in the world, Otus is in position 5 with the GPU partition (June 2025). This article provides a comprehensive overview of the system in terms of
Physicists have transformed a decades-old technique for simplifying quantum equations into a reusable, user-friendly "conversion table" that works on a laptop and returns results within hours.
arXiv:2512.03914v1 Announce Type: new Abstract: Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including turbulence and confinement, which are essential for optimizing fusion reactor performance. Transitioning to exascale simulations introduces significant challenges, with traditional file input/output (I/O) inefficiencies remaining a key bottleneck. This work advances BIT1, an electrostatic PIC MC code, by improving the particle mover with OpenMP task-based parallelism, integrating the openPMD streaming API, and enabling in-memory data streaming with ADIOS2's Sustainable Staging Transport (SST) engine to enhance I/O performance, computational efficiency, and system storage utilization. We employ profiling tools such as gprof, perf, IPM and Darshan, which provide insights into computation, communication, and I/O operations. We implement
arXiv:2512.03914v1 Announce Type: cross Abstract: Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including turbulence and confinement, which are essential for optimizing fusion reactor performance. Transitioning to exascale simulations introduces significant challenges, with traditional file input/output (I/O) inefficiencies remaining a key bottleneck. This work advances BIT1, an electrostatic PIC MC code, by improving the particle mover with OpenMP task-based parallelism, integrating the openPMD streaming API, and enabling in-memory data streaming with ADIOS2's Sustainable Staging Transport (SST) engine to enhance I/O performance, computational efficiency, and system storage utilization. We employ profiling tools such as gprof, perf, IPM and Darshan, which provide insights into computation, communication, and I/O operations. We implement
New York aims to democratize access to hardware often limited to federal labs and Big Tech
A.I. has added urgency to the U.S. national laboratories that have been sites of cutting-edge scientific research, leading to deals with tech giants like Nvidia to speed up.
A.I. has added urgency to the U.S. national laboratories that have been sites of cutting-edge scientific research, leading to deals with tech giants like Nvidia to speed up.
Researchers created scalable quantum circuits capable of simulating fundamental nuclear physics on more than 100 qubits. These circuits efficiently prepare complex initial states that classical computers cannot handle. The achievement demonstrates a new path toward simulating particle collisions and extreme forms of matter. It may ultimately illuminate long-standing cosmic mysteries.
Researchers have created one of the most detailed virtual mouse cortex simulations ever achieved by combining massive biological datasets with the extraordinary power of Japan’s Fugaku supercomputer. The digital brain behaves like a living system, complete with millions of neurons and tens of billions of synapses, giving scientists the ability to watch diseases like Alzheimer’s or epilepsy unfold step by step. The project opens a new path for studying brain function, tracking how damage spreads across neural circuits, and testing ideas that once required countless experiments on real tissue.
Cutting-edge simulations show that Enceladus’ plumes are losing 20–40% less mass than earlier estimates suggested. The new models provide sharper insights into subsurface conditions that future landers may one day probe directly.
A broad association of researchers from across Lawrence Berkeley National Laboratory (Berkeley Lab) and the University of California, Berkeley have collaborated to perform an unprecedented simulation of a quantum microchip, a key step forward in perfecting the chips required for this next-generation technology. The simulation used more than 7,000 NVIDIA GPUs on the Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy (DOE) user facility.
The Fugaku supercomputer built a highly detailed virtual mouse cortex with millions of neurons, enabling unprecedented simulations of brain function and disease and marking a major step toward full-brain digital models. The post Supercomputer-Driven Simulation Creates Near-Cellular Digital Mouse Cortex appeared first on GEN - Genetic Engineering and Biotechnology News.
arXiv:2511.11542v1 Announce Type: cross Abstract: Simulation of physical systems is essential in many scientific and engineering domains. Commonly used domain decomposition methods are unable to deliver high simulation rate or high utilization in network computing environments. In particular, Exascale systems deliver only a small fraction their peak performance for these workloads. This paper introduces the novel \algorithmpropernoun{} algorithm, designed to overcome these limitations. We apply this method and show simulations running in excess of 1.6 million time steps per second and simulations achieving 84 PFLOP/s. Our implementation can achieve 90\% of peak performance in both single-node and clustered environments. We illustrate the method by applying the shallow-water equations to model a tsunami following an asteroid impact at 460m-resolution on a planetary scale running on a cluster of 64 Cerebras CS-3 systems.
arXiv:2511.11542v1 Announce Type: new Abstract: Simulation of physical systems is essential in many scientific and engineering domains. Commonly used domain decomposition methods are unable to deliver high simulation rate or high utilization in network computing environments. In particular, Exascale systems deliver only a small fraction their peak performance for these workloads. This paper introduces the novel \algorithmpropernoun{} algorithm, designed to overcome these limitations. We apply this method and show simulations running in excess of 1.6 million time steps per second and simulations achieving 84 PFLOP/s. Our implementation can achieve 90\% of peak performance in both single-node and clustered environments. We illustrate the method by applying the shallow-water equations to model a tsunami following an asteroid impact at 460m-resolution on a planetary scale running on a cluster of 64 Cerebras CS-3 systems.
Aalto University researchers have developed a method to execute AI tensor operations using just one pass of light. By encoding data directly into light waves, they enable calculations to occur naturally and simultaneously. The approach works passively, without electronics, and could soon be integrated into photonic chips. If adopted, it promises dramatically faster and more energy-efficient AI systems.
arXiv:2511.10159v1 Announce Type: new Abstract: The demand for computer in our daily lives has led to the proliferation of Datacenters that power indispensable many services. On the other hand, computing has become essential for some research for various scientific fields, that require Supercomputers with vast computing capabilities to produce results in reasonable time. The scale and complexity of these systems, compared to our day-to-day devices, are like comparing a cell to a living organism. To make them work properly, we need state-of-the-art technology and engineering, not just raw resources. Interconnecting the different computer nodes that make up a whole is a delicate task, as it can become the bottleneck for the whole infrastructure. In this work, we explore two aspects of the network: how to prevent degradation under heavy use with congestion control, and how to save energy when idle with power management; and how the two may interact.
The dream of creating game-changing quantum computers—supermachines that encode information in single atoms rather than conventional bits—has been hampered by the formidable challenge known as quantum error correction.
arXiv:2511.05149v1 Announce Type: new Abstract: Over the past decade, Supercomputers and Data centers have evolved dramatically to cope with the increasing performance requirements of applications and services, such as scientific computing, generative AI, social networks or cloud services. This evolution have led these systems to incorporate high-speed networks using faster links, end nodes using multiple and dedicated accelerators, or a advancements in memory technologies to bridge the memory bottleneck. The interconnection network is a key element in these systems and it must be thoroughly designed so it is not the bottleneck of the entire system, bearing in mind the countless communication operations that generate current applications and services. Congestion is serious threat that spoils the interconnection network performance, and its effects are even more dramatic when looking at the traffic dynamics and bottlenecks generated by the communication operations mentioned above. In
From Cassini’s awe-inspiring flybys to cutting-edge simulations, scientists are decoding the secrets of Enceladus’s geysers. Supercomputer models show the icy moon’s plumes lose less mass than expected, refining our understanding of its mysterious interior. These discoveries could shape future missions that may one day explore its subsurface ocean—and perhaps even detect life below the ice.
arXiv:2511.03359v1 Announce Type: cross Abstract: We have developed a new version of the high-performance J\"ulich universal quantum computer simulator (JUQCS-50) that leverages key features of the GH200 superchips as used in the JUPITER supercomputer, enabling simulations of a 50-qubit universal quantum computer for the first time. JUQCS-50 achieves this through three key innovations: (1) extending usable memory beyond GPU limits via high-bandwidth CPU-GPU interconnects and LPDDR5 memory; (2) adaptive data encoding to reduce memory footprint with acceptable trade-offs in precision and compute effort; and (3) an on-the-fly network traffic optimizer. These advances result in an 11.4-fold speedup over the previous 48-qubit record on the K computer.
arXiv:2511.04677v1 Announce Type: new Abstract: The rapid growth of data-intensive applications such as generative AI, scientific simulations, and large-scale analytics is driving modern supercomputers and data centers toward increasingly heterogeneous and tightly integrated architectures. These systems combine powerful CPUs and accelerators with emerging high-bandwidth memory and storage technologies to reduce data movement and improve computational efficiency. However, as the number of accelerators per node increases, communication bottlenecks emerge both within and between nodes, particularly when network resources are shared among heterogeneous components.
arXiv:2511.03359v1 Announce Type: cross Abstract: We have developed a new version of the high-performance J\"ulich universal quantum computer simulator (JUQCS-50) that leverages key features of the GH200 superchips as used in the JUPITER supercomputer, enabling simulations of a 50-qubit universal quantum computer for the first time. JUQCS-50 achieves this through three key innovations: (1) extending usable memory beyond GPU limits via high-bandwidth CPU-GPU interconnects and LPDDR5 memory; (2) adaptive data encoding to reduce memory footprint with acceptable trade-offs in precision and compute effort; and (3) an on-the-fly network traffic optimizer. These advances result in an 11.4-fold speedup over the previous 48-qubit record on the K computer.
arXiv:2511.00224v1 Announce Type: cross Abstract: Quantum computers must operate in concert with classical computers to deliver on the promise of quantum advantage for practical problems. To achieve that, it is important to understand how quantum and classical computing can interact together, and how one can characterize the scalability and efficiency of hybrid quantum-classical workflows. So far, early experiments with quantum-centric supercomputing workflows have been limited in scale and complexity. Here, we use a Heron quantum processor deployed on premises with the entire supercomputer Fugaku to perform the largest computation of electronic structure involving quantum and classical high-performance computing. We design a closed-loop workflow between the quantum processors and 152,064 classical nodes of Fugaku, to approximate the electronic structure of chemistry models beyond the reach of exact diagonalization, with accuracy comparable to some all-classical approximation methods.
The nuclear reactions that fuel the sun could soon be harnessed to generate electricity on Earth — with
arXiv:2510.24545v1 Announce Type: cross Abstract: Modern simulations and observations in Astronomy & Cosmology (A&C) produce massively large data volumes, posing significant challenges for storage, access and data analysis. A long-standing bottleneck in high-performance computing, especially now in the exascale era, has been the requirement to write these large datasets to disks, which limits the performance. A promising solution to this challenge is in-situ processing, where analysis and visualization are performed concurrently with the simulation itself, bypassing the storage of the simulation data. In this work, we present new results from an approach for in-situ processing based on Hecuba, a framework that provides a highly distributed database for streaming A&C simulation data directly into the visualization pipeline to make possible on-line visualization. By integrating Hecuba with the high-performance cosmological simulator ChaNGa, we enable real-time, in-situ visualization of
arXiv:2510.24175v1 Announce Type: new Abstract: Developing and redesigning astrophysical, cosmological, and space plasma numerical codes for existing and next-generation accelerators is critical for enabling large-scale simulations. To address these challenges, the SPACE Center of Excellence (SPACE-CoE) fosters collaboration between scientists, code developers, and high-performance computing experts to optimize applications for the exascale era. This paper presents our strategy and initial results on the Leonardo system at CINECA for three flagship codes, namely gPLUTO, OpenGadget3 and iPIC3D, using profiling tools to analyze performance on single and multiple nodes. Preliminary tests show all three codes scale efficiently, reaching 80% scalability up to 1,024 GPUs.
Scientists at the University of Glasgow have harnessed a powerful supercomputer, normally used by astronomers and physicists to study the universe, to develop a new machine learning model which can help translate the language of proteins.
Researchers from Google Quantum AI report that their quantum processor, Willow, ran an algorithm for a quantum computer that solved a complex physics problem thousands of times faster than the world's most powerful classical supercomputers. If verified, this would be one of the first demonstrations of practical quantum advantage, in which a quantum computer solves a real-world problem faster and more accurately than a classical computer.
arXiv:2510.19783v1 Announce Type: new Abstract: The increase in computation and storage has led to a significant growth in the scale of systems powering applications and services, raising concerns about sustainability and operational costs. In this paper, we explore power-saving techniques in high-performance computing (HPC) and datacenter networks, and their relation with performance degradation. From this premise, we propose leveraging Energy Efficient Ethernet (EEE), with the flexibility to extend to conventional Ethernet or upcoming Ethernet-derived interconnect versions of BXI and Omnipath. We analyze the PerfBound proposal, identifying possible improvements and modeling it into a simulation framework. Through different experiments, we examine its impact on performance and determine the most appropriate interconnect. We also study traffic patterns generated by selected HPC and machine learning applications to evaluate the behavior of power-saving techniques. From these
The new quantum computing algorithm, called "Quantum Echoes," is the first that can be independently verified by running it on another quantum computer.
A team at the University at Buffalo has made it possible to simulate complex quantum systems without needing a supercomputer. By expanding the truncated Wigner approximation, they’ve created an accessible, efficient way to model real-world quantum behavior. Their method translates dense equations into a ready-to-use format that runs on ordinary computers. It could transform how physicists explore quantum phenomena.
arXiv:2510.03557v1 Announce Type: cross Abstract: Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys. To achieve the realism needed for this critical scientific partnership, detailed gas dynamics, along with a host of astrophysical effects, must be treated self-consistently with gravity for end-to-end modeling of structure formation. As an important step on this roadmap, exascale computing enables simulations that span survey-scale volumes while incorporating key subgrid processes that shape complex cosmic structures. We present results from CRK-HACC, a cosmological hydrodynamics code built for the extreme scalability requirements set by modern cosmological surveys. Using separation-of-scale techniques, GPU-resident tree solvers, in situ analysis pipelines, and multi-tiered I/O, CRK-HACC executed Frontier-E: a four trillion particle full-sky simulation, over an order
arXiv:2510.03557v1 Announce Type: new Abstract: Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys. To achieve the realism needed for this critical scientific partnership, detailed gas dynamics, along with a host of astrophysical effects, must be treated self-consistently with gravity for end-to-end modeling of structure formation. As an important step on this roadmap, exascale computing enables simulations that span survey-scale volumes while incorporating key subgrid processes that shape complex cosmic structures. We present results from CRK-HACC, a cosmological hydrodynamics code built for the extreme scalability requirements set by modern cosmological surveys. Using separation-of-scale techniques, GPU-resident tree solvers, in situ analysis pipelines, and multi-tiered I/O, CRK-HACC executed Frontier-E: a four trillion particle full-sky simulation, over an order
An international research collaboration has harnessed supercomputing power to better understand how massive slabs of ancient ocean floors are shaped as they sink hundreds of kilometers below Earth's surface.
Since the early 20th century, scientists have gathered compelling evidence that the universe is expanding at an accelerating rate. This acceleration is attributed to what is known as dark energy—a fundamental property of spacetime that has a repulsive effect on galaxies.
arXiv:2510.01170v1 Announce Type: cross Abstract: Exascale computing is transforming the field of materials science by enabling simulations of unprecedented scale and complexity. We present exa-AMD, an open-source, high-performance simulation code specifically designed for accelerated materials discovery on modern supercomputers. exa-AMD addresses the computational challenges inherent in large-scale materials discovery by employing task-based parallelization strategies and optimized data management tailored for high performance computers. The code features a modular design, supports both distributed and on-node parallelism, and is designed for flexibility and extensibility to accommodate a wide range of materials science applications. We detail the underlying algorithms and implementation, and provide comprehensive benchmark results demonstrating strong scaling across multiple high performance computing platforms. We provide two example applications, the design of Fe-Co-Zr and Na-B-C
To probe the mysteries of how galaxies evolve over time, scientists needed a supercomputer with out-of-this-world computational power.
Researchers have discovered a novel criterion for sorting particles in microfluidic channels, paving the way for advancements in disease diagnostics and liquid biopsies. Using the supercomputer "Fugaku," a joint team from the University of Osaka, Kansai University and Okayama University revealed that soft particles, like biological cells, exhibit unique focusing patterns compared to rigid particles.
Of the many roads leading to successful Artemis missions, one is paved with high-tech computing chips called superchips. Along the way, a partnership between NASA wind tunnel engineers, data visualization scientists, and software developers verified a quick, cost-effective solution to improve NASA's SLS (Space Launch System) rocket for the upcoming Artemis II mission. This will be the first crewed flight of the SLS rocket and Orion spacecraft, on an approximately 10-day journey around the moon.
Astronomers have long relied on supercomputers to simulate the immense structure of the Universe, but a new tool called Effort.jl is changing that. By mimicking the behavior of complex cosmological models, this emulator delivers results with the same accuracy — and sometimes even finer detail — in just minutes on a standard laptop. The breakthrough combines neural networks with clever use of physical knowledge, cutting computation time dramatically while preserving reliability.
arXiv:2509.13575v1 Announce Type: new Abstract: Deploying new supercomputers requires testing and evaluation via application codes. Portable, user-friendly tools enable evaluation, and the Multicomponent Flow Code (MFC), a computational fluid dynamics (CFD) code, addresses this need. MFC is adorned with a toolchain that automates input generation, compilation, batch job submission, regression testing, and benchmarking. The toolchain design enables users to evaluate compiler-hardware combinations for correctness and performance with limited software engineering experience. As with other PDE solvers, wall time per spatially discretized grid point serves as a figure of merit. We present MFC benchmarking results for five generations of NVIDIA GPUs, three generations of AMD GPUs, and various CPU architectures, utilizing Intel, Cray, NVIDIA, AMD, and GNU compilers. These tests have revealed compiler bugs and regressions on recent machines such as Frontier and El Capitan. MFC has benchmarked
Simulations still can’t predict exactly when an earthquake will happen, but with the incredible processing power of modern
Nature is the foremost international weekly scientific journal in the world and is the flagship journal for Nature Portfolio. It publishes the finest peer-reviewed research in all fields of science and technology on the basis of its originality, importance, interdisciplinary interest, timeliness, accessibility, elegance and surprising conclusions. Nature publishes landmark papers, award winning news, leading comment and expert opinion on important, topical scientific news and events that enable readers to share the latest discoveries in science and evolve the discussion amongst the global scientific community.
Researchers from DTU and Amager and Hvidovre Hospital will have access to the Gefion supercomputer in a series
arXiv:2509.08207v1 Announce Type: new Abstract: Aurora is Argonne National Laboratory's pioneering Exascale supercomputer, designed to accelerate scientific discovery with cutting-edge architectural innovations. Key new technologies include the Intel(TM) Xeon(TM) Data Center GPU Max Series (code-named Sapphire Rapids) with support for High Bandwidth Memory (HBM), alongside the Intel(TM) Data Center GPU Max Series (code-named Ponte Vecchio) on each compute node. Aurora also integrates the Distributed Asynchronous Object Storage (DAOS), a novel exascale storage solution, and leverages Intel's oneAPI programming environment. This paper presents an in-depth exploration of Aurora's node architecture, the HPE Slingshot interconnect, the supporting software ecosystem, and DAOS. We provide insights into standard benchmark performance and applications readiness efforts via Aurora's Early Science Program and the Exascale Computing Project.
arXiv:2508.20603v1 Announce Type: cross Abstract: Petabyte-scale data volumes are generated by observations and simulations in modern astronomy and astrophysics. Storage, access, and data analysis are significantly hampered by such data volumes and are leading to the development of a new generation of software tools. The Visualization Interface for the Virtual Observatory (VisIVO) has been designed, developed and maintained by INAF since 2005 to perform multi-dimensional data analysis and knowledge discovery in multivariate astrophysical datasets. Utilizing containerization and virtualization technologies, VisIVO has already been used to exploit distributed computing infrastructures including the European Open Science Cloud (EOSC). We intend to adapt VisIVO solutions for high performance visualization of data generated on the (pre-)Exascale systems by HPC applications in Astrophysics and Cosmology (A\&C), including GADGET (GAlaxies with Dark matter and Gas) and PLUTO simulations,
Darwin Monkey or 'Wukong' features over 2 billion artificial neurons and more than 100 billion synapses — similar to the neural structure of a macaque.
arXiv:2508.19138v1 Announce Type: new Abstract: Designing nanoscale electronic devices such as the currently manufactured nanoribbon field-effect transistors (NRFETs) requires advanced modeling tools capturing all relevant quantum mechanical effects. State-of-the-art approaches combine the non-equilibrium Green's function (NEGF) formalism and density functional theory (DFT). However, as device dimensions do not exceed a few nanometers anymore, electrons are confined in ultra-small volumes, giving rise to strong electron-electron interactions. To account for these critical effects, DFT+NEGF solvers should be extended with the GW approximation, which massively increases their computational intensity. Here, we present the first implementation of the NEGF+GW scheme capable of handling NRFET geometries with dimensions comparable to experiments. This package, called QuaTrEx, makes use of a novel spatial domain decomposition scheme, can treat devices made of up to 84,480 atoms, scales very
Scientists are rethinking the universe’s deepest mysteries using numerical relativity, complex computer simulations of Einstein’s equations in extreme conditions. This method could help explore what happened before the Big Bang, test theories of cosmic inflation, investigate multiverse collisions, and even model cyclic universes that endlessly bounce through creation and destruction.
arXiv:2508.13523v1 Announce Type: cross Abstract: Since its inception in 1995, LAMMPS has grown to be a world-class molecular dynamics code, with thousands of users, over one million lines of code, and multi-scale simulation capabilities. We discuss how LAMMPS has adapted to the modern heterogeneous computing landscape by integrating the Kokkos performance portability library into the existing C++ code. We investigate performance portability of simple pairwise, many-body reactive, and machine-learned force-field interatomic potentials. We present results on GPUs across different vendors and generations, and analyze performance trends, probing FLOPS throughput, memory bandwidths, cache capabilities, and thread-atomic operation performance. Finally, we demonstrate strong scaling on all current US exascale machines -- OLCF Frontier, and ALCF Aurora, and NNSA El Capitan -- for the three potentials.
arXiv:2508.13523v1 Announce Type: new Abstract: Since its inception in 1995, LAMMPS has grown to be a world-class molecular dynamics code, with thousands of users, over one million lines of code, and multi-scale simulation capabilities. We discuss how LAMMPS has adapted to the modern heterogeneous computing landscape by integrating the Kokkos performance portability library into the existing C++ code. We investigate performance portability of simple pairwise, many-body reactive, and machine-learned force-field interatomic potentials. We present results on GPUs across different vendors and generations, and analyze performance trends, probing FLOPS throughput, memory bandwidths, cache capabilities, and thread-atomic operation performance. Finally, we demonstrate strong scaling on all current US exascale machines -- OLCF Frontier, and ALCF Aurora, and NNSA El Capitan -- for the three potentials.
Simulations still can't predict exactly when an earthquake will happen, but with the incredible processing power of modern exascale supercomputers, they can now predict how they will happen and how much damage they will likely cause.
Scientists at Lawrence Livermore National Laboratory (LLNL) have helped develop an advanced, real-time tsunami forecasting system—powered by El Capitan, the world's fastest supercomputer—that could dramatically improve early warning capabilities for coastal communities near earthquake zones.
arXiv:2508.06710v1 Announce Type: new Abstract: Computational Fluid Dynamics (CFD) simulations are often constrained by the memory-bound nature of sparse matrix-vector operations, which eventually limits performance on modern high-performance computing (HPC) systems. This work introduces a novel approach to increase arithmetic intensity in CFD by leveraging repeated matrix block structures. The method transforms the conventional sparse matrix-vector product (SpMV) into a sparse matrix-matrix product (SpMM), enabling simultaneous processing of multiple right-hand sides. This shifts the computation towards a more compute-bound regime by reusing matrix coefficients. Additionally, an inline mesh-refinement strategy is proposed: simulations initially run on a coarse mesh to establish a statistically steady flow, then refine to the target mesh. This reduces the wall-clock time to reach transition, leading to faster convergence with equivalent computational cost. The methodology is evaluated
Tesla has pulled the plug on Dojo, scrapping the AI supercomputer project that Elon Musk once called essential
Pete Bannon, who joined Tesla from Apple in 2016, is leaving the electric vehicle maker.
Pete Bannon, who joined Tesla from Apple in 2016, is leaving the electric vehicle maker.
As Greenland's ice retreats, it's fueling tiny ocean organisms. To test why, scientists turned to a computer model from JPL and MIT that's been called a laboratory in itself.
arXiv:2507.20719v1 Announce Type: new Abstract: Our fully kinetic, implicit Particle-in-Cell (PIC) simulations of global magnetospheres on up to 32,768 of El Capitan's AMD Instinct MI300A Accelerated Processing Units (APUs) represent an unprecedented computational capability that addresses a fundamental challenge in space physics: resolving the multi-scale coupling between microscopic (electron-scale) and macroscopic (global-scale) dynamics in planetary magnetospheres. The implicit scheme of iPIC3D supports time steps and grid spacing that are up to 10 times larger than those of explicit methods, without sacrificing physical accuracy. This enables the simulation of magnetospheres while preserving fine-scale electron physics, which is critical for key processes such as magnetic reconnection and plasma turbulence. Our algorithmic and technological innovations include GPU-optimized kernels, particle control, and physics-aware data compression using Gaussian Mixture Models. With
Scientists at the University of Stuttgart's Institute of Aerodynamics and Gas Dynamics (IAG) have produced a novel dataset that will improve the development of turbulence models. With the help of the Hawk supercomputer at the High-Performance Computing Center Stuttgart (HLRS), investigators in the laboratory of Dr. Christoph Wenzel conducted a large-scale direct numerical simulation of a spatially evolving turbulent boundary layer.
arXiv:2507.16697v1 Announce Type: new Abstract: Turbulence plays a crucial role in multiphysics applications, including aerodynamics, fusion, and combustion. Accurately capturing turbulence's multiscale characteristics is essential for reliable predictions of multiphysics interactions, but remains a grand challenge even for exascale supercomputers and advanced deep learning models. The extreme-resolution data required to represent turbulence, ranging from billions to trillions of grid points, pose prohibitive computational costs for models based on architectures like vision transformers. To address this challenge, we introduce a multiscale hierarchical Turbulence Transformer that reduces sequence length from billions to a few millions and a novel RingX sequence parallelism approach that enables scalable long-context learning. We perform scaling and science runs on the Frontier supercomputer. Our approach demonstrates excellent performance up to 1.1 EFLOPS on 32,768 AMD GPUs, with a
arXiv:2507.16697v1 Announce Type: cross Abstract: Turbulence plays a crucial role in multiphysics applications, including aerodynamics, fusion, and combustion. Accurately capturing turbulence's multiscale characteristics is essential for reliable predictions of multiphysics interactions, but remains a grand challenge even for exascale supercomputers and advanced deep learning models. The extreme-resolution data required to represent turbulence, ranging from billions to trillions of grid points, pose prohibitive computational costs for models based on architectures like vision transformers. To address this challenge, we introduce a multiscale hierarchical Turbulence Transformer that reduces sequence length from billions to a few millions and a novel RingX sequence parallelism approach that enables scalable long-context learning. We perform scaling and science runs on the Frontier supercomputer. Our approach demonstrates excellent performance up to 1.1 EFLOPS on 32,768 AMD GPUs, with a
The Isambard-AI supercomputer is made fully operational as the government unveils fresh AI plans.
arXiv:2507.11512v1 Announce Type: cross Abstract: Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a
arXiv:2507.11512v1 Announce Type: new Abstract: Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by dense matrix operations have seen substantial speedups by utilizing low precision formats such as FP16. However, a majority of scientific simulation applications are memory bandwidth limited. Beyond preliminary studies, the practical gain from using mixed-precision algorithms on a given HPC system is largely unclear. The High Performance GMRES Mixed Precision (HPG-MxP) benchmark has been proposed to measure the useful performance of a HPC system on sparse matrix-based mixed-precision applications. In this work, we present a highly optimized implementation of the HPG-MxP benchmark for an exascale system and describe our algorithm enhancements. We show for the first time a speedup of 1.6x using a
arXiv:2507.09337v1 Announce Type: new Abstract: Heterogeneity is the prevalent trend in the rapidly evolving high-performance computing (HPC) landscape in both hardware and application software. The diversity in hardware platforms, currently comprising various accelerators and a future possibility of specializable chiplets, poses a significant challenge for scientific software developers aiming to harness optimal performance across different computing platforms while maintaining the quality of solutions when their applications are simultaneously growing more complex. Code synthesis and code generation can provide mechanisms to mitigate this challenge. We have developed a toolchain, ORCHA, which arises from the needs of a large multiphysics simulation software, Flash-X, which were not met by any of the existing solutions. ORCHA is composed of three stand-alone tools -- one to express high-level control flow and a map of what to execute where on the platform, a second one to express
arXiv:2507.09337v1 Announce Type: new Abstract: Heterogeneity is the prevalent trend in the rapidly evolving high-performance computing (HPC) landscape in both hardware and application software. The diversity in hardware platforms, currently comprising various accelerators and a future possibility of specializable chiplets, poses a significant challenge for scientific software developers aiming to harness optimal performance across different computing platforms while maintaining the quality of solutions when their applications are simultaneously growing more complex. Code synthesis and code generation can provide mechanisms to mitigate this challenge. We have developed a toolchain, ORCHA, which arises from the needs of a large multiphysics simulation software, Flash-X, which were not met by any of the existing solutions. ORCHA is composed of three stand-alone tools -- one to express high-level control flow and a map of what to execute where on the platform, a second one to express
Our Milky Way could have many more satellite galaxies than we've detected so far. They're just too faint to be seen.
Kenneth Merz, Ph.D., of Cleveland Clinic's Center for Computational Life Sciences and a team are exploring how quantum computers can work with supercomputers to better simulate molecule behavior.
Working in tandem, a quantum computer and a supercomputer modelled the behaviour of several molecules, paving the way for useful applications in chemistry and pharmaceutical research
Elon Musk's artificial intelligence startup attained an official permit to power its Memphis supercomputer facility using natural gas-burning turbines.
Researchers have used machine learning to dramatically speed up the processing time when simulating galaxy evolution coupled with supernova explosion. This approach could help us understand the origins of our own galaxy, particularly the elements essential for life in the Milky Way.
Researchers have successfully demonstrated quantum speedup in kernel-based machine learning.
Scientists have built a compact physical qubit with built-in error correction, and now say it could be scaled into a 1,000-qubit machine that is small enough to fit inside a data center. They plan to release this machine in 2031.