updates
17 TopicsBenchmark Different Capacities for EDA Workloads on Microsoft HPC Storages
Overview Semiconductor (or Electronic Design Automation [EDA]) companies prioritize reducing time to market (TTM), which depends on how quickly tasks such as chip design validation and pre-foundry work can be completed. Faster TTM also helps save on EDA licensing costs, as less time spent on work means more time available for the licenses. To achieve shorter TTM, storage solutions are crucial. As illustrated in the article “Benefits of using Azure NetApp Files for Electronic Design Automation (EDA)” (1*), with Large Volume feature, which requires a minimum size of 50TB, Azure NetApp Files can be boosted to reach up to 652,260 I/O rate at 2ms latency, and 826,379 at performance edge (~7 ms) for one Large Volume. Objective In real-world production, EDA files—such as tools, libraries, temporary files, and output—are usually stored in different volumes with varying capacities. Not every EDA job needs extremely high I/O rates or throughput. Additionally, cost is a key consideration, since larger volumes are more expensive. The objective of this article is to share benchmark results for different storage volume sizes: 50TB, 100TB, and 500TB, all using the Large Volume features. We also included a 32TB case—where Large Volume features aren't available on ANF—for comparison with Azure Managed Lustre File System (AMLFS), another Microsoft HPC storage solution. These benchmark results can help customers evaluate their real-world needs, considering factors like capacity size, I/O rate, throughput, and cost. Testing Method EDA workloads are classified into two primary types—Frontend and Backend, each with distinct requirements for the underlying storage and compute infrastructure. Frontend workloads focus on logic design and functional aspects of chip design and consist of thousands of short-duration parallel jobs with an I/O pattern characterized by frequent random reads and writes across millions of small files. Backend workloads focus on translating logic design to physical design for manufacturing and consists of hundreds of jobs involving sequential read/write of fewer larger files. The choice of a storage solution to meet this unique mix of frontend and backend workload patterns is non-trivial. Frontend and backend EDA workloads are very demanding on storage solutions – standard industry benchmarks indicate a high I/O profile of the workloads described above that include a substantial amount of NFS access, lookup, create, getattrs, link and unlink operations, as well as small and large file read and write operations. This blog contains the output from the performance testing of an industry standard benchmark for EDA. For this particular workload, the benchmark represents the I/O blend typical of a company running both front- and backend EDA workloads in parallel. Testing Environment We used 10 E64dsv5 as client VMs connecting to one single ANF or AMFLS volume with nconnect mount option (for ANF) to ensure generate enough workloads for benchmark. The client VM’s tuning and configuration are the same that specified on (1*). ANF mount option: nocto,actimeo=600,hard,rsize=262144,wsize=262144,vers=3,tcp,noatime,nconnect=8 AMLFS mount: sudo mount -t lustre -o noatime,flock All resources reside in the same VNET and same Proximity Placement Group when possible to ensure low network latency. Figure 1. High level architecture of the testing environment Benchmark Results As EDA jobs are highly latency sensitive. For today’s more complex chip designs, 2 milliseconds of latency per EDA operation is generally seen as the ideal target, while edge performance limit is around 7 milliseconds. We listed the I/O rates achieved at both latency points for easier reference. Throughput (in MB/s) is also included, as it is essential for many back-end tasks and the output phase. (Figure 2., Figure 3,. Figure 4, and Table 1.) For cases where the Large Volume feature is enabled, we observe the following: 100TB with Ultra tier and 500TB with Standard, Premium or Ultra tier can reach to over 640,000 I/O rate at 2ms latency. This is consistent to the 652,260 as stated in (*1). For Ultra 500TB volume can even reach 705,500 I/O rate at 2ms latency. For workloads not requiring much I/O rate, either 50TB with Ultra tier or 100TB with Premium tier can reach 500,000 I/O rate. For an even smaller job, 50TB with Premium tier can reach 255,000 and more inexpensive. For scenarios throughput is critical, 500TB with Standard, Premium or Ultra tier can all reach 10~12TB/s throughput. Figure 2. Latency vs. I/O rate: Azure NetApp Files- one Large Volume Figure 3. Achieved I/O rate at 2ms latency & performance edge (~7ms): Azure NetApp Files- one Large Volume Figure 4. Achieved throughput (MB/s) at 2ms latency & performance edge (~7ms): Azure NetApp Files- one Large Volume Table 1. Achieved I/O rate and Throughput at both latency: Azure NetApp Files- one Large Volume For cases with less than 50TB of capacity, where the Large Volume feature not available for ANF, we included Azure Managed Lustre File System (AMLFS) for comparison. With the same 32TB volume size, a regular ANF volume achieves about 90,000 I/O at 2ms latency, while an AMLFS Ultra volume (500 MB/s/TiB) can reach roughly double that, around 195,000. This shows that AMLFS is a better choice for performance when the Large Volume feature isn't available on ANF. (Figure 5.) Figure 5. Achieved I/O rate at 2ms latency: ANF regular volume vs. AMLFS Summary This article shared benchmark results for different storage capacities needed for EDA workloads, including 50TB, 100TB, and 500TB volumes with the Large Volume feature enabled. It also compared a 32TB volume—where the Large Volume feature isn’t available on ANF—to Azure Managed Lustre File System (AMLFS), another Microsoft HPC storage option. These results can help customers choose or design storage that best fits their needs by balancing capacity, I/O rate, throughput, and cost. With the Large Volume feature, 100TB Ultra and 500TB Standard, Premium, or Ultra tiers can achieve over 640,000 I/O at 2ms latency. For jobs that need less I/O, 50TB Ultra or 100TB Premium can reach 500,000, while 50TB Premium offers 255,000 at a lower cost. When throughput matters most, 500TB volumes across all tiers can deliver 10–12TB/s. If you have a smaller job or can’t use the Large Volume feature, Azure Managed Lustre File System (AMLFS) gives you better performance than a regular ANF volume. A final reminder, this article primarily provided benchmark results to help semiconductor customers in designing their storage solutions, considering capacity size, I/O rate, throughput, and cost. It did not address other important criteria such as heterogeneous integration or legacy compliance, which are also important when selecting an appropriate storage solution. References Benefits of using Azure NetApp Files for Electronic Design Automation (EDA) Learn more about Azure Managed LustreIntroducing NVads V710 v5 series VMs
Cost-optimized AI inference, virtual workstations, and cloud gaming. AI inferencing and graphics-intensive applications continue to demand cost-effective, low power, high performance GPUs with more GPU memory and faster CPUs. Today we are thrilled to announce the General Availability of NVads V710 v5-series virtual machines (VMs) to expand our NV VM lineup to meet these very needs. As we mentioned in our previous preview announcement, customers running small-to-medium AI/ML inferencing workloads, Virtual Desktop Infrastructure (VDI), visualization, and cloud gaming workloads need NVads V710 v5-series VMs. Each VM is powered by an AMD Radeon™ Pro V710 GPU with up to 28 GB of GPU memory, which easily serves popular small open-source language models and handles the most demanding visualization scenarios. On top of this, the vCPU cores are backed by high-frequency 4th Generation AMD EPYC™ CPUs (3.9 GHz base and 4.3 GHz max frequency) for compute-intensive workloads that demand both CPU and GPU performance. Right size the VM for your workload needs. With NVads V710 v5 VMs, you only pay for the GPU/CPU compute you need. GPU partitioning capabilities enable customers to allocate fractions of the GPU – as small as 1/6 th of a V710 GPU – according to their workload requirements. This flexibility is ideal for customers that need to support a variety of inferencing and graphical workloads efficiently without requiring a full GPU for each application. The series provides several options ranging from 1/6 of a GPU with 4 GB of memory, perfect for lightweight virtual desktop experiences, to a full V710 GPU with a massive 28 GB for graphics intensive engineering applications or AI. Out of the box performance with ROCm and Radeon PRO Graphics. AMD Radeon PRO Graphics provides a seamless and reliable experience for visualization-focused workloads. The GPU and drivers are optimized or certified for all the major ISV solutions from vendors such as Adobe and Autodesk. They also support the latest ROCm releases and are designed to seamlessly integrate with popular machine learning frameworks like PyTorch, Triton, ONNX, and vLLM to serve small to medium language models. Upgrade to get up to 2.5x boost. Our own internal benchmarks of popular VDI, rendering, and visualization tests show NVads V710 v5 VMs are up to 2.5x faster when compared to NV v4 VMs. This means you can get more work done, faster, and have an overall better experience. Customer Momentum At Azure, we work closely with our partners and customers so they can take full advantage of these new VMs and accelerate their applications. Listen to what our partners at Dizzion, Cognata, and NAIO had to say. "The new Azure NVads V710 instances, powered by AMD Radeon Pro V710 GPUs, offer exceptional performance and flexibility at competitive prices. Dizzion Desktop as a Service customers delivering CAD, BIM, edge AI, and other high-performance workloads have eagerly awaited this addition to the market.” – Ruben Spruijt, Field CTO, Dizzion “In our experience, the V710 delivers excellent performance across both CPU and GPU workloads, making it a highly capable platform for a wide range of use cases. It offers a robust and reliable software stack, particularly well-suited for OpenGL and machine learning applications.” – Danny Atsmon, CEO, Cognata “We’ve tested the V710 thoroughly across a range of AI workloads, and the performance has really impressed us. It’s fast, stable, and scales well across different scenarios. It’s become a reliable, cost-effective part of our stack, and we’ll keep building on top of it as we expand our projects.” – Dr.-Ing. Timo Sämann, Chief AI Scientist, NAIO Product Details vCPUs 4th Generation AMD EPYC™ CPU. Configurations from 4 to 28 vCPUs (3.95 GHz base, 4.3 GHz max). Memory 16 GB to 160 GB GPU AMD Radeon PRO V710 GPU with 28 GB GDDR6 memory. 1/6, 1/3, 1/2, or full GPU. Storage Up to 1 TB temporary disk Networking Up to 80 Gbps Azure Accelerated Networking NVads V710 v5 VMs, now available in 5 Azure regions. We are happy to announce NVads V710 v5 VMs are now available in East US, North Central US, South Central US, West US, and West Europe. To learn more about our VMs and how to get started, please visit our documentation page.Azure NV V710 v5: Empowering Real-Time AI/ML Inferencing and Advanced Visualization in the Cloud
Today, we’re excited to introduce the Azure NV V710 v5, the latest VM tailored for small-to-medium AI/ML inferencing workloads, Virtual Desktop Infrastructure (VDI), visualization, and cloud gaming workloads.Introducing the Azure NMads MA35D: Exclusive Media Processing Powerhouse in the Cloud
Azure is proud to announce the exclusive launch of the NMads MA35D Virtual Machine (VM) – a groundbreaking addition to our portfolio specifically designed to redefine media processing and video transcoding in the cloud.Performance & Scalability of HBv4 and HX-Series VMs with Genoa-X CPUs
Azure has announced the general availability of Azure HBv4-series and HX-series virtual machines (VMs) for high performance computing (HPC). This blog provides in-depth technical and performance information about these HPC-optimized VMs.