featured

Jul 18, 2025
3 pandas Workflows That Slowed to a Crawl on Large Datasets—Until We Turned on GPUs
If you work with pandas, you’ve probably hit the wall. It’s that moment when your trusty workflow, so elegant on smaller datasets, grinds to a halt on a...
4 MIN READ

Jul 17, 2025
NVIDIA Canary‑Qwen‑2.5B: Open‑Source ASR/LLM for Superior Transcription and Summarization
Top‑ranked on the HuggingFace Open‑ASR leaderboard, the model is production‑ready.
1 MIN READ

Jul 17, 2025
Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science
In our previous post, we introduced the setup of predictive modeling in chip manufacturing and operations, highlighting common challenges such as imbalanced...
6 MIN READ

Jul 16, 2025
CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design
GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...
12 MIN READ

Jul 10, 2025
From Terabytes to Turnkey: AI-Powered Climate Models Go Mainstream
In the race to understand our planet’s changing climate, speed and accuracy are everything. But today’s most widely used climate simulators often struggle:...
7 MIN READ

Jul 10, 2025
InfiniBand Multilayered Security Protects Data Centers and AI Workloads
In today’s data-driven world, security isn't just a feature—it's the foundation. With the exponential growth of AI, HPC, and hyperscale cloud computing, the...
6 MIN READ

Jul 10, 2025
Accelerating Video Production and Customization with GliaCloud and NVIDIA Omniverse Libraries
The proliferation of generative AI video models, along with the new workflows these models have introduced, has significantly accelerated production efficiency...
4 MIN READ

Jul 09, 2025
Reinforcement Learning with NVIDIA NeMo-RL: Reproducing a DeepScaleR Recipe Using GRPO
Reinforcement learning (RL) is the backbone of interactive AI. It is fundamental for teaching agents to reason and learn from human preferences, enabling...
5 MIN READ

Jul 09, 2025
Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python
C++ libraries like CUB and Thrust provide high-level building blocks that enable NVIDIA CUDA application and library developers to write speed-of-light code...
5 MIN READ

Jul 07, 2025
Think Smart and Ask an Encyclopedia-Sized Question: Multi-Million Token Real-Time Inference for 32X More Users
Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents...
8 MIN READ

Jul 07, 2025
NVIDIA cuQuantum Adds Dynamics Gradients, DMRG, and Simulation Speedup
NVIDIA cuQuantum is an SDK of optimized libraries and tools that accelerate quantum computing emulations at both the circuit and device level by orders of...
5 MIN READ

Jul 07, 2025
Turbocharging AI Factories with DPU-Accelerated Service Proxy for Kubernetes
As AI evolves to planning, research, and reasoning with agentic AI, workflows are becoming increasingly complex. To deploy agentic AI applications efficiently,...
6 MIN READ

Jul 07, 2025
LLM Inference Benchmarking: Performance Tuning with TensorRT-LLM
This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference...
11 MIN READ

Jul 03, 2025
RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups
RAPIDS, a suite of NVIDIA CUDA-X libraries for Python data science, released version 25.06, introducing exciting new features. These include a Polars GPU...
6 MIN READ

Jul 03, 2025
New Video: Build Self-Improving AI Agents with the NVIDIA Data Flywheel Blueprint
AI agents powered by large language models are transforming enterprise workflows, but high inference costs and latency can limit their scalability and user...
2 MIN READ

Jul 02, 2025
Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX
As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there's a renewed interest in GPU optimization...
11 MIN READ