featured

Jul 18, 2025

3 pandas Workflows That Slowed to a Crawl on Large Datasets—Until We Turned on GPUs

If you work with pandas, you’ve probably hit the wall. It’s that moment when your trusty workflow, so elegant on smaller datasets, grinds to a halt on a...

4 MIN READ

Jul 17, 2025

NVIDIA Canary‑Qwen‑2.5B: Open‑Source ASR/LLM for Superior Transcription and Summarization

Top‑ranked on the HuggingFace Open‑ASR leaderboard, the model is production‑ready.

1 MIN READ

Jul 17, 2025

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science

In our previous post, we introduced the setup of predictive modeling in chip manufacturing and operations, highlighting common challenges such as imbalanced...

6 MIN READ

Jul 16, 2025

CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design

GEMM optimization on GPUs is a modular problem. Performant implementations need to specify hyperparameters such as tile shapes, math and copy instructions, and...

12 MIN READ

Jul 10, 2025

From Terabytes to Turnkey: AI-Powered Climate Models Go Mainstream

In the race to understand our planet’s changing climate, speed and accuracy are everything. But today’s most widely used climate simulators often struggle:...

7 MIN READ

Jul 10, 2025

InfiniBand Multilayered Security Protects Data Centers and AI Workloads

In today’s data-driven world, security isn't just a feature—it's the foundation. With the exponential growth of AI, HPC, and hyperscale cloud computing, the...

6 MIN READ

Jul 10, 2025

Accelerating Video Production and Customization with GliaCloud and NVIDIA Omniverse Libraries

The proliferation of generative AI video models, along with the new workflows these models have introduced, has significantly accelerated production efficiency...

4 MIN READ

Jul 09, 2025

Reinforcement Learning with NVIDIA NeMo-RL: Reproducing a DeepScaleR Recipe Using GRPO

Reinforcement learning (RL) is the backbone of interactive AI. It is fundamental for teaching agents to reason and learn from human preferences, enabling...

5 MIN READ

Jul 09, 2025

Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

C++ libraries like CUB and Thrust provide high-level building blocks that enable NVIDIA CUDA application and library developers to write speed-of-light code...

5 MIN READ

Jul 07, 2025

Think Smart and Ask an Encyclopedia-Sized Question: Multi-Million Token Real-Time Inference for 32X More Users

Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents...

8 MIN READ

Jul 07, 2025

NVIDIA cuQuantum Adds Dynamics Gradients, DMRG, and Simulation Speedup

NVIDIA cuQuantum is an SDK of optimized libraries and tools that accelerate quantum computing emulations at both the circuit and device level by orders of...

5 MIN READ

Jul 07, 2025

Turbocharging AI Factories with DPU-Accelerated Service Proxy for Kubernetes

As AI evolves to planning, research, and reasoning with agentic AI, workflows are becoming increasingly complex. To deploy agentic AI applications efficiently,...

6 MIN READ

Jul 07, 2025

LLM Inference Benchmarking: Performance Tuning with TensorRT-LLM

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference...

11 MIN READ

Jul 03, 2025

RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups

RAPIDS, a suite of NVIDIA CUDA-X libraries for Python data science, released version 25.06, introducing exciting new features. These include a Polars GPU...

6 MIN READ

Jul 03, 2025

New Video: Build Self-Improving AI Agents with the NVIDIA Data Flywheel Blueprint

AI agents powered by large language models are transforming enterprise workflows, but high inference costs and latency can limit their scalability and user...

2 MIN READ

Jul 02, 2025

Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX

As accelerated computing continues to drive application performance in all areas of AI and scientific computing, there's a renewed interest in GPU optimization...

11 MIN READ

featured

3 pandas Workflows That Slowed to a Crawl on Large Datasets—Until We Turned on GPUs

NVIDIA Canary‑Qwen‑2.5B: Open‑Source ASR/LLM for Superior Transcription and Summarization

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science

CUTLASS 3.x: Orthogonal, Reusable, and Composable Abstractions for GEMM Kernel Design

From Terabytes to Turnkey: AI-Powered Climate Models Go Mainstream

InfiniBand Multilayered Security Protects Data Centers and AI Workloads

Accelerating Video Production and Customization with GliaCloud and NVIDIA Omniverse Libraries

Reinforcement Learning with NVIDIA NeMo-RL: Reproducing a DeepScaleR Recipe Using GRPO

Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

Think Smart and Ask an Encyclopedia-Sized Question: Multi-Million Token Real-Time Inference for 32X More Users

NVIDIA cuQuantum Adds Dynamics Gradients, DMRG, and Simulation Speedup

Turbocharging AI Factories with DPU-Accelerated Service Proxy for Kubernetes

LLM Inference Benchmarking: Performance Tuning with TensorRT-LLM

RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups

New Video: Build Self-Improving AI Agents with the NVIDIA Data Flywheel Blueprint

Advanced NVIDIA CUDA Kernel Optimization Techniques: Handwritten PTX

NVIDIA Canary‑Qwen‑2.5B: Open‑Source ASR/LLM for Superior Transcription and Summarization

Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science