From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs
Unlock this course with a free trial
Join today to access over 24,600 courses taught by industry experts.
Key concepts in llama.cpp walkthrough
From the course: Large Language Models on AWS: Building and Deploying Open-Source LLMs
Key concepts in llama.cpp walkthrough
- [Instructor] Let's talk through this Qwen2 coder deployment pipeline, A comprehensive guide from a high level view. And this guide is really interesting because it shows probably the most cutting edge AI coding assistant workflow that's local that you can use because we use Llama.cpp and we have full control of every single step. And this process involves several stages. Each one is a crucial role in making it run efficiently on my specific hardware. So first up here we have the Hugging Face model download stage, and this place we have a central repository for AI models. You can think of it as a GitHub for AI. It has thousands of models, including this Qwen2.5 coder from Ali. And this provides a access to a state-of-the-art coding assistant that's on par with mini commercial models. First up, we have the heavy lifting, which is we have to download this model and it's 32 gigabytes. So it's a huge, huge model. But then we go through and we get the Hugging Face CLI to download it, and…
Contents
-
-
-
(Locked)
Implications of Amdahl’s law: A walkthrough4m 5s
-
(Locked)
Compiling llama.cpp demo4m 17s
-
(Locked)
GGUF file format3m 18s
-
(Locked)
Python UV scripting3m 55s
-
Python UV packaging overview1m 59s
-
(Locked)
Key concepts in llama.cpp walkthrough4m 37s
-
(Locked)
GGUF quantized llama.cpp end-to-end demo4m 3s
-
(Locked)
Llama.cpp on AWS G5 demo4m 20s
-
(Locked)
-