Ollama vs. llama.cpp vs. MLX with Qwen3.5 35B on Apple Silicon

Wed, 18 Mar 2026 00:00:00 +0000

Local LLMs are fast enough for real coding work now. You can easily fit very capable model to a machine with 32GB of RAM. On Apple Silicon with the right engine, you get 100+ tokens per second from a model like Qwen3.5 32B that can write, refactor, and debug production code. It offers reliable tool calling within any coding agent with only 3B active parameters. Check out what the CTO at HuggingFace has to say about it (post on LinkedIn):

Llama.cpp on Ante Kapetanovic

Ollama vs. llama.cpp vs. MLX with Qwen3.5 35B on Apple Silicon