<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Llama.cpp on Ante Kapetanovic</title><link>https://antekapetanovic.com/tags/llama.cpp/</link><description>Recent content in Llama.cpp on Ante Kapetanovic</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 18 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://antekapetanovic.com/tags/llama.cpp/index.xml" rel="self" type="application/rss+xml"/><item><title>Ollama vs. llama.cpp vs. MLX with Qwen3.5 35B on Apple Silicon</title><link>https://antekapetanovic.com/blog/qwen3.5-apple-silicon-benchmark/</link><pubDate>Wed, 18 Mar 2026 00:00:00 +0000</pubDate><guid>https://antekapetanovic.com/blog/qwen3.5-apple-silicon-benchmark/</guid><description>&lt;p&gt;Local LLMs are fast enough for real coding work now.
You can easily fit very capable model to a machine with 32GB of RAM. On Apple Silicon with the right engine, you get 100+ tokens per second from a model like Qwen3.5 32B that can write, refactor, and debug production code. It offers reliable tool calling within any coding agent with only 3B active parameters. Check out what the CTO at HuggingFace has to say about it (&lt;a href="https://www.linkedin.com/posts/julienchaumond_if-you-like-claude-codecodex-and-have-at-activity-7435342353147793408-Q3hB?utm_source=share&amp;amp;utm_medium=member_desktop&amp;amp;rcm=ACoAACtgcOkBKUG5ONYK2d6W7FvFuEzi26FO5LU"&gt;post on LinkedIn&lt;/a&gt;):&lt;/p&gt;</description></item></channel></rss>