Conflicted about AI

I’m feeling extremely conflicted about AI. On one hand, I’ve been learning about and using AI recently, and I’ve also been pursuing creative pursuits like audiobook narration. I’ve seen many many creative people filled with rage about AI, and specifically people trying to pass off AI-generated works as their own creative output. Even outside of this deception, AI offers “good enough” (highly debatable) versions of art, of music, of speech, and this directly impacts the ability of all the people working in these fields earning money to support themselves, to keep doing what they love and also pay the bills.

George Costanza Me: Was that wrong?

My previous post about AI used a picture that was AI-generated. Immediately following that post I see demigod-from-the-proto-Internet Jamie Zawinski in my Mastodon feed boosting a post declaring what I just did was super bad:

Before you add an AI-generated image to your blog post, have you considered saving even more time and just putting “YOU CAN STOP READING NOW” in 120 point text

Goddamn it. I was so pleased, using these cool new tools and coming up with a prompt that got ChatGPT Sora to roughly give me what I wanted for an image — a (O)llama relaxing at home watching TV in the dark. If I hadn’t done this, it’s not like I would have commissioned a human artist, or tried to draw something myself, I just would have published the post with no image at all and called it good enough.

Isn’t the net addition of an amusing picture to a blog post a good thing? But also, isn’t there something to the negative sentiment?

Also me: This is Not Good.

AI is disrupting the audiobook world. When I, as a person seeking to start doing audiobook narration look to audition for self-published books on Amazon’s ACX platform, the primary way I know people are writing the fiction books is: no AI would write this poorly! But when it comes to nonfiction, it’s much harder to know. Do I need to fall back on the classic LLM tells of bulleted lists and use of em-dashes to guess? How do I feel about auditioning to narrate a book possibly written generated by an LLM? Should it matter? Am I validating or <something>-washing the approach by doing an actual human-produced narration of the book?

At least the book’s author wanted an actual human narration, instead of relying on AI for that too. A human narration is always going to sound better (for now), but when the producer of AI-generated books is using a quantity-over-quality approach to flood the marketplace, will that matter, or will consumers eventually come to accept AI narration as their ears get used to it, much the same way pop music fans are now used to autotuned vocals?

This blog post will eventually be used to train LLMs

LLMs are being trained on the entire accessible internet (sanitized). LLMs are a distillation of all their training, and do not refer back to the original source to respond to a query. There is a negative feedback loop in the making: the more we use LLMs, the less incentive there is to publish on the internet, since it will result in fewer hits by actual users. The less published, the less training data for LLMs, unless they resort to synthetic training data.

I do not believe AGI is coming soon, or that scaling LLMs will result in another quantum leap in their capabilities. But even the current level of technology has only begun to disrupt the status quo. Even if it makes mistakes and hallucinates, it is still incredibly useful, but it will still be years before we understand where best to deploy it. Before then, I think we can expect that someone will try to use AI for every possible application, and some of these will fail in unforeseen and incredibly damaging ways.

So yeah. I’m extremely conflicted.

Local LLMs and Ollama

A Llama relaxing at home.

Lately as part of my time-off learning, I’ve been diving into AI. (Gawd, I know, right?) Of course, they have been such a big deal over the past few years I feel like I’m a little late to the party, but I’m also NOT late to the party. While big proprietary tools like ChatGPT and Claude are no longer new, development is still happening rapidly! Most interesting to me, smaller models, and “open source” models, have improved dramatically, and we can now have our own private LLM instances that we can query, for free and with no data leakage!

Step One: LM Studio on Macbook Air

Download LM Studio. Run it. Download Qwen3 4B Thinking model. Load it. Chat away. How frickin’ easy is this?? I’m running a LLM and don’t need to worry about OpenAI harvesting my queries, or having to pay a subscription. Apple M-series Macs are actually decent at running LLMs, they have a unified CPU/video memory architecture, so a Mac with a lot of RAM can hold and run big models reasonably fast, where you’d otherwise need numerous high-end GPUs with a lot of VRAM, that would be even more money and power.

But on my 16GB M3 Air, after a bit of this, one does start to notice… it’s a little slow, innit? And, the answers, when asking about obscure things, are craaazy. My favorite is to ask about 80s sitcoms My Two Dads or Who’s the Boss. Qwen3 4B said My Two Dads is about a woman named Grace who marries her two fathers after they divorce. Uh, no.

But still, I am hooked!

Hmm, my pandemic-era gaming machine is sitting idle, I bet it could run some more models…

Step Two: LM Studio on Windows with RTX 3090

Installing LM Studio on my Windows gaming machine and downloading some larger models that fit within its 3090’s 24GB VRAM blows my hair back! Better results! And fast! I immediately start making plans to upgrade the system’s RAM, and exploring LM Studio’s server options. Clearly I want a fast and large (or at least medium-sized) LLM: for dumb questions, for smart questions, for programming assistance, which seems to be the number one killer app for LLMs so far. My editor of choice is Emacs (which you will take from my cold, dead hands). I jump through a number of hoops to get LM Studio listening on a port, Windows firewall made aware, and Emacs gptel configured.

Eh, it works ok.

I’ve heard good things about Ollama, let’s try that.

Step 3: Switch to Ollama and Ellama

Ollama on Windows has a GUI, although simpler than LM Studio’s. It still allows you to download models and chat, although that’s about it. The main reasons to use Ollama are: it’s open-source (MIT), it seems to handle model dynamic loading pretty well, and its service API is a little richer than LM Studio’s OpenAI-compatible API.

Alas, trying to copy a 23GB model file from LM Studio to Ollama’s .models/ directory doesn’t work. Looks like some painful re-downloading is in our future.

I ditched gptel and am now using Ellama package, on my laptop running Emacs on Fedora. After some difficulty getting it to connect to the model server, it’s working. Pretty damn slick. It does a lot. A whole lot! I’m going to need to learn how to use all of this. I keep generating text into buffers for source code, and there are a lot of options to explore. Docs exist but is of the “assume you know what everything does” variety, which I still don’t. But, very exciting!

Aside: So Many Models!

Here in mid-2025 the models I’ve been using are gpt-oss:20b, deepseek-r1:32b, gemma3:27b, and llama4:scout. Played some with 120b and 70b versions but they’re so much slower I don’t have the patience. I am also using some proprietary LLMs (ChatGPT, Claude) and they are definitely better, but… I dunno, I just really like the idea of using a local LLM. Smaller models and distillations keep getting better. Maybe I missed the boat on LLMs initially, but I think I started playing with local LLMs at the perfect time.

Really interesting to see where models are coming from. Big US tech firms, and also really good-for-their-size Chinese models. HmmmMMMmmmm.

Step 4: LLM Server RAM upgrade

It had 32GB. I upgrade it to 128GB, the max the motherboard supports. It lets me run larger (e.g. 70B) models, although a bit more slowly, even with Ollama putting as much as it can into the 3090’s VRAM.

One default behavior of Ollama is it likes to unload models after a couple of minutes. This affects initial queries, since it has to load the 23GB file off storage and into VRAM before it can run the LLM. Disable this – keep the last LLM model loaded indefinitely.

It would also be cool if Ollama could keep multiple models loaded in RAM, although maybe only one could use VRAM at a time. Still working on figuring this out.

Step 5: Browser support

I install a Firefox plugin called “Ollama Client” on all my machines. (Does it really only have 30 users??? Is no one else using local LLMs yet??) Also works well, once I do some Firefox-specific magic to allow CORS or something.

Conclusion and Next Steps

So yeah, this all is pretty cool. I’ve been watching a lot of videos on LLMs, it’s really interesting to understand how the training and tuning phases work, all the data-scrubbing issues involved , and things like why LLMs hallucinate citations and other things. (BTW, AI owes a huge debt of gratitude to Wikipedia, Reddit, and Github!) AI is not scary, it’s just a tool. Even as a software engineer, I have no fear of being replaced by ChatGPT. It will just be my helper.

But, man! I sure wish GPUs with gobs of VRAM were cheaper! (Join the club, huh?)

Next Steps:

  1. Keep using and learning Ellama
  2. Investigate ollama-rs
  3. Investigate encrypted (i.e. https) connections to Ollama, unencrypted freaks me out even on my home network.
    • Investigate auth, so I can use my LLM server from elsewhere for free
  4. Tool use/MCPs with Ollama server
  5. Multimodal (e.g. images)