30 Aug 2025 9 min read

The LLM Hype Train: A Pamphlet[?] You Should Read With Your Manager

What if I told you ChatGPT is the end of software engineering? Would you believe it? Three years ago OpenAI changed the game in the AI field with ChatGPT. ChatGPT is based on the foundation of Large Language Models, in short LLM. Since then, AI has finally made it onto the aisles of the majority of companies. Even in Germany.

When AI Hit the Office

I believe that every extreme is bad. That includes the current LLM and agentic hype. There’s always a trending topic in tech. It mostly starts with academia, catches fire in startups, and soon becomes glorified on LinkedIn or any other social media platform. That’s not new. For LLMs the same happened. BUT, What is new is how accessible LLMs are and with that, AI became “saloon ready”.

Everyone is capable of thinking of a killer use-case for applying this technology and turn around a sinking boat. This is good from the perspective that everyone is kind of enabled to come up with ideas. But there's a dark side: people oversimplify what LLMs actually are.

Prompt + text in → Solution out.

So simple. So seductive. So bad for (ML) software engineers. All of a sudden, every regex becomes a prompt. Every problem is solvable - just ask a LLM.

The Illusion of Simplicity

Prompt in, solution out — but at what cost?

Let me quote what Pydantic says on their website as of 12.07.25

https://ai.pydantic.dev/logfire/ (Debugging & Monitoring)
Applications that use LLMs have some challenges that are well known and understood: LLMs are slow, unreliable and expensive.

These applications also have some challenges that most developers have encountered much less often: LLMs are fickle and non-deterministic. Subtle changes in a prompt can completely change a model's performance, and there's no EXPLAIN query you can run to understand why.

Warning
From a software engineers point of view, you can think of LLMs as the worst database you've ever heard of, but worse.
If LLMs weren't so bloody useful, we'd never touch them.

Before LLMs we had simple LMs - Language Models. I remember a tutorial in university in which we built a tweet bot of a very active politician on Twitter. That Bot could generate a tweet after another. Same base principle as the first Large Language Models, but very limited in their general capabilities and much more like a parrot.

Why Autonomous Agents Often Fail in The Real World

Now, when you thought the hype couldn’t even get bigger, agents came along and knocked on your companies door. Agents are LLM-powered bots that autonomously execute tasks using well-defined APIs, typically via MCP (Model-Context Protocol). The foundation is still a language model, but wrapped in orchestration logic that chains steps together. The chat interface is what makes it so magical. The LLM decides then in a chain-of-thought, similar flow, when and what information it needs to request to fulfill a task.

For example:

You ask your Agent to book the cheapest flight from Berlin to Paris on next weekend. It looks up APIs, navigates on websites, compares prices, reads content - and bam, your flight has been booked! Très bien.

Except … maybe you are going to Prague. On a Tuesday. In business class. It’s bittersweet, because the technology is great, but errors accumulate. In May 2025, I saw a talk called "The Future of AI: Building the Most Impactful Technology Together" from Leandro von Werra who works at HuggingFace on PyCon DE & PyData 2025 in Wiesbaden, in which he exactly explained that issue. If an agent solves each subtask of that one big task to book a flight with a 90% accuracy, you end up with a 0.9 x 0.9 x 0.9 x 0.9 x 0.9 = 0,59% success rate. That’s almost a Bernoulli experiment, like a coin flip. Just cheaper in time and money, or you think of your travel budget as the coin.

To be fair, APIs - that the agent uses via MCP - can introduce determinism, bringing much more joy to this rigged game. If the function call is stable and predictable, you regain some control. But the agent still decides what input to send — and that’s where the chaos may return.

LLMs - The Swiss Army Knife of AI Models

Why generalist tools aren’t always the right choice

LLMs are impressive. But they are generalists (so far). Whenever I talk to someone about LLMs and their capabilities, I tell them that I see them as a Swiss Army knife. They are good at many things, but they are not specialists and are therefore only excellent at a few. Let’s circle back three years. Before LLMs had come along, I’d argue we had the major fields of:

Natural Language Processing
Computer Vision
Recommender System
Tabular data… plus fields you would find across all of those. They were independent of the bigger field, for example, Explainable AI (XAI), on-device, federated learning, and generative AI - the list goes on.

… wait.

> Generative AI?!

Yes, that’s right. We had this before. Not only does a Twitter bot count as generative AI, but also sampling from a distribution to generate sophisticated, close to real, input data counts as generative AI.

Today, this feels like the ancient way, parts that have been forgotten, buried as relics in many ML temples across the globe. I’ve recently seen an explanatory poster in the company I work for, in the coffee corner, which hierarchically organizes AI terms among each other. It went approximately like this:

generative AI (subfield of) → Deep Learning (subfield of) → ML (subfield of) → AI.

That’s misleading. It’s not ideal on two levels. First, generative AI is not restricted to Deep Learning. Yes, you could argue DL is a subfield of ML, and hence, this is right, but it’s not. This chain sets DL as a requirement. Second, our ML/AI Zoo is full of plenty of other beautiful technologies and fields. Don’t limit it to gen AI. Educate holistically and don’t give them the sugar they already had anyway.

Coming back to the analogy of a Swiss army knife. Let's take a sentiment classification use case. I’ve seen this plenty of times before LLMs were a thing. What’s the solution?

> “A supervised classifier” you say? < “Good”, I reply, you learned much.

But I reckon we’ve all seen people throwing LLMs on this. It’s overkill. Remember: They’re slow, expensive, and not deterministic. Only do that for prototyping. If you think this is useful as a feature or product, then build a proper model for this. Compared to the Swiss Army Knife, the proper classification model is more like a drilling machine. Any “traditional” classifier, in fact. Perfect for one task, but only for one. You would certainly fail for using it to hammer in a nail, but you wouldn’t think of that anyway, because it’s not the right tool and you know it. That’s something most people haven’t grasped yet when it comes to AI/ML and LLMs.

Coding With LLMs: Why Thinking Still Matters

LLMs for coding are impressive, but keep in mind that they are trained on vast amounts of data, and all kinds: genius and garbage. That means they are trained on code that is very high in quality, but also on data that is poor in quality.

Let’s assume they balance each other out, then we get an average software engineer (Since there are fewer experts than juniors, it does not balance out.). I’ve heard this many times, for example, on The Real Python Podcast: Ep 248, with Raymond Camden in which he said that it was a huge help to him with Python, since he was a novice in that field, but not so with JavaScript, since he’s an expert in that field. For me, it’s the other way around. I learn a lot from LLMs in JS and get stuff done, but I believe that the Python code I mostly get is not ideal.

Even though I do not own a glass bowl to look into the future, I can imagine that the trend with LLMs is going more towards specific LLMs.

However, the problem with LLMs is that we partially have better solutions for specific use-cases, and differentiating between “Yes, that’s a good LLM task” and “No, we use a traditional ML/DL approach” seems difficult. I found them bloody useful myself, but most often for creative tasks. I think first, before I ask the LLM. No vibe coding for me. Why would I let the LLM do the fun part? Software engineering is a craft, and I take the productivity boost that I get with LLMs cheerfully, but I never forget that I need to think through problems, design, and architecture myself, before I start ping-pong on my ideas with an LLM. If you don’t train a muscle, it degrades, and that most likely happens with your coding skills too, when you entirely let your LLM code for you. (Besides the fact that LLMs still make a lot of errors anyway, and how would you judge them as an error if you don‘t have the expertise - busted.).

We also see a lot of videos about software engineers being replaced by sophisticated AI software agents. Now, on a bad day, I might listen to that, but on any other day, I see other jobs replaced much earlier if we were talking about replacement. Everything that is mostly about organizing, managing, and decision-making is way easier to handle with LLMs. It’s just that drawing that image that software engineering can be replaced with is so much more powerful. For sure, as software engineers, we need to adjust and use what boosts our productivity, but always keep in mind that you are your greatest human capital. What do you think who will perform better in a software engineering job interview? The one person who vibe coded an app in seven days or the software engineer who thoughtfully crafted this app, which has a high coverage, good code design, and modular architecture in four weeks?

Disruption → Drawbacks → Mitigation: Is That a Typical Tech Cycle?!

We’ve seen this before, and I believe that this is somehow typical for new technology. There’s a disruptive technology, and it comes with drawbacks. Once there is a breakthrough, we start using this technology and figure out solutions to mitigate our invention along the way. We’ve seen this with other technology, too, for example:

Automobiles: have added to accident rates and air pollution. Later on, we got seatbelts, airbags, and emission standards.
The internet itself: Privacy is a big pain these days, but at least we got GDPR.

Another bummer is the mere energy consumption LLMs need. Did you know that generating an image with ChatGPT is equivalent to fully charging your phone? We already knew that the training of an LLM is costly, but inference - that is, the process of computing an answer to you, regardless of whether text, image, or voice (multi-modal) - costs a lot of computation too. And this is ongoing.

Another challenge: most of our AI tools are made in the US. We in Europe shouldn’t neglect this. First, this will become a huge privacy issue in the future, because LLMs like ChatGPT know a lot about you, your work, inclinations, etc. Second, we make ourselves dependent on the big vendors. This is an issue, in my opinion, for two reasons:

a) We rely on what they offer

b) We assume that this will always be available, but the first days of the trade dispute between Europe and the US showed that it can be faulty to assume that we can always rely on them. Luckily, the European ecosystem is getting stronger with companies such as HuggingFace, Mistral, Aleph Alpha, Stability.ai, DeepL, and Black Forest Labs.

Final Thoughts: Be a Thinker, Not Just a Prompt Engineer

I currently see two types of companies:

Those still struggling to digitize their ecosystem
Companies that are at the forefront of innovation (or at least they think they are) and are applying the latest research

Even at big firms like SAP or SICK, there’s a wide spectrum. And an IT consultant from Freiburg I know tells me the same thing: you can't use AI if you haven’t even digitized your workflows yet. So whenever you hear somebody talking about LLMs, ChatGPT, Agents, and all the latest hypes, think of whether those mentioned technologies are the right tool to get the job done. Chances are - you can guess, it’s mostly not. Take that discussion. Eventually, you will help to build a better AI ecosystem within your company.

As I mentioned, LLMs are powerful. I use them. Often. I just recently reverse-engineered an API. Creativity. Boilerplate. Those are the things I aim for. But I still think before I prompt.

I also tried out the agentic mode in VS Code. Yes, it's impressive! It's a very powerful tool, particularly when it runs code and fixes its own mistakes and bugs. I've been trying it with a Svelte app. It was blazingly fast. It's ideal for prototyping. Although it's phenomenal, I haven't learnt much. I wouldn't be able to replicate the LLM's work. I believe that my Svelte coding skills are generally not as good as those of the LLM. So, if I continue to use the LLM in agent mode, I need to ensure that I have a certain level of expertise and continue to develop myself. How would that look like?

And, in case you were wondering: I used LLMs to critique the structure and style of this article. That’s it. I wrote it entirely by myself, but I find it helpful to get feedback. I do not let it write my article, because it’s fun. I enjoy writing. It keeps me sharp. And it’s how I keep the edge that no LLM can replicate: My own thinking