2025-02-23

Winter 2024 / 2025

It's mid May as I write this, our garden is in full bloom and I already had to cut the grass several times. So I guess winter season is officially over - even though some of my more dedicated friends are still going ski touring. My skiing this season was basically all in service of helping the kids learn to ski. And learn they did. During our traditional week in the Kinderhotel Kröller in the Zillertal in Austria both kids attended ski lessons. On the final race day Leonie placed second and Lukas third in their respective age groups. More importantly, both of them casually skied down a black slope with me, blabbering all the way down. From now on they should be unstoppable ;-P

My brothers Torsten and Richard visited for New Years. We had a lot of chill family time and good conversations. The only downer was that Anke couldn't make it this time because she caught Covid just before they intended to leave for Switzerland. Let's try that again soon!

Leonie and I in Brunni.
Lukas, Anita, Leonie in Brunni.
Lukas and I.
Lake Lucerne under a sea of clouds.
Calories!
Leonie is taking piano lessons and was excited to have a proper musician in the house.
Torsten's birthday.
Traditionally celebrated with a "Grüner" - disgusting stuff that somehow stuck in our family ;-P
A fiendishly difficult "The Three ???" puzzle we rented from the library. Evoked lots of childhood memories for all of us. Took us several days to complete.
Hiking up the Albis to get above the clouds.
Snow!
Awesome rope-swing in awesome weather.
Flying!
Cheese Raclette. We are in Switzerland after all.
Happy New Year!
Traditional Lego session. Lots of important construction work going on.
Flumserberg with Götti (godfather) Christian.
Crowded slopes in terrible weather ;-P
Lenzerheide with Christian.
Flumserberg with our neighbor Yvonne. Lukas and Ava go to the forest Kindergarten together.
Austria.
Zillertal Arena.
On the chairlift with Lukas during his ski lessons.
Anita. She got a lot more comfortable on skis and we went to the very top together.
Lukas receiving his medal.
Leonie receiving her medal. Only a few hundredths of a second behind first place. Which was held by a much older kid, so she did remarkably well!
Evening tradition at the hotel bar. Compensating all the day's skiing and swimming with a bunch of cocktails. Mustn't become too fit, right?
Trying the sledging right by the hotel.
I had to squeeze in as much skiing as I could while the kids were away taking their lessons ;-P

2025-01-19

Wildspitz (1206m), Höhronen (1229m)

Zürich had been in a layer of clouds for a while. A classical weather inversion situation with dense low clouds turning everything gray while you get beautiful clear blue skies up in the mountains. So I decided to escape into the sun for a few hours. Between family commitments I didn't have enough time for a big outing - and the high mountains have too much snow for comfortable hiking anyways. So I aimed for a modest range barely half an hour's drive away from home. It turned out to be the perfect choice. High enough to get above the clouds, low enough to stay below more severe snow.

15km, +555m, 3:30h. Parked the car at Biberbrugg and did a roundtrip up over the ridge and back via the Biber creek.

Slippery trail.

2025-01-13

On Prompt Engineering

I have a visceral reaction to people talking about "prompt engineering". This is an attempt to reflect on why this phrase elicits such a strong negative emotional response in me. There are a lot of things to criticize about the current AI hype and its impact on the web, the environment, the economy and society at large. That’s not my focus here. I want to instead evaluate what I understand good (software) engineering to mean and contrast that to the activity of writing prompts.

What are some properties you want a software system to exhibit? I’d argue the list should include at least:

  • Predictable. My mental model should be aligned with the software so that I can with reasonable certainty predict the outcome of calling any one function in the system. This is very much not the case for LLM prompts. Replacing a word with a synonym, fixing a spelling mistake or changing the order of statements can all have dramatic and unpredictable effects on the result.
  • Deterministic / Repeatable. The software should be deterministic. Providing the same inputs to the system should reliably produce the same outputs. LLMs have built-in randomness euphemistically referred to as “temperature”. Execute the same prompt multiple times, get a different result each time.
  • Inspectable / Debuggable. I should be able to pop open the hood, look at the code, and figure out what it is doing. If I experience bugs or unexpected outputs I should be able to trace back through the code to understand where and why the logic went wrong or differed from my expectations. None of that is true for LLMs. If I get an unexpected result all I can do is permute my prompt and hope for the best. I can’t inspect the inner workings of the network to figure out which part of the input was responsible for the deviation.
  • Composable. Software systems should be building blocks I can combine to build something bigger than the parts. LLMs are all or nothing. Their interface is effectively all of human language as input and output. Multi modal models go even further to include other modalities like images, video and audio. This is too wide and deep an interface. By its very design, an LLM is not only a monolith but also perfect spaghetti. Everything potentially affects everything else. There’s no isolation in the network, no re-use of just parts of it is possible.
  • Stable. Software should be stable across versions. LLMs change in random unpredictable ways from one release to the next. A prompt that provides good results now may become entirely useless with the next generation of the model. Numeric embeddings produced with one version are meaningless and incomparable to ones produced with a different version. Great fun to start over each time.
  • Testable. Software should be (unit-)testable with fairly high coverage. LLMs offer a single entry point for a huge set of functionality. The input and output space is enormous. It is utterly hopeless to achieve anything approaching good test coverage. All you can do is shine a flashlight in a vast dark ocean and hope that your tiny collection of training examples covers all cases relevant to your problem.
  • Efficient. Software should be resource efficient. The principle of least power applies. LLMs are massive resource hogs and you activate large fractions of them no matter how simple or complex your prompt is.
  • Fast. Most software counts QPS - queries per second and latency in milliseconds. LLMs count QPM - queries per minute and latency in seconds. Maybe fast enough for an interactive chat bot. Painfully slow for working with large data sets. Especially if conjuring up prompts is so random that responsible development involves a ton of repeated experimentation and testing on large validation sets.
  • Precise. Software interfaces should be precise. Human language is not that. It is inherently ambiguous, redundant and open to interpretation. There’s no way to ensure the LLM will choose any one particular interpretation of a statement.
  • Secure. Software should be robust against injection attacks, leaking data and other safety concerns. Given the properties listed so far, it seems highly doubtful that this can ever be guaranteed for prompting LLMs. How can we pretend to secure a system we don’t understand and can’t test comprehensively?
  • Useful. This is arguably the one point where LLMs make up for not ticking even a single one of the other boxes. They do solve problems where we don’t currently have any alternative approach for a solution.

Prompt engineering is hard for all the wrong reasons. Don’t take my word for it. Take a recent (December 2024) paper The Prompt Report: A Systematic Survey of Prompting Techniques which surveys more than 1500 other papers to compile a taxonomy of prompting techniques. Everyone is just blindly pushing and prodding the machinery, hoping to tickle it in just the right ways to provide useful results. Consider choice quotes like “exemplar order can cause accuracy to vary from sub-50% to 90%+” or “providing models with exemplars with incorrect labels may not negatively diminish performance. However, under certain settings, there is significant performance impact” or the summary “prompt engineering is fundamentally different from other ways of getting a computer to behave the way you want it to: these systems are being cajoled, not programmed, and, in addition to being quite sensitive to the specific LLM being used, they can be incredibly sensitive to specific details in prompts without there being any obvious reason those details should matter”. This is utterly ridiculous and shows our fundamental lack of understanding of cause and effect with these things.

Prompt engineering is performing alchemy in a chemistry lab. It is an insult to the chemists.

So yeah. Devising good prompts is hard and occasionally even useful. But the current practice consists primarily of cargo culting and blind experimentation. It is lacking the rigor and explicit trade-offs made in other engineering disciplines.

Gemini 1.5 Flash: "Draw a brightly colored cheerful cartoon robot with an idiotic grin on its face and bulging bloodshot eyes. It should swing a hammer and wrench in dangerous and clumsy ways."

2024-11-16

Brüschbüchel (1817m), Chruter (1881m)

Afroz suggested to go hiking on the weekend. He had a nice trail in Valais in mind. I was up for spending time outside, but the proposed route would imply a four hour one way car commute, tilting the balance of commute to hike entirely in the wrong direction in my opinion. So we spontaneously compromised on a trail near lake Klöntal. My idea was to link some smaller unremarkable mountains that I hadn't been to yet. The trick was chosing a Southerly aspect for the hike and not go too high - both to avoid getting into snow. It worked out beautifully! We enjoyed great November weather. In the sun high above the perpetually shaded Klöntal valley lying still in hoar frost. We ticked two of my three planned summits. The third looked a bit too intimidating after we had just scrambled over some karst landscape and steep slippery grass. It's most definitely doable, but at the same time easy to get stuck in these conditions when you never know whether you'll encounter a patch of snow or ice in the wrong spot. So we watched the local chamois population show off their skills in difficult terrain and enjoyed each others company and conversation on the way back to the car.

Smoke from a wood fire hangin low in the valley.
Sun!
Lake Klöntal with the mighty Glärnisch North face. That'd be a dream to climb... an intimidating prospect.
Frozen pond.
Afroz.
Sören.
Some balance required.
View towards the Wägitalersee.
Some scrambling required.
Steep slope cleared of trees. Maybe by an avalanche a long time ago?
Back into the frozen valley.