Working with LLMs is weird—you don’t always know what prompt will get the result you’re looking for.
Having worked with AI since OpenAI released the GPT-3 beta in 2020, I’ve used language models for everything from automating bank fraud detection to generating and testing thousands of variations of advertisements. For the past year-plus, I’ve written a monthly column for Every on the differences between what works for AIs and humans. I even wrote a book about prompt engineering, the delicate art of tweaking strings of words to get a model to do exactly what I want.
Lately, though, I’ve stopped writing prompts myself. Instead, I use DSPy, an automated prompt-optimization tool that is still relatively obscure, but powerful enough that it could soon do away with prompt engineers like myself. (Shopify CEO Tobi Lutke recently called DSPy “severely underhyped.”)
When DSPy started to gain traction, people breathlessly exclaimed that “prompt engineering is dead.” Rather than take it personally, I learned how to use it, and started sending my clients better prompts optimized by DSPy. I can still beat DSPy if I try hard enough, but for anyone with the time and my five years of prompt engineering experience, you’re better off relying on DSPy.
Here’s how you do it: You define the inputs you want to give to the AI (the information it needs to do the task) and define a reward function, or evaluation metric (a way to measure how well it did the task). It’s like checking to see if the answer your prompt gave matches the right answer to a question. So DSPy’s optimizers can tell if they’re doing a good job, as they automatically optimize the prompt instructions for me (given the same inputs, trying strategies for getting better outputs). It works across all the major language models, and I can swap the models in and out like Legos, without worrying about the specifics of how OpenAI works as compared to Google or Anthropic.
Say you wanted to extract the right information from thousands of differently formatted invoices rather than entering them manually. You could write a prompt, run it on a few invoices, then check what fields it got wrong, and keep manually adding rules to the prompt until it gets everything right. DSPy writes the instructions automatically, which, depending on the size of your task, could save you days worth of work.
