Code Is Cheap. Taste, Intent, and Eval Are Not.

There is a version of the current anxiety about AI and software that I think is focused on the wrong thing, and it goes like this: if AI can write code, what is left for developers to do?

The premise is understandable. Models can generate components, patch bugs, draft architecture diagrams, write tests, explain unfamiliar code, and assemble a plausible MVP from a few paragraphs of direction. The unit of software production that used to feel scarce — the implementation of a function, a page, a script, a migration — is becoming cheaper.

But this does not make developers less important. It changes where the value is.

The scarce skill is no longer just the ability to produce code. It is the ability to operate the full loop around code: knowing what good looks like, expressing what you want clearly enough that an AI system can act on it, and evaluating whether the result actually meets the standard.

I think of this as the core loop of the AI-native developer:

Taste → Intent → Eval

Each part matters. The loop breaks if any one of them is weak.

Taste

Taste is judgment.

AI can give you ten implementations. It cannot reliably tell you which one is simple in the right way, which one will age well, which one matches the product’s character, which one is merely impressive-looking, and which one quietly introduces complexity that the team will pay for later.

Low-taste development says: it works.

High-taste development keeps asking harder questions:

Is this over-designed?
Does this interaction feel natural?
Will this abstraction survive contact with the next feature?
Is this solving a real user problem, or just demonstrating capability?
Does this feel like a first-rate product, or like a competent demo?

Taste is not just visual taste. It is technical taste, product taste, interaction taste, business taste, and even writing taste. It is the accumulated ability to distinguish the merely functional from the genuinely good.

This matters more, not less, when AI enters the workflow. AI amplifies taste. A developer with strong taste can use AI to explore more possibilities, move faster through rough work, and hold the final output to a higher standard. A developer without taste can also move faster — but mostly toward a larger pile of average work.

That is the trap. AI makes mediocrity easier to produce at scale.

The question is not whether a model can generate something that looks complete. It usually can. The question is whether you can tell when completeness is hiding weakness.

Intent

Intent is the ability to turn a vague desire into a clear direction.

Developers used to spend most of their time communicating with programming languages. Now they increasingly communicate with a system of tools: codebases, agents, design surfaces, databases, product requirements, tests, metrics, and users.

In that world, the important act is not just writing instructions. It is making intention legible.

What are we trying to build? Why this and not something else? What should the system optimize for? What should it avoid? What is the first version allowed to ignore? What does “done” mean?

This is why “prompt engineering” has always felt like too narrow a phrase. The deeper skill is intent engineering. Not the trick of writing a clever prompt, but the discipline of converting ambiguity into executable shape.

Compare these two requests:

Build me an AI website.

And:

Build an AI idea management tool for independent developers. The core is not note-taking; it is turning scattered ideas into executable projects over time. The first version should include idea capture, automatic summarization, tags, a rough value score, and suggested next actions. The design should feel closer to Linear plus Notion plus ChatGPT than to a heavy knowledge base.

The second request is not better because it contains more words. It is better because it carries a product theory. It defines the user, the shape of the problem, the boundary of the first version, the desired feel, and the things the system should not become.

That is the work.

In AI-native development, a lot of value moves upstream. The developer becomes responsible for turning chaos into intent:

A vague idea becomes a requirement.
A requirement becomes a spec.
A spec becomes a sequence of tasks.
A task becomes an agent instruction.
The agent output gets pulled back into the real goal.

The clearer the intent, the higher the ceiling of the AI output. The model is not a substitute for direction. It is a multiplier on direction.

Eval

Eval is the ability to tell whether the result is actually good.

This is the part that will become more important than most people expect. AI systems are very good at producing things that look plausible. The code reads well. The architecture diagram looks professional. The documentation sounds complete. The UI has the surface markers of polish. The test suite exists.

But plausible is not the same as correct. Complete is not the same as reliable. A demo that works once is not the same as a system that can survive production.

When AI writes more of the code, developers have to become better evaluators.

They need to ask:

Did the model misunderstand the requirement?
Did it take a shortcut that only works in the happy path?
Did it introduce a security problem?
Did it violate the existing architecture?
Did it add an abstraction because the abstraction looked sophisticated?
Did it generate tests that confirm its own assumptions instead of challenging them?
Did it solve the user problem, or just the prompt?

Eval is not only testing. Testing is one form of eval, but the larger category includes product review, code review, user experience review, metric review, security review, and taste review.

For AI systems, eval also has to become architectural. A serious agent system should not be only a generation system. It should have a harness around it: checks, benchmarks, counterexamples, review artifacts, and feedback loops.

The loop should look less like this:

AI generates → human accepts

And more like this:

AI generates → AI checks → tools verify → human samples → metrics respond → the system improves

This is where I think a lot of the next wave of tooling will matter. Not better chat boxes. Better evaluation harnesses. Systems that can create benchmarks, compare against real examples, surface uncertainty, and know when a human needs to look.

The future advantage is not simply: I can get AI to generate something.

It is:

I can tell whether what AI generated is actually good.

The Loop

Taste, intent, and eval are not separate skills. They form a loop.

Taste determines the standard.

Intent makes the standard actionable.

Eval determines whether the output met the standard.

If you have intent without taste, you can direct AI clearly toward a mediocre goal. If you have taste without intent, you may know what good looks like but fail to make it executable. If you have taste and intent without eval, you can generate impressive-looking work without knowing whether it is robust. If you have eval without taste, you can check correctness but miss excellence.

The strong AI-native developer is the person who can hold all three:

They know what is worth doing, can define it clearly, can use AI to move quickly, and can judge the result without being fooled by polish.

That is a different profile from the developer of the last era. Not completely different — writing code still matters, and deep technical understanding still matters — but the center of gravity shifts.

The developer becomes part product thinker, part systems designer, part AI operator, part workflow designer, part evaluator.

The ordinary question is:

How do I build this feature?

The better question is:

Is this feature worth building, what exactly should it become, how should AI help produce it, and how will I know whether it is good enough?

That is the difference.

Code is becoming cheap. Taste, intent, and eval are becoming expensive.

Taste

Intent

Eval

The Loop

Related writing

Stay in the loop