Evaluating Work in the Age of AI: A Manager's PerspectiveJump to section titled Evaluating Work in the Age of AI: A Manager's Perspective
December 10, 2025
Why clarity matters more than ever, and how to bring AI-shaped work into performance conversations.
Performance reviews have always been tricky. They require us to compress months of work into neat narratives, to weigh outcomes against effort, to separate individual contribution from team dynamics. Now add AI to the mix.
When someone uses an LLM to draft code or write documentation, how do we evaluate that work? The output might be excellent. The process might have been efficient. But what exactly did the person contribute?
The Problem with Black BoxesJump to section titled The Problem with Black Boxes
The challenge isn't that AI makes work easier—we've always celebrated tools that boost productivity. The challenge is visibility. When I review a pull request, I can see the code. When I read a design document, I can trace the reasoning. But when AI assists in creating these artifacts, the seams become invisible.
This creates anxiety on both sides. Employees wonder if they'll be credited for their judgment and direction, or dismissed for "just prompting." Managers worry they can't distinguish between thoughtful AI collaboration and thoughtless copy-paste.
Toward ClarityJump to section titled Toward Clarity
I've been experimenting with a few approaches:
1. Focus on decisions, not deliverables. Instead of asking "what did you produce?" ask "what decisions did you make?" The choice to use AI is itself a decision. So is the choice of how to prompt, what to accept, what to revise, and what to reject entirely.
2. Normalize the conversation. Make AI use a regular topic in 1. Not to police it, but to understand it. "How are you using AI tools this week?" opens space for people to share their experiments and struggles.
3. Evaluate the iteration, not just the output. Someone who critically refines AI output through multiple rounds is doing different work than someone who ships the first response. Ask to see the journey, not just the destination.
The Human in the LoopJump to section titled The Human in the Loop
What we're really evaluating hasn't changed: judgment, taste, persistence, collaboration, growth. AI shifts where these qualities manifest, but doesn't eliminate them.
The engineer who prompts well, reviews critically, and integrates thoughtfully is doing skilled work. Our evaluation frameworks need to recognize this—not by counting prompts, but by honoring the thinking that surrounds them.
This is part of an ongoing exploration of AI in the workplace. More thoughts in the garden.