Using item analysis to improve your tests — and your teaching

After a test, most teachers look at the grade distribution. A few look at which questions the class struggled with. Almost none look systematically at which questions worked.

That's item analysis — and it's one of the highest-leverage analytical habits a teacher can develop.

What item analysis actually tells you

For each question on a test, you can ask:

Difficulty — what percentage of students got it right?
Discrimination — did high-scoring students get it right more often than low-scoring students?
Distractor performance — for multiple choice, which wrong answers attracted students?

These three questions expose a lot about both the question and the students.

Difficulty: the simplest metric

If 95% of your students got a question right, the question didn't distinguish between students who learned the material and those who didn't. Every student succeeded. That's fine for a confidence-builder at the start of a test, but it's not doing assessment work.

If 10% of your students got it right, the question is either:

Too hard (poorly written, ambiguous, out of scope)
Testing something you didn't teach effectively
Genuinely the hardest concept in the unit

Sweet spot for most items: 50–85% correct. That's the range where a question is informative — meaningfully separating students who understood from those who didn't.

Discrimination: the diagnostic metric

Discrimination asks: among students who got this question right, did they also do well on the rest of the test?

A high-discrimination question is one where top-scoring students reliably got it right and low-scoring students reliably got it wrong. This is what you want. The question distinguishes mastery.

A low-discrimination question (or worse, a negatively discriminating one) is a warning. If top students got it wrong and low students got it right, something is off:

The question might be poorly worded
The "correct" answer might actually be wrong
There might be an unintended interpretation

The negative discrimination red flag

Any question where low-scoring students outperformed top students on that specific item deserves a second look. It almost always means the question is broken. Check the wording, the answer key, or whether the question tests something different from what you intended.

Distractor analysis: what wrong answers reveal

For multiple choice questions, look at which wrong answers students picked. If one distractor attracts a lot of students, ask why:

Is it a common misconception you need to address?
Is it a plausible-but-wrong reading of the question?
Is there a lurking ambiguity that makes two options defensible?

Distractor patterns are gold for diagnosing student thinking. If 30% of your class picked the same wrong answer, you've discovered something about how they're reasoning — and what to reteach.

Where item analysis finds the most unexpected insights

1. Your best-loved question is underperforming. Teachers often have favorite questions they reuse year after year. Item analysis sometimes reveals these questions are too easy, too hard, or ambiguously worded. The data reveals what gut sense missed.

2. A unit you thought went well actually didn't. The overall grade distribution looks good, but item analysis reveals that the average is buoyed by easy items while the items actually testing the unit's key standards underperform. The class score hides the gap.

3. A lesson you spent a lot of time on didn't stick. You taught something thoroughly; the item tests it directly; 30% of students got it right. That's a signal to re-examine the lesson, not just the test question.

4. Your answer key is wrong. It happens. Item analysis surfaces it fast. If a distractor is attracting 70% of students, either your students are systematically misreading it — or your key is wrong.

Making it a habit

You don't need to do item analysis on every quiz. A reasonable cadence:

After every unit test: review the 3–5 lowest-discrimination questions
End of quarter: look for patterns across unit tests (always weak on this standard?)
Before reusing a test: any question with poor performance gets reviewed or revised

This is a 10-minute habit once you're practiced. The payoff compounds: better tests, better-informed teaching, fewer "I thought this was clear" moments.

In PaperScorer

Item analysis reports are generated automatically after every test. You see:

Per-item correct percentage
Discrimination index for each item
Distractor selection percentages for multiple choice
Per-standard mastery if you've tagged items to standards

You don't have to build the analysis — you just have to look at it.

Key takeaway

Your test is a tool. Item analysis tells you whether the tool is working. Three questions per review — difficulty, discrimination, distractors — and your assessments get sharper every time you run them.

Using item analysis to improve your tests — and your teaching

What item analysis actually tells you

Difficulty: the simplest metric

Discrimination: the diagnostic metric

Distractor analysis: what wrong answers reveal

Where item analysis finds the most unexpected insights

Making it a habit

In PaperScorer

Keep reading

Building a year-long assessment plan that doesn't burn out your students

Formative vs summative assessments: when to use each

Writing rubrics for written responses that students actually understand

Ready to try PaperScorer?