Many modern microcontrollers have on-chip debug modules which can be accessed via JTAG - these debug modules let you do stuff like stop a program in execution, examine registers/memory, single-step through the code etc.
It seems the working of the Atmega16 on-chip debug module is not completely documented - but Atmel has done one favour to tool developers - they have a device called the AVR ICE which performs whatever undocumented magic is required to extract debug info from a target AVR; now, the s/w interface between the AVR ICE and the PC is completely documented in an Atmel application note, AVR060.
Yet another interesting thing is that the AVR ICE is nothing but an Atmega16 microcontroller which contains firmware to talk JTAG with a target microcontroller - this firmware is available in binary from Atmel. There is a project which aims to create a cheap JTAG ICE clone - Build your own JTAG ICE clone. If you have no time to build a PCB or do soldering, you can get your clone from here - you can definitely build it at 1/4th the cost.
There is a GNU/Linux program called AVaRICE which not only implements the JTAG ICE communication protocol in C++ but also interfaces with gdb so that you can debug AVR code - a few days back, I started writing some Python scripts to communicate with the ICE clone - everything seemed to be working well - I was able to stop the program running on the target AVR, single-step through the code, read the program counter - everything except reading from the CPU registers and RAM - two things without which a debugger is not very useful. Finally, the problem was tracked down to not sending a `device descriptor' to the ICE from the PC. The `device descriptor' is 123 bytes of magic - give it to the ICE and everything works well!