Recordable CD

Photo courtesy Ernest von Rosen, AMGmedia

CD Encoding Issues

If you have a CD-R drive, and want to produce your own audio CDs or CD-ROMs, one of the great things you've got going in your favor is the fact that software can handle all the details for you. You can say to your software, "Please store these songs on this CD," or "Please store these data files on this CD-ROM," and the software will do the rest. Because of this, you don't need to know anything about CD data formatting to create your own CDs. However, CD data formatting is complex and interesting, so let's go into it anyway.

To understand how data are stored on a CD, you need to understand all of the different conditions the designers of the data encoding methodology were trying to handle. Here is a fairly complete list:

  • Because the laser is tracking the spiral of data using the bumps, there cannot be extended gaps where there are no bumps in the data track. To solve this problem, data is encoded using EFM (eight-fourteen modulation). In EFM, 8-bit bytes are converted to 14 bits, and it is guaranteed by EFM that some of those bits will be 1s.
  • Because the laser wants to be able to move between songs, data needs to be encoded into the music telling the drive "where it is" on the disc. This problem is solved using what is known as subcode data. Subcode data can encode the absolute and relative position of the laser in the track, and can also encode such things as song titles.
  • Because the laser may misread a bump, there need to be error-correcting codes to handle single-bit errors. To solve this problem, extra data bits are added that allow the drive to detect single-bit errors and correct them.
  • Because a scratch or a speck on the CD might cause a whole packet of bytes to be misread (known as a burst error), the drive needs to be able to recover from such an event. This problem is solved by actually interleaving the data on the disc, so that it is stored non-sequentially around one of the disc's circuits. The drive actually reads data one revolution at a time, and un-interleaves the data in order to play it.
  • If a few bytes are misread in music, the worst thing that can happen is a little fuzz during playback. When data is stored on a CD, however, any data error is catastrophic. Therefore, additional error correction codes are used when storing data on a CD-ROM.