John Gateley

YADB: A CD Metadata database


This is one of my favorite projects, and a very early use of crowd intelligence: YADB is a CD metadata database. YADB stands for Yet Another DataBase, a riff on YACC -- yet another compiler compiler.

Music CDs do not include any metadata: no artist name, no CD title, no track titles. All they have is the number of tracks and the length of each track. J. River built a nice media player, and implemented YADB to populate CD metadata when CDs are ripped. There's a server with a database containing the mapping from number of tracks and lengths to title, artist, and track titles. Users both looked up info and submitted their own versions.

User info is notoriously inaccurate. For example, at the time I left the company, we had seen 187 different spellings or capitalizations of Pink Floyd. To improve the quality of the data, I implemented a voting system. All user submissions were kept forever, with a count. If another user submitted matching data, the count was increased, otherwise a new entry with a count of 1 was created. When a user queried the data, the highest count match was returned. In the case of Pink Floyd, the proper spelling and capitalization overwhelmingly had the highest count.

YADB is still running today, though you must use J. River's Media Center to access it. The data continues to be higher and higher quality. The algorithm also applies to cover art, which has been very successful as well.