Multiple artists and/or genres

655321 · October 13, 2024, 1:30pm

For external lyrics it’s usually only about .lrc and .txt files (utf-8) which each share the base name of the song they belong to (which is the only link between them).

song.flac
song.lrc
song.txt

The possible information for .lrc files (which I found hardly any player to fully/correctly support) is described here for .lrc files. It’s not entirely accurate (or adhered to in the wild) concerning the timestamps tho as I found a couple of different formats that different software creates.
For what information the lyrics can generally contain, points 4.9 and 4.10 of the id3v2.3.0 spec are useful.

[mm:ss.xx]lyrics
[m:ss.xx]lyrics
[m:ss.xxx]lyrics
[mm:ss.xxx]lyrics

For each of these there’s also a variant with a space between the timestamp and the lyrics text.

In my script I decided to optionally standardize all these variants before handling them further like this:

# Conform timestamp style to [00:00.000]TEXT
def standardize_timestamps(lyrics):
    match_to_skip = r"^\[(\d{2}:\d{2}\.\d{3})\](?! )(.*)$" # if you want the final version to be [00:00.00] text, change "\d{3})\](?! )" to "\d{2})\] "
    match_to_alter = r"^\[(\d{1,2}:\d{2}\.\d{2,3})\] *(.*)$"
    lines = lyrics.split('\n')
    standardized_lyrics = []

    for line in itertools.islice(lines, 5):
        skip = re.match(match_to_skip, line)
        if skip:
            return lyrics
    for line in lines:
        match = re.match(match_to_alter, line)
        if match:
            time_stamp = match.group(1)
            text = match.group(2)
            time_obj = datetime.strptime(time_stamp, "%M:%S.%f")
            formatted_time = time_obj.strftime("%M:%S.%f")[:-3] # if you want to change the formatting to [00:00.00] text, change "[:-3]" to "[:-4]
            standardized_lyrics.append(f"[{formatted_time}]{text}") # and also change "[{formatted_time}]{text}" to "[{formatted_time}] {text}"
        else:
            standardized_lyrics.append(line)
    return '\n'.join(standardized_lyrics)

I opted for [mm:ss.xxx]lyrics for the final output as it has the highest precision and no pointless space before the lyrics.
However, this only gives you the lines with timestamps and optionally text, not special lines to specify the offset etc…

Since the ID tags are supposed to be in front of the actual lyrics, fetching additional information could be done by matching the first 10 or 20 lines (to prevent parsing countless lines needlessly) like this for the offset for example:

^\[offset: *([+-]\d+)\]$

I do not know how languages are handled in .lrc files, I’ve hardly ever come across .lrc files with any of these tags present. The majority only consists of the timestamps and the text. There’s a serious lack of documentation for external lyrics files.

A matter of preference is if you decide to delete or keep empty lines and lines with only a timestamp for synced lyrics (which give them a verse structure but lead to skipping over lines which some might dislike).
grafik
Personally I prefer them to be displayed with empty lines, which looks like this in the file:

[00:18.047]Your deep lines, your edges, the way you curve
[00:25.727]You pass by crossroads with no return
[00:33.527]And come down in springtime to tell you the truth
[00:41.207]Your branches, your breakthroughs, you draw me into you
[00:48.405]
[00:52.847]Riverine, riverine, changing into something new

In comparison, unsynced lyrics are even less documented and usually only contain the lyrics as plain text with a few tags (I found no syntax for) at the start. There might be a standard but I don’t know it.

Overall, I’d ensure that timestamps, lyrics text and the offset are correctly parsed and wait for people to start complaining about the other tags which will hopefully shine a light on the syntax variations in the wild.