Word-level time-aligned lyric support (Enhanced LRC file)

This took me way longer than expected, but it’s done for now (until someone inevitably finds a bug).
Anyone with test files and/or a current backup of your files: Feel free to try my script lyrict to test, embed or extract synced and unsynced lyrics from and to LYRICS, UNSYNCEDLYRICS, SYLT, USLT tags/frames and .lrc/.txt files. :wink:

1 Like

Just adding my support for word-synced lyrics!

2 Likes

Need this feature, and here’s the enhanced lrc files I found.

1 Like

Use the upload here.
@Tolriq I would like to see the word-synced lyrics to :laughing:
As these are also somewhat standardized:

Enhanced LRC file

Walaoke_extension

1 Like

Cannot stress enough how much I want this feature, I got a bunch of songs with syllable/ELRC Lyrics and being able to properly see them in a great music player like this would be the cherry on top.

1 Like

Looks like Jellyfin will have API support for .elrc lyrics in the next release. I think with WhisperAI and similar models getting better and better at transcribing/aligning text word by word, .elrc lyrics will become more prominent as the manual labor of creating them has decreased significantly.

2 Likes

Is this enhanced lyrics feature really implemented? Or any plan in near future?

I saw the tag as Implemented so asking out of curiosity.

2 Likes

Even im excited to know !! :smirking_face:

3 Likes

Has this feature actually been implemented? Even after updating to the latest version, it’s still like this:

Filtering was implemented unless your are connected to a server that is supposed to serve them already cleaned up.

Open a proper issue with logs and details.

Do you think you will implement highlighting of single words in the future? Right now you only remove in-line time stamps, right?

1 Like

Can someone in this thread @OcraCore or others can provide some test files with different format for those ?

Preferably with some in English or language that sounds like it’s written to test things :slight_smile:

I’ve uploaded the example elrc file (and the matching song) I created for a request for SubtitleEdit to add elrc support (that sadly went nowhere).
Description: 655321 elrc example file

@655321 how did you create the file or where did you get it ? There’s a lot of unwanted spaces and no end times.

The link I provided details how I created them in SubtitleEdit + Notepad++.

I merely followed the style detailed on the wikipedia page. Their example has spaces between timestamps and also before and after each word and no end timestamps:

[00:00.00] <00:00.04> When <00:00.16> the <00:00.82> truth <00:01.29> is <00:01.63> found <00:03.09> to <00:03.37> be <00:05.92> lies 
[00:06.47] <00:07.67> And <00:07.94> all <00:08.36> the <00:08.63> joy <00:10.28> within <00:10.53> you <00:13.09> dies 
[00:13.34] <00:14.32> Don't <00:14.73> you <00:15.14> want <00:15.57> somebody <00:16.09> to <00:16.46> love

The german version of the same article only uses spaces after each word and omits the first word timestamp. It also has no end timestamps:

[00:00.88]Freude, <00:01.66>schöner <00:03.14>Götterfunken, <00:05.30>Tochter <00:06.47>aus <00:07.16>Elysium
[00:10.06]Wir betreten feuertrunken, <00:14.90>Himmlische, <00:16.67>dein Heiligtum.

Enhanced LRC Maker yields end timestamps, lacks the first word timestamps, also has spaces before and after words and curiously adds an empty line after each line:

[00:01.495] Please <00:03.112> don't <00:03.763> reproach <00:04.618> me <00:06.275> 

[00:06.885] For, <00:08.510> for <00:09.037> how <00:09.565> empty <00:11.715> 

[00:12.409] My <00:12.779> life <00:14.031> has <00:14.846> become <00:16.504> 

[00:17.480] I <00:17.846> don't <00:18.251> know <00:19.147> what <00:19.593> really <00:20.444> happened <00:22.709> 

[00:23.402] I <00:23.734> watched <00:24.140> your <00:25.080> disappointment <00:28.045> 

[00:28.615] At <00:28.901> being <00:29.470> misunderstood <00:33.328> 

[00:34.429] I <00:39.529> forgive <00:40.458> you <00:44.559> 

[00:48.375] Oh, <00:50.198> something <00:51.174> metal <00:53.403> 

[00:54.017] Tearing <00:56.238> my <00:56.769> stomach <00:57.588> out <00:59.0010> 

[00:59.117] If <00:59.562> you <01:00.006> think <01:00.534> ill <01:01.912> of <01:02.440> me <01:03.896> 

[01:05.071] Can <01:05.643> you, <01:07.068> can <01:07.554> you <01:08.449> forgive <01:09.054> me? <01:11.086> 

[01:11.770] Forgive <01:12.254> me <01:13.144> 

[01:13.590] Can <01:14.114> you, <01:16.499> can <01:16.905> you <01:17.792> forgive <01:18.767> me <01:21.074> 

[01:21.603] Too, <01:26.097> too? <01:29.095> 

[01:31.568] I <01:32.302> tried <01:33.927> to <01:34.294> learn <01:34.708> your <01:35.158> language <01:37.234> 

[01:37.689] But <01:38.137> fell <01:38.461> asleep <01:39.922> half-undressed <01:43.170> 

[01:43.538] Unrecognizable <01:48.494> to <01:48.898> myself <01:51.828> 

LRC Generator also has a word by word mode. It yields lyrics with 2 spaces after each word, no timestamp for the first word and no end timestamps, however it retains empty lines that were present in the source lyrics:

[ti:06 Broken Harp]
[ar:PJ Harvey]
[al:White Chalk]
[la:EN]
[re:LRCgenerator.com]
[ve:4.00]

[00:01.15]Please  <00:03.09>don't  <00:03.74>reproach  <00:04.59>me
[00:06.45]For,  <00:08.60>for  <00:09.08>how  <00:09.57>empty
[00:12.24]My  <00:12.72>life  <00:13.94>has  <00:14.88>become

[00:16.13]I  <00:17.83>don't  <00:18.24>know  <00:19.14>what  <00:19.66>really  <00:20.55>happened
[00:23.07]I  <00:23.48>watched  <00:23.88>your  <00:25.17>disappointment
[00:28.12]At  <00:28.49>being  <00:29.18>misunderstood
[00:34.36]I  <00:39.60>forgive  <00:40.42>you

[00:48.32]Oh,  <00:50.23>something  <00:51.21>metal
[00:53.99]Tearing  <00:56.18>my  <00:56.70>stomach  <00:57.63>out
[00:59.01]If  <00:59.57>you  <00:59.86>think  <01:00.42>ill  <01:01.93>of  <01:02.42>me

[01:05.14]Can  <01:05.63>you,  <01:07.09>can  <01:07.58>you  <01:08.48>forgive  <01:09.01>me?
[01:11.77]Forgive  <01:12.66>me
[01:13.67]Can  <01:14.12>you,  <01:16.47>can  <01:16.87>you  <01:17.77>forgive  <01:18.66>me
[01:21.13]Too,  <01:26.16>too?

[01:31.01]I  <01:32.93>tried  <01:33.94>to  <01:34.31>learn  <01:34.79>your  <01:35.20>language
[01:37.44]But  <01:37.93>fell  <01:38.33>asleep  <01:40.08>half-undressed
[01:43.56]Unrecognizable  <01:48.46>to  <01:48.82>myself

[01:50.71]--- www.LRCgenerator.com ---

So I guess there’s not really much of a standard way.
Do let me know if there are more tools that allow creating word-synced enhanced lrc files. I don’t think any of the tools I’ve found so far are particulary good/precise.

Last word ending:
Personally I think there are two solid ways to denote the end of the last word.

  1. when the first word of a line has it’s own timestamp, the line timestamp marks the end of the last word of the previous line.
[00:01.219]<00:01.219>word1 <00:03.065>word2
[00:05.625]<00:06.768>word3 <00:08.567>word4

Here, word2 ends at 00:05.625 while word3 gets highlighted at 00:06.768.

  1. with an end timestamp and without a first word timestamp:
[00:01.219]word1 <00:03.065>word2 <00:05.625>
[00:06.768]word3 <00:08.567>word4

Spaces:
While I agree that spaces before and after each word are wasteful, not having any is really poorly readable:

[00:01.219]<00:01.219>Please<00:03.065>don't<00:03.766>reproach<00:04.452>me
[00:06.768]<00:06.768>For,<00:08.567>for<00:09.034>how<00:09.434>empty
[00:12.189]<00:12.189>My<00:12.655>life<00:14.009>has<00:14.770>become
[00:16.564]

Having a space after each timestamp-word pair is a good compromise imho:

[00:01.219]<00:01.219>Please <00:03.065>don't <00:03.766>reproach <00:04.452>me
[00:06.768]<00:06.768>For, <00:08.567>for <00:09.034>how <00:09.434>empty
[00:12.189]<00:12.189>My <00:12.655>life <00:14.009>has <00:14.770>become
[00:16.564]

Do let me know if there are other variants I’ve not yet encountered.

Edit:
lrcmaker also yields syntax in line with the english wiki (I tested a different song):

[ar:Midlake & Madison Cunningham]
[al:unknown]
[au:Midlake & Madison Cunningham]
[length:3:41.18]
[by:lrcmaker.com]
[ti:Guardian]
[vi:]
[00:00.00] <00:18.60> I <00:18.87> don’t <00:19.25> know <00:19.69> what <00:19.90> it <00:20.08> is
[00:20.56] <00:20.83> To <00:21.02> put <00:21.22> my <00:21.55> mind <00:21.81> at <00:22.00> ease
[00:22.97] <00:23.18> Further <00:23.37> on <00:23.56> up <00:24.00> the <00:24.19> road
[00:24.39] <00:24.96> Many <00:25.18> miles <00:25.38> carried
[00:26.81] <00:27.02> For <00:27.21> the <00:27.40> fearless <00:27.74> in <00:28.62> kind
[00:29.32] <00:29.59> Are <00:29.81> entwined <00:30.01> and <00:30.54> the
[00:30.81] <00:31.57> Fearful <00:31.78> are <00:32.79> denied <00:33.86> of <00:34.05> what <00:34.24> wine <00:34.74> hands <00:35.04> thee
[00:35.55] <00:35.95> Gonna <00:36.19> feel <00:36.42> it <00:37.48> out
[00:37.99] <00:38.22> Gonna <00:38.42> feel <00:38.62> it <00:39.60> in
[00:40.11] <00:40.44> Who <00:40.68> alone’s <00:40.88> without
[00:42.17] <00:42.59> Who <00:42.86> atones <00:43.09> within
[00:44.52] <00:44.87> What’s <00:45.09> inside <00:45.86> of <00:46.15> a <00:46.56> line
[00:46.98] <00:47.37> Drawn <00:47.57> outside <00:47.77> of <00:48.23> thee
[00:49.14] <00:49.39> Anymore <00:49.61> than <00:50.75> what <00:50.98> time
[00:51.41] <00:51.66> Has <00:51.87> aligned <00:52.07> for
[00:52.93] <00:53.26> The <00:53.65> Guardians <00:54.53> of <00:55.01> an <00:55.31> Audience <00:57.00> above
[00:57.68] <00:58.35> An <01:01.68> Audience <01:02.16> for <01:04.00> the <01:04.57> Guardians 
2 Likes

Ok so complete mess :frowning:

There’s many other extensions, to support start and end of syllables, indicate the voice and everything.

I hate lyrics.

I think that’s overkill. At that point you could just as well use a karaoke program that does only that. Supporting word timings is already fairly exotic for a music player.

What would you consider to be the ideal syntax for word timed enhanced lyrics (spaces? and if so, how many and where? first word time vs. end time)?

The spaces does not really matter, lrc are not made for humans if you add the timestamps :wink:

I’ll add removal of the extra ones as it’s the most common case.

Word sync only means that you can’t really animate the word, just highlight it because you don’t know if the next start is near the end of the current or just a pause, same for the end.

So for word sync end times are useless.

For the syllable stuff and singer colors it’s actually fun.

01 Wake Me Up (Feat. Justice).lrc.txt (7.7 KB)

Sure, but that doesn’t mean that they have to be completely unreadable. Otherwise you could argue that the line breaks are also not needed (since you can just parse from one [mm:ss.xxx] timestamp to the next) and slam everything into a single line.

[00:10.408]v1:<00:10.408>All <00:10.608><00:10.608>I <00:10.835><00:10.835>have <00:11.259><00:11.259>is <00:12.476><00:12.643>my <00:13.008><00:13.008>leg<00:14.744><00:14.823>acy <00:16.479><00:16.479>

What does the v1 stand for?

I’m guessing the double timestamp at the end is “end of last word” + “end of line”?
Imho for 99% of lyrics that’s overkill and probably a nightmare to create (unless you use an AI tool to do it automatically, but I’ve had really mixed results with whisperai + stable-ts timing-wise).

I thought you were only going for word level highlighting, not full karaoke style animation within the words.

If you want to support both that’s a fun amount of regex. :grinning_face:

Is that syntax documented somewhere? I don’t think I’ve ever seen it in lrc/elrc files.

The v1,v2,bg are the voices, there’s waloaoke version too.

The specs are mostly maintained on Chinese GitHub repos, there’s a couple of updated versions.