Hello, I volunteer to help run tech at a college radio station. We have a large music library, it's about 12 Terabytes of mp3s (64,000 artists... 165,000 albums... 1.4 million songs).
In terms of folder hierarchy, we have a "Library" folder that contains a folder for each artist (64,000 subfolders). Inside each artist folder is a folder for each of their albums (at most a dozen or two subfolders), and of course inside those album folders are the mp3s for each track in that album (at most a few dozen mp3s).
The problem I'm running into is that search doesn't work because I can't get VirtualDJ to scan all the id3 tags. What happens is when I right click on the "Library" folder in VDJ and run Batch >> Add to DB, I see in the status bar that it's trying to add to the database... but after a few minutes the message goes away and it appears like that process crashed (no new tracks are added to the db). If I instead use the browser to expand the "Library", and then right click on one of the artist names (say, Johnny Cash) and run Batch >> Add to DB, I can see the status bar says "Add to DB" (or something similar) for a second and then it says "XX tags to read" where XX is a number and this counts down as it finishes reading the tags. Then I can try searching for Johnny Cash and indeed everything works correctly.
So why doesn't "Add to DB" work on the main "Library" folder? Perhaps it's encountering a unicode character in a filename that's making it choke? Or perhaps it's just the sheer quantity of folders or files that's making it choke? I wish there was a way I could turn on debugging and see what the problem was. If I could see a bad character, I can fix that. Or if I could see that it's choking trying to read the "Library" folder with its 64,000 subfolders (one per artist) then maybe I could use a "sharding" technique and make subfolders like "A", "B", "C", etc and put the artists starting with "A" in the "A" subfolder and thus reducing the maximum number of subfolders in any folder by (optimistically) a factor of 1/26.
Does anyone have any advice? If I can't get search to work I'm going to be really letting down all the DJs at the station. Please help! And thank you,
Ian
In terms of folder hierarchy, we have a "Library" folder that contains a folder for each artist (64,000 subfolders). Inside each artist folder is a folder for each of their albums (at most a dozen or two subfolders), and of course inside those album folders are the mp3s for each track in that album (at most a few dozen mp3s).
The problem I'm running into is that search doesn't work because I can't get VirtualDJ to scan all the id3 tags. What happens is when I right click on the "Library" folder in VDJ and run Batch >> Add to DB, I see in the status bar that it's trying to add to the database... but after a few minutes the message goes away and it appears like that process crashed (no new tracks are added to the db). If I instead use the browser to expand the "Library", and then right click on one of the artist names (say, Johnny Cash) and run Batch >> Add to DB, I can see the status bar says "Add to DB" (or something similar) for a second and then it says "XX tags to read" where XX is a number and this counts down as it finishes reading the tags. Then I can try searching for Johnny Cash and indeed everything works correctly.
So why doesn't "Add to DB" work on the main "Library" folder? Perhaps it's encountering a unicode character in a filename that's making it choke? Or perhaps it's just the sheer quantity of folders or files that's making it choke? I wish there was a way I could turn on debugging and see what the problem was. If I could see a bad character, I can fix that. Or if I could see that it's choking trying to read the "Library" folder with its 64,000 subfolders (one per artist) then maybe I could use a "sharding" technique and make subfolders like "A", "B", "C", etc and put the artists starting with "A" in the "A" subfolder and thus reducing the maximum number of subfolders in any folder by (optimistically) a factor of 1/26.
Does anyone have any advice? If I can't get search to work I'm going to be really letting down all the DJs at the station. Please help! And thank you,
Ian
Posted Tue 04 Apr 23 @ 9:45 pm
It May Take A 30 days just to scan. And just my opinion Highlight a Group Say A's Drag And Drop In Browser See if that will work.
Posted Wed 05 Apr 23 @ 2:11 am
Thanks for the reply!
When I "add to db" for just one artist (say, with 50 tracks), I can see in the status bar it's counting down the tracks as it scans the id3 tags, and it does at least 10 per second. So our 1.7 million tracks should take about 48 hours, if my math is right. I'm fine waiting that long, but I think the "add to db" is crashing and so it won't actually get to finish.
You mentioned highlighting A's and dragging and dropping in the browser. Just to clarify, do you mean open up windows explorer and drag and drop the A's into the browser? Like would I create a folder in the browser to hold the A's?
I just realized there is an SDK / API for VirtualDJ. It looks like another option is I could write a program that opens up the artist folders one at a time (using browser_gotofolder "/my_path/my_folder") and recurse them (using recurse_folder). It looks like recursing an artist folder will cause all of the tracks to be added to the database. I'm hoping someone will have a better option though, because this sounds like it would be much slower than a regular "add to db" and depending on how performant this is, it could cause this process to take months instead of days, plus I wouldn't be able to use VDJ while it's running.
thanks to all for advice,
Ian
When I "add to db" for just one artist (say, with 50 tracks), I can see in the status bar it's counting down the tracks as it scans the id3 tags, and it does at least 10 per second. So our 1.7 million tracks should take about 48 hours, if my math is right. I'm fine waiting that long, but I think the "add to db" is crashing and so it won't actually get to finish.
You mentioned highlighting A's and dragging and dropping in the browser. Just to clarify, do you mean open up windows explorer and drag and drop the A's into the browser? Like would I create a folder in the browser to hold the A's?
I just realized there is an SDK / API for VirtualDJ. It looks like another option is I could write a program that opens up the artist folders one at a time (using browser_gotofolder "/my_path/my_folder") and recurse them (using recurse_folder). It looks like recursing an artist folder will cause all of the tracks to be added to the database. I'm hoping someone will have a better option though, because this sounds like it would be much slower than a regular "add to db" and depending on how performant this is, it could cause this process to take months instead of days, plus I wouldn't be able to use VDJ while it's running.
thanks to all for advice,
Ian
Posted Wed 05 Apr 23 @ 2:21 pm
Currently there is a limit to both the number of subfolders and the number of files that recurse or add to search db will look for.
It looks like in your case you indeed hit this limit, as vdj will stop looking for folder after 100000 subfolders or 1000000 files.
You could indeed use a script to recurse each artist's folder separately. I think that should be possible at a rate much faster than 1 artist per second, so that could work in a matter of hours.
It looks like in your case you indeed hit this limit, as vdj will stop looking for folder after 100000 subfolders or 1000000 files.
You could indeed use a script to recurse each artist's folder separately. I think that should be possible at a rate much faster than 1 artist per second, so that could work in a matter of hours.
Posted Wed 05 Apr 23 @ 7:28 pm
Hi Adion and thanks for the reply!
The 100,000 folder limit and 1,000,000 file limit you mentioned, I couldn't find any confirmation of that online. Are you one of the developers, or how did you discover those numbers?
thank you so much, this is incredibly helpful!
Ian
The 100,000 folder limit and 1,000,000 file limit you mentioned, I couldn't find any confirmation of that online. Are you one of the developers, or how did you discover those numbers?
thank you so much, this is incredibly helpful!
Ian
Posted Thu 06 Apr 23 @ 1:10 pm
Yes, I'm indeed one of the developers. We'll increase the limit a bit for the next update.
Posted Thu 06 Apr 23 @ 1:20 pm
Adion wrote :
Yes, I'm indeed one of the developers. We'll increase the limit a bit for the next update.
Could you let us know the reason for the limits as well (i.e. is it processing limitations?) ? I only ask because they don't seem to come from the expected filesystem limitations.
Is there a limit for total number of songs manageable by a VirtualDJ database we should know of?
I'm thinking that this probably suggests an organization structure to not hit these limits (no artist directory with more than 100000 artists, no folder with more than 1 million files/folders).
Posted Thu 06 Apr 23 @ 7:37 pm
It was more of a limit to prevent infinite recursion in case there was a symlink or similar pointing back at a folder up in the hierarchy.
Posted Thu 06 Apr 23 @ 8:10 pm
Adion wrote :
It was more of a limit to prevent infinite recursion in case there was a symlink or similar pointing back at a folder up in the hierarchy.
I understand. Is it possible to use some form of cycle detection (like having a graph representation with visitation markinG for folders (using something like inodes numbers on MacOS to identify folders)) to try to avoid using such a limit?
Posted Fri 07 Apr 23 @ 12:35 am
Depending on the type of link it might be possible or not I think, but would still be slower than a limit for the rare case it would happen. (And after more than 10 years of vdj 8 it's the first time someone seems to have hit the limit too)
Posted Fri 07 Apr 23 @ 4:20 am
Hi Adion,
I wrote a search engine for windows file shares back in college (in the days before torrent), and I ran into that exact problem you mentioned of a link (on a linux computer) pointing to a parent folder, which made those sort of cycles (unfortunately) possible. Now that windows has symlink support, it's possible yes for that to be an issue on windows computers too. What I did to solve it was limit the maximum recursed folder depth to something like 10 levels (if I'm remembering correctly). I wrote the software to not recurse into any subdirectories deeper than that, and it turned out to be a reliable fix. It could be worth considering for VDJ, if the code change didn't require too much restructuring. For your consideration. I really appreciate being able to speak with you here.
Ian
I wrote a search engine for windows file shares back in college (in the days before torrent), and I ran into that exact problem you mentioned of a link (on a linux computer) pointing to a parent folder, which made those sort of cycles (unfortunately) possible. Now that windows has symlink support, it's possible yes for that to be an issue on windows computers too. What I did to solve it was limit the maximum recursed folder depth to something like 10 levels (if I'm remembering correctly). I wrote the software to not recurse into any subdirectories deeper than that, and it turned out to be a reliable fix. It could be worth considering for VDJ, if the code change didn't require too much restructuring. For your consideration. I really appreciate being able to speak with you here.
Ian
Posted Fri 07 Apr 23 @ 4:06 pm
To ICC
You mentioned highlighting A's and dragging and dropping in the browser. Just to clarify, do you mean open up windows explorer and drag and drop the A's into the browser? Like would I create a folder in the browser to hold the A's?
Yes Thats what I'm Saying.. I just Highlighted All Audio & Video Drag & Drop And see it in a day or two.
You mentioned highlighting A's and dragging and dropping in the browser. Just to clarify, do you mean open up windows explorer and drag and drop the A's into the browser? Like would I create a folder in the browser to hold the A's?
Yes Thats what I'm Saying.. I just Highlighted All Audio & Video Drag & Drop And see it in a day or two.
Posted Fri 07 Apr 23 @ 11:03 pm