This is more a note to myself more than anything else. However, hopefully someone else will also find it useful.
When traversing ftp paths on the genbank, especially programmatically, do not be surprised if the paths change suddenly, especially using the raw FTP protocol.
A simple example with a Bacterial genome on the commandline ftp client shows
(cbc) CIS2X1NFGTF1:test_download_genbank aragaven$ ftp ftp://ftp.ncbi.nlm.nih.gov/ Trying 18.104.22.168... Connected to ftp.wip.ncbi.nlm.nih.gov. 220- This warning banner provides privacy and security notices consistent with applicable federal laws, directives, and other federal guidance for accessing this Government system, which includes all devices/storage media attached to this system. This system is provided for Government-authorized use only. Unauthorized or improper use of this system is prohibited and may result in disciplinary action and/or civil and criminal penalties. At any time, and for any lawful Government purpose, the government may monitor, record, and audit your system usage and/or intercept, search and seize any communication or data transiting or stored on this system. Therefore, you have no reasonable expectation of privacy. Any communication or data transiting or stored on this system may be disclosed or used for any lawful Government purpose. 220 FTP Server ready. 331 Anonymous login ok, send your complete email address as your password 230 Anonymous access granted, restrictions apply Remote system type is UNIX. Using binary mode to transfer files. 200 Type set to I ftp> cd /genomes/refseq/bacteria/Acaryochloris_marina/latest_assembly_versions/GCF_000018105.1_ASM1810v1 250 CWD command successful ftp> pwd Remote directory: /genomes/all/GCF/000/018/105/GCF_000018105.1_ASM1810v1 ftp>
While i was puzzled for a little bit, it does make sense to have some sort of either hard links or symbolic links on the server to avoid duplication of data.
It should definitely be borne in mind when programmatically querying the server as I have done here https://github.com/compbiocore/access_genbank , wherein I store the download paths and access them in the program later as well as for logging purposes.