![]() |
||||
|
|
PMC Utilities
FTP Service
|
|
PMC FTP Service
The PMC FTP Service may be used to download the source files for any article in the PMC Open Access Subset, associate PMC articles with identifiers such as: PubMed IDs, DOIs, Manuscript IDs, ISSN, etc., and can used as a source for data mining. Quick Links:
Source files from the PMC Open Access SubsetThis FTP service may be used to download the source files for any article in the PMC Open Access Subset. The source files for an article may include:
The URL to access the FTP site is ftp://ftp.ncbi.nlm.nih.gov/pub/pmc All the source files for an article are packaged in a single .tar.gz file. The FTP site has a two-level-deep folder (directory) structure. Folder names are randomly generated and the .tar.gz file for an article is randomly assigned to a second-level folder. Finding Data in the File List (file_list.txt)The file_list.txt file in the main FTP folder is an index of all the articles available from the site. The file is located at ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/file_list.txt. Each entry in the list consists of:
Obtaining DOIs and PubMed IDs for Articles in PMCUse PMC-ids.csv.gz to associate PMC articles with a PMC ID, a PubMed ID, and the corresponding DOI. PMC-ids.csv.gz is a comma separated file with the following fields:
If the information is not available, entries will contain an empty space.
XML for Data Mining via FTPThe file articles.tar.gz contains XML (and only XML) files for ALL PMC open access articles. This was created for users who need PMC XML for data mining and processing purposes, but do not need PDFs, images, or supplementary data. Suggested FTP Client ConfigurationAfter a series of experiments using ftp clients with NCBI's ftp server, we've found that the configuration of ftp clients can seriously affect performance. NCBI recommends setting the TCP buffer size to 32Mb. For more information on FTP configuration, please see the US Department of Energy's Guide to Bulk Data Transfer over a WAN. The contents of the PMC FTP site is normally updated once a day. If you have questions or comments about the PMC FTP site, please write to oai@ncbi.nlm.nih.gov. |