- 1 How to set up a local repository
- 2 To test local repository
- 3 Repository directory structure
- 4 Package download statistics
- 5 List all R packages from CRAN/Bioconductor
How to set up a local repository
Some information here may focus on Windows binary packages
- CRAN specific: http://cran.r-project.org/mirror-howto.html
- Bioconductor specific: http://www.bioconductor.org/about/mirrors/mirror-how-to/
- How to Set Up a Custom CRAN-like Repository
Utilities such as install.packages can be pointed at any CRAN-style repository, and R users may want to set up their own. The ‘base’ of a repository is a URL such as http://www.omegahat.org/R/: this must be an URL scheme that download.packages supports (which also includes ‘ftp://’ and ‘file://’, but not on most systems ‘https://’). Under that base URL there should be directory trees for one or more of the following types of package distributions:
- "source": located at src/contrib and containing .tar.gz files. Other forms of compression can be used, e.g. .tar.bz2 or .tar.xz files.
- "win.binary": located at bin/windows/contrib/x.y for R versions x.y.z and containing .zip files for Windows.
- "mac.binary.leopard": located at bin/macosx/leopard/contrib/x.y for R versions x.y.z and containing .tgz files.
Each terminal directory must also contain a PACKAGES file. This can be a concatenation of the DESCRIPTION files of the packages separated by blank lines, but only a few of the fields are needed. The simplest way to set up such a file is to use function write_PACKAGES in the tools package, and its help explains which fields are needed. Optionally there can also be a PACKAGES.gz file, a gzip-compressed version of PACKAGES—as this will be downloaded in preference to PACKAGES it should be included for large repositories. (If you have a mis-configured server that does not report correctly non-existent files you will need PACKAGES.gz.)
To add your repository to the list offered by setRepositories(), see the help file for that function.
A repository can contain subdirectories, when the descriptions in the PACKAGES file of packages in subdirectories must include a line of the form
—once again write_PACKAGES is the simplest way to set this up.
Space requirement if we want to mirror WHOLE repository
- Whole CRAN takes about 92GB (rsync -avn cran.r-project.org::CRAN > ~/Downloads/cran).
- Bioconductor is big (> 64G for BioC 2.11). Please check the size of what will be transferred with e.g. (rsync -avn bioconductor.org::2.11 > ~/Downloads/bioc) and make sure you have enough room on your local disk before you start.
On the other hand, we if only care about Windows binary part, the space requirement is largely reduced.
- CRAN: 2.7GB
- Bioconductor: 28GB.
- If the binary package was built on R 2.15.1, then it cannot be installed on R 2.15.2. But vice is OK.
- Remember to issue "--delete" option in rsync, otherwise old version of package will be installed.
- The repository still need src directory. If it is missing, we will get an error
Warning: unable to access index for repository http://arraytools.no-ip.org/CRAN/src/contrib Warning message: package ‘glmnet’ is not available (for R version 2.15.2)
The error was given by available.packages() function.
To bypass the requirement of src directory, I can use
install.packages("glmnet", contriburl = contrib.url(getOption('repos'), "win.binary"))
but there may be a problem when we use biocLite() command.
I find a workaround. Since the error comes from missing CRAN/src directory, we just need to make sure the directory CRAN/src/contrib exists AND either CRAN/src/contrib/PACKAGES or CRAN/src/contrib/PACKAGES.gz exists.
To create CRAN repository
Before creating a local repository please give a dry run first. You don't want to be surprised how long will it take to mirror a directory.
Dry run (-n option). Pipe out the process to a text file for an examination.
rsync -avn cran.r-project.org::CRAN > crandryrun.txt
To mirror only partial repository, it is necessary to create directories before running rsync command.
cd mkdir -p ~/Rmirror/CRAN/bin/windows/contrib/2.15 rsync -rtlzv --delete cran.r-project.org::CRAN/bin/windows/contrib/2.15/ ~/Rmirror/CRAN/bin/windows/contrib/2.15 (one line with space before ~/Rmirror) # src directory is very large (~27GB) since it contains source code for each R version. # We just need the files PACKAGES and PACKAGES.gz in CRAN/src/contrib. So I comment out the following line. # rsync -rtlzv --delete cran.r-project.org::CRAN/src/ ~/Rmirror/CRAN/src/ mkdir -p ~/Rmirror/CRAN/src/contrib rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/PACKAGES.gz ~/Rmirror/CRAN/src/contrib/
library(tools) write_PACKAGES("~/Rmirror/CRAN/bin/windows/contrib/2.15", type="win.binary")
and if we want to get src directory
rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/*.tar.gz ~/Rmirror/CRAN/src/contrib/ rsync -rtlzv --delete cran.r-project.org::CRAN/src/contrib/2.15.3 ~/Rmirror/CRAN/src/contrib/
We can use du -h to check the folder size.
For example (as of 1/7/2013),
$ du -k ~/Rmirror --max-depth=1 --exclude ".*" | sort -nr | cut -f2 | xargs -d '\n' du -sh 30G /home/brb/Rmirror 28G /home/brb/Rmirror/Bioc 2.7G /home/brb/Rmirror/CRAN
To create Bioconductor repository
rsync -avn bioconductor.org::2.11 > biocdryrun.txt
Then creates directories before running rsync.
cd mkdir -p ~/Rmirror/Bioc wget -N http://www.bioconductor.org/biocLite.R -P ~/Rmirror/Bioc
where -N is to overwrite original file if the size or timestamp change and -P in wget means an output directory, not a file name.
Optionally, we can add the following in order to see the Bioconductor front page.
rsync -zrtlv --delete bioconductor.org::2.11/BiocViews.html ~/Rmirror/Bioc/packages/2.11/ rsync -zrtlv --delete bioconductor.org::2.11/index.html ~/Rmirror/Bioc/packages/2.11/
The software part (aka bioc directory) installation:
cd mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/bin/windows/ ~/Rmirror/Bioc/packages/2.11/bioc/bin/windows # Either rsync whole src directory or just essential files # rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/ ~/Rmirror/Bioc/packages/2.11/bioc/src rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/bioc/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/bioc/src/contrib/ # Optionally the html part mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/html rsync -zrtlv --delete bioconductor.org::2.11/bioc/html/ ~/Rmirror/Bioc/packages/2.11/bioc/html mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/vignettes rsync -zrtlv --delete bioconductor.org::2.11/bioc/vignettes/ ~/Rmirror/Bioc/packages/2.11/bioc/vignettes mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/news rsync -zrtlv --delete bioconductor.org::2.11/bioc/news/ ~/Rmirror/Bioc/packages/2.11/bioc/news mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/licenses rsync -zrtlv --delete bioconductor.org::2.11/bioc/licenses/ ~/Rmirror/Bioc/packages/2.11/bioc/licenses mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/manuals rsync -zrtlv --delete bioconductor.org::2.11/bioc/manuals/ ~/Rmirror/Bioc/packages/2.11/bioc/manuals mkdir -p ~/Rmirror/Bioc/packages/2.11/bioc/readmes rsync -zrtlv --delete bioconductor.org::2.11/bioc/readmes/ ~/Rmirror/Bioc/packages/2.11/bioc/readmes
and annotation (aka data directory) part:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows mkdir -p ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib # one line for each of the following rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/bin/windows/ ~/Rmirror/Bioc/packages/2.11/data/annotation/bin/windows rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/annotation/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/annotation/src/contrib/
and experiment directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/data/experiment/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/data/experiment/src/contrib/
and extra directory:
mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 mkdir -p ~/Rmirror/Bioc/packages/2.11/extra/src/contrib # one line for each of the following # Note that we are cheating by only downloading PACKAGES and PACKAGES.gz files rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/bin/windows/contrib/2.15/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/bin/windows/contrib/2.15/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/ rsync -zrtlv --delete bioconductor.org::2.11/extra/src/contrib/PACKAGES.gz ~/Rmirror/Bioc/packages/2.11/extra/src/contrib/
sync Bioconductor software packages
To keep a copy of the bioc/source (software packages) code only,
$ mkdir -p ~/bioc_release/bioc/ $ rsync -zrtlv --delete master.bioconductor.org::release/bioc/src ~/bioc_release/bioc/ $ du -h ~/bioc_release/bioc/ # 20GB, 1565 items, Bioc 3.7
Note -z - compress file data during the transfer, -t - preserve modification times, -l copy symbolic links as symbolic links. The option -zrtlv can be replaced by the common options -avz.
To get the old versions of a packages (after the release of a version of Bioconductor), check out the Archive folder.
Now we can create a cron job to do sync. Note my observation is Bioconductor has a daily update around 10:45AM. So I set time at 11:00AM.
echo "00 11 * * * rsync -avz --delete master.bioconductor.org::release/bioc/src ~/bioc_release/bioc/" >> \ ~/Documents/cronjob # everyday at 6am & 1pm crontab ~/Documents/cronjob crontab -l
To test local repository
su ln -s /home/brb/Rmirror/CRAN /var/www/html/CRAN ln -s /home/brb/Rmirror/Bioc /var/www/html/Bioc ls -l /var/www/html
The soft link mode should be 777.
To test CRAN
Replace the host name arraytools.no-ip.org by IP address 10.133.2.111 if necessary.
r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages("glmnet")
We can test if the backup server is working or not by installing a package which was removed from the CRAN. For example, 'ForImp' was removed from CRAN in 11/8/2012, but I still a local copy built on R 2.15.2 (run rsync on 11/6/2012).
r <- getOption("repos"); r["CRAN"] <- "http://cran.r-project.org" r <- c(r, BRB='http://arraytools.no-ip.org/CRAN') # CRAN CRANextra BRB # "http://cran.r-project.org" "http://www.stats.ox.ac.uk/pub/RWin" "http://arraytools.no-ip.org/CRAN" options(repos=r) install.packages('ForImp')
Note by default, CRAN mirror is selected interactively.
> getOption("repos") CRAN CRANextra "@[email protected]" "http://www.stats.ox.ac.uk/pub/RWin"
To test Bioconductor
# CRAN part: r <- getOption("repos"); r["CRAN"] <- "http://arraytools.no-ip.org/CRAN" options(repos=r) # Bioconductor part: options("BioC_mirror" = "http://arraytools.no-ip.org/Bioc") source("http://bioconductor.org/biocLite.R") # This source biocLite.R line can be placed either before or after the previous 2 lines biocLite("aCGH")
If there is a connection problem, check folder attributes.
chmod -R 755 ~/CRAN/bin
- Note that if a binary package was created for R 2.15.1, then it can be installed under R 2.15.1 but not R 2.15.2. The R console will show package xxx is not available (for R version 2.15.2).
- For binary installs, the function also checks for the availability of a source package on the same repository, and reports if the source package has a later version, or is available but no binary version is.
So for example, if the mirror does not have contents under src directory, we need to run the following line in order to successfully run install.packages() function.
options(install.packages.check.source = "no")
- If we only mirror the essential directories, we can run biocLite() successfully. However, the R console will give some warning
> biocLite("aCGH") BioC_mirror: http://arraytools.no-ip.org/Bioc Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. Installing package(s) 'aCGH' Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/src/contrib Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 trying URL 'http://arraytools.no-ip.org/Bioc/packages/2.11/bioc/bin/windows/contrib/2.15/aCGH_1.36.0.zip' Content type 'application/zip' length 2431158 bytes (2.3 Mb) opened URL downloaded 2.3 Mb package ‘aCGH’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\limingc\AppData\Local\Temp\Rtmp8IGGyG\downloaded_packages Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/data/experiment/bin/windows/contrib/2.15 Warning: unable to access index for repository http://arraytools.no-ip.org/Bioc/packages/2.11/extra/bin/windows/contrib/2.15 > library()
Repository directory structure
The information below is specific to R 2.15.2. There are linux and macosx subdirecotries whenever there are windows subdirectory.
bin/winows/contrib/2.15 src/contrib /contrib/2.15.2 /contrib/Archive web/checks /dcmeta /packages /views
A clickable map 
The information below is specific to Bioc 2.11 (R 2.15). There are linux and macosx subdirecotries whenever there are windows subdirectory.
bioc/bin/windows/contrib/2.15 /html /install /license /manuals /news /src /vignettes data/annotation/bin/windows/contrib/2.15 /html /licenses /manuals /src /vignettes /experiment/bin/windows/contrib/2.15 /html /manuals /src/contrib /vignettes extra/bin/windows/contrib /html /src /vignettes