Extract files

From 太極
Jump to navigation Jump to search

Painless file extraction on Linux

Painless file extraction on Linux


if [ $# -eq 0 ]; then
    echo -n "filename> "
    read filename

if [ ! -f "$filename" ]; then
    echo "No such file: $filename"
    exit $?

case $filename in
    *.tar)      tar xvf $filename;;
    *.tar.bz2)  tar xvjf $filename;;
    *.tbz)      tar xvjf $filename;;
    *.tbz2)     tar xvjf $filename;;
    *.tgz)      tar xvzf $filename;;
    *.tar.gz)   tar xvzf $filename;;
    *.gz)       gunzip -v $filename;;
    *.bz2)      bunzip2 -v $filename;;
    *.zip)      unzip -v $filename;;
    *.Z)        uncompress -v $filename;;
    *)          echo "No extract option for $filename"

Extract tar.gz or zip to a specified directory

tar xzvf XXXX.tar.gz -C DIRECTORY
# single or double quotes will give an error
# tar xzvf ~/Downloads/inSilicoDb_2.7.0.tar.gz -C "~/Downloads"
# tar: ~/Downloads: Cannot open: No such file or directory
# tar: Error is not recoverable: exiting now
# $ tar xzvf ~/Downloads/inSilicoDb_2.7.0.tar.gz -C '~/Downloads'
# tar: ~/Downloads: Cannot open: No such file or directory
# tar: Error is not recoverable: exiting now

unzip XXX.zip -d DIRECTORY

Extract gz file but keep the original gz file

gunzip -c x.txt.gz > x.txt

gunzip -c which simply writes the output stream to stdout

Extract .xz file

xz -d archive.xz

Extract tar.xz file

The bottomline is we don't need the 'z' parameter (used for gz ONLY but does not work for xz file) in the tar command for tar.xz files. And the method also works for tar.gz files. The argument '-f' means the archive file. Recall that the tar command can be used to store and extract files, so no default parameters.

tar xf archive.tar.xz
tar xf archive.tar.gz

Extract tar.bz2 file

tar -xjvf archive.tar.bz2  # replace z with j as we compare it to tar.gz file

How To Extract and Decompress a .bz2/.tbz2 File

See this article from cyberciti.biz.

bzip2 -d your-filename-here.bz2
# OR
bzip2 -d -v your-filename-here.bz2
# OR
bzip2 -d -k your-filename-here.bz2
# OR
bunzip2 filename.bz2

10 Basic Encryption Terms Everyone Should Know and Understand


How to Encrypt and Decrypt Files and Directories Using Tar and OpenSSL


How to install and use 7zip file archiver

Compare zip, tar.xz, tar.gz, 7z

The compression rate comparison is (from best to worst) 7z > tar.xz > tar.gz > zip.

For example, consider qt-everywhere-opensource-src-5.5.0 from http://download.qt.io/official_releases/qt/5.5/5.5.0/single/

  • zip 540M
  • tar.xz 305M
  • tar.gz 436M
  • 7z 297M

Extract one files from tar.gz

Extract a file called etc/default/sysstat from config.tar.gz tarball:

$ tar -zxvf config.tar.gz etc/default/sysstat

Noe that a new directory etc/default will be created under the current directory if it does not exist.

Wildcard based extracting

You can also extract those files that match a specific globbing pattern (wildcards). For example, to extract from cbz.tar all files that begin with pic, no matter their directory prefix, you could type:

$ tar -xf cbz.tar --wildcards --no-anchored 'pic*'

To extract all php files, enter:

$ tar -xf cbz.tar --wildcards --no-anchored '*.php'

remove leading directory components on extraction with tar

AVFS and Archivemount

If we want to extract certain files from a tarballj/archive, it is more efficient to use a virtual filesystem like AVFS. PS. for a large archive file, even extracting only a single file at the top directory it is terribly slow if we use the tar command directly.

Before we install the utility, let's look at the package dependecies of AVFS and Archivemount.

$ apt-cache showpkg archivemount
Package: archivemount
0.8.1-1 (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages)
 Description Language: 
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages
                  MD5: d6302be9f06a91afa32326ab175e2086
 Description Language: en
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_i18n_Translation-en
                  MD5: d6302be9f06a91afa32326ab175e2086

Reverse Depends: 
0.8.1-1 - libarchive13 (0 (null)) libc6 (2 2.4) libfuse2 (2 2.8.1) fuse (2 2.8.5-2) archivemount:i386 (0 (null)) 
0.8.1-1 - 
Reverse Provides: 
[email protected] ~ $ apt-cache showpkg avfs
Package: avfs
1.0.1-2 (/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages) (/var/lib/dpkg/status)
 Description Language: 
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_binary-amd64_Packages
                  MD5: bce08fbc36fd7b8e3c454f36f0daf699
 Description Language: en
                 File: /var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_trusty_universe_i18n_Translation-en
                  MD5: bce08fbc36fd7b8e3c454f36f0daf699

Reverse Depends: 
1.0.1-2 - libc6 (2 2.14) libfuse2 (2 2.8.1) fuse (0 (null)) unzip (0 (null)) zip (0 (null)) arj (0 (null)) lha (0 (null))
 zoo (0 (null)) rpm (0 (null)) p7zip (16 (null)) p7zip-full (0 (null)) cdparanoia (0 (null)) 
wget (0 (null)) avfs:i386 (0 (null)) 
1.0.1-2 - 
Reverse Provides:

Install it now.

sudo apt-get install avfs
# Assume MyFile.tar.gz exists in the current directory
ls ~/.avfs/$PWD/MyFile.tar.gz#       
# Alternatively, browse the content in Nautilus, but you need to add a trailing # character by hand to the path 
# (Ctrl-L to access the address bar).
cat ~/.avfs/$PWD/MyFile.tar.gz#/README
# another tarball
ls ~/.avfs/$PWD/MyFile2.tar.gz#       

For some reason, avfs sometimes does not work:( In this case, Ubuntu's Archive Manager does work. Maybe the file is too large.

[email protected] ~/Downloads $ time ls ~/.avfs/$PWD/Homo_sapiens_UCSC_hg19.tar.gz#/
ls: cannot access /home/brb/.avfs//home/brb/Downloads/Homo_sapiens_UCSC_hg19.tar.gz#/nown	exact	1	SingleClassTriAllelic,InconsistentAlleles	2	1000GENOMES,SSMP,	2	A,T,	22.000000,2274.000: Input/output error
ls: cannot access /home/brb/.avfs//home/brb/Downloads/Homo_sapiens_UCSC_hg19.tar.gz#/chr12	25482890	rs544684287	G	A	0	.	molType=genomic;class=single
chr12	25482914	rs558575390	T	G	0	.	m: Input/output error

real	25m51.340s
user	0m0.000s
sys	0m0.003s
[email protected] ~/Downloads $ ls ~/.avfs/$PWD/annovar.latest.tar.gz#/

For archivemount, see Cool User File Systems: ArchiveMount

archivemount files.tgz mntDir
umount mntDir