Recently needed to extract a small "chunk" from a collection of adjacent MrSid mosaics, each about 4Gb in size. Once again, GDAL came to the rescue, and saved much time and agony wile working with very large, compressed, and proprietary-format files. Two lessons learned:

  1. The GDAL VRT format can save a lot of time and effort by providing access to a collection of files without actually altering the originals.
  2. ArcGIS 9.x does not like BigTIFF files. When file sizes approach or exceed 4Gb, the HFA format is a nice alternative.

 Have patience, subsetting a chunk out of 5 adjacent MrSid files (4Gb each) took about 7 hours. Fun experiment: extract sub-chunks from each of the constituent sid files and distribute across CPU cores.

Example Code

# check projections: OK all are NAD_1983_UTM_Zone_11N
for x in *.sid; do gdalinfo $x | grep "^PROJCS\[" ; done

# BBOX for sub-image
n=4122720
s=4017900
w=322890
e=389430

# make VRT combining all MrSid images
gdalbuildvrt -srcnodata "0" merged.vrt *.sid

# extract sub-region from VRT
# note format: ulx uly lrx lry
# works in QGIS, or other GDAL-based based GIS applications
gdal_translate --config GDAL_CACHEMAX 640 \
-ot Byte -of GTiff -a_nodata 0 \
-projwin 322890 4122720 389430 4017900 \
-co "TFW=YES" -co "COMPRESS=LZW" -co "BIGTIFF=YES" \
merged.vrt seki_merged.tif

# arcmap 9.x hates biftiffs! -- use HFA
# this has to be created from the source dataset, not the .tif above
# seems to work fine in ArcGIS 9.3
gdal_translate --config GDAL_CACHEMAX 640 \
-ot Byte -of HFA -a_nodata 0 \
-projwin 322890 4122720 389430 4017900 \
merged.vrt seki_merged.img