Using Temporary Directories
This post (originally: a long email) was written in response to a friend’s
email about running out of space on the shared server’s /tmp/
directory.
TL;DR
-
Create a temporary directory with your prefix (e.g.
lobstr
), in the system’s TMP directory:$ mktemp -d -t lobstr.XXXXXX /tmp/lobstr.je4Gax
-
Put all of your program’s temporary files inside that directory.
-
Delete the directory when the program exits.
Simple Usage (unix shell)
Create a temporary file, in the default TMP directory:
$ mktemp
/tmp/tmp.47i2SIE2lj
The created file is owned by the user, and is set to disallow access from other users:
$ ls -l /tmp/tmp.47i2SIE2lj
-rw------- 1 gordon gordon 0 Nov 18 11:45 tmp.47i2SIE2lj
Create a temporary directory, in the default TMP directory:
$ mktemp -d
/tmp/tmp.6BvgfZqKrk
$ ls -dl /tmp/tmp.6BvgfZqKrk
drwx------ 2 gordon gordon 4096 Nov 18 11:51 /tmp/tmp.6BvgfZqKrk
Create a temporary file, with your program’s prefix (e.g. lobstr
):
$ mktemp lobstr.XXXXXX
lobstr.sdaTH9
At least six X
must be specified, and they will be replaced by random letters.
Notice that the file was created in the current directory (not in /tmp
).
You might be tempted to specify a full path with your prefix:
$ mktemp /tmp/lobstr.XXXXXX ### BAD EXAMPLE - DO NOT USE
/tmp/lobstr.jLB4eX
But that is wrong. It assumes that /tmp
is the always the TMP directory,
which is not the case. Instead, use the following:
# Create a temporary file:
$ mktemp -t lobstr.XXXXXX #### Good Example
/tmp/lobstr.k8h6g0p
The result looks the same, but using -t
tells mktemp
to use the system’s
temporary directory (more examples, below).
To create a temporary directory, with a your prefix, in the system’s TMP directory,
combine -d
and -t
. For portability on Mac OS X, the -d
must be first:
# Create a temporary directory:
$ mktemp -d -t lobstr.XXXXXX
/tmp/lobstr.eiDFtC
Why not hard-code /tmp/
for temporary files
On the most common unix systems, for your ‘common’ usage, there is a directory
named /tmp
, which is supposed to contain temporary files.
BUT,
There are many cases where you (or the system-administrator) wants to store
temporary files else where.
Examples:
- On our server
club.wi.mit.edu
,/tmp/
is on the root disk, which is very small (less than 1.1GB free space left). - On Cluster computer systems (e.g. TAK), and other systems using
SGE/LFS/Torque/etc, when you submit a new job, and it starts running on a
node, each user gets his/her own temporary directory - it is not
/tmp
. - On Windows (which we don’t care about, but still…) - the temporary
directory is obviously not
/tmp/
. - For faster performance, it is sometimes better to use a temporary directory on a specific disk, or even completely in memory.
- On Amazon EC2, it is sometimes better to use the local-instance storage disk as temporary directory - which has lots of storage and relatively fast I/O.
- On some systems,
/tmp
is using a in-memory-disk - which is very fast, but tends to be small. Using a different (disk-based) temporary directory can help store large temporary files. - Hard-coded paths are evil.
Changing the temporary directory
When properly using mktemp
(and the appropriate modules in Perl/Python/C,
see below), the location of the TMP directory can be set using the TMPDIR
environment variable:
# Set the location of the TMP directory
# (e.g. in "~/.bashrc")
$ export TMPDIR=/data/gordon/tmp
# Create a temporary file in the new temp directory
$ mktemp -t lobstr.XXXXXX
/data/gordon/tmp/lobstr.vzmfDd
By setting TMPDIR
in your .bashrc
file, or manually before running a command,
you can change the location of the temporary files - to a directory with more
storage space or faster disks.
Shell examples
Create one temporary directory, then all file names can be fixed: there would no collisions if the same script is run multiple times in parallel as each temp directory is unique.
#!/bin/sh
DIR=$(mktemp -d -t myproject.XXXXXX) || exit 1
echo "tmpdir = $DIR"
echo "filename = $DIR/step1.txt"
# Do one thing
my-program > "$DIR/step1.txt" || exit 1
# Do another thing
sort "$DIR/step1.txt" > "$DIR/step2.txt" || exit 1
# When the script is completed, delete the temp directory
# (or keep it for debugging/troubleshooting)
rm -r "$DIR"
Using the script:
# default tmp directory
$ sh example.sh
tmpdir = /tmp/myproject.L3BxdI
filename = /tmp/myproject.L3BxdI/step1.txt
# Custom tmp directory
$ export TMPDIR=/data/gordon/tmp/
$ sh example.sh
tmpdir = /data/gordon/myproject.K4Pijh
filename = /data/gordon/myproject.K4Pijh/step1.txt
Python
Use the tempfile
module to create a temporary directory:
import tempfile,os
# Create a temporary directory
tmpdir = tempfile.mkdtemp(prefix='lobstr.')
# Write files in that directory
filename = os.path.join(tmpdir,'step1.txt')
print "tmpdir = ", tmpdir
print "filename = ", filename
Using the script:
## default tmp directory
$ python example.py
tmpdir = /tmp/lobstr.yuHqQV
filename = /tmp/lobstr.yuHqQV/step1.txt
## Custom tmp directory
$ export TMPDIR=/data/gordon/tmp/
$ python example.py
tmpdir = /data/gordon/tmp/lobstr.4MzWUG
filename = /data/gordon/tmp/lobstr.4MzWUG/step1.txt
Perl
Use File::Temp
module to create temporary directories:
use File::Temp qw/tempdir/;
use File::Spec::Functions;
# Create a temporary directory
$tmpdir = tempdir ( 'lobstr.XXXXXX', TMPDIR=> 1);
# A file in the above temp directory
$filename = catfile($tmpdir,'step1.txt');
print "tmpdir = $tmpdir\n";
print "filename = $filename\n";
Using the script:
## default tmp directory
$ perl example.pl
tmpdir = /tmp/lobstr.Ziqv9r/
filename = /tmp/lobstr.Ziqv9r/step1.txt
## custom tmp directory
$ export TMPDIR=/data/gordon/tmp/
$ perl example.pl
tmpdir = /data/gordon/tmp/lobstr.teWY_P
filename = /data/gordon/tmp/lobstr.teWY_P/step1.txt
NOTE about Pytohn/Perl with invalid TMPDIR
Python and Perl tries to ‘help’ developers by hiding technical details from them. If the directory in TMPDIR is not accessible, they will try other directories which out complaining or notifying the user (where as sane programs will fail with an error message).
Examples:
mktemp
- fails if directory doesn’t exist (which is a Good Thing):
$ TMPDIR=/foo/bar mktemp -t
mktemp: failed to create file via template ‘/foo/bar/tmp.XXXXXXXXXX’: No such file or directory
Python will ignore the failure and try other directories:
$ TMPDIR=/foo/bar python example.py
tmpdir = /tmp/lobstrzNlf6K
filename = /tmp/lobstrzNlf6K/step1.txt
Perl will ignore the failure too:
$ TMPDIR=/foo/bar perl example.pl
tmpdir = /tmp/lobstr.bwccSd
So always ensure your temp-directory is accessible.
C
To the best of my knowledge, there’s no standard (POSIX) way to automatically use TMPDIR. The following code should work on Unix systems:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <err.h>
int main(void)
{
const char *dirtemplate = "lobstr.XXXXXX";
const char *tmpdir = NULL;
/* Get TMPDIR env variable or fall back to /tmp/ */
tmpdir = getenv("TMPDIR");
if (tmpdir==NULL)
tmpdir = "/tmp" ;
/* Construct a template using 'tmpdir' */
size_t len = strlen(tmpdir) + 1
+ strlen(dirtemplate) + 1;
char *template = calloc( len, 1 );
if (template == NULL)
err (1, "calloc(%zu,1) failed", len);
strcpy(template, tmpdir);
strcat(template, "/");
strcat(template, dirtemplate);
/* Create temporary directory */
if (mkdtemp(template)==NULL)
err (1, "mkdtemp(%s) failed",template);
printf("tmpdir = %s\n", template);
/* Construct a filename */
const char* filetemplate="step1.txt";
len = strlen(template)+1+strlen(filetemplate)+1;
char *filename = calloc(len,1);
if (filename==NULL)
err (1, "calloc(%zu,1) failed", len);
strcpy(filename,template);
strcat(filename,"/");
strcat(filename,filetemplate);
printf("filename = %s\n", filename);
return 0;
}
Using the code:
## Compile the code
$ gcc -Wall -Wextra -o ex1 example.c
## Use with default directory
$ ./ex1
tmpdir = /tmp/lobstr.kFLKey
filename = /tmp/lobstr.kFLKey/step1.txt
## Use with custom directory
$ TMPDIR=/data/gordon/tmp ./ex1
tmpdir = /data/gordon/tmp/lobstr.KBAewp
filename = /data/gordon/tmp/lobstr.KBAewp/step1.txt
## Use with invalid directory
$ TMPDIR=/foo/bar/ ./ex1
ex1: mkdtemp(/foo/bar//lobstr.Fhrenm) failed: No such file or directory
Automatically deleting temporary direcotries
Using the shell’s trap
command you can automatically delete the temporary
directory, even in case the shell script terminated with an error, or by
CTRL+C:
#!/bin/sh
# Create tepmorary directory
DIR=$(mktemp -d -t test.XXXXXX) || exit 1
# When the script terminates, run the 'rm' command
trap "rm -rf '$DIR'" EXIT
# run your programs..
my-program 1
my-other-program --foo --bar
# When the script ends (even with CTRL+C or an error),
# the 'trap' command will be executed.
NOTE: When debugging, you’d probably want to comment-out this command, to keep the temporary files available.
BAD THINGS to avoid
- Do not create multiple temporary files directly in
/tmp/
(or which ever temporary directory. Instead - create on temporary directory, and create files inside it. - Do not hard-code “/tmp/” in temporary file names you create.
- Do not assume “/tmp” is the temporary directory
Security Considerations
There are several intricate security consideration when using temporary files for critical processing (e.g. system files owned by root, or security related operations). Those are not covered here.
Bonus round - sorting in memory
GNU Sort can sort very big files (much larger than available memory) by using temporary files. But sorting completely in-memory without temporary files is much faster (if you have enough RAM).
By pointing sort to an invalid temporary directory, you can ensure sort
does
not use the disk (but you’ll have to give it enough RAM to use).
Examples:
Default ‘sort’ - whether it uses temporary disk depends on the size of the input file, and the amount of RAM in the system:
$ sort INPUT > OUTPUT
When sorting from PIPE, sort can’t tell how much RAM to use (because the size of the input is unknown):
$ program | sort > OUTPUT
Using -s
you can tell SORT how much RAM to use:
# Use upto 10GB of RAM:
$ sort -S 10G INPUT > OUTPUT
By using an invalid temporary directory, you can force sort
to use RAM (if the data fits) or simply fail:
$ sort -S 10G -T /no/such/dir INPUT > OUTPUT
Example, try to sort 76MB of data using only 10MB of RAM:
$ seq 10000000 | wc -c |numfmt --to=iec
76M
$ seq 10000000 | sort -S 10M -T /no/such/dir > /dev/null
sort: cannot create temporary file in ‘/no/such/dir’: No such file or directory
Which indicates you’ll need more RAM to sort the data in-memory without temporary files.