Best practices for data comparison and migration

This document describes our recommendations on how to manage data sets which are going to be transferred to or compared with the contents of other locations within the JSC systems.

General advise

Small number of big files is better than large number of small files

Larger files are easier to move than many small files. Tools like rsync, cp, scp, … etc. all require some additional steps for each file that it copies. Creating a tar file (even without compression) can help significantly reduce movement time.

Use data compression

  • pro: data reduction (quota usage and amount of data to be transferred)
    • cons: CPU and time consuming

    How much compression can help when moving data depends on the data reduction level, which in turn heavily depends on the data structure itself. Generally, network speed vs time to compress is the main issue. Within the same network one can expect that compression brings little in terms of speedup compared to simply moving larger single files.

    For example (compressing 20 files in total 100GB):

    # EXAMPLE 1: No Compression
    $ time tar -cf random-only.tar.gz random*.data
    real    1m31.335s
    user    0m1.890s
    sys     1m22.597s
    
    # EXAMPLE 2: gzip single thread (Estimated time is 70+min)
    $ time tar -czvf random-gzip.tar.gz random*.data
    random01.data
    random02.data
    ...
    
    # EXAMPLE 3: pgiz using 20 processes
    $ time tar --use-compress-program="pigz -k -p 20" -cf random-pigz-20.tar.gz random*.data
    real    3m22.630s
    user    60m24.506s
    sys     3m33.297s
    

    It remains to be tested whether EXAMPLE 3 can compress sufficiently to bring any speedup when later transferring the data. Long compression times should not be tolerated (hence the estimated time in EXAMPLE 2). In those cases it becomes quite difficult to asses if the data was correctly packed or a failure has happened as time increases. It is always advisable to run a few tests on data samples, before proceeding to compress large data sets. Check for parallel compression tools. When in doubt, focus on reducing number of files as given above.

    Note that rsync has the option --compression or -z to compress data during transmission. However, it does NOT reduce the number of transferred files.

    One last thought! Compression might not help with data transfer, but it is quite useful when saving on allocated space and thereby used quota. If data will not be used for a while or archived, it is sensible to consider compressing it.

Data deduplication: Avoid transferring same data multiple times

Make sure that data is not already available on the destination. PIs should check what data is used by multiple project users and create shareable directories for all to access. This avoids transferring the data multiple times for each user and reduces the allocated quota.

Checksum whenever possible

Consistently creating and saving checksums for each file can be very beneficial. The most common use case is to check data integrity after a transfer or in general. Another use case is avoiding transferring data multiple times, as users can be sure that the data has not changed by simply checking checksum equivalence.

Recommended data compare and transfer tools

rsync is preferred by many communities in most cases. The main reasoning is that it does not transfer data that has already been transferred before. It also allows to resume the transfer when interrupted. The downside is its lack of parallelism. This can be improved upon by combining rsync with parallel or xargs. On the HPC systems the tools mpifileutils are recommended, specially when transferring large data sets (The tools will be discussed in in the next sections).

rsync

rsync is the most commonly available utility for transferring and synchronizing files across single and multiple systems.

$ rsync --help
Usage: rsync [OPTION]... SRC [SRC]... DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
  or   rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST
  or   rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST
...

Here some options to be aware of:

Option

Description

Comments

--info=FLAGS

fine-grained informational verbosity

Use --info=progress2 to see progress, alternatively --progress

-c, --checksum

skip based on checksum, not mod-time & size

This will do an actual comparison of the data, but costs more time

-a, --archive

archive mode; equals -rlptgoD (no -H,-A,-X)

Preserves file metadata and is recommended for most large transfers

-u, --update

skip files that are newer on the receiver

Useful when merging the state of two changed directories

-X, --xattrs

preserve extended attributes

Good to add to keep attributes

--delete

delete extraneous files from destination dirs

Useful when destination contains clutter. WARNING this will delete files from destination.

-z, --compress

compress file data during the transfer

Useful when compression rate is high and transfer rate slow (See general advice above)

--exclude=PATTERN

exclude files matching PATTERN

The more is excluded, the faster the transfer

Note that ending with / means content of, for example:

$ ls src
        test1  test2
$ rsync -a src/ dest
$ ls dest
        test1  test2

Meanwhile,

$ rsync -a src dest/
$ ls dest
        src

mpifileutils

MPI-based tool for basic tasks such as copy, remove and compare of large data sets. mpifileutils documentation

Utilities

  • dcp - Copy files.

  • drm - Remove files.

  • dchmod - Change owner, group, and permissions on files.

  • dcmp - Compare contents between directories or files.

  • dsync - Synchronize source and destination directories or files.

  • ddup - Find duplicate files.

The full list of utilities can be found here

The documentation offers the following overall tips (Adjusted for our use-case):

  • Run the tools within a job allocation. The sweet spot for most tools is about 2-4 nodes. It is possible to use more nodes, but we recommend experimenting with the performance. The tools report a healthy amount of data on their progress.

  • Use most CPU cores on each node by adjusting --tasks-per-node, though leaving a few cores idle on each node for the file system client processes is recommended.

  • Most tools do not checkpoint their progress. Be sure to request sufficient time in your allocation to allow the job to complete. It could be necessary to start over from the beginning if a tool is interrupted.

  • It is not possible to pipe output of one tool to the input of another. However, the --input and --output file options are good approximations. You can use batch jobs for this purpose.

  • It is not easy to check the return codes of tools. Instead, inspect stdout and stderr output for errors.

Note that mpifileutils uses OpenMPI which needs to be loaded along with its modules.

dcp

Parallel MPI application to recursively copy files and directories.

$ srun --time=00:01:00 --nodes=1 dcp --help
Usage: dcp [options] source target
       dcp [options] source ... target_dir

Options:
  -b, --bufsize <SIZE>     - IO buffer size in bytes (default 4MB)
  -k, --chunksize <SIZE>   - work size per task in bytes (default 4MB)
  -X, --xattrs <OPT>       - copy xattrs (none, all, non-lustre, libattr)
  -i, --input <file>       - read source list from file
  -L, --dereference        - copy original files instead of links
  -P, --no-dereference     - don't follow links in source
  -p, --preserve           - preserve permissions, ownership, timestamps (see also --xattrs)
  -s, --direct             - open files with O_DIRECT
  -S, --sparse             - create sparse files when possible
      --progress <N>       - print progress every N seconds
  -v, --verbose            - verbose output
  -q, --quiet              - quiet output
  -h, --help               - print usage
For more information see https://mpifileutils.readthedocs.io.

Here an example of copying 100GB divided over 10000 files in 10746 directories (nested in 3 levels).

# Using regular cp
$ time srun --nodes=1 --ntasks=1 cp -r dummy_files/ $FASTDATA_project/user1/

real    7m47.048s
user    0m0.015s
sys     0m0.054s

# Using rsync
$ time srun --nodes=1 --ntasks=1 rsync -a dummy_files $FASTDATA_project/user1/

real    11m55.149s
user    0m0.012s
sys     0m0.085s

# Using dcp from mpifileutils
$ time srun --nodes=1 --ntasks=120 dcp --preserve --progress 1 dummy_files $FASTDATA_project/user1/
[2024-02-02T16:39:00] Preserving file attributes.
[2024-02-02T16:39:00] Walking /p/largedata/project/user1/dummy_files
[2024-02-02T16:39:00] Walked 20746 items in 0.379 secs (54743.090 items/sec) ...
[2024-02-02T16:39:00] Walked 20746 items in 0.381 seconds (54487.636 items/sec)
[2024-02-02T16:39:00] Copying to /p/fastdata/project/user1
[2024-02-02T16:39:00] Items: 20746
[2024-02-02T16:39:00]   Directories: 10746
[2024-02-02T16:39:00]   Files: 10000
[2024-02-02T16:39:00]   Links: 0
[2024-02-02T16:39:00] Data: 100.000 GiB (10.240 MiB per file)
[2024-02-02T16:39:00] Creating 10746 directories
[2024-02-02T16:39:17] Created 1982 directories (18%) in 17.243 secs (114.946 dirs/sec) 76 secs left ...
[2024-02-02T16:39:23] Created 6996 directories (65%) in 22.513 secs (310.758 dirs/sec) 12 secs left ...
[2024-02-02T16:39:23] Created 10746 directories (100%) in 22.513 secs (477.321 dirs/sec) done
[2024-02-02T16:39:23] Creating 10000 files.
[2024-02-02T16:39:27] Created 1715 items (17%) in 3.924 secs (437.052 items/sec) 19 secs left ...
[2024-02-02T16:39:29] Created 5298 items (53%) in 5.831 secs (908.574 items/sec) 5 secs left ...
[2024-02-02T16:39:30] Created 8259 items (83%) in 7.349 secs (1123.790 items/sec) 2 secs left ...
[2024-02-02T16:39:30] Created 10000 items (100%) in 7.349 secs (1360.641 items/sec) done
[2024-02-02T16:39:30] Copying data.
[2024-02-02T16:39:40] Copied 8.556 GiB (9%) in 10.392 secs (843.066 MiB/s) 111 secs left ...
[2024-02-02T16:39:45] Copied 43.881 GiB (44%) in 14.653 secs (2.995 GiB/s) 19 secs left ...
[2024-02-02T16:39:52] Copied 56.356 GiB (56%) in 22.380 secs (2.518 GiB/s) 17 secs left ...
[2024-02-02T16:39:58] Copied 82.182 GiB (82%) in 27.907 secs (2.945 GiB/s) 6 secs left ...
[2024-02-02T16:39:58] Copied 100.000 GiB (100%) in 27.907 secs (3.583 GiB/s) done
[2024-02-02T16:39:58] Copy data: 100.000 GiB (107374182400 bytes)
[2024-02-02T16:39:58] Copy rate: 3.583 GiB/s (107374182400 bytes in 27.908 seconds)
[2024-02-02T16:39:58] Syncing data to disk.
[2024-02-02T16:39:58] Sync completed in 0.017 seconds.
[2024-02-02T16:39:58] Setting ownership, permissions, and timestamps.
[2024-02-02T16:39:59] Updated 20746 items in 0.995 seconds (20852.120 items/sec)
[2024-02-02T16:39:59] Syncing directory updates to disk.
[2024-02-02T16:39:59] Sync completed in 0.016 seconds.
[2024-02-02T16:39:59] Started: Feb-02-2024,16:39:00
[2024-02-02T16:39:59] Completed: Feb-02-2024,16:39:59
[2024-02-02T16:39:59] Seconds: 58.803
[2024-02-02T16:39:59] Items: 20746
[2024-02-02T16:39:59]   Directories: 10746
[2024-02-02T16:39:59]   Files: 10000
[2024-02-02T16:39:59]   Links: 0
[2024-02-02T16:39:59] Data: 100.000 GiB (107374182400 bytes)
[2024-02-02T16:39:59] Rate: 1.701 GiB/s (107374182400 bytes in 58.803 seconds)

real    1m13.381s
user    0m0.013s
sys     0m0.080s

dcmp

Parallel MPI application to compare two files or to recursively compare files with same relative paths within two different directories.

$ srun --nodes=1 --ntasks=1 dcmp --help

Usage: dcmp [options] source target

Options:
  -o, --output <EXPR:FILE>  - write list of entries matching EXPR to FILE
  -t, --text                - change output option to write in text format
  -b, --base                - enable base checks and normal output with --output
      --bufsize <SIZE>      - IO buffer size in bytes (default 4MB)
      --chunksize <SIZE>    - minimum work size per task in bytes (default 4MB)
  -s, --direct              - open files with O_DIRECT
      --progress <N>        - print progress every N seconds
  -v, --verbose             - verbose output
  -q, --quiet               - quiet output
  -l, --lite                - only compares file modification time and size
  -h, --help                - print usage

EXPR consists of one or more FIELD=STATE conditions, separated with '@' for AND or ',' for OR.
AND operators bind with higher precedence than OR.

Fields: EXIST,TYPE,SIZE,UID,GID,ATIME,MTIME,CTIME,PERM,ACL,CONTENT
States: DIFFER,COMMON
Additional States for EXIST: ONLY_SRC,ONLY_DEST

Example expressions:
- Entry exists in both source and target and type differs between the two
  EXIST=COMMON@TYPE=DIFFER

- Entry exists only in source, or types differ, or permissions differ, or mtimes differ
  EXIST=ONLY_SRC,TYPE=DIFFER,PERM=DIFFER,MTIME=DIFFER

By default, dcmp checks the following expressions and prints results to stdout:
  EXIST=COMMON
  EXIST=DIFFER
  EXIST=COMMON@TYPE=COMMON
  EXIST=COMMON@TYPE=DIFFER
  EXIST=COMMON@CONTENT=COMMON
  EXIST=COMMON@CONTENT=DIFFER
For more information see https://mpifileutils.readthedocs.io.

How to save the comparison results is best described in the documentation, but it is good to go through an example here.

Below is an example comparing using dcmp src dummy_files/ with the dest $FASTDATA_project/user1/dummy_files/ where some files have been removed, and a few edited. The example captures each of those changes in a different output file.

To output the results we use the -o or --output option. First we need to describe what to compare, CONTENT or EXIST and second, separated by an equal sign, under what condition to output the result DIFFER or COMMON for CONTENT and ONLY_SRC or ONLY_DEST for EXIST. Finally, separated by a colon what file to log the output to.

That results in the form -o FIELD=CONDITION:path/to/file.

Note that we have in this example 4 conditions on which logs will be written in 4 files.

  • one when content is different -o CONTENT=DIFFER:dcmp_differ_file.out,

  • one when content is the same -o CONTENT=COMMON:dcmp_common_file.out,

  • one when the file exists in src only -o EXIST=ONLY_SRC:dcmp_onlysrc_file.out,

  • and finally one when the file exists in dest only -o EXIST=ONLY_DEST:dcmp_onlydest_file.out.

The last one should have no results as we have added no files to dest.

To combine results, we can use AND (@) and OR (,). For example, to log all the above conditions in one file, we can use -o CONTENT=DIFFER,CONTENT=COMMON,EXIST=ONLY_SRC,EXIST=ONLY_DEST:dcmp_allconditions.out.

Finally, we use the option --text to have the output in the files in the text format. The default is to output in a binary format that can be piped using the -i option to other tools such as dcp or drm.

$ time srun --nodes=1 --ntasks=120 dcmp \
        -o CONTENT=DIFFER:dcmp_differ_file.out \
        -o CONTENT=COMMON:dcmp_common_file.out \
        -o EXIST=ONLY_SRC:dcmp_onlysrc_file.out \
        -o EXIST=ONLY_DEST:dcmp_onlydest_file.out \
        --text --progress 1 dummy_files/ $FASTDATA_project/user1/dummy_files/
[2024-02-02T17:15:10] Walking source path
[2024-02-02T17:15:10] Walking /p/largedata/project/user1/dummy_files
[2024-02-02T17:15:11] Walked 20746 items in 0.408 secs (50792.286 items/sec) ...
[2024-02-02T17:15:11] Walked 20746 items in 0.410 seconds (50581.193 items/sec)
[2024-02-02T17:15:11] Walking destination path
[2024-02-02T17:15:11] Walking /p/fastdata/project/user1/dummy_files
[2024-02-02T17:15:11] Walked 9403 items in 0.166 secs (56753.740 items/sec) ...
[2024-02-02T17:15:11] Walked 9403 items in 0.166 seconds (56683.154 items/sec)
[2024-02-02T17:15:11] Comparing items
[2024-02-02T17:15:11] Comparing file contents
[2024-02-02T17:15:13] Compared 6.755 GiB (7%) in 2.505 secs (2.696 GiB/s) 33 secs left ...
[2024-02-02T17:15:16] Compared 14.923 GiB (16%) in 4.799 secs (3.110 GiB/s) 26 secs left ...
[2024-02-02T17:15:18] Compared 24.874 GiB (26%) in 7.115 secs (3.496 GiB/s) 20 secs left ...
[2024-02-02T17:15:21] Compared 35.748 GiB (38%) in 10.065 secs (3.552 GiB/s) 17 secs left ...
[2024-02-02T17:15:24] Compared 50.331 GiB (53%) in 13.318 secs (3.779 GiB/s) 12 secs left ...
[2024-02-02T17:15:26] Compared 63.272 GiB (67%) in 15.423 secs (4.102 GiB/s) 8 secs left ...
[2024-02-02T17:15:28] Compared 73.847 GiB (78%) in 17.373 secs (4.251 GiB/s) 5 secs left ...
[2024-02-02T17:15:30] Compared 82.258 GiB (87%) in 18.774 secs (4.382 GiB/s) 3 secs left ...
[2024-02-02T17:15:31] Compared 88.299 GiB (93%) in 19.875 secs (4.443 GiB/s) 1 secs left ...
[2024-02-02T17:15:33] Compared 92.976 GiB (98%) in 21.838 secs (4.257 GiB/s) 0 secs left ...
[2024-02-02T17:15:33] Compared 94.824 GiB (100%) in 21.839 secs (4.342 GiB/s) done
[2024-02-02T17:15:33] Started   : Feb-02-2024, 17:15:11
[2024-02-02T17:15:33] Completed : Feb-02-2024, 17:15:33
[2024-02-02T17:15:33] Seconds   : 21.846
[2024-02-02T17:15:33] Items     : 20746
[2024-02-02T17:15:33] Item Rate : 20746 items in 21.845926 seconds (949.650729 items/sec)
[2024-02-02T17:15:33] Bytes read: 94.824 GiB (101816256512 bytes)
[2024-02-02T17:15:33] Byte Rate : 4.341 GiB/s (101816256512 bytes in 21.846 seconds)
[2024-02-02T17:15:33] Writing to output file: dcmp_differ_file.out
[2024-02-02T17:15:34] Wrote 1146 files in 1.018 seconds (1126.240 files/sec)
Number of items that have different contents: 573 (Src: 573 Dest: 573), dumped to "dcmp_differ_file.out"
[2024-02-02T17:15:34] Writing to output file: dcmp_common_file.out
[2024-02-02T17:15:34] Wrote 17660 files in 0.084 seconds (210193.624 files/sec)
Number of items that have the same content: 8830 (Src: 8830 Dest: 8830), dumped to "dcmp_common_file.out"
[2024-02-02T17:15:34] Writing to output file: dcmp_onlysrc_file.out
[2024-02-02T17:15:34] Wrote 11343 files in 0.081 seconds (139911.700 files/sec)
Number of items that exist only in source directory: N/A (Src: 11343 Dest: 0), dumped to "dcmp_onlysrc_file.out"
[2024-02-02T17:15:34] Writing to output file: dcmp_onlydest_file.out
[2024-02-02T17:15:34] Wrote 0 files in 0.039 seconds (0.000 files/sec)
Number of items that exist only in destination directory: 0 (Src: 0 Dest: 0), dumped to "dcmp_onlydest_file.out"

real    0m38.142s
user    0m0.017s
sys     0m0.082s

drm

A tool for removing files recursively in parallel. drm behaves like rm -rf, but it is faster.

$ srun --nodes=1 --ntasks=1 drm --help

Usage: drm [options] <path> ...

Options:
  -i, --input   <file>   - read list from file
  -o, --output <file>    - write list to file in binary format
  -t, --text             - use with -o; write processed list to file in ascii format
  -l, --lite             - walk file system without stat
      --stat             - walk file system with stat
      --exclude <regex>  - exclude from command entries that match the regex
      --match   <regex>  - apply command only to entries that match the regex
      --name             - change regex to apply to entry name rather than full pathname
      --dryrun           - print out list of files that would be deleted
      --aggressive       - aggressive mode deletes files during the walk. You CANNOT use dryrun with this option.
  -T, --traceless        - remove child items without changing parent directory mtime
      --progress <N>     - print progress every N seconds
  -v, --verbose          - verbose output
  -q, --quiet            - quiet output
  -h, --help             - print usage

For more information see https://mpifileutils.readthedocs.io.

For example,

$ time srun --nodes=1 --ntasks=120 drm --progress 1 $FASTDATA_project/user1/dummy_files/

[2024-02-23T16:02:00] Walking /p/fastdata/project/user1/dummy_files
[2024-02-23T16:02:01] Walked 12763 items in 0.083 secs (153078.963 items/sec) ...
[2024-02-23T16:02:01] Walked 12763 items in 0.085 seconds (150320.623 items/sec)
[2024-02-23T16:02:01] Removing 12763 items
[2024-02-23T16:02:02] Removed 326 items (2.55%) in 1.882 secs (173.193 items/sec) 71 secs remaining ...
[2024-02-23T16:02:03] Removed 2208 items (17.30%) in 2.884 secs (765.598 items/sec) 13 secs remaining ...
[2024-02-23T16:02:04] Removed 4318 items (33.83%) in 3.792 secs (1138.805 items/sec) 7 secs remaining ...
[2024-02-23T16:02:08] Removed 6231 items (48.82%) in 7.196 secs (865.870 items/sec) 7 secs remaining ...
[2024-02-23T16:02:08] Removed 12720 items (99.66%) in 7.255 secs (1753.290 items/sec) 0 secs remaining ...
[2024-02-23T16:02:08] Removed 12763 items (100.00%) in 7.255 secs (1759.202 items/sec) done
[2024-02-23T16:02:08] Removed 12763 items in 7.258 seconds (1758.507 items/sec)

real    0m21.985s
user    0m0.017s
sys     0m0.063s

ddup

Parallel MPI application to report files under a directory tree having identical content.

$ srun --nodes=1 --ntasks=1 ddup --help

Usage: ddup <dir>

Options:
  -d, --debug <DEBUG>  - set verbosity, one of: fatal,err,warn,info,dbg
  -v, --verbose        - verbose output
  -q, --quiet          - quiet output
  -h, --help           - print usage

For more information see https://mpifileutils.readthedocs.io.

For example,

$ time srun --nodes=1 --ntasks=120 ddup -v $FASTDATA_project/user1/dummy_files | tee ddup_files.out
[2024-02-23T15:51:51] Walking /p/fastdata/project/user1/dummy_files
[2024-02-23T15:51:51] Walked 12763 items in 0.242 secs (52738.076 items/sec) ...
[2024-02-23T15:51:51] Walked 12763 items in 0.244 seconds (52354.221 items/sec)
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_4947 1027e873e77474379ad6579600898b57c8bc6a92b2f6b46579817001ab440e37
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_5196 ea26fd104cc729660985f54405541a90665cca859b7382a5136d273fc131ee65
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_5868 7fa936a8c7f45cb6205a2b9054049e49b48ca278a31e0641660527d496953264
/p/fastdata/project/user1/dummy_files/dir_level0_0/dummy_file_6940 45b08ecc134c410d9d5314114c22b0af81ef5d71bd1c129e209ba47bbfa0b0b3
/p/fastdata/project/user1/dummy_files/dir_level0_0/dummy_file_4947 1027e873e77474379ad6579600898b57c8bc6a92b2f6b46579817001ab440e37
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_5634 db87eea1e7141f875f2628787fba5ec73456323b520fbd5d259b67ddb5876df5
/p/fastdata/project/user1/dummy_files/dir_level0_0/dummy_file_6269 ee5bff0173961f7a122a1eaaa32f9795196b07e6f286beb506d642d0cc7d5180
...
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_6512 9112507bd3c47ae43aef1f61319f2ed9804a56a392dd32d1a0338a52aea486a7
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_5035 eddde3058b6c363cd7f2a05d417aab810e757c4fd2fbcd96b4c4c93ca9587e72
/p/fastdata/project/user1/dummy_files/dir_level0_0/dummy_file_5111 a7db3f350fc2a250d64038e6e8cc17855e15970b58f399d216210e8652672ef9
/p/fastdata/project/user1/dummy_files/dir_level0_0/dummy_file_6724 32130bfbd471e0018b93318bb1c6d393a422cec89ba28d89f57b1ccd52571037
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_6844 0cde5d52d33b244f8d0c940c7b97749aefe0440d0b0e8c389c210b95ec76bd61
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_5953 56593066fc075edd9cd26f3799c70f42ba9da07749293cd65c2ffa7c5a74686a
/p/fastdata/project/user1/dummy_files/dir_level0_0/dummy_file_5161 760b96e70ad3ca25a8683e4a7de3478956075273c56de1a493bfa7b785beb9c3
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_7051 10f81a11d73946c2493417858804b4c6abc2137236c8f55a1f2da3cbf2162fb2
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_6229 45e20091e6a282ec34ec5b4e8eb6cb75ca97a74a842e4efe2b2675724dc7ca5e
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_6913 da8711d1ec5f4c1c7ed393169f119c79b4e13860845c5b915510c6f27b75df69
/p/fastdata/project/user1/dummy_files/cp_dir_level0_0/dummy_file_5797 4ca81a63216876078f0cdc16343ea147d205c1e6d7e7abd6475a825308a5ca73

real    0m51.515s
user    0m0.019s
sys     0m0.143s