Sharing data on Swan

By default, all user directories on Swan are accessible only by the HCC user themselves. When you want to share data stored on Swan with collaborators, we provide multiple ways on how you can do that, depending on your need.

Each method presented here has its advantages and disadvantages, and some methods work for sharing data with other HCC users and some when data is shared with external collaborators without HCC accounts.

The common methods used to share data stored on Swan are:

and the Sections below contain detailed information and setup steps for each.


Standard Unix permissions

Each file on the cluster can have read (r), write (w) and execute (x) permissions for different access groupings. These access groupings are known as the user (u), group (g) and other (o) permission modes of the file.

The user permissions map to the UID (user identifier number) of the account that created the file. Similarly, the group permissions map to the GID (group identifier number) of the account that created the file. Generally, the GID is the primary group of the user account in most cases. The other (o) permissions map to all other users not matching the prior two groupings. To say that another way, your HCC account username maps to the UID, your HCC primary group maps to the GID (although this may depend on where the file is located/created with regards to supplementary group access), and the other is all the other users that are not part of your HCC user account or group(s).

The (x) permission differs depending on the file type. Directory type with (x) will allow search operations for the grouping involved under that directory path - lacking the (x) bit will result in permission denied errors for the grouping being checked for path access. File types with (x) are known as executable files that the system will run (load a program image file instance into RAM memory and execute it), while files without (x) tend to be data files of some sort used for input or output.

Directory files start with a d in the permission listing, while files have a hyphen -. Next, the user (u), group (g) and other (o) permission modes follow, i.e., tuuugggooo, where t is the type of directory/file, uuu, ggg and ooo are permission placeholders for the prior mentioned (u), (g) and (o) permission groupings.

Directory permission mode example:

drwxr-x--x  # an example directory permission modes
||||||||||
tuuugggooo  # permission mode template
d           # directory type "d"
 rwx        # directory user (UID owner) has read (r) write (w) and access/execute (x) permissions to what is contained in the directory
    r-x     # directory group (GID owner) has only read (r) and access/execute (x) permissions to what is contained in the directory, but cannot create new entries
       --x  # all other users can try to search/access/execute (x) for content, if they know the absolute path under the directory already, but cannot read (r) to check/discover existing entries if they don't know of them, or write (w) to create new entries within the directory

File permission mode example:

-rw-r-----  # an example file permission modes
||||||||||
tuuugggooo  # permission mode template
-           # file type "-"
 rw-        # file user (UID owner) has read (r) write (w) but not execute (x) permissions to the file
    r--     # file group (GID owner) has only read (r) permissions
       ---  # all other users on the system have no access

File and directories permissions can be set using chmod.

Using group-level shared directory

If you want to share data between group members, we can create group-level read-only or read-write shared directory under /work/group/shared or /common/group/shared for this purpose.

Everyone that is part of the HCC group (whether with primary or supplementary access) can read data stored in the shared directory and/or write data in the shared directory. If you are interested in having a group-level shared directory, please email hcc-support@unl.edu for the setup.

When data is stored in the shared directory, occasionally some permission errors may occur. In this case, a shared_fix.sh script can be used to correct the permissions that is created by HCC staff when the group-level shared directory is set. This script should be run by the owner of the data in the shared directory where other group members have difficulty with the needed access. The script will ensure that the group (g) modes match the user’s user (u) permissions of the owner of the shared files so group members can have the same level of access.

Please note that both ${WORK} and ${COMMON} have their advantages and disadvantages, so please be aware of that when choosing where to store the group-level shared directory.

Pros:

  • Data in the shared directory can be easily accessed on the cluster and used as part of SLURM jobs.
  • The access for the shared directory can be read-only or read-write.
  • When multiple HCC users need access to the same data and scripts, storing the data in group-level shared directory is recommended.

Cons:

  • Users need HCC accounts to access the shared directory from the cluster nodes.
  • Users need to be part of the HCC group with the shared directory in order to access data.
  • The permissions are set as discussed above in the Standard Unix Permissions section.

While the group-level shared directory can be created as read-only or read-write, please always make sure that the shared data has the correct permissions.

Using user-level world-readable directory

If you want to create directory under your HCC account that is readable and accessible by everyone with HCC account, whether or not you are part of the same HCC group, the commands you can use are:

cd ${WORK}
mkdir public
chmod go+x ${WORK}  # ensure directory search is possible to your ${WORK} for group (g) and other (o)
chmod u=rwx,go=rx public

Here, read (r), write (w) and execute (x) permissions are given to the user (u), and read (r) and execute (x) permissions are given for the group (g) and others (o). After the world-readable directory is created, you can share the absolute path to it with your collaborators that have HCC accounts.

Pros:

  • Data in the shared directory can be easily accessed on the cluster and used as part of SLURM jobs.
  • Easy way to share a single file.

Cons:

  • Users need HCC accounts to access the shared directory from the cluster nodes.
  • You should be careful when setting permissions this way - you can lock yourself from your HCC account if the permissions are insufficient and/or you can give public access to your files.

Please note that when sharing a file, all the directories in the path to the file need to have execute (x) bits set (in order for its contents to be accessible) and read ® bits set (to show up in listing queries), e.g., the ls -l command. For example, if you want to share the directory /work/group/username/shared/, read (r) and execute (x) permissions should be given to /work, /work/group, /work/group/username and /work/group/username/shared to ensure both access to the files and the ability to list directory entries for the various path components.

Using POSIX Access Control Lists (ACL)

With the standard Unix/POSIX permissions the cluster uses, it is not possible to share data with only a single user as only the user, group and other permission model is in effect.

However, this is possible with the POSIX Access Control Lists model (POSIX extended ACLs) which extends the standard POSIX model. This is more involved setup that is only recommended for the advanced user that has the need and is already well experienced with the standard model. We refer such users to the tool documentation of getfacl and setfacl.

Please note that only the ${WORK} file-system on the cluster supports ACL, and sharing data in this way on ${COMMON} is not possible.

One can use ACL on directories/files stored in ${WORK} with the getfacl and setfacl commands mentioned above. Similar to Unix/POSIX permissions, ACL provides read (r), write (w) and execute (x) permissions for the user (u), group (g) and other (o). The user is your HCC account, the group is your HCC group (or supplementary group for where the file is located), and the other is all the other users that are not part of your HCC group. An ACL can “extend” this prior mapping by allowing a per-user and/or per-group list of additional groupings that reside within the traditional/standard model’s “group permission” grouping. To say that another way, the group rwx permissions mapping expands to multiple entries that only the prior mentioned tools can work with.

Things to remember: - Presently only the ${WORK} file-system supports POSIX extended ACLs, and sharing data in this way on ${COMMON} is not possible. - HCC staff cannot help with advanced permission modes as the end user is ultimately responsible for these settings if they choose to add and use them.

To view the ACL setting for the file file.txt on ${WORK}, one can run:

getfacl file.txt
$ getfacl file.txt
# file: file.txt
# owner: demo01
# group: demo
user::rw-
group::r--
other::r--

Running as user demo01, to share the file ${WORK}/shared/file.txt with a user demo02 and grant them read (r) and write (w) permissions, the setup steps are:

$ cd ${WORK}/shared  # the "shared" path must be setup by HCC staff, please see the section above on how to request access

# create an empty file
$ touch file.txt

# check the file permissions
$ ls -l file.txt
-rw-r--r-- 1 demo01 demo 0 Aug 22 16:25 file.txt

# view the ACL settings for the file
$ getfacl file.txt
# file: file.txt
# owner: demo01
# group: demo
user::rw-
group::r--
other::r--

# set/update the ACL settings for the file
$ setfacl -m user:demo02:rw file.txt

# check the file permissions
$ ls -l file.txt
-rw-rw-r--+ 1 demo01 demo 0 Aug 22 16:25 file.txt

Note the + character being added at the end of the permission mode line (-rw-rw-r--+). This indicates a directory or file that has extended ACL rules added to it.

# view the ACL settings for the file
$ getfacl file.txt
# file: file.txt
# owner: demo01
# group: demo
user::rw-
user:demo02:rw-
group::r--
mask::rw-
other::r--

Note that a user:demo02:rw- mapping was added in the ACL listing. This means the listed user account can be granted the rw permission modes only when the “allow mask” line would allow for it, which in this case it does (mask::rw-).

Directores carry a default ACL entry to grant users/groups that have an entry to pass the same defaults to child directories and files that are created within/under it.

# create directory
$ mkdir test_dir

# check the directory permissions
$ ls -ld test_dir
drwxr-sr-x 2 demo01 demo 33280 Aug 22 16:36 test_dir

# set/update the ACL settings for the directory
# ensure all users can collaborate on newly created files, in this case demo01 and demo02 accounts are working together and expect to share files amongst themselves
$ setfacl -m default:user:demo01:rwx -m default:user:demo02:rwx test_dir/

# check the directory permissions
$ ls -ld test_dir
drwxr-sr-x+ 2 demo01 demo 33280 Aug 22 16:36 test_dir

Note the + character being added at the end of the permission mode line (drwxr-sr-x+). This indicates a directory or file that has extended ACL rules added to it.

# view the ACL settings for the directory
$ getfacl test_dir/
# file: test_dir/
# owner: demo01
# group: demo
# flags: -s-
user::rwx
group::r-x
other::r-x
default:user::rwx
default:user:demo01:rwx
default:user:demo02:rwx
default:group::r-x
default:mask::rwx
default:other::r-x

# create an empty file
$ touch test_dir/file.txt

# check the file permissions
$ ls -l test_dir/file.txt
-rw-rw-r--+ 1 demo01 demo 0 Aug 22 16:37 test_dir/file.txt

# view the ACL settings for the file
$ getfacl test_dir/file.txt
# file: test_dir/file.txt
# owner: demo01
# group: demo
user::rw-
user:demo01:rwx                 #effective:rw-
user:demo02:rwx                 #effective:rw-
group::r-x                      #effective:r--
mask::rw-
other::r--

Note the effective mode differs from the rule, this is because the touch command used open() octal permissions of 666 for the file (4 for read (r), 2 for write (w) and 1 for execute (x) was missing).

# give the file execute permissions
$ chmod g+x test_dir/file.txt

# check the file permissions
$ ls -l test_dir/file.txt
-rw-rwxr--+ 1 demo01 demo 0 Aug 22 16:37 test_dir/file.txt

# view the ACL settings for the file
$ getfacl test_dir/file.txt
# file: test_dir/file.txt
# owner: demo01
# group: demo
user::rw-
user:demo01:rwx
user:demo02:rwx
group::r-x
mask::rwx
other::r--

Changing the group permission on the file updated the mask::rwx extended ACL entry to “allow” the execute (x) permission that was previously missing. Note well, even though the group permissions in the ls listing show rwx for the group, actual GID group members would only have r-x access as the “allow” mask property is what is actually listed.

With the setfacl commands above, the listed demo accounts are given read (r) write (w) or execute (x) access to the file file.txt by the ACL. Standard permission modes still apply and it is assumed that these demo accounts have sufficient directory search (x) permissions to reach the ${WORK}/shared path. These details may need to be given when HCC staff sets up the shared path if the user accounts are not members of the group involved at the path ${WORK} expands to.

Remove all extended ACL entries for file.txt:

setfacl -b file.txt

More examples on ACLs can be found here and the author of the Linux POSIX ACL implementation has an excellent document on the topic here.

Pros:

  • ACL provides safer and more flexible way to manage access to share data than standard Unix/POSIX permissions.

Cons:

  • Using ACLs is advanced approach and requires detailed understanding to get the details correct.
  • Users need HCC accounts to access the shared directory.

Please note that when sharing a file, all the directories in the path to the file need to have execute (x) bits set (in order for its contents to be accessible) and read ® bits set (to show up in listing queries), e.g., the ls -l command. For example, if you want to share the directory /work/group/username/shared/, read (r) and execute (x) permissions should be given to /work, /work/group, /work/group/username and /work/group/username/shared to ensure both access to the files and the ability to list directory entries for the various path components.

Please note that using ACLs is not straight-forward and please consider this approach only when the other suggested approaches here do not apply to you. If you have any questions about using ACLs, please email hcc-support@unl.edu.

Using Globus shared collections

If you want to share data with a single user, a specific custom group of users (independent of your HCC group), or with external collaborators (without HCC accounts), Linux user/group permissions do not provide that flexibility. In this case, Globus shared collections offer much more flexibility and control.

Once a shared Collection is created (e.g., /work/group/username/shared), you can perform “Add Permissions - Share With” multiple times, and each time you can select different subdirectory from the created Collection, set different permissions and share it with different collaborators.

Some things to note:

  • you can share subdirectories on the same level under the Collection with different permissions for different users (e.g., /work/group/username/shared/shared1 can be Read/Write and /work/group/username/shared/shared2 can be Read-Only);
  • if you set the permissions of the subdirectory to Read/Write, all the directories within this subdirectory will have Read/Write permissions and you can not overwrite that (e.g., if /work/group/username/shared/shared1 is Read/Write then /work/group/username/shared/shared1/test will be Read/Write too);
  • if you set the permissions of the subdirectory to Read-Only, you can set Read/Write permissions to the directories within this subdirectory (e.g., if /work/group/username/shared/shared1 is Read-Only then /work/group/username/shared/shared1/test can be set to Read/Write if needed).

Pros:

  • Users don’t need HCC accounts to access shared data via Globus.
  • Users don’t need to be part of the same HCC group to share data via Globus.
  • Anyone with institutional and InCommon credentials can login to Globus.
  • Globus shared collections offer much more flexiblity and control - different data can be shared with different users.
  • The access for the shared data can be read-only or read-write.
  • All file access via Globus is proxied as the user that sets up the share, so all files are owned and accessed as this user.

Cons:

  • Data shared with Globus can not be accessed directly on the cluster and the data will need to be transferred to the cluster if it is used as part of SLURM job; unless the data being shared is from a cluster file-system, in which case the prior mentioned Unix permissions and/or ACLs may be needed to grant the HCC accounts the needed permissions - thus complicating the share.
  • Globus provides web-based App and a CLI tool for the transfer.