Data Transfer Methods¶
Purpose¶
This page describes common methods used to move research data between systems.
It is a reference page, not a setup guide. Use it to understand the role, strengths, and limitations of each method.
For task-oriented guidance, see:
Main categories¶
Data transfer methods used in research workflows usually fall into three broad categories:
| Category | Typical methods | Best suited for |
|---|---|---|
| Managed transfer services | Globus | Large or important transfers between systems |
| Command-line transfer tools | scp, rsync, sftp |
Direct transfers where shell access is available |
| Sync and sharing platforms | Nextcloud | User-facing sharing and synchronisation |
These methods are not interchangeable. The appropriate method depends on scale, reliability requirements, environment, and collaborator access.
Managed transfer services¶
Globus¶
Globus is designed for reliable, high-volume transfer between connected systems and endpoints.
Characteristics¶
- built for large-scale data movement
- supports monitored and restartable transfers
- suitable for institution-to-institution workflows
- reduces the need to keep an interactive session open during long transfers
Best suited for¶
- large datasets
- repeated transfers between systems
- transfers that need better reliability than ad hoc command-line copying
- movement between storage systems and supported institutional endpoints
Limitations¶
- requires appropriate endpoint availability
- not itself a storage system
- may require service-specific setup or institutional support
Command-line transfer tools¶
SCP¶
scp copies files directly over SSH.
Characteristics¶
- simple and widely available
- useful for direct copy operations
- suited to smaller or straightforward transfers
Best suited for¶
- one-off transfers
- moving individual files or small groups of files
- environments where SSH access is already available
Limitations¶
- less suitable for very large or fragile transfers
- limited transfer resilience compared with managed services
- not ideal for repeated synchronisation workflows
rsync¶
rsync synchronises files between locations and transfers only changed content where possible.
Characteristics¶
- efficient for repeated transfers
- useful for keeping directories in sync
- commonly used over SSH
Best suited for¶
- updating an existing copy of a dataset
- repeated transfers where only some files change
- mirroring directory structures
Limitations¶
- requires more care than simple copy tools
- still depends on command-line access and correct usage
- not a storage system or collaboration platform
SFTP¶
SFTP provides file transfer over SSH using an interactive file transfer model.
Characteristics¶
- widely supported
- suitable for file-by-file movement
- often available in GUI clients as well as command-line tools
Best suited for¶
- simple transfer workflows
- users who prefer a file transfer interface over shell commands
- environments where SSH-based access is available
Limitations¶
- not optimised for very large-scale transfer workflows
- less efficient than
rsyncfor repeated synchronisation
Sync and sharing platforms¶
Nextcloud¶
Nextcloud provides file access, synchronisation, and sharing.
Characteristics¶
- web and desktop-client access
- synchronisation across devices
- sharing with internal or external collaborators
Best suited for¶
- collaborative document exchange
- smaller research files
- workflows that require user-friendly sharing
Limitations¶
- not a primary research storage system
- not suitable for high-performance workflows
- not appropriate for large dataset movement at scale
For more detail, see: - Nextcloud
Choosing between methods¶
| Need | Most suitable method |
|---|---|
| Move a very large dataset reliably between systems | Globus |
| Copy a small number of files over SSH | SCP |
| Keep two directories in sync over time | rsync |
| Share files with collaborators through a user-facing interface | Nextcloud |
Important distinction¶
Data transfer methods move data between systems.
They do not all provide:
- durable storage
- authoritative data management
- collaboration controls
- compute-local performance
Those functions are provided by storage and access services, not by transfer methods alone.