Run Large-Scale Analysis¶
What you are trying to do¶
You want to run analysis that exceeds the capacity of a local machine.
This typically involves:
- using shared compute infrastructure
- running jobs in a managed environment
- working with larger datasets or longer-running processes
Key decisions¶
Before starting, clarify:
What type of workload are you running?¶
- batch processing
- parallel or distributed workloads
- GPU-accelerated computation
How large is the data?¶
- data that fits locally
- large datasets requiring shared storage
- data generated during computation
How will the analysis run?¶
- short interactive runs
- long-running jobs
- repeated or automated workflows
Recommended path¶
1. Use an appropriate compute service¶
Start here:
2. Prepare your data¶
Ensure that:
- your data is available in the compute environment
- input and output locations are clearly defined
If data needs to be moved:
3. Run your analysis¶
Submit and manage your workload using the appropriate tools:
If needed:
4. Retrieve and manage results¶
After the analysis:
- retrieve output data
- store results in an appropriate location
- share results if required