Handling Large File Uploads
Best practices for reliably uploading, validating, and processing large files (video, high-res images, archives) with RunAsh.
Large uploads require special handling to be resilient, cost-efficient, and secure. This guide covers direct (signed) uploads, chunked & resumable uploads, server-side validation, streaming processing, and UX patterns.
- Direct uploads (presigned URLs) — client uploads straight to object storage (S3, GCS) using short-lived signed URLs. Minimizes server bandwidth.
- Chunked / Resumable uploads — break files into parts; resume after network interruptions. Useful for very large files or unreliable networks.
- Multipart uploads — object-store-native multipart (S3 multipart) for efficient parallel upload and server-side assembly.
- Streaming ingestion — stream processing pipelines that accept uploads and process them incrementally (transcoding, thumbnailing).
- Client requests a presigned upload URL from your backend (you validate auth & quotas).
- Your backend returns a short-lived signed URL + expected metadata (content-type, max-size, checksum optional).
- Client uploads directly to storage using the signed URL (PUT/POST).
- Storage sends a notification or your client calls your backend with the uploaded file info.
- Backend verifies object integrity (size/checksum), enqueues post-processing (transcode, virus-scan), and emits a webhook/event when done.
POST /api/uploads/presign
Body: { filename: "video.mp4", contentType: "video/mp4", size: 250000000 }
Response: { uploadUrl: "https://storage.example/...signed...", objectKey: "uploads/abc123.mp4", expiresIn: 300 }For shaky networks or very large files use chunked uploads with a resumable protocol (e.g., tus, resumable.js, or custom token-based chunking). Benefits: resume after failures, parallel chunk uploads, progress visibility.
Design notes
- Choose a chunk size (e.g., 4–8 MB) that balances throughput and latency.
- Maintain chunk sequence numbers and checksums for integrity.
- Track upload session state server-side or via a resumable token returned to the client.
Example (high-level)
1) POST /api/uploads/sessions -> returns uploadSessionId
2) Client PUT /uploads/{sessionId}/chunks/{chunkIndex} with chunk payload and checksum
3) After all chunks uploaded -> POST /api/uploads/{sessionId}/complete -> server assembles / invokes storage multipart complete- Authenticate and authorize upload requests on your backend before issuing signed URLs or creating upload sessions.
- Enforce size limits, content-type restrictions, and per-user quotas server-side.
- Validate uploads after completion using size and checksum (e.g., SHA256) to detect corruption.
- Scan files for malware (virus/malware scanning) before processing or making content public.
- Use short-lived signed URLs and rotate signing keys regularly.
After upload completes, offload heavy work to background workers:
- Transcoding / re-encoding (video -> multiple renditions)
- Thumbnail generation and waveform extraction (audio)
- Metadata extraction, content moderation, and product recognition
- Store processed outputs to separate locations and update records when ready
POST /webhooks/upload-complete
{
"objectKey": "uploads/abc123.mp4",
"size": 250000000,
"checksum": "sha256:..."
}- Show upload progress (per-chunk and overall). Users expect accurate progress bars for large files.
- Allow background uploads or provide clear messaging if the user navigates away.
- Provide resume options and retry status when connectivity is lost.
- Prefer optimistic UI: let the user continue while background processing (transcode) runs, but mark content as "processing".
Implement idempotent upload chunk endpoints (use sessionId + chunkIndex) so clients can safely retry failed chunk uploads. Use exponential backoff with jitter for transient errors and honor storage service retry guidance.
- Track upload attempts, success rates, average upload times and bytes transferred per user.
- Monitor failed/resumed sessions to identify network hotspots or client bugs.
- Use lifecycle rules to delete incomplete multipart uploads and remove temp objects to avoid storage costs.
- Consider server egress costs — prefer direct-to-storage uploads to reduce server bandwidth and cost.
- Issue short-lived presigned URLs for direct uploads where possible.
- Support resumable chunked uploads for very large files or unreliable networks.
- Validate size & checksum after upload and scan for malware before processing.
- Enqueue heavy work (transcode, analysis) to background workers and notify via webhook/event when complete.
- Implement progress UI, retries with jitter, and clear UX for "processing" state.
- Clean up abandoned multipart uploads and temporary storage periodically.
Was this page helpful?
Your feedback helps us improve RunAsh docs.