Handling Large File Uploads

Best practices for reliably uploading, validating, and processing large files (video, high-res images, archives) with RunAsh.

Overview

Large uploads require special handling to be resilient, cost-efficient, and secure. This guide covers direct (signed) uploads, chunked & resumable uploads, server-side validation, streaming processing, and UX patterns.

Key approaches

Direct uploads (presigned URLs) — client uploads straight to object storage (S3, GCS) using short-lived signed URLs. Minimizes server bandwidth.
Chunked / Resumable uploads — break files into parts; resume after network interruptions. Useful for very large files or unreliable networks.
Multipart uploads — object-store-native multipart (S3 multipart) for efficient parallel upload and server-side assembly.
Streaming ingestion — stream processing pipelines that accept uploads and process them incrementally (transcoding, thumbnailing).

Recommended flow (presigned + webhook)

Client requests a presigned upload URL from your backend (you validate auth & quotas).
Your backend returns a short-lived signed URL + expected metadata (content-type, max-size, checksum optional).
Client uploads directly to storage using the signed URL (PUT/POST).
Storage sends a notification or your client calls your backend with the uploaded file info.
Backend verifies object integrity (size/checksum), enqueues post-processing (transcode, virus-scan), and emits a webhook/event when done.

Presigned URL (pseudo) — backend

POST /api/uploads/presign
Body: { filename: "video.mp4", contentType: "video/mp4", size: 250000000 }
Response: { uploadUrl: "https://storage.example/...signed...", objectKey: "uploads/abc123.mp4", expiresIn: 300 }

Chunked & Resumable Uploads

For shaky networks or very large files use chunked uploads with a resumable protocol (e.g., tus, resumable.js, or custom token-based chunking). Benefits: resume after failures, parallel chunk uploads, progress visibility.

Design notes

Choose a chunk size (e.g., 4–8 MB) that balances throughput and latency.
Maintain chunk sequence numbers and checksums for integrity.
Track upload session state server-side or via a resumable token returned to the client.

Example (high-level)

1) POST /api/uploads/sessions -> returns uploadSessionId
2) Client PUT /uploads/{sessionId}/chunks/{chunkIndex} with chunk payload and checksum
3) After all chunks uploaded -> POST /api/uploads/{sessionId}/complete -> server assembles / invokes storage multipart complete

Security & validation

Authenticate and authorize upload requests on your backend before issuing signed URLs or creating upload sessions.
Enforce size limits, content-type restrictions, and per-user quotas server-side.
Validate uploads after completion using size and checksum (e.g., SHA256) to detect corruption.
Scan files for malware (virus/malware scanning) before processing or making content public.
Use short-lived signed URLs and rotate signing keys regularly.

Post-upload processing

After upload completes, offload heavy work to background workers:

Transcoding / re-encoding (video -> multiple renditions)
Thumbnail generation and waveform extraction (audio)
Metadata extraction, content moderation, and product recognition
Store processed outputs to separate locations and update records when ready

Post-upload webhook (pseudo)

POST /webhooks/upload-complete
{
  "objectKey": "uploads/abc123.mp4",
  "size": 250000000,
  "checksum": "sha256:..."
}

Progress & UX

Show upload progress (per-chunk and overall). Users expect accurate progress bars for large files.
Allow background uploads or provide clear messaging if the user navigates away.
Provide resume options and retry status when connectivity is lost.
Prefer optimistic UI: let the user continue while background processing (transcode) runs, but mark content as "processing".

Retries, backoff & idempotency

Implement idempotent upload chunk endpoints (use sessionId + chunkIndex) so clients can safely retry failed chunk uploads. Use exponential backoff with jitter for transient errors and honor storage service retry guidance.

Client & server example snippets

Upload using presigned URL (client)

// 1) Request presigned URL from backend: POST /api/uploads/presign -> { uploadUrl, objectKey } // 2) Upload file with fetch await fetch(uploadUrl, { method: 'PUT', headers: { 'Content-Type': file.type }, body: file, }); // 3) Notify backend: POST /api/uploads/complete -> { objectKey, size, checksum }

Chunked upload (client pseudo)

Backend: simple presign handler (pseudo)

app.post('/api/uploads/presign', authMiddleware, async (req, res) => { // validate file size, user quota, content-type const { filename, contentType, size } = req.body; if (size > MAX_UPLOAD_BYTES) return res.status(400).json({ error: 'file too large' }); const objectKey = `uploads/${userId}/${randomId()}-${filename}`; const uploadUrl = await storage.generatePresignedPutUrl(objectKey, { contentType, expiresIn: 300 }); res.json({ uploadUrl, objectKey, expiresIn: 300 }); });

Monitoring & cost considerations

Track upload attempts, success rates, average upload times and bytes transferred per user.
Monitor failed/resumed sessions to identify network hotspots or client bugs.
Use lifecycle rules to delete incomplete multipart uploads and remove temp objects to avoid storage costs.
Consider server egress costs — prefer direct-to-storage uploads to reduce server bandwidth and cost.

Checklist

Issue short-lived presigned URLs for direct uploads where possible.
Support resumable chunked uploads for very large files or unreliable networks.
Validate size & checksum after upload and scan for malware before processing.
Enqueue heavy work (transcode, analysis) to background workers and notify via webhook/event when complete.
Implement progress UI, retries with jitter, and clear UX for "processing" state.
Clean up abandoned multipart uploads and temporary storage periodically.

PreviousAuth Token Expiration NextWeb Socket Connection Drops

Was this page helpful?

Your feedback helps us improve RunAsh docs.