Split large files, run parallel processing, and stream batch analysis. Use when sampling datasets, aggregating logs, or transforming bulk data.