The nightly archive workflow processes film scans from the scanner to packed format, ready for video rendering.
/Volumes/rhogos/scans/NRNR-complete.pck format for efficient storagePACK_LOCATIONS and REVIEW tablesThe video workflow runs on a separate machine (C231-V) and processes packed scans into viewable videos. It has two parallel pipelines: one for complete rolls, one for individual stories.
| Workflow | Repository | Runs On | Triggered By |
|---|---|---|---|
| Archive | workflow-archive-workspace | kairon (10.3.12.7) | Nightly cron |
| Video | workflow-video-workspace | C231-V (10.4.15.216) | HTTP from archive / nightly cron |
This shows the complete pipeline from film scanner to website.
For the curious: here's what happens when film scans become web videos.
.pck storage format.The entire process is automated and runs overnight. When you see a new video on the website, it went through all these steps while you were sleeping.
A film "roll" contains multiple news stories back-to-back. To make individual stories viewable, our team marks where each story begins and ends.
Some rolls were never released as part of Hearst's Volume/Issue/Story catalog system - they might be outtakes, test footage, or material that was filmed but never used. These rolls are still valuable historical content and are published as complete rolls on the public website, without story breakdown.
Rolls are encoded before story marking for two reasons:
This separation means a roll can appear on Beta within 24 hours of scanning, while the detailed story-by-story breakdown happens over the following days or weeks as staff have time to review.
Each roll can have audio synchronized with the video. Sound is captured, processed, and stored separately from the video frames.
A roll may have up to two sound files:
| Type | Filename | Source |
|---|---|---|
| Standard | NR######-S.flac | Direct capture from the NR element being scanned |
| Alternative (-c) | NR######-S-c.flac | Audio from a different source element |
When both exist, the -c version takes precedence and is used for video encoding. The -c file contains audio sourced from a different element than the one whose NR number it bears (e.g., audio from a better-quality print, or in some cases, assembled from multiple source elements).
/sound directory/shares/sound/final/NR017/)Sound editors may need to correct audio files when:
To provide audio from an alternative source, the sound editor creates a file with the -c suffix (e.g., NR017269-S-c.flac). The workflow automatically uses this version instead of the standard capture.
| Pattern | Example | Meaning |
|---|---|---|
NR######-S.flac | NR017269-S.flac | Standard audio for roll NR017269 |
NR######-S-c.flac | NR017269-S-c.flac | Audio from alternative source (takes priority) |
NR######-S2.flac | NR017269-S2.flac | Audio for second scan of same roll |
NR######-S2-c.flac | NR017269-S2-c.flac | Alternative source audio for second scan |
A question newcomers often ask: why a custom archive format instead of standard image formats? The answer involves both practical efficiency and a perhaps surprising quality observation.
Our Scanity produces 16-bit grayscale TIFFs at exactly 4300×3324 pixels per frame. Each roll contains thousands of frames—a 15-minute roll has roughly 21,600 frames. Storing these as individual TIFFs would be unwieldy (about 28 MB per frame uncompressed).
PACK files (.pck) store frames in a compact binary format:
Result: ~2.4 MB per frame instead of 28 MB—about 92% reduction.
The 3×3 averaging method produces remarkably high quality with essentially no visible artifacts. This works because:
The reduction from 16-bit to 12-bit might seem like a compromise, but in practice:
After 3×3 averaging (which itself reduces noise), the discarded bits contain almost no useful information.
Here's the perhaps surprising observation: we have not found observable significant differences between videos rendered from PACK files and those rendered from full 16-bit TIFFs. The 16-bit masters exist for archival completeness, but in practice the PACK proxies produce equivalent results.
This is why we confidently use PACK files as our rendering source rather than treating them purely as previews.
Another common question: why not use industry-standard grading software like DaVinci Resolve? The answer involves our specific requirements for archival work.
Our timing adjustments are specialized for grayscale archival work—all Hearst newsreels are black and white. The Review application provides these controls:
The Review app supports dedicated hardware controllers (Griffin Powermate, Shuttle Pro) for efficient frame-by-frame work—essential when timing thousands of frames per roll.
Our edit files are plain text, one line per adjustment:
<negative> 1000 shot (0, 0, 1431, 1107) exp (2.00,0,4095) 1193 shot (175, 81, 1387, 971) exp (0.63,578,4095) 1194 nudge (175,88)
This format enables:
| Resolve | Our System |
|---|---|
| Designed for color (RGB) | Optimized for grayscale (single channel) |
| Binary project files | Plain text edit lists |
| Manual timeline workflow | Frame-number based, automatable |
| Interactive grading session | Batch overnight processing |
| Proprietary format lock-in | Simple, archival-friendly format |
For a color film project with complex grading needs, Resolve would be the right choice. For grayscale archival work requiring reproducibility, version control, and automation, our text-based system is more appropriate.
When scanning film, the Scanity uses "film stock" presets to configure illumination levels. For Hearst newsreels, we use custom presets called NNews (negatives) and PNews (positives) with offsets from +05 to +50.
| Aspect | Details |
|---|---|
| Mechanism | Adjusts PrinterLightsMax green channel value |
| Formula (4K) | Green PLMax = 67 - offset (e.g., NNews+35 → green = 32) |
| Standard resolution | 4300×3324 (99.7% of scans) |
| Scale | ~12 points = 1 F-stop of exposure change |
| Range | +05 to +48/50 (~3.5 stops total) |
| Curves applied | None — all scans are linear (Characteristic = LIN) |
Operators are capturing linear, ungraded data. The NNews/PNews presets perform exposure compensation only — no color grading occurs during scanning. This means post-processing workflows must apply appropriate log/gamma curves during color correction.
| Location | Purpose | Path |
|---|---|---|
| rhogos | Working storage for active scans | /Volumes/rhogos/scans/ |
| ARK drives | Long-term archive (ARK-001 through ARK-xxx) | /Volumes/ARK-xxx/ |
| Barn | Packed scans ready for rendering | /Volumes/kairon/barn/ |
| Video | Rendered MP4 files | /Volumes/kairon/video/ |
| Table | Database | Purpose |
|---|---|---|
| SCAN_LOCATIONS | hearst_webapp | Tracks raw scan locations |
| PACK_LOCATIONS | hearst_webapp | Tracks packed scan locations |
| REVIEW | hearst | QC queue for new packs |
| SOUND | hearst_webapp | Audio file metadata |
If scans get stuck at any stage, they can be manually advanced:
mv-complete-nrs manuallyscan-pack on the archive volume, then cat-scans on barncat-scans on barn (updates both PACK_LOCATIONS and REVIEW)10.4.15.216:8020| Term | Meaning |
|---|---|
| NR | Newsreel - the identifier for each film roll (e.g., NR021621) |
| Roll | A complete reel of film, typically 10-20 minutes, containing multiple news stories |
| Story | An individual news segment within a roll, identified by Volume/Issue/Story numbers |
| Scan | The raw digital capture of a film roll from the Scanity scanner |
| Pack (.pck) | Custom binary format storing 12-bit, 3× downsampled frames (7 per file). 92% smaller than source TIFFs with no visible quality loss |
| ARK | Archive drive - offline storage for long-term preservation (ARK-001, ARK-047, etc.) |
| PARK | Parity ARK - a redundant copy of an ARK drive |
| Barn | The directory holding packed scans ready for video encoding (/Volumes/kairon/barn/) |
| rhogos | Network volume for working storage - where active scans are processed |
| kairon | Network volume hosting barn, video output, and other shared resources |
| Beta | The internal preview website where staff review and mark videos |
| HLS | HTTP Live Streaming - adaptive video format that adjusts quality to bandwidth |
| Edits | Text files containing crop, exposure, and timing instructions for each roll |
| FLAC | Lossless audio format used for archival sound (converted from WAV) |
| In/Out points | The start and end timecodes marking a story within a roll |
| -c suffix | Indicates audio sourced from a different element than the NR it's named after (takes priority over standard version) |
| AEO Light | Software for decoding audio from negative density optical tracks. Applies compensation for the gamma mismatch that occurs when reading negative tracks directly (they were designed for photochemical printing, not direct reading) |
| Optical track | The soundtrack printed along the edge of the film, read optically during playback or scanning |
| Composite print | A film print containing both picture and soundtrack on the same strip of film |
| Scanity | The film scanner (by Digital Film Technology) that captures 16-bit grayscale TIFFs at 4300×3324 per frame, plus optical audio |
| Timing | The process of adjusting exposure, contrast, and cropping for each shot. Named after the historical practice of controlling print exposure "times" in the lab |
| Gamma | The non-linear brightness curve applied to an image. Higher gamma brightens midtones; lower gamma darkens them. Our range is typically 0.5–3.0 |
Generated from workflow-archive-workspace/docs/workflow-flowchart.md