← Back to Beta

Film Scan Workflow Documentation

Contents

Archive Workflow

The nightly archive workflow processes film scans from the scanner to packed format, ready for video rendering.

Archive Workflow Diagram

Stage Descriptions

1. Input

2. Preparation

3. Processing

4. Archive

5. Pack

6. Output


Video Workflow

The video workflow runs on a separate machine (C231-V) and processes packed scans into viewable videos. It has two parallel pipelines: one for complete rolls, one for individual stories.

Video Workflow Diagram

Workflow Boundaries

WorkflowRepositoryRuns OnTriggered By
Archiveworkflow-archive-workspacekairon (10.3.12.7)Nightly cron
Videoworkflow-video-workspaceC231-V (10.4.15.216)HTTP from archive / nightly cron

Combined Workflow (Simplified)

This shows the complete pipeline from film scanner to website.

Combined Workflow Diagram

How Videos Are Made

For the curious: here's what happens when film scans become web videos.

How Videos Are Made Diagram

The Process

  1. Unpack the frames: Each film roll is stored as thousands of compressed images (one per frame of film). The first step is unpacking these from our .pck storage format.
  2. Apply adjustments: Our team marks each roll with timing and grading instructions - where to crop the image, how to adjust exposure for faded film, etc. These adjustments are applied to each frame. (All Hearst newsreels are black and white.)
  3. Assemble into video: The processed frames are stitched together at 24 frames per second (the original film speed) to create smooth motion.
  4. Synchronize audio: The optical soundtrack captured during scanning (or a manually-corrected replacement) is synchronized with the video.
  5. Compress for delivery: Raw video would be enormous. We compress it using modern codecs (H.264) to reduce file sizes by 95%+ while preserving quality.
  6. Create multiple versions:

The entire process is automated and runs overnight. When you see a new video on the website, it went through all these steps while you were sleeping.


How Stories Are Created

A film "roll" contains multiple news stories back-to-back. To make individual stories viewable, our team marks where each story begins and ends.

How Stories Are Created Diagram

The Process

  1. View the roll: Staff watch the complete roll video on the Beta website.
  2. Mark story boundaries: Using the website's editing interface, they mark the "in" point (where a story starts) and "out" point (where it ends). A typical roll might have 5-15 separate news stories.
  3. Save to database: The timecodes are stored in the database, linked to Hearst's historical catalog system (Volume, Issue, Story numbers).
  4. Overnight encoding: The story workflow reads these in/out points and extracts each story segment from the roll, encoding it as a separate video file.
  5. Publish: Each story gets its own:

Not All Rolls Have Stories

Some rolls were never released as part of Hearst's Volume/Issue/Story catalog system - they might be outtakes, test footage, or material that was filmed but never used. These rolls are still valuable historical content and are published as complete rolls on the public website, without story breakdown.

Why Encode Rolls First?

Rolls are encoded before story marking for two reasons:

This separation means a roll can appear on Beta within 24 hours of scanning, while the detailed story-by-story breakdown happens over the following days or weeks as staff have time to review.


How Sound Files Work

Each roll can have audio synchronized with the video. Sound is captured, processed, and stored separately from the video frames.

Sound Capture Methods

Sound File Types

A roll may have up to two sound files:

TypeFilenameSource
StandardNR######-S.flacDirect capture from the NR element being scanned
Alternative (-c)NR######-S-c.flacAudio from a different source element

When both exist, the -c version takes precedence and is used for video encoding. The -c file contains audio sourced from a different element than the one whose NR number it bears (e.g., audio from a better-quality print, or in some cases, assembled from multiple source elements).

Sound Processing Pipeline

  1. Capture: Scanity records WAV files during scanning, stored in the scan's /sound directory
  2. Normalize: Audio levels are normalized to consistent volume
  3. Convert: WAV is converted to FLAC format (lossless compression)
  4. Store: FLAC files are copied to the sound server, organized by NR prefix (e.g., /shares/sound/final/NR017/)
  5. Sync: During video encoding, the audio is synchronized with the video frames

Sound Editor Workflow

Sound editors may need to correct audio files when:

To provide audio from an alternative source, the sound editor creates a file with the -c suffix (e.g., NR017269-S-c.flac). The workflow automatically uses this version instead of the standard capture.

File Naming Convention

PatternExampleMeaning
NR######-S.flacNR017269-S.flacStandard audio for roll NR017269
NR######-S-c.flacNR017269-S-c.flacAudio from alternative source (takes priority)
NR######-S2.flacNR017269-S2.flacAudio for second scan of same roll
NR######-S2-c.flacNR017269-S2-c.flacAlternative source audio for second scan

Why We Use PACK Files

A question newcomers often ask: why a custom archive format instead of standard image formats? The answer involves both practical efficiency and a perhaps surprising quality observation.

The Format

Our Scanity produces 16-bit grayscale TIFFs at exactly 4300×3324 pixels per frame. Each roll contains thousands of frames—a 15-minute roll has roughly 21,600 frames. Storing these as individual TIFFs would be unwieldy (about 28 MB per frame uncompressed).

PACK files (.pck) store frames in a compact binary format:

Result: ~2.4 MB per frame instead of 28 MB—about 92% reduction.

Why 3×3 Averaging Works So Well

The 3×3 averaging method produces remarkably high quality with essentially no visible artifacts. This works because:

Why 12-bit is Sufficient

The reduction from 16-bit to 12-bit might seem like a compromise, but in practice:

After 3×3 averaging (which itself reduces noise), the discarded bits contain almost no useful information.

Proxies That Match Masters

Here's the perhaps surprising observation: we have not found observable significant differences between videos rendered from PACK files and those rendered from full 16-bit TIFFs. The 16-bit masters exist for archival completeness, but in practice the PACK proxies produce equivalent results.

This is why we confidently use PACK files as our rendering source rather than treating them purely as previews.


Why Custom Timing Tools (Not DaVinci Resolve)

Another common question: why not use industry-standard grading software like DaVinci Resolve? The answer involves our specific requirements for archival work.

What "Timing" Means Here

Our timing adjustments are specialized for grayscale archival work—all Hearst newsreels are black and white. The Review application provides these controls:

Exposure

Crop and Position

Hardware Support

The Review app supports dedicated hardware controllers (Griffin Powermate, Shuttle Pro) for efficient frame-by-frame work—essential when timing thousands of frames per roll.

Versionable Edit Lists

Our edit files are plain text, one line per adjustment:

<negative>
1000 shot (0, 0, 1431, 1107) exp (2.00,0,4095)
1193 shot (175, 81, 1387, 971) exp (0.63,578,4095)
1194 nudge (175,88)

This format enables:

Why Not Resolve?

ResolveOur System
Designed for color (RGB)Optimized for grayscale (single channel)
Binary project filesPlain text edit lists
Manual timeline workflowFrame-number based, automatable
Interactive grading sessionBatch overnight processing
Proprietary format lock-inSimple, archival-friendly format

For a color film project with complex grading needs, Resolve would be the right choice. For grayscale archival work requiring reproducibility, version control, and automation, our text-based system is more appropriate.

The Workflow

  1. Review application: Staff adjust timing with live preview using our macOS tool
  2. Save edits: Timing decisions stored as text files
  3. Version control: Edits committed to Git
  4. Overnight encoding: Automated pipeline reads edit files, renders videos
  5. Re-render on change: Edit file updates trigger automatic re-encoding

Film Stock Presets (NNews/PNews)

When scanning film, the Scanity uses "film stock" presets to configure illumination levels. For Hearst newsreels, we use custom presets called NNews (negatives) and PNews (positives) with offsets from +05 to +50.

Detailed Documentation:

Quick Summary

AspectDetails
MechanismAdjusts PrinterLightsMax green channel value
Formula (4K)Green PLMax = 67 - offset (e.g., NNews+35 → green = 32)
Standard resolution4300×3324 (99.7% of scans)
Scale~12 points = 1 F-stop of exposure change
Range+05 to +48/50 (~3.5 stops total)
Curves appliedNone — all scans are linear (Characteristic = LIN)

Key Finding

Operators are capturing linear, ungraded data. The NNews/PNews presets perform exposure compensation only — no color grading occurs during scanning. This means post-processing workflows must apply appropriate log/gamma curves during color correction.

Note: If a scan appears too dark or washed out in the final video, the issue is in the edit file timing, not the film stock preset. The preset choice is recorded in the scan metadata for reference but doesn't need to be "corrected" after scanning.

Key Storage Locations

LocationPurposePath
rhogosWorking storage for active scans/Volumes/rhogos/scans/
ARK drivesLong-term archive (ARK-001 through ARK-xxx)/Volumes/ARK-xxx/
BarnPacked scans ready for rendering/Volumes/kairon/barn/
VideoRendered MP4 files/Volumes/kairon/video/

Database Tables Updated

TableDatabasePurpose
SCAN_LOCATIONShearst_webappTracks raw scan locations
PACK_LOCATIONShearst_webappTracks packed scan locations
REVIEWhearstQC queue for new packs
SOUNDhearst_webappAudio file metadata

Troubleshooting: Manual Intervention

If scans get stuck at any stage, they can be manually advanced:

  1. Stuck in NR: Run mv-complete-nrs manually
  2. Archived but not packed: Run scan-pack on the archive volume, then cat-scans on barn
  3. Packed but not in REVIEW: Run cat-scans on barn (updates both PACK_LOCATIONS and REVIEW)
  4. Not rendering: Check render service at 10.4.15.216:8020

Glossary

TermMeaning
NRNewsreel - the identifier for each film roll (e.g., NR021621)
RollA complete reel of film, typically 10-20 minutes, containing multiple news stories
StoryAn individual news segment within a roll, identified by Volume/Issue/Story numbers
ScanThe raw digital capture of a film roll from the Scanity scanner
Pack (.pck)Custom binary format storing 12-bit, 3× downsampled frames (7 per file). 92% smaller than source TIFFs with no visible quality loss
ARKArchive drive - offline storage for long-term preservation (ARK-001, ARK-047, etc.)
PARKParity ARK - a redundant copy of an ARK drive
BarnThe directory holding packed scans ready for video encoding (/Volumes/kairon/barn/)
rhogosNetwork volume for working storage - where active scans are processed
kaironNetwork volume hosting barn, video output, and other shared resources
BetaThe internal preview website where staff review and mark videos
HLSHTTP Live Streaming - adaptive video format that adjusts quality to bandwidth
EditsText files containing crop, exposure, and timing instructions for each roll
FLACLossless audio format used for archival sound (converted from WAV)
In/Out pointsThe start and end timecodes marking a story within a roll
-c suffixIndicates audio sourced from a different element than the NR it's named after (takes priority over standard version)
AEO LightSoftware for decoding audio from negative density optical tracks. Applies compensation for the gamma mismatch that occurs when reading negative tracks directly (they were designed for photochemical printing, not direct reading)
Optical trackThe soundtrack printed along the edge of the film, read optically during playback or scanning
Composite printA film print containing both picture and soundtrack on the same strip of film
ScanityThe film scanner (by Digital Film Technology) that captures 16-bit grayscale TIFFs at 4300×3324 per frame, plus optical audio
TimingThe process of adjusting exposure, contrast, and cropping for each shot. Named after the historical practice of controlling print exposure "times" in the lab
GammaThe non-linear brightness curve applied to an image. Higher gamma brightens midtones; lower gamma darkens them. Our range is typically 0.5–3.0

Generated from workflow-archive-workspace/docs/workflow-flowchart.md