← Back to Beta

Film Scan Workflow Documentation

Contents

Archive Workflow
Video Workflow
Combined Overview
How Videos Are Made
How Stories Are Created
How Sound Files Work
Why We Use PACK Files
Why Custom Timing Tools
Film Stock Presets (NNews/PNews)
Storage Locations
Database Tables
Troubleshooting
Glossary

Archive Workflow

The nightly archive workflow processes film scans from the scanner to packed format, ready for video rendering.

Stage Descriptions

1. Input

Film Scanner: Physical Scanity scanner captures film frames and audio
NR Directory: Raw scans land in /Volumes/rhogos/scans/NR

2. Preparation

Mount Drives: Connect to all network storage volumes
Check Environment: Verify all systems are accessible
Move Valid Scans: Scans passing validation move to NR-complete
Fix Permissions: Ensure files are readable across systems

3. FlexSync Staging

Stage to Kairon: FlexSync replicates scans from rhogos to kairon staging directory (server-side, avoids FC client issues)
Track Status: SCAN_STAGING table tracks each scan's progress through staging and archive

4. Processing

Generate Hash Signatures: Create checksums for integrity verification
Convert Audio: Transform WAV files to space-efficient FLAC format
Catalog Scans: Record scan metadata in the database

5. Archive

Sync Database Backup: Copy database to archive drives for safety
Copy to ARK/PARK: Duplicate scans from staging to ARK drives, then ARK to PARK (redundant copy via fast Thunderbolt bus)
Move to Kairon: Move scans from staging to final kairon location
Low Space Alerts: Email notification when ARK/PARK pairs drop below 2TB

6. Pack

Pack Scans: Compress image frames into .pck format for efficient storage
Update Catalog: Add entries to PACK_LOCATIONS and REVIEW tables
Trigger Render: Send HTTP request to start the video workflow

7. Output

Review Queue: Packs appear in QC system for human review

Video Workflow

The video workflow runs on a separate machine (C231-V) and processes packed scans into viewable videos. It has two parallel pipelines: one for complete rolls, one for individual stories.

Workflow Boundaries

Workflow	Repository	Runs On	Triggered By
Archive	`workflow-archive-workspace`	C231A	Nightly cron
Video	`workflow-video-workspace`	C231-V	HTTP from archive / nightly cron

Combined Workflow (Simplified)

This shows the complete pipeline from film scanner to website.

How Videos Are Made

For the curious: here's what happens when film scans become web videos.

The Process

Unpack the frames: Each film roll is stored as thousands of compressed images (one per frame of film). The first step is unpacking these from our .pck storage format.
Apply adjustments: Our team marks each roll with timing and grading instructions - where to crop the image, how to adjust exposure for faded film, etc. These adjustments are applied to each frame. (All Hearst newsreels are black and white.)
Assemble into video: The processed frames are stitched together at 24 frames per second (the original film speed) to create smooth motion.
Synchronize audio: The optical soundtrack captured during scanning (or a manually-corrected replacement) is synchronized with the video.
Compress for delivery: Raw video would be enormous. We compress it using modern codecs (H.264) to reduce file sizes by 95%+ while preserving quality.
Create multiple versions:
- 1080p master: Full quality for detailed viewing
- 360p preview: Smaller files for quick browsing
- HLS streams: Adaptive quality that adjusts to your internet speed
- Thumbnails: Preview images for the website

The entire process is automated and runs overnight. When you see a new video on the website, it went through all these steps while you were sleeping.

How Stories Are Created

A film "roll" contains multiple news stories back-to-back. To make individual stories viewable, our team marks where each story begins and ends.

The Process

View the roll: Staff watch the complete roll video on the Beta website.
Mark story boundaries: Using the website's editing interface, they mark the "in" point (where a story starts) and "out" point (where it ends). A typical roll might have 5-15 separate news stories.
Save to database: The timecodes are stored in the database, linked to Hearst's historical catalog system (Volume, Issue, Story numbers).
Overnight encoding: The story workflow reads these in/out points and extracts each story segment from the roll, encoding it as a separate video file.
Publish: Each story gets its own:
- MP4 video file
- HLS streaming version
- Thumbnail image
- Entry in the website's story browser

Not All Rolls Have Stories

Some rolls were never released as part of Hearst's Volume/Issue/Story catalog system - they might be outtakes, test footage, or material that was filmed but never used. These rolls are still valuable historical content and are published as complete rolls on the public website, without story breakdown.

Why Encode Rolls First?

Rolls are encoded before story marking for two reasons:

Speed: Staff can review content the next day without waiting for story marking
Flexibility: Story boundaries can be adjusted later, and only the affected stories need re-encoding

This separation means a roll can appear on Beta within 24 hours of scanning, while the detailed story-by-story breakdown happens over the following days or weeks as staff have time to review.

How Sound Files Work

Each roll can have audio synchronized with the video. Sound is captured, processed, and stored separately from the video frames.

Sound Capture Methods

Scanity optical capture: The standard method. During scanning, the Scanity reads the optical soundtrack printed on the film edge and records it as a WAV file.
AEO Light: Used for negative density tracks. While Scanity can physically read these tracks, the output is clipped and distorted. This is because negative optical soundtracks were recorded with densities optimized for the full photochemical chain: negative → print stock → positive. The print stock's characteristic curve (its gamma response) would transform the densities during printing. When Scanity reads the negative directly, it interprets densities that were never meant to be final values—they were intermediate values designed to produce correct audio after the print transformation. The result is wrong amplitude mapping: clipping where density exceeds readable range, and distortion from the uncorrected transfer function. AEO Light applies a compensation curve to simulate what the photochemical print process would have done, properly decoding the audio.

Sound File Types

A roll may have up to two sound files:

Type	Filename	Source
Standard	`NR######-S.flac`	Direct capture from the NR element being scanned
Alternative (-c)	`NR######-S-c.flac`	Audio from a different source element

When both exist, the -c version takes precedence and is used for video encoding. The -c file contains audio sourced from a different element than the one whose NR number it bears (e.g., audio from a better-quality print, or in some cases, assembled from multiple source elements).

Sound Processing Pipeline

Capture: Scanity records WAV files during scanning, stored in the scan's /sound directory
Normalize: Audio levels are normalized to consistent volume
Convert: WAV is converted to FLAC format (lossless compression)
Store: FLAC files are copied to the sound server, organized by NR prefix (e.g., /shares/sound/final/NR017/)
Sync: During video encoding, the audio is synchronized with the video frames

Sound Editor Workflow

Sound editors may need to correct audio files when:

The optical track is damaged or has dropouts
The Scanity capture has timing issues
A better source exists (e.g., from a different print)
The track needs noise reduction or other restoration

To provide audio from an alternative source, the sound editor creates a file with the -c suffix (e.g., NR017269-S-c.flac). The workflow automatically uses this version instead of the standard capture.

File Naming Convention

Pattern	Example	Meaning
`NR######-S.flac`	`NR017269-S.flac`	Standard audio for roll NR017269
`NR######-S-c.flac`	`NR017269-S-c.flac`	Audio from alternative source (takes priority)
`NR######-S2.flac`	`NR017269-S2.flac`	Audio for second scan of same roll
`NR######-S2-c.flac`	`NR017269-S2-c.flac`	Alternative source audio for second scan

Why We Use PACK Files

A question newcomers often ask: why a custom archive format instead of standard image formats? The answer involves both practical efficiency and a perhaps surprising quality observation.

The Format

Our Scanity produces 16-bit grayscale TIFFs at exactly 4300×3324 pixels per frame. Each roll contains thousands of frames—a 15-minute roll has roughly 21,600 frames. Storing these as individual TIFFs would be unwieldy (about 28 MB per frame uncompressed).

PACK files (.pck) store frames in a compact binary format:

Downsampled 3:1: Each 3×3 block of pixels becomes one pixel via averaging, producing 1432×1108 images
12-bit depth: Reduced from 16-bit (the top 12 bits are preserved)
Batched: 7 frames per .pck file with a simple header

Result: ~2.4 MB per frame instead of 28 MB—about 92% reduction.

Why 3×3 Averaging Works So Well

The 3×3 averaging method produces remarkably high quality with essentially no visible artifacts. This works because:

Even division: 4300÷3 and 3324÷3 divide cleanly (nearly—we skip 2 columns), so every output pixel samples exactly 9 input pixels with equal weight
No interpolation artifacts: Unlike bicubic or Lanczos resampling, simple averaging introduces no ringing, halos, or sharpening artifacts
Film grain averages naturally: The stochastic grain pattern smooths gracefully rather than aliasing

Why 12-bit is Sufficient

The reduction from 16-bit to 12-bit might seem like a compromise, but in practice:

Noise floor: The bottom 4 bits of 16-bit scans primarily contain scanner noise and shot noise from the scanning process (two captures are never identical)
Perceptual limits: The human eye distinguishes roughly 250-300 gray levels in continuous tone; 12-bit provides 4,096
Film density range: Actual film densities rarely exercise the full 16-bit range

After 3×3 averaging (which itself reduces noise), the discarded bits contain almost no useful information.

Proxies That Match Masters

Here's the perhaps surprising observation: we have not found observable significant differences between videos rendered from PACK files and those rendered from full 16-bit TIFFs. The 16-bit masters exist for archival completeness, but in practice the PACK proxies produce equivalent results.

This is why we confidently use PACK files as our rendering source rather than treating them purely as previews.

Why Custom Timing Tools (Not DaVinci Resolve)

Another common question: why not use industry-standard grading software like DaVinci Resolve? The answer involves our specific requirements for archival work.

What "Timing" Means Here

Our timing adjustments are specialized for grayscale archival work—all Hearst newsreels are black and white. The Review application provides these controls:

Exposure

Gamma: Overall brightness curve (typically 0.5 to 3.0)
Black point: Shadow threshold (where blacks begin)
White point: Highlight threshold (where whites clip)

Crop and Position

Crop: Frame boundaries to hide scanning artifacts, splices, or frame lines
Nudge: Frame-by-frame position adjustments for unstable or weaving frames
Crop presets: Quickly apply common crop settings
Invert: For negative vs. positive source material

Hardware Support

The Review app supports dedicated hardware controllers (Griffin Powermate, Shuttle Pro) for efficient frame-by-frame work—essential when timing thousands of frames per roll.

Versionable Edit Lists

Our edit files are plain text, one line per adjustment:

<negative>
1000 shot (0, 0, 1431, 1107) exp (2.00,0,4095)
1193 shot (175, 81, 1387, 971) exp (0.63,578,4095)
1194 nudge (175,88)

This format enables:

Git version control: Every timing change is tracked with full history
Diff-friendly: Changes between versions are human-readable
Audit trail: We can see who changed what, when, and why
Batch processing: Edit files feed directly into automated encoding

Why Not Resolve?

Resolve	Our System
Designed for color (RGB)	Optimized for grayscale (single channel)
Binary project files	Plain text edit lists
Manual timeline workflow	Frame-number based, automatable
Interactive grading session	Batch overnight processing
Proprietary format lock-in	Simple, archival-friendly format

For a color film project with complex grading needs, Resolve would be the right choice. For grayscale archival work requiring reproducibility, version control, and automation, our text-based system is more appropriate.

The Workflow

Review application: Staff adjust timing with live preview using our macOS tool
Save edits: Timing decisions stored as text files
Version control: Edits committed to Git
Overnight encoding: Automated pipeline reads edit files, renders videos
Re-render on change: Edit file updates trigger automatic re-encoding

Film Stock Presets (NNews/PNews)

When scanning film, the Scanity uses "film stock" presets to configure illumination levels. For Hearst newsreels, we use custom presets called NNews (negatives) and PNews (positives) with offsets from +05 to +50.

Detailed Documentation:

Technical Reference — What NNews/PNews presets actually control (PrinterLightsMax, F-stop conversions, database queries)
Usage Analysis — Statistics on how operators use these presets (preset changes per roll, overlapping scans)

Quick Summary

Aspect	Details
Mechanism	Adjusts `PrinterLightsMax` green channel value
Formula (4K)	Green PLMax = 67 - offset (e.g., NNews+35 → green = 32)
Standard resolution	4300×3324 (99.7% of scans)
Scale	~12 points = 1 F-stop of exposure change
Range	+05 to +48/50 (~3.5 stops total)
Curves applied	None — all scans are linear (`Characteristic = LIN`)

Key Finding

Operators are capturing linear, ungraded data. The NNews/PNews presets perform exposure compensation only — no color grading occurs during scanning. This means post-processing workflows must apply appropriate log/gamma curves during color correction.

Note: If a scan appears too dark or washed out in the final video, the issue is in the edit file timing, not the film stock preset. The preset choice is recorded in the scan metadata for reference but doesn't need to be "corrected" after scanning.

Key Storage Locations

Location	Purpose	Path
rhogos	Working storage for active scans	`/Volumes/rhogos/scans/`
Staging	FlexSync staging area (scans awaiting archive)	`/Volumes/kairon/scans/staging/`
ARK drives	Long-term archive (ARK-001 through ARK-xxx)	`/Volumes/ARK-xxx/`
PARK drives	Redundant copy of ARK (PARK-001 through PARK-xxx)	`/Volumes/PARK-xxx/`
Kairon scans	Final archived scans on StorNext	`/Volumes/kairon/scans/`
Barn	Packed scans ready for rendering	`/Volumes/kairon/barn/`
Video	Rendered MP4 files	`/Volumes/kairon/video/`

Database Tables Updated

Table	Database	Purpose
SCAN_STAGING	hearst	Tracks FlexSync staging and archive progress
SCAN_LOCATIONS	hearst_webapp	Tracks raw scan locations
PACK_LOCATIONS	hearst_webapp	Tracks packed scan locations
REVIEW	hearst	QC queue for new packs
SOUND	hearst_webapp	Audio file metadata

Troubleshooting: Manual Intervention

If scans get stuck at any stage, they can be manually advanced:

Stuck in NR: Run mv-complete-nrs manually
Archived but not packed: Run scan-pack on the archive volume, then cat-scans on barn
Packed but not in REVIEW: Run cat-scans on barn (updates both PACK_LOCATIONS and REVIEW)
Not rendering: Check render service at 10.4.15.216:8020

Glossary

Term	Meaning
NR	Newsreel - the identifier for each film roll (e.g., NR021621)
Roll	A complete reel of film, typically 10-20 minutes, containing multiple news stories
Story	An individual news segment within a roll, identified by Volume/Issue/Story numbers
Scan	The raw digital capture of a film roll from the Scanity scanner
Pack (.pck)	Custom binary format storing 12-bit, 3× downsampled frames (7 per file). 92% smaller than source TIFFs with no visible quality loss
ARK	Archive drive - offline storage for long-term preservation (ARK-001, ARK-047, etc.)
PARK	Parity ARK - a redundant copy of an ARK drive, connected via Thunderbolt for fast local copying
FlexSync	Quantum StorNext server-side replication tool. Copies data between StorNext volumes without requiring FC client access, avoiding hang issues
Staging	Temporary holding area on kairon where FlexSync places scans before archive to ARK/PARK (`/Volumes/kairon/scans/staging/`)
Barn	The directory holding packed scans ready for video encoding (`/Volumes/kairon/barn/`)
rhogos	Network volume for working storage - where active scans are processed
kairon	Network volume hosting barn, video output, and other shared resources
Beta	The internal preview website where staff review and mark videos
HLS	HTTP Live Streaming - adaptive video format that adjusts quality to bandwidth
Edits	Text files containing crop, exposure, and timing instructions for each roll
FLAC	Lossless audio format used for archival sound (converted from WAV)
In/Out points	The start and end timecodes marking a story within a roll
-c suffix	Indicates audio sourced from a different element than the NR it's named after (takes priority over standard version)
AEO Light	Software for decoding audio from negative density optical tracks. Applies compensation for the gamma mismatch that occurs when reading negative tracks directly (they were designed for photochemical printing, not direct reading)
Optical track	The soundtrack printed along the edge of the film, read optically during playback or scanning
Composite print	A film print containing both picture and soundtrack on the same strip of film
Scanity	The film scanner (by Digital Film Technology) that captures 16-bit grayscale TIFFs at 4300×3324 per frame, plus optical audio
Timing	The process of adjusting exposure, contrast, and cropping for each shot. Named after the historical practice of controlling print exposure "times" in the lab
Gamma	The non-linear brightness curve applied to an image. Higher gamma brightens midtones; lower gamma darkens them. Our range is typically 0.5–3.0

Generated from workflow-archive-workspace/docs/workflow-flowchart.md