====== Building an Air Gap Server ====== **Purpose:** Secure long-term backup storage isolated from network threats\\ **Key Requirements:** Full disk encryption, physical security, validated data transfer\\ **Target Environment:** FreeBSD with ZFS and GELI encryption\\ **Status:** Powered off when not actively receiving updates ===== Overview ===== An **Air Gap Server** is a server that operates without network connectivity to protect critical backup data from remote attacks. A modified approach allows temporary network access for system updates while maintaining security boundaries. The primary purpose is to store long-term backups with significantly reduced attack surface. Physical isolation combined with encryption provides defense-in-depth against: * Remote network attacks (ransomware, unauthorized access) * Physical theft or unauthorized access * Compromised source systems **Critical:** If the server must be stored in an unsecured location, full disk encryption is **mandatory**, not optional. ===== Core Security Principles ===== * **Physical Isolation** — Geographic separation from primary servers with controlled access * **Defense in Depth** — Multiple security layers protect data at rest and in transit * **Data Validation** — Verify integrity of all data transfers and scripts * **Automated Reporting** — Track all operations for audit and monitoring * **Powered Off by Default** — Server only active during updates or maintenance ===== Implementation Guidelines ===== ==== Physical Security and Access Control ==== **Ideal Configuration:** * Store in secure facility requiring authenticated access (e.g., Network Operations Center) * Geographic separation from primary production servers * Documented access procedures and audit logs **Fallback for Insecure Locations:** When secure facilities are unavailable: * **Mandatory:** Full disk encryption (''GELI'' or equivalent) * **Mandatory:** Documented key management procedures * **Recommended:** Physical locks, tamper-evident seals * **Recommended:** Motion detection or access logging Store encryption keys in a different physical location than the server. Consider splitting keys across multiple secure locations. ==== Encryption Strategy ==== This implementation uses FreeBSD with ''GELI'' disk encryption backing a ''ZFS'' filesystem. **At Rest Protection:** * Full disk encryption using ''GELI'' (minimum requirement) * All data pools encrypted with strong passphrases or key files * Keys never stored on the server itself **Split-Key Architecture:** For enhanced security, consider using split-key encryption where the final encryption key is derived from combining two separate key components. This enhances security by allowing the actual GELI key to be stored securely off-site, as it cannot be reconstructed without both components: * **Two-Operator Model:** Each key component held by different operators * Requires both operators present to unlock encrypted data * Maximum security: no single person can access data alone * Higher operational overhead * **Operator + Automated Model:** One key with operator, one on server * Operator key: Physically carried by trusted operator * Server key: Stored on automated script (never on target server) * Keys combined via XOR or similar operation at decrypt time * Balances security with automation needs **In Transit Protection:** * Transport media (external drives) fully encrypted * Delta data encrypted before writing to transport media * Encryption keys validated at both source and destination **Example GELI Setup:** # Generate a random key file (4096 bits = 512 bytes) openssl rand 512 > /secure/path/geli.key chmod 400 /secure/path/geli.key # Initialize GELI encryption on disk using the key file geli init -s 4096 -K /secure/path/geli.key /dev/ada0 # Attach encrypted device geli attach -k /secure/path/geli.key /dev/ada0 # Create ZFS pool on encrypted device zpool create backup /dev/ada0.eli Key size of 4096 bits provides strong encryption. The key file should be stored securely and backed up to a separate location. Use ''-P'' flag to add passphrase protection in addition to key file. ==== Data Transfer Validation ==== **Transport Media Requirements:** * Large capacity drives (match expected delta sizes) * Encrypted filesystem (''GELI'', ''LUKS'', or BitLocker) or Encryption of individual files in transit * Labeled with GPT labels for automated mounting **Delta Monitoring:** Monitor transfer sizes to detect anomalies: * Establish baseline delta sizes for normal operations * Alert on deltas exceeding 150-200% of baseline * **Large deltas may indicate ransomware on source system** **Data Integrity Verification:** # Generate checksum on source zfs send pool/dataset@snapshot | tee >(sha256) > /mnt/transport/delta.zfs # Verify checksum on air gap server sha256 /mnt/transport/delta.zfs **Data Validation:** * All data encrypted with symmetric key at source * Decryption failure automatically rejects the data * Failed decryption indicates corruption or tampering * Process terminates on any decryption failure ==== Script Validation and Maintenance ==== Air gap servers require special consideration for maintenance since they lack network access for updates. **Validated Script Execution:** Scripts may be deployed to perform maintenance tasks: * ''ZFS'' scrubs and pool health checks * Snapshot cleanup and rotation * SMART disk monitoring * System updates (if temporarily networked) **Script Deployment Process:** - Scripts stored on source server and version controlled - Scripts encrypted with symmetric key before transfer - Air gap server must successfully decrypt before execution - Decryption failure prevents script execution and terminates process - Scripts run automatically during replication operations **Example Script Encryption/Decryption:** # On source server: encrypt script openssl enc -aes-256-cbc -salt -in cleanup_script.sh \ -out cleanup_script.sh.enc -pass file:/secure/transport.key # On air gap server: decrypt and execute openssl enc -aes-256-cbc -d -in cleanup_script.sh.enc \ -out cleanup_script.sh -pass file:/secure/transport.key && \ sh cleanup_script.sh || { echo "Decryption failed - aborting"; exit 1; } **Security through decryption:** Scripts that cannot be decrypted with the correct symmetric key are rejected. Any decryption failure terminates the entire process to prevent execution of potentially tampered scripts. ==== Reporting and Audit Trail ==== **Reporting Challenges:** * Air gap servers cannot send email reports * No network access for remote monitoring * Reports must be physically retrieved **Solution — Report Drive:** * Dedicated removable media for reports (USB drive, small HDD) * Reports written to transport drive after each operation * Administrator retrieves and processes reports manually **Report Contents:** * Timestamp of operation * Data volumes transferred (size, snapshot names) * Success/failure status of each operation * Disk health (SMART status, ZFS pool health) * Script execution results * Any errors or warnings **Example Report Structure:** === Air Gap Backup Report === Date: 2026-01-18 03:00:00 Operation: Incremental Backup Source: production.example.com Target: airgap-backup01 Datasets Processed: - pool/data: 45.2 GB transferred Latest: pool/data@2026-01-18_02:00:00 - pool/databases: 12.8 GB transferred Latest: pool/databases@2026-01-18_02:00:00 Pool Health: ONLINE Disk Status: All disks PASSED SMART checks Maintenance Scripts Executed: - snapshot_cleanup.sh: SUCCESS (removed 3 old snapshots) - zfs_scrub.sh: SUCCESS (no errors found) System Shutdown: 2026-01-18 03:45:00 Next Expected Update: 2026-01-25 ==== Power Management ==== **Default State: Powered Off** The air gap server should remain powered off except during: * Scheduled data imports * Manual maintenance operations * Security audits **Benefits of Power-Off Strategy:** * Encrypted drives are locked (keys in memory are cleared) * Eliminates risk of remote exploitation during off time * Reduces hardware wear and power consumption * Limits window of opportunity for physical attacks **Automated Shutdown:** Final script in maintenance chain should power off the system: #!/bin/sh # Final maintenance script - shutdown system # Verify all operations completed successfully if [ -f /var/run/backup_complete ]; then # Write final report echo "Backup completed successfully at $(date)" >> /mnt/report/status.log # Sync all filesystem buffers sync # Unmount transport media umount /mnt/transport umount /mnt/report # Power off system shutdown -p now else echo "ERROR: Backup did not complete. Manual intervention required." >> /mnt/report/error.log # Do NOT shutdown - leave powered on for troubleshooting fi **Do not** configure automatic shutdown if backups fail. A powered-on system indicates problems requiring manual investigation. ===== Example Workflow ===== A typical weekly backup cycle: **Day 1 (Monday) — Source Server:** - Automated script takes ZFS snapshots of all datasets - Calculates incremental changes since last backup - Encrypts delta data to transport drive with symmetric key - Encrypts maintenance scripts with same symmetric key - Operator notified that transport drive is ready **Day 2 (Tuesday) — Physical Transport:** - Operator removes transport drive from source server - Drive physically transported to air gap location - Transport logged in access control system **Day 3 (Wednesday) — Air Gap Server:** - Operator inserts transport drive and powers on server - Server boots, mounts transport drive - Automated script begins: * Attempts to decrypt delta files with symmetric key * Validates delta sizes against baseline * Imports ZFS datasets (decryption happens during import) * Attempts to decrypt and run maintenance scripts * Any decryption failure terminates the entire process * Generates report to report drive * Powers off system (only if all operations succeed) - Operator retrieves report drive for later review **Day 4 (Thursday) — Report Processing:** - Operator reviews reports from air gap server - Verifies all backups completed successfully - Archives reports for audit trail - Updates monitoring dashboard **Day 8 (Next Monday):** - Process repeats with fresh delta data ===== Pre-Implementation Checklist ===== [ ] Physical Security [ ] Secure location identified and documented [ ] Access procedures established [ ] Key storage locations determined [ ] Hardware [ ] Air gap server procured and tested [ ] Transport drives procured (minimum 2 for rotation) [ ] Report drive procured [ ] All drives labeled appropriately [ ] Encryption [ ] GELI encryption configured and tested [ ] Encryption keys generated and stored securely [ ] Key recovery procedures documented [ ] Transport drives encrypted [ ] Software [ ] FreeBSD installed and hardened [ ] ZFS pools created and tested [ ] Replication scripts developed and tested [ ] Maintenance scripts developed and tested [ ] Symmetric transport keys generated and deployed [ ] Procedures [ ] Backup schedule documented [ ] Transport procedures documented [ ] Report review procedures documented [ ] Key rotation schedule established [ ] Disaster recovery plan created [ ] Testing [ ] Full backup cycle tested end-to-end [ ] Recovery procedures tested [ ] Failure scenarios tested [ ] Report generation verified [ ] Automated shutdown verified ===== Security Considerations ===== **Threat Model:** This design protects against: * ✓ Remote network attacks (ransomware, unauthorized access) * ✓ Compromised source systems * ✓ Physical theft (with encryption) * ✓ Unauthorized physical access (with encryption) This design does NOT fully protect against: * ✗ Sophisticated attackers with physical access and unlimited time * ✗ Compromised encryption keys * ✗ Attacks on the transport process itself * ✗ Insider threats with authorized access **Best Practices:** * Rotate encryption keys annually * Test recovery procedures quarterly * Review audit logs monthly * Update maintenance scripts as needed * Keep offline backups of critical configuration ===== Troubleshooting ===== **Common Issues:** ^ Problem ^ Symptom ^ Solution ^ | Transport drive not mounting | Server unable to find ''/dev/gpt/label'' | Verify GPT label, check dmesg for device detection | | Decryption fails | OpenSSL reports bad decrypt error | Verify correct symmetric key in use, check file integrity, investigate potential tampering or corruption | | Large delta size | Delta exceeds baseline by 200%+ | **Do not import** — investigate source system for compromise or legitimate growth | | Server won't shutdown | Remains powered on after backup | Check ''/var/run/backup_complete'' flag, review error logs on report drive | | ZFS pool won't import | Import command fails | Verify encryption key, check pool status with ''zpool import -F'' | ===== Real-World Implementation ===== ==== Client Requirements ==== A production deployment required the following specifications: ^ Requirement ^ Implementation ^ | Replication Schedule | Monthly updates from in-house backup server to air gap server | | Transport Media | 3× 1.9TB SSD drives in rotation | | Drive Rotation | One at source, one at target, one in transit — minimizes site visits | | Security Model | Multi-layer encryption with split-key architecture | | Location | Air gap server in unsecured location (mandatory encryption) | | Automation | Fully automated with maintenance script execution | ==== Security Architecture ==== **Encryption Layers:** - **At Rest (Target):** ''GELI'' full disk encryption on air gap server - **In Transit:** Symmetric key encryption for all data on transport drives - **Maintenance Scripts:** Encrypted with same symmetric key - **Split-Key Design:** Target ''GELI'' key derived from: * Server-resident key component (stored locally) * Operator-carried key component (physical transport) * Combined via XOR bitwise operation at decrypt time * Target GELI key stored securely to facilitate key rotation and recovery **Split-key advantage:** Neither component alone can decrypt the air gap server. Compromise of a single key (server or transport) does not expose data. ==== Implementation Scripts ==== Custom automation scripts handle the complete workflow. Source code is available via Subversion: **Repository URL:** ''http://svn.dailydata.net/svn/zfs_utils/trunk''\\ **Sub-project:** ''sneakernet'' **Export the project:** mkdir -p /usr/local/opt svn export http://svn.dailydata.net/svn/zfs_utils/trunk /usr/local/opt/zfs_utils **Source Server Workflow:** - Auto-detect operating mode (source vs. target) - Mount transport drive using GPT label detection - Verify transport drive processed by target (check status file) - Securely erase previous data from transport drive - Calculate incremental ZFS replication stream - Encrypt and write replication data to transport drive - Record latest snapshots sent (update status file) - Encrypt and write maintenance scripts to transport drive - Unmount transport drive - Email completion report to administrators **Target Server Workflow:** - Mount transport drive - Detect operator-provided secure key (USB/separate media) - Combine server key with operator key (XOR operation) - Unlock ''GELI'' encrypted disks using combined key - Import ZFS pool - Save current snapshot list to state file (enable rollback if needed) - Decrypt and import replication streams from transport - Collect system statistics (pool health, disk status, capacity) - Decrypt and execute maintenance scripts - Generate detailed report and write to report drive - Unmount all media - Power off system # Example: Simplified detection logic # this is actually accomplished within sneakernet automatically, so # not necessary. This just shows the logic used. HOSTNAME=$(hostname -s) if [ "$HOSTNAME" = "backup-source" ]; then # Source mode /usr/local/sbin/sneakernet --mode=source elif [ "$HOSTNAME" = "airgap-target" ]; then # Target mode /usr/local/sbin/sneakernet --mode=target else echo "ERROR: Unknown host" >&2 exit 1 fi ==== Three-Drive Rotation Strategy ==== The three-drive rotation minimizes operational overhead: **Normal Operation Cycle:** ^ Month ^ Drive A ^ Drive B ^ Drive C ^ Action Required ^ | 1 | At Source (ready) | At Target | In Transit to Target | Operator: Deliver Drive C to target | | 2 | At Source (ready) | In Transit to Source | At Target (ready) | Operator: Collect Drive B from target | | 3 | In Transit to Target | At Source (ready) | At Target | Operator: Deliver Drive A to target | | 4 | At Target | At Source (ready) | In Transit to Source | Operator: Collect Drive C from target | **Benefits:** * Each site visit handles both delivery and pickup * No waiting time for drive processing * Reduced frequency of site access (security benefit) * Built-in offline backup (data exists on multiple drives) ==== Key Management Strategy ==== **Symmetric Transport Key:** * Unique key per deployment * Stored on both source and target servers * Used to encrypt data and scripts on transport drives **Split GELI Key:** * Server component: Stored on target server (never leaves facility) * Operator component: Carried by trusted operator (never stored at target) * Combined at runtime via XOR: ''final_key = server_key ⊕ operator_key'' **Key Rotation Procedures:** If transport drive compromised: # Generate new symmetric key openssl rand 32 | xxd -p | tr -d '\n' > /secure/path/new_transport.key # Deploy as maintenance script on next run # Old data on compromised drive remains encrypted with old key If operator key compromised, retrieve the geli key from secure storage in hex format, then run the following commands: # Generate new key pair openssl rand 32 > operator.key xxd -p -c 999 operator.key > operator.key.hex # retrieve server.key from secure storage in binary format and run the following # perl on-liner on them. This is not tested. The keys are in hex, not binary # and the result is in hex (use xxd for two way processing) perl -e ' # Iterate over each byte index of the keys print join("", map { # Extract bytes and perform XOR sprintf("%02x", hex(substr($ARGV[0], $_, 2)) ^ hex(substr($ARGV[1], $_, 2)) ) } 0 .. (length($ARGV[0]) / 2 - 1) # Calculate the number of bytes ) . "\n" # Print the result ' 'operator.key.hex' 'server.key.hex' # Operator must use new key on next visit after updating the key on the air gap server # If server.key is ever lost, must **Key rotation can be automated** through maintenance scripts. New keys deployed during normal replication cycles without requiring emergency site visits. ==== Operational Benefits ==== This implementation balances security with operational efficiency: **Security Advantages:** * No single point of key compromise * Lost transport drive: data remains encrypted * Lost operator key: server data still protected * Automated key rotation capability * Audit trail via detailed reports **Operational Advantages:** * Minimal site visits (monthly vs. weekly) * No waiting time for processing * Fully automated operation (no manual commands) * Email reports from source (connected) * Physical reports from target (air-gapped) * Automated maintenance without network access ===== References ===== * [[https://docs.freebsd.org/en/books/handbook/disks/#disks-encrypting-geli|FreeBSD GELI Encryption Documentation]] * [[https://docs.freebsd.org/en/books/handbook/zfs/|FreeBSD ZFS Administration Guide]] * [[https://www.openssl.org/docs/|OpenSSL Documentation]] — For symmetric encryption operations * [[https://www.nist.gov/publications/guide-storage-encryption-technologies-end-user-devices|NIST Storage Encryption Guide]] ===== Related Documentation ===== * [[zfs_replication_scripts|ZFS Replication Scripts]] — Automated snapshot and transfer scripts * [[key_management_procedures|Encryption Key Management]] — Key generation, storage, and rotation * [[incident_response_airgap|Air Gap Incident Response Plan]] — What to do if compromise suspected * [[disaster_recovery_procedures|Disaster Recovery Procedures]] — Restoring from air gap backups