Log Rotation Strategies

Log rotation is the unglamorous job that keeps a busy web server alive: it caps the size of active log files, compresses what is finished, hands the web server a fresh file to write to, and deletes what has aged out — all without dropping a single request or breaking the pipeline that reads those logs. Get it wrong and you get one of two failures, both bad for SEO. Either the disk fills and the server starts returning 5xx to Googlebot, or rotation truncates a file while the server still holds the handle and you silently lose hours of crawl data. This guide builds a production-ready rotation workflow that avoids both, and ties the file mechanics to the log retention policies that decide how many rotated files to keep.

The core decision is how you hand the server its new file: the safe create plus postrotate reload path, or the riskier copytruncate path for daemons that cannot reopen their own log handle. We will configure both, show the exact failure each one prevents, verify rotation actually fired, and give a troubleshooting runbook for permission drift, dead inodes, and overlapping I/O.

  • Choose create/postrotate versus copytruncate with eyes open
  • Configure size- and time-triggered rotation that survives traffic spikes
  • Verify a rotation cycle and confirm the shipper followed the new inode
  • Recover from the permission, inode, and I/O failures rotation causes

The Rotation Lifecycle and the create-vs-copytruncate Fork

Every rotation cycle is the same five beats: the active log reaches a trigger (size or time), the file is rotated (renamed or copied aside), the old data is compressed, the server is told to start writing fresh, and files past the retention count are deleted. The one beat that carries real risk is "tell the server to start writing fresh," because that is where you either cleanly swap inodes or race the running daemon.

The diagram below shows the lifecycle and the fork at that step. The left branch, create + postrotate, renames the file (the server keeps writing to the now-renamed inode for a few milliseconds), then a USR1 signal or reload makes the server reopen and write to a brand-new file — no data lost, because nothing was truncated. The right branch, copytruncate, copies the file's contents aside and then truncates the original in place; any request logged in the gap between copy and truncate is lost. Use the left branch for Nginx and Apache, which both reopen on signal; reserve the right branch for daemons that cannot. The format details that determine how big each line is — and therefore how fast you hit a size trigger — are covered in Apache vs Nginx log formats.

Rotation lifecycle: create plus postrotate versus copytruncate A flow from active log through trigger to a fork: the create plus postrotate branch renames and reloads with no data loss, while the copytruncate branch copies then truncates in place and can lose entries written during the gap. Rotation lifecycle and the safe-vs-risky fork active access.log trigger: size or daily create + postrotate rename old file USR1 / reload: reopen new file no entries lost copytruncate copy contents aside truncate original in place gap: entries can be lost Prefer the left branch for Nginx/Apache; use copytruncate only when the daemon cannot reopen its log.

Prerequisites

Before deploying a rotation config, confirm these are in place:

  • Root or sudo on the log host and the ability to edit /etc/logrotate.d/ and manage systemd timers or cron.
  • logrotate 3.14+ (logrotate --version) for dependable dateext and maxage.
  • A staging or low-traffic host to test postrotate signal lines before they run against production at 3 a.m.
  • Knowledge of your log shipper's inode behavior — Filebeat and Vector follow by inode; a naive tail -f does not. This determines whether rotation breaks your pipeline.
  • A retention decision already made, since rotate and maxage counts come from your log retention policies, not from this file.

Rotation Architecture & Sizing Parameters

Rotation triggers come in two kinds and you want both. A time trigger (daily) gives predictable, evenly named files that are easy to correlate with crawl reports; a size trigger (size 500M) is the safety net that catches a flash sale or a viral spike before the disk fills. Used alone, each fails: size-only rotation produces oddly timed files that are hard to line up with a crawl-rate-by-hour analysis, and time-only rotation can let a single day's file grow to tens of gigabytes during a traffic event.

Step 1: Measure real daily growth before choosing thresholds. Do not guess; one combined-format line is roughly 150–200 bytes, but your actual mix of bots and humans determines the real number.

du -sh /var/log/nginx/access.log
echo "Lines today: $(wc -l < /var/log/nginx/access.log)"

Expected Output:

1.8G    /var/log/nginx/access.log
Lines today: 9847213

Step 2: Pick triggers from the measurement. If a normal day is 1.8 GB, a daily trigger with a size 500M safety net means a quiet day rotates once on time and a 4x spike rotates several times on size, neither filling the disk. The table maps the common directives to what they do and when to reach for each.

Directive What it does Use when
daily / weekly Time-based trigger Predictable cleanup, easy correlation with reports
size 500M Rotate when the file exceeds a size Safety net for unpredictable traffic spikes
rotate N Keep N rotated files, then delete Enforcing the retention count from your policy
maxage N Delete rotated files older than N days Hard age ceiling regardless of count
compress + delaycompress gzip, but skip the newest rotation Keep last rotation readable for shipping
dateext Name files by date, not .1, .2 Stable names for archival and audits

Production Warning: Never rely on daily alone during flash sales or viral events. A size trigger is the only thing standing between an unexpected 10x traffic day and a full disk that starts serving 5xx to crawlers. Set size even if daily is your primary trigger.

Core Configuration Implementation

Deploy the rotation rule with safe signal handling. The safe pattern is create (logrotate makes the new file with explicit ownership) plus a postrotate reload so the web server reopens its handle. Avoid copytruncate unless a daemon genuinely cannot reopen its log.

1. The standard safe configuration. Create /etc/logrotate.d/web-access:

/var/log/nginx/access.log /var/log/apache2/access.log {
    daily
    rotate 14
    size 500M
    maxage 30
    missingok
    notifempty
    compress
    delaycompress
    create 0640 www-data adm
    dateext
    dateformat -%Y%m%d
    sharedscripts
    postrotate
        systemctl reload nginx   > /dev/null 2>&1 || true
        systemctl reload apache2 > /dev/null 2>&1 || true
    endscript
}

Expected Output: validate the syntax with a dry run.

sudo logrotate -d /etc/logrotate.d/web-access
reading config file /etc/logrotate.d/web-access
rotating pattern: /var/log/nginx/access.log ... after 1 days (14 rotations)
considering log /var/log/nginx/access.log

Safety Note: sharedscripts makes the postrotate block run once per cycle no matter how many files matched, preventing a double reload race on multi-service hosts. The || true on each reload means a host that runs only Nginx will not abort the whole rotation because the Apache reload "failed." A postrotate that exits non-zero aborts the run and leaves the disk filling — always make these lines non-fatal.

2. The copytruncate variant, for daemons that cannot reopen. Some application servers and older daemons hold their log handle and never reopen on signal. For those only, use copytruncate and accept the small race window.

/var/log/app/worker.log {
    daily
    rotate 7
    copytruncate
    compress
    delaycompress
    missingok
    notifempty
}

Explanation: copytruncate copies the file aside then truncates the original, so the daemon keeps writing to the same inode without a reload. Any line written in the microsecond gap between copy and truncate is lost — acceptable for low-rate worker logs, not for high-rate access logs.

Production Warning: Do not use copytruncate on a high-throughput access log. On a server logging thousands of lines per second, the copy-then-truncate gap reliably drops entries, corrupting exactly the crawl data you rotate logs to preserve. For Nginx and Apache, always use create + postrotate reload instead.

3. Schedule with a systemd timer for precise, off-peak execution. Replace ad-hoc cron with a timer so rotation runs at a known quiet hour.

# /etc/systemd/system/logrotate-custom.timer
[Unit]
Description=Run logrotate daily at 03:15

[Timer]
OnCalendar=*-*-* 03:15:00
AccuracySec=1min
Persistent=true

[Install]
WantedBy=timers.target

Expected Output: enable and confirm the next trigger.

sudo systemctl daemon-reload
sudo systemctl enable --now logrotate-custom.timer
sudo systemctl status logrotate-custom.timer
Active: active (waiting) since ...; Trigger: ... 03:15:00 UTC

Production Warning: Overlapping rotation with a backup or heavy analytics job creates severe I/O contention that degrades TTFB and can make crawlers back off. Confirm your backup window does not intersect 03:15, and watch iostat -x 1 during the first few cycles. Compression of multi-gigabyte files is CPU- and I/O-heavy; processing those files is where a tool like the one in parsing 10GB logs with Python and pandas efficiently earns its place, because it streams the compressed files instead of loading them whole.

Verification, Monitoring & Troubleshooting

Rotation that "looks configured" is not rotation that ran. Verify the cycle end to end and keep a runbook for the three failures rotation actually causes in production.

Step 1: Force a dry run, then confirm the state file advanced. The dry run shows what would happen; the state file proves it happened.

sudo logrotate -dv /etc/logrotate.d/web-access
grep "access.log" /var/lib/logrotate/status

Expected Output:

"/var/log/nginx/access.log" 2026-6-19-3:15:0

Failure mode 1: Permission drift. After rotation the new file has the wrong owner or mode, so analytics tools or shippers lose read access.

sudo ls -l /var/log/nginx/access.log*

Detection: the new access.log is not 0640 www-data adm. Fix: set ownership explicitly in the create 0640 www-data adm directive rather than relying on the daemon's umask, then force one rotation to confirm.

Failure mode 2: Inode mismatch, shipper reading a dead file. Rotation succeeded but the shipper kept the old handle, so new lines never reach the pipeline.

sudo lsof +D /var/log/nginx/ | grep -i deleted

Detection: a (deleted) file held open by your shipper. Fix: ensure the shipper follows by inode (Filebeat, Vector do); for manual tailing use tail -F (capital F), which reopens by name on rotation. Confirm the postrotate reload actually fired in the next check.

Failure mode 3: Rotation silently skipped. The timer or a postrotate failure stopped the cycle and the disk is filling.

sudo grep logrotate /var/log/syslog | tail -5
df -h /var/log; df -i /var/log

Detection: no recent success line, or space/inode use climbing. Fix: run sudo logrotate -f /etc/logrotate.d/web-access to recover immediately, then fix the root cause — most often a non-zero postrotate exit aborting the run, which is why the reload lines end in || true. How many files this leaves behind is governed by your retention numbers; reconcile against log retention policies and the deeper cold-tier design in log storage and archival best practices.

Common Mistakes

  • Using copytruncate on a high-throughput access log. The copy-then-truncate gap drops entries under load, corrupting crawl analysis. Fix: use create + postrotate reload for Nginx/Apache; reserve copytruncate for low-rate daemons that cannot reopen their handle.
  • A postrotate line that can exit non-zero. One failing reload aborts the whole rotation, so the disk fills while you think rotation works. Fix: end reload commands with || true and test the exact line on staging.
  • Overlapping rotation with backups. Concurrent I/O spikes raise TTFB and make crawlers back off. Fix: stagger schedules with the systemd timer and verify with iostat -x 1.
  • Setting rotate too low. Aggressive deletion erases historical crawl trends and can breach a retention mandate. Fix: derive rotate/maxage from your retention policy, not a round number.
  • Relying on daily with no size safety net. A traffic spike fills the disk before the next scheduled rotation. Fix: always pair a time trigger with size 500M (or a value sized to your measured daily growth).

Frequently Asked Questions

How does log rotation impact search engine crawl budget?
Indirectly but materially. Poorly managed rotation lets disk fill or causes I/O contention during compression, which raises server response times; crawlers respond to slow or failing responses by reducing their request rate, so effective crawl budget shrinks. Smooth, off-peak rotation keeps response times flat and the crawl rate steady.

Should I rotate logs based on size or time?
Use both. A daily trigger gives predictable, easily correlated files; a size 500M trigger is the safety net for spikes. Size-only rotation produces awkwardly timed files that are hard to align with crawl reports, and time-only rotation can produce an unmanageably large single-day file during a traffic event. The combination handles both the steady state and the surprise.

Can log rotation break real-time analytics pipelines?
Yes, if the shipper does not handle the file-descriptor change. With create mode plus a postrotate reload, the web server reopens the new file cleanly; the shipper must then follow by inode (Filebeat and Vector do) or use tail -F so it reopens on rotation. A naive tail -f keeps reading the old, now-rotated inode and silently stops ingesting new lines.

How do I verify that rotation executed successfully?
Check /var/lib/logrotate/status for an updated timestamp on the file, run logrotate -dv for a dry run that shows the planned actions, and grep /var/log/syslog for logrotate success or error lines. Confirm the new active file has the expected 0640 www-data adm permissions and that no shipper is holding a (deleted) inode.

Part of the Server Log Fundamentals & Compliance series.