Setting Up a GoAccess Real-Time Dashboard on Ubuntu
A GoAccess real-time dashboard turns a raw Nginx or Apache access log into a live HTML view that updates as requests arrive, so you can watch crawl activity, status-code spikes, and bot traffic the moment they happen instead of after a batch report. The catch is that a careless setup either silently parses nothing — empty graphs — or exposes a WebSocket port to the public internet. This guide deploys a persistent, hardened GoAccess WebSocket dashboard on Ubuntu 22.04/24.04 that survives log rotation and service crashes.
The objective is a systemd-managed daemon that ingests the combined log format correctly, streams live updates over a WebSocket bound to localhost, and isolates search-engine crawler traffic so you can read crawl budget signals directly. It builds on the broader log parsing workflows and CLI toolchains and pairs naturally with the Node.js and GoAccess integration patterns for alerting on top of the same stream.
Diagnosis: Empty Graphs and Port Conflicts
The two failure modes you hit on day one are an all-zero dashboard (the log format string does not match your log) and a refused WebSocket connection (something already owns the port). Confirm both before you write a service unit.
First, look at a real line so you can match the format exactly rather than guessing:
head -n 1 /var/log/nginx/access.log
Expected Output:
66.249.66.1 - - [15/Mar/2024:10:12:00 +0000] "GET /products HTTP/1.1" 200 5120 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
That +0000 timezone field and the bracketed timestamp are exactly where naive format strings break. Note the shape now; you will mirror it in log-format below.
Next, confirm the default WebSocket port 7890 is free. A bound port produces a dashboard that loads but never receives live frames:
ss -tulpn | grep 7890
Expected Output:
(no output)
No output means the port is free. If a row prints, terminate the conflicting process or pick another port in the config. You can pair this quick check with the broader sweep techniques in the CLI one-liners for quick audits guide.
Concept: Why Real-Time Mode Needs a Persistent Process
Static GoAccess runs once and exits. Real-time mode (real-time-html) keeps the process alive, tailing the log with inotify and pushing each new line to connected browsers over a WebSocket. That changes two things. First, the process must be supervised — if it dies, the dashboard freezes at the last frame with no error. Second, the WebSocket endpoint is a network listener, and anything it can reach can read your traffic data. The correct architecture is a supervised daemon bound to 127.0.0.1, fronted by an authenticated reverse proxy, never a raw port on 0.0.0.0.
The parsing side hinges on three directives working together: date-format, time-format, and log-format. GoAccess matches each line against the combined log-format template; any line that does not match is silently dropped. The %^ token discards a field, which is how you skip the timezone offset and the identity/auth fields that the combined format reserves but rarely populates.
Step-by-Step: Deploy the Dashboard
Step 1: Install GoAccess from the official Ubuntu repository. The distro package is recent enough for real-time HTML on 22.04 and 24.04.
sudo apt-get update && sudo apt-get install -y goaccess
goaccess --version | head -n 1
Expected Output:
GoAccess - 1.6.5
Step 2: Write a strict parsing config. Create /etc/goaccess/goaccess.conf. The log-format below matches the standard Nginx combined format; %^ skips the timezone offset so the bracketed timestamp parses cleanly, and ws-url plus bind-addr keep the WebSocket on localhost only.
time-format %H:%M:%S
date-format %d/%b/%Y
log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
real-time-html true
ws-url ws://127.0.0.1:7890
addr 127.0.0.1
port 7890
bind-addr 127.0.0.1
Explanation: date-format/time-format decode the [15/Mar/2024:10:12:00 +0000] stamp; the first %^ skips nothing structural but the trailing %^ inside the brackets discards the +0000 offset. Binding to 127.0.0.1 is the single most important hardening line.
Production Warning: Never set addr, bind-addr, or ws-url to 0.0.0.0 or a public IP. An exposed GoAccess WebSocket port serves your full traffic stream — IPs, URLs, user agents — to anyone who connects, with no authentication. Always bind to 127.0.0.1 and expose the dashboard only through an authenticated HTTPS reverse proxy (covered in Verification).
Step 3: Validate the format against a sample before daemonizing. Run GoAccess once in batch mode; a non-empty report proves the format matches.
goaccess /var/log/nginx/access.log \
--config-file=/etc/goaccess/goaccess.conf \
-o /tmp/test-report.html --no-global-config
grep -c "Requests" /tmp/test-report.html
Expected Output:
1
A count of 0 means the log-format does not match your log — recheck the timestamp shape from Diagnosis before continuing. A successful match here is also where you confirm status codes parse, which feeds the understanding of HTTP status codes in server logs that the dashboard surfaces.
Step 4: Create a supervised systemd service. Write /etc/systemd/system/goaccess-realtime.service so the daemon restarts on crash and on rotation-triggered reloads, running as the unprivileged www-data user.
[Unit]
Description=GoAccess Real-Time Log Dashboard
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/goaccess /var/log/nginx/access.log \
--config-file=/etc/goaccess/goaccess.conf \
-o /var/www/html/report.html
Restart=always
RestartSec=5
User=www-data
[Install]
WantedBy=multi-user.target
Step 5: Enable and start the service. daemon-reload picks up the new unit; enable --now starts it and sets it to boot automatically.
sudo systemctl daemon-reload
sudo systemctl enable --now goaccess-realtime
systemctl is-active goaccess-realtime
Expected Output:
active
Once live, you can bridge these metrics into the Node.js and GoAccess integration layer for alert thresholds, or correlate the hourly view against a full measurement of crawl rate by hour from your logs.
Edge-Case Handling
Log rotation severs the file descriptor. When logrotate renames the access log, GoAccess keeps reading the now-deleted inode and the dashboard stalls. Signal it to reopen the file on rotation. Add this to /etc/logrotate.d/nginx:
/var/log/nginx/access.log {
daily
rotate 14
compress
delaycompress
missingok
postrotate
kill -USR1 $(pgrep -f goaccess) 2>/dev/null || true
systemctl reload goaccess-realtime 2>/dev/null || true
endscript
}
The SIGUSR1 signal forces GoAccess to reopen the log descriptor after rotation; systemctl reload is a fallback if the PID lookup fails. The delaycompress flag keeps the previous file readable for one cycle so no lines are lost mid-rotation. For deeper rotation tuning, see the log rotation strategies cluster.
Crawler noise drowns the panels. A heavily crawled site fills the dashboard with bot hits that obscure human traffic. To read crawl budget in isolation, pre-filter the stream rather than parsing everything:
grep -i 'googlebot' /var/log/nginx/access.log | \
goaccess --config-file=/etc/goaccess/goaccess.conf -o /var/www/html/googlebot.html -
This produces a Googlebot-only view; piping a pre-filtered stream also cuts CPU on high-traffic hosts.
Verification: Confirm the Live Stream and Secure Access
Confirm the WebSocket handshake completes and the dashboard receives live frames. Generate a request, then watch for the update:
curl -s -o /dev/null -w "%{http_code}\n" http://localhost/
Expected Output:
200
Open the report in a browser, open developer tools, and filter the Network tab by WS. A single WebSocket connection to 127.0.0.1:7890 should show inbound frames carrying JSON payloads each time a new request hits the log. To expose the dashboard safely, front it with an authenticated Nginx reverse proxy over HTTPS:
location /goaccess/ {
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://127.0.0.1:7890/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
Explanation: the proxy terminates TLS and enforces HTTP basic auth while the GoAccess port stays on localhost, so the live data is never directly reachable.
Common Mistakes
- Binding the WebSocket to a public address. Setting
ws-urlorbind-addrto0.0.0.0exposes the raw traffic stream with no authentication. Always bind to127.0.0.1and reverse-proxy with auth. - A
log-formatthat silently does not match. GoAccess drops non-matching lines without error, so a wrong timestamp or quoting produces empty graphs. Validate in batch mode (Step 3) before daemonizing, and mirror the exact line shape from the Diagnosis output. - No rotation signal. Without a
postrotatekill -USR1hook, GoAccess reads a deleted inode after rotation and the dashboard freezes until a manual restart. Wire the signal intologrotate.
Frequently Asked Questions
How do I restrict GoAccess WebSocket access to internal users only?
Bind the WebSocket to 127.0.0.1 with bind-addr 127.0.0.1 and ws-url ws://127.0.0.1:7890, then expose the dashboard exclusively through an Nginx reverse proxy that enforces HTTPS and auth_basic. The GoAccess port itself should never be reachable from outside the host.
Does real-time mode hurt server performance during traffic spikes?
Impact is minimal. GoAccess uses inotify to stream only new log lines rather than re-reading the file, keeping CPU overhead typically below 2% on a standard VPS. Pre-filtering with grep before piping further reduces work during peaks.
How can I isolate Googlebot traffic for crawl budget analysis?
Pipe the log through grep -i googlebot before feeding GoAccess for a targeted view, or use the --http-user-agent filter to narrow the dashboard to a specific user-agent pattern. The isolated view makes crawl-rate and status-code signals legible without bot noise.
Related Guides
- CLI One-Liners for Quick Audits — fast shell checks to validate logs before piping them into GoAccess.
- Grafana Loki for SEO Log Aggregation — a queryable alternative when you outgrow a single-host dashboard.
- Understanding HTTP Status Codes in Server Logs — read the status panels GoAccess surfaces correctly.
- Measuring Crawl Rate by Hour from Server Logs — turn the live stream into an hourly crawl-budget metric.
Part of the Node.js and GoAccess Integration series.