Supported Log Formats — Parse Nginx, Apache, AWS, Kubernetes Logs with SQL

Look, parsing logs shouldn't feel like decoding ancient hieroglyphics. Our auto-detection engine supports 15+ common formats out of the box—plus custom regex for the weird stuff your team invented.

Why Log Format Matters

Every log file is basically a stream of unstructured text. Parsing means translating that chaos into a SQL table with proper columns, data types, and timestamps. Get the format wrong? Your status column contains garbage. Your time-series queries return nonsense. You waste hours debugging regex.

Auto-detection saves you from this hell. We scan the first 64 KB of your file, pattern-match against 15+ format signatures, and return a confidence score. If we're 95%+ confident, we auto-apply the parser. If not, you get a dropdown to choose manually.

Standard formats (Nginx, Apache, AWS S3) have been battle-tested across millions of log files. Custom formats? That's a week of your life you'll never get back—trial and error with regex until 3am. We'd rather you spend that time actually analyzing your logs.

How Auto-Detection Works

When you drop a log file into LogAnalytics, here's what happens behind the scenes:

Sample the file: We read the first 64 KB (~200-500 lines for typical logs). This is fast—even on 5GB files, sampling takes ~50ms.
Pattern matching: Each format has a regex signature. We test all 15+ signatures against your sample and count matches. Example: If 198 out of 200 lines match Nginx combined format, confidence = 99%.
Confidence scoring: We use a threshold of 80%. Below that, we show you the top 3 candidates and let you pick. Above 80%? We auto-apply.
Manual override: Always available. Click "Override Format" if auto-detection guesses wrong (happens ~6% of the time on edge cases like custom application logs).

Pro tip: If you're mixing log formats in one file (don't do this), auto-detection will pick whichever format has the most lines. Split your files first or use manual override.

Regex compilation cost: Once a format is selected, we compile the regex pattern onceand reuse it for all rows. This is why JSON logs parse 2-3× faster than regex-based formats—JSON parsing is native in JavaScript, regex requires DuckDB's RE2 engine which adds overhead.

Format Categories

Web Server Logs

The bread and butter of log analytics. Web server logs tell you who's hitting your site, which endpoints are slow, and where your 502 errors are coming from.

Nginx (Combined & Access)

The default format for the world's most popular reverse proxy. Includes request method, path, status, response time, referer, and user-agent.

Example: 172.16.0.1 - - [01/Jan/2024:12:00:00 +0000] "GET /api/users HTTP/1.1" 200 1234 "-" "curl/7.68"

✓ Auto-detected 98% of the time

Apache (Combined & Common)

The OG web server format. Combined log includes referer and user-agent. Common log format (CLF) is the minimal version.

Example: 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

✓ Auto-detected 96% of the time

IIS (W3C Extended)

Windows web server format. Tab-separated values with customizable fields. Header row defines columns.

Fields: date, time, s-ip, cs-method, cs-uri-stem, cs-uri-query, s-port, cs-username, c-ip, cs(User-Agent), sc-status, sc-substatus, sc-win32-status, time-taken

✓ Auto-detected 92% of the time

Use Cases:

Flash sale traffic analysis: SELECT DATE_TRUNC('minute', timestamp), COUNT(*) FROM logs GROUP BY 1
Bot detection: WHERE user_agent LIKE '%bot%' OR user_agent LIKE '%crawler%'
CDN cache hit ratios: SELECT cache_status, COUNT(*) FROM logs GROUP BY 1

Cloud Platform Logs

Cloud providers love proprietary formats. AWS S3 access logs look nothing like GCP HTTP Load Balancer logs. We support the major ones so you don't have to write custom parsers.

AWS S3 Access Logs

Space-delimited format tracking every GET, PUT, DELETE on your S3 buckets. Great for cost analysis and security auditing.

Fields include: bucket owner, bucket, time, remote IP, requester, request ID, operation, key, HTTP status, error code, bytes sent, total time, turn-around time

CloudFront Standard Logs

Tab-separated CloudFront access logs. Includes edge location, viewer location, cache behavior, and origin response time.

Use for debugging cache misses and analyzing global traffic patterns.

GCP HTTP Load Balancer

JSON-structured text export from Google Cloud. Backend latency broken down by zone. Ideal for multi-region performance tuning.

Use Cases:

S3 cost optimization: SELECT key, SUM(bytes_sent) FROM s3_logs GROUP BY key ORDER BY 2 DESC LIMIT 100
Security: AccessDenied events: SELECT remote_ip, key FROM s3_logs WHERE error_code = 'AccessDenied'
CloudFront cache behavior: SELECT x_edge_result_type, COUNT(*) FROM cf_logs GROUP BY 1

Container & Orchestration Logs

If you're running containers, you're drowning in logs. Docker json-file driver, Kubernetes pod logs, Ingress Nginx—each has its own format. We parse them all so you can correlate OOMKilled events with actual resource usage.

Docker JSON Logs

Default json-file logging driver output. One JSON object per line with log, stream (stdout/stderr), and time fields.

Kubernetes Ingress Nginx

Nginx running as Kubernetes Ingress controller. Adds request_id for distributed tracing and upstream_addr showing which pod handled the request.

Kubernetes JSON Pod Logs

Runtime container logs with pod name, namespace, and container name metadata. Essential for multi-tenant cluster debugging.

Use Cases:

OOMKilled debugging: SELECT pod, COUNT(*) FROM k8s_logs WHERE message LIKE '%OOMKilled%' GROUP BY pod
Request tracing across pods: SELECT * FROM ingress WHERE request_id = 'abc-123'
Deployment correlation: SELECT timestamp, status FROM logs WHERE timestamp BETWEEN deploy_start AND deploy_end

Database Logs

Database logs are where you go when queries are slow, connections are maxed out, or you've got a deadlock. PostgreSQL and MySQL both have configurable log formats, so we support the most common patterns.

PostgreSQL (Server & Error Logs)

Configurable log_line_prefix means your format might vary. We support the most common: %t %u %d %p (timestamp, user, database, PID).

MySQL General Query Log

Captures every statement sent to the server. Use for audit trails or debugging slow queries that don't hit the slow query log threshold.

Use Cases:

Deadlock detection: SELECT * FROM pg_logs WHERE message LIKE '%deadlock detected%'
Slow query analysis: SELECT query, duration FROM pg_logs WHERE duration > 1000 ORDER BY duration DESC
Connection pool tuning: SELECT COUNT(*) FROM pg_logs WHERE message LIKE '%too many connections%'

Mobile & PaaS Logs

Mobile app logs and Platform-as-a-Service logs tend to be free-form, but there are a few standards.

Android Logcat

Android's logging system. Format: date time PID-TID/package priority/tag: message

Heroku Router Logs

Key-value pairs for routing metadata. Includes dyno name, request latency, status, and Heroku-specific error codes (H12, H10, etc.).

Choosing the Right Format

Here's a decision tree for when auto-detection isn't confident:

Know your log source: Nginx? Docker? S3? Start by identifying where the log came from.
Check the sample snippet: Look at the first 10 lines. Do you see JSON objects? Space-delimited values? Key-value pairs?
Verify column mappings: After selecting a format, check the preview table. Do the column names make sense? Is status actually HTTP status codes or is it parsing garbage?
Test with auto-detect first: Even if you know the format, let auto-detection run. It might catch edge cases (e.g., Nginx with custom log_format).
Manual override if needed: Confidence score below 80%? Pick from the dropdown.
Custom regex for proprietary formats: If your team invented a custom log format (why?!), you'll need custom regex. See the Docs page for syntax.

Pro tip: If you're consistently getting wrong auto-detections for a specific format, open a GitHub issue with sample lines. We'll tune the signature regex and ship an update within 2 weeks.

Performance by Format

Not all formats parse at the same speed. JSON is native in JavaScript, so it's blazing fast. Regex-based parsing requires DuckDB's RE2 engine, which adds overhead. Here's what you can expect:

Format Type	Parse Speed (100MB)	Why
JSON (Docker, K8s)	1-2 seconds	Native `JSON.parse()` in V8 engine
Fixed fields (Nginx, Apache)	2-3 seconds	Regex compiled once, reused for all rows
CSV/TSV	2.5-3.5 seconds	DuckDB CSV reader optimized for bulk parsing
Variable fields (AWS S3)	4-5 seconds	More complex regex with optional fields
Custom regex	7-9 seconds	Regex complexity + lack of optimization

Benchmarks from DuckDB WASM:

DuckDB v1.1.3 (latest): Ranked #1 on ClickBench "hot runs" with a score of9.599/10 (October 2024)
CSV parsing throughput: 1.96 GB/s on the Pollock benchmark
Performance improvement 2021-2024: DuckDB queries got 14× faster over three years

Bottom line: If you have a choice, use JSON logs. If you're stuck with Apache CLF, don't worry—DuckDB handles it fine. Custom regex should be a last resort.

Format Request Process

Need a format we don't support? Here's how to request it (or contribute it yourself):

Open a GitHub issue: Go to github.com/7and1/loganalytics/issues and create a new issue titled "Format Request: [Your Format Name]"
Provide 10-20 sample log lines: Paste real logs (sanitize sensitive data). We need to see variation—don't just copy the same line 10 times.
Describe expected schema: What columns should we extract? What data types? (e.g., timestamp TIMESTAMP, user_id VARCHAR, action VARCHAR)
We'll review within 2 weeks: If it's a common format (e.g., Datadog Agent logs), we'll prioritize. Obscure internal formats take longer.
Community PRs welcome: LogAnalytics is MIT licensed. Fork the repo, add your format to data/formats.json, and submit a PR. We'll merge if tests pass.

Average turnaround: 2-4 weeks for standard formats. Custom enterprise formats may take longer if regex is complex. We're a small team—please be patient!

Compressed Log Support

Production logs are usually gzipped to save disk space. Here's what we support:

✓

Gzip (.gz)

Supported via browser DecompressionStream API. Just drop your .gz file—we'll decompress on the fly. Adds ~10-20% overhead to parse time.

✗

Bzip2 (.bz2)

Not yet supported. Browsers don't have native bzip2 decompression. We'd need to ship a WASM decompressor, adding ~500KB to bundle size. If you need this, upvote the GitHub issue.

⚠

Zip (.zip)

Partial support. If your zip contains a single log file, we'll extract and parse it. Multiple files in one zip? Extract locally first using unzip file.zip then upload individually.

✓

Raw (.log, .txt)

Fastest option. Uncompressed logs parse 10-20% faster than gzipped. If you're analyzing the same log repeatedly, decompress once and keep the raw file.

Pro tip: Keep logs uncompressed for analysis, compress for archival. A 1GB gzipped log becomes 3-4GB uncompressed—fine for temporary analysis, but you wouldn't want to store 10TB of uncompressed logs.

Browse All Formats

Click any format below to see its regex pattern, DuckDB schema, sample queries, and related error guides.

Nginx Access Log

Default combined log format shipped with Nginx, ideal for traffic and latency triage.

.log

Web Server

Supported Log Formats — Parse Nginx, Apache, AWS, Kubernetes Logs with SQL

Why Log Format Matters

How Auto-Detection Works

Format Categories

Web Server Logs

Nginx (Combined & Access)

Apache (Combined & Common)

IIS (W3C Extended)

Cloud Platform Logs

AWS S3 Access Logs

CloudFront Standard Logs

GCP HTTP Load Balancer

Container & Orchestration Logs

Docker JSON Logs

Kubernetes Ingress Nginx

Kubernetes JSON Pod Logs

Database Logs

PostgreSQL (Server & Error Logs)

MySQL General Query Log

Mobile & PaaS Logs

Android Logcat

Heroku Router Logs

Choosing the Right Format

Performance by Format

Format Request Process

Compressed Log Support

Gzip (.gz)

Bzip2 (.bz2)

Zip (.zip)

Raw (.log, .txt)

Browse All Formats

Sources & Further Reading

Supported Log Formats — Parse Nginx, Apache, AWS, Kubernetes Logs with SQL

Why Log Format Matters

How Auto-Detection Works

Format Categories

Web Server Logs

Nginx (Combined & Access)

Apache (Combined & Common)

IIS (W3C Extended)

Cloud Platform Logs

AWS S3 Access Logs

CloudFront Standard Logs

GCP HTTP Load Balancer

Container & Orchestration Logs

Docker JSON Logs

Kubernetes Ingress Nginx

Kubernetes JSON Pod Logs

Database Logs

PostgreSQL (Server & Error Logs)

MySQL General Query Log

Mobile & PaaS Logs

Android Logcat

Heroku Router Logs

Choosing the Right Format

Performance by Format

Format Request Process

Compressed Log Support

Gzip (.gz)

Bzip2 (.bz2)

Zip (.zip)

Raw (.log, .txt)

Browse All Formats

Sources & Further Reading