High I/O Wait - Disk Bottleneck Diagnosis

Identify and fix high iowait on Linux servers, find the responsible process, tune MySQL, Redis, and I/O schedulers.

iowait is the percentage of CPU time spent waiting for disk I/O to complete. High iowait (consistently above 20-30%) indicates a disk bottleneck.

Confirm High iowait

# top - look at "wa" column in CPU line
top

# vmstat 1-second intervals (wa column)
vmstat 1 10

# iostat - detailed per-disk statistics
iostat -x 1 5

Key iostat columns:

util%: disk utilization (100% = saturated)
await: average I/O wait time in ms (should be < 10ms for SSD)
r/s, w/s: read/write operations per second
rkB/s, wkB/s: throughput in KB/s

Find the Responsible Process

# iotop: real-time I/O per process (install if missing)
apt-get install -y iotop
iotop -o          # show only processes doing I/O

# pidstat: I/O stats per process
apt-get install -y sysstat
pidstat -d 1 5

# Find processes with most I/O in /proc
for pid in /proc/[0-9]*/io; do
  reads=$(awk '/^read_bytes/{print $2}' $pid 2>/dev/null)
  comm=$(cat ${pid%/io}/comm 2>/dev/null)
  echo "$reads $comm $pid"
done | sort -rn | head -10

Common Causes and Fixes

1. Runaway Backup or Cron Job

# Identify backup processes
ps aux | grep -E "tar|rsync|mysqldump|gzip|cp"

# Deprioritize I/O of a running backup
ionice -c 3 -p <PID>       # idle class: only runs when disk is free

# Run future backups with low I/O priority
ionice -c 3 nice -n 19 rsync -a /data /backup

2. MySQL Full Table Scans / No Indexes

-- Show running queries
SHOW PROCESSLIST;

-- Check for queries doing full scans
SHOW ENGINE INNODB STATUS\G

-- Enable slow query log
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

Increase buffer pool to reduce disk reads:

# /etc/mysql/mysql.conf.d/mysqld.cnf
innodb_buffer_pool_size = 2G        # ~70% of available RAM
innodb_io_capacity = 2000           # IOPS your disk can handle
innodb_flush_method = O_DIRECT      # bypass OS cache for InnoDB

3. Redis AOF Sync Too Aggressive

# /etc/redis/redis.conf

# Change from "always" (every write) to "everysec"
appendfsync everysec

# Or disable AOF if durability isn't critical
appendonly no

4. Nginx / App Writing Excessive Logs

# Check log growth rate
watch -n 1 'ls -lh /var/log/nginx/'

# Temporarily disable access log
# In nginx.conf:
# access_log off;

# Rotate logs immediately
logrotate -f /etc/logrotate.d/nginx

5. Swap Thrashing (low RAM)

# Check swap activity
vmstat 1 | awk '{print $7, $8}'   # si=swap in, so=swap out

# If si/so are non-zero continuously, add RAM or reduce memory usage
free -h

6. Disk Hardware Issues

# Check SMART status
apt-get install -y smartmontools
smartctl -a /dev/sda

# Look for Reallocated_Sector_Ct, Pending_Sector, Offline_Uncorrectable
# Any non-zero value here = disk is failing

I/O Scheduler Tuning

# Check current scheduler
cat /sys/block/sda/queue/scheduler

# For NVMe SSDs: use "none" or "mq-deadline"
echo "none" > /sys/block/nvme0n1/queue/scheduler

# For SATA SSD: use "mq-deadline"
echo "mq-deadline" > /sys/block/sda/queue/scheduler

# Make persistent (udev rule)
echo 'ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="none"' \
  > /etc/udev/rules.d/60-scheduler.rules

Limit I/O with cgroups (systemd)

# /etc/systemd/system/myapp.service.d/io.conf
[Service]
IOWeight=50              # 1-10000, default 100
IOReadBandwidthMax=/dev/sda 50M
IOWriteBandwidthMax=/dev/sda 50M

systemctl daemon-reload && systemctl restart myapp

On cloud VPS environments, iowait can also reflect network storage (Ceph, NFS-backed volumes) saturation rather than local disk issues. Check with your provider if disk metrics look fine but iowait is high.

High I/O Wait - Disk Bottleneck Diagnosis

On this page