Common Issues
High I/O Wait - Disk Bottleneck Diagnosis
Identify and fix high iowait on Linux servers, find the responsible process, tune MySQL, Redis, and I/O schedulers.
iowait is the percentage of CPU time spent waiting for disk I/O to complete. High iowait (consistently above 20-30%) indicates a disk bottleneck.
Confirm High iowait
# top - look at "wa" column in CPU line
top
# vmstat 1-second intervals (wa column)
vmstat 1 10
# iostat - detailed per-disk statistics
iostat -x 1 5Key iostat columns:
util%: disk utilization (100% = saturated)await: average I/O wait time in ms (should be < 10ms for SSD)r/s,w/s: read/write operations per secondrkB/s,wkB/s: throughput in KB/s
Find the Responsible Process
# iotop: real-time I/O per process (install if missing)
apt-get install -y iotop
iotop -o # show only processes doing I/O
# pidstat: I/O stats per process
apt-get install -y sysstat
pidstat -d 1 5
# Find processes with most I/O in /proc
for pid in /proc/[0-9]*/io; do
reads=$(awk '/^read_bytes/{print $2}' $pid 2>/dev/null)
comm=$(cat ${pid%/io}/comm 2>/dev/null)
echo "$reads $comm $pid"
done | sort -rn | head -10Common Causes and Fixes
1. Runaway Backup or Cron Job
# Identify backup processes
ps aux | grep -E "tar|rsync|mysqldump|gzip|cp"
# Deprioritize I/O of a running backup
ionice -c 3 -p <PID> # idle class: only runs when disk is free
# Run future backups with low I/O priority
ionice -c 3 nice -n 19 rsync -a /data /backup2. MySQL Full Table Scans / No Indexes
-- Show running queries
SHOW PROCESSLIST;
-- Check for queries doing full scans
SHOW ENGINE INNODB STATUS\G
-- Enable slow query log
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';Increase buffer pool to reduce disk reads:
# /etc/mysql/mysql.conf.d/mysqld.cnf
innodb_buffer_pool_size = 2G # ~70% of available RAM
innodb_io_capacity = 2000 # IOPS your disk can handle
innodb_flush_method = O_DIRECT # bypass OS cache for InnoDB3. Redis AOF Sync Too Aggressive
# /etc/redis/redis.conf
# Change from "always" (every write) to "everysec"
appendfsync everysec
# Or disable AOF if durability isn't critical
appendonly no4. Nginx / App Writing Excessive Logs
# Check log growth rate
watch -n 1 'ls -lh /var/log/nginx/'
# Temporarily disable access log
# In nginx.conf:
# access_log off;
# Rotate logs immediately
logrotate -f /etc/logrotate.d/nginx5. Swap Thrashing (low RAM)
# Check swap activity
vmstat 1 | awk '{print $7, $8}' # si=swap in, so=swap out
# If si/so are non-zero continuously, add RAM or reduce memory usage
free -h6. Disk Hardware Issues
# Check SMART status
apt-get install -y smartmontools
smartctl -a /dev/sda
# Look for Reallocated_Sector_Ct, Pending_Sector, Offline_Uncorrectable
# Any non-zero value here = disk is failingI/O Scheduler Tuning
# Check current scheduler
cat /sys/block/sda/queue/scheduler
# For NVMe SSDs: use "none" or "mq-deadline"
echo "none" > /sys/block/nvme0n1/queue/scheduler
# For SATA SSD: use "mq-deadline"
echo "mq-deadline" > /sys/block/sda/queue/scheduler
# Make persistent (udev rule)
echo 'ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="none"' \
> /etc/udev/rules.d/60-scheduler.rulesLimit I/O with cgroups (systemd)
# /etc/systemd/system/myapp.service.d/io.conf
[Service]
IOWeight=50 # 1-10000, default 100
IOReadBandwidthMax=/dev/sda 50M
IOWriteBandwidthMax=/dev/sda 50Msystemctl daemon-reload && systemctl restart myappOn cloud VPS environments, iowait can also reflect network storage (Ceph, NFS-backed volumes) saturation rather than local disk issues. Check with your provider if disk metrics look fine but iowait is high.