strace and lsof - Process Debugging Tools
Debug stuck processes, trace system calls, find open files/ports, and analyze file descriptors with strace and lsof.
When to Use These Tools
strace: Diagnose what a process is doing at the system call level
- Process appears stuck or frozen
- Application using unexpected syscall overhead
- Debugging permission issues
- Understanding why a process is slow
lsof: Find what a process has open (files, sockets, pipes)
- Determine which process is using a port
- Find open file handles
- Recover disk space from deleted but still-open files
- Diagnose which files a process accesses
strace: System Call Tracing
Basic Usage
Trace a new command:
strace ls -la /homeAttach to a running process:
strace -p 1234Common Options
Trace specific syscall categories:
# Network syscalls
strace -e trace=network curl example.com
# File operations
strace -e trace=file ls /tmp
# Open, read, write operations
strace -e trace=open,read,write,openat cat /etc/passwd
# Connection-related syscalls
strace -e trace=network,connect,bind nginx -s reloadCount syscalls by type and time:
# Summary table showing call count, time, and errors
strace -c ls -la /home
# Follow child processes (threads, forks)
strace -f -c systemctl start nginxFull syscall output with timing:
# Show syscall names and arguments
strace -e trace=all -v command
# Include timestamps for each syscall
strace -t ls /tmp
# Microsecond precision
strace -tt curl example.com
# Show relative time between calls
strace -r cat /var/log/syslog | head -20Practical: Debug Nginx Worker Stuck
# Find Nginx worker PID
ps aux | grep nginx
# Attach strace to the worker
strace -p 2345 -e trace=network,file
# Expected output shows syscalls waiting on I/O
# If blocked on read/write, look at previous operationsCapture System Call Output
Write to file for later analysis:
# Output to file
strace -o /tmp/trace.log -f command
# Verbose output with full structs
strace -e trace=all -v -o /tmp/trace.txt command
# Parse the trace file
grep "open\|read\|write" /tmp/trace.loglsof: List Open Files
Basic Usage
List all open files on the system:
lsofList files opened by a specific process:
lsof -p 1234
lsof -p 1234,5678 # Multiple PIDsFind What's Using a Port
Critical for troubleshooting port conflicts:
# What's listening on port 80?
lsof -i :80
# TCP only
lsof -i TCP:443
# UDP
lsof -i UDP:53
# All activity on a port (LISTEN, ESTABLISHED, etc.)
lsof -i :22 -nOutput fields explained:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
nginx 1234 root 6u IPv4 12345 0t0 TCP *:80 (LISTEN)- FD: File descriptor (r=read, w=write, u=read/write, -=unknown)
- TYPE: Socket, file, directory, etc.
- NAME: Remote IP:port or local file path
Find Files Opened by User
# All files opened by www-data user
lsof -u www-data
# Exclude www-data
lsof -u ^root
# Multiple users
lsof -u www-data,mysqlFind Deleted Files Still Holding Disk Space
A common cause of "disk full" when no large files are visible:
# Find deleted files still open by any process
lsof | grep deleted
# Example output:
# python 2345 user 10w REG 10,20 104857600 /tmp/logfile.txt (deleted)
# Kill the process to free space
kill 2345Specific user:
lsof -u * | grep deletedNetwork Connections
Monitor active network connections:
# All network connections
lsof -i
# Listen sockets only
lsof -i -sTCP:LISTEN
# Established connections
lsof -i -sTCP:ESTABLISHED
# IPv4 only
lsof -i 4
# IPv6 only
lsof -i 6File and Directory Monitoring
# What files is a process accessing?
lsof -p 1234
# What process is using /var/log/apache2/access.log?
lsof /var/log/apache2/access.log
# What's in /home directory?
lsof /home
# Regular files only
lsof -p 1234 -a -d cwd # Current working directoryUseful Filter Combinations
# Process using a specific file
lsof /tmp/lockfile.pid
# Processes on a remote host (NFS)
lsof -h 192.168.1.100
# All IPv4 connections to remote port 443
lsof -i 4 -i -sTCP:ESTABLISHED | grep :443
# Memory-mapped files
lsof -d memltrace: Library Call Tracing
Complement to strace, traces calls to shared libraries:
# Trace library calls
ltrace ./myapp
# Show timing
ltrace -C ./myapp # Demangle C++ symbols
# Count library calls
ltrace -c ./myapp
# Trace specific library
ltrace -e malloc,free ./myapp/proc Filesystem: Deep Inspection
For processes already running, examine /proc/PID/ directly:
/proc/PID/fd - File Descriptors
# List all open file descriptors for PID 1234
ls -l /proc/1234/fd/
# 0=stdin, 1=stdout, 2=stderr, 3+=custom
# Example:
# lrwx------ 1 root root 64 Mar 29 10:00 0 -> /dev/pts/1
# l-wx------ 1 root root 64 Mar 29 10:00 1 -> /var/log/app.log
# l-wx------ 1 root root 64 Mar 29 10:00 2 -> /var/log/app.log
# lrwx------ 1 root root 64 Mar 29 10:00 3 -> socket:[12345]
# Check file descriptor limits
cat /proc/1234/limits | grep files/proc/PID/maps - Memory Map
See what libraries and memory regions are loaded:
cat /proc/1234/maps
# Output shows:
# 7f1234560000-7f1234600000 r-xp 00000000 08:01 1234567 /lib64/libc.so.6
# base-end perms offset dev ino filename/proc/PID/status - Process Details
Summary of CPU, memory, and file limits:
cat /proc/1234/status
# Key fields:
# VmPeak: Peak memory
# VmRSS: Resident set size
# FDSize: Number of file descriptors
# VmLck: Locked memoryReal-World Scenarios
Scenario 1: Find What's Holding a Port
A service won't start because port 8080 is in use:
# Identify the process
lsof -i :8080
# Kill it
kill -9 $(lsof -t -i :8080)
# Or graceful
kill $(lsof -t -i :8080)Scenario 2: MySQL Connection Leak
MySQL running out of connections but nothing obvious in netstat:
# List all MySQL connections
lsof -p $(pgrep mysql) | grep TCP
# Count by state
lsof -p $(pgrep mysql) -i -sTCP | awk '{print $9}' | sort | uniq -c
# Find zombie connections
lsof -p $(pgrep mysql) | grep CLOSE_WAITScenario 3: Debug Slow PHP-FPM Process
A worker is consuming CPU:
# Find the PID
ps aux | grep php-fpm
# Trace system calls
strace -p 2345 -e trace=file,network
# See what files it's accessing
lsof -p 2345 | grep -E "\.php|\.so"
# Check file descriptor count
ls -1 /proc/2345/fd | wc -lScenario 4: Recover Disk Space
Disk full but no large files visible:
# Find deleted files still open
lsof | grep deleted | head -20
# Check size
lsof | grep deleted | awk '{print $7}' | paste -sd+ | bc
# Identify the process
lsof | grep deleted | awk '{print $2}' | sort | uniq
# Restart the service
systemctl restart service-namePerformance Note: strace and lsof have overhead. For high-traffic processes, use them briefly for diagnosis, not continuous monitoring. Use watch or systemd journal for long-term observation.
File Descriptor Exhaustion: Monitor /proc/PID/limits and track FDSize in /proc/PID/status. Default limits are often 1024 or 65536. Increase with ulimit -n or systemd LimitNOFILE.