Server Management

strace and lsof - Process Debugging Tools

Debug stuck processes, trace system calls, find open files/ports, and analyze file descriptors with strace and lsof.

When to Use These Tools

strace: Diagnose what a process is doing at the system call level

  • Process appears stuck or frozen
  • Application using unexpected syscall overhead
  • Debugging permission issues
  • Understanding why a process is slow

lsof: Find what a process has open (files, sockets, pipes)

  • Determine which process is using a port
  • Find open file handles
  • Recover disk space from deleted but still-open files
  • Diagnose which files a process accesses

strace: System Call Tracing

Basic Usage

Trace a new command:

strace ls -la /home

Attach to a running process:

strace -p 1234

Common Options

Trace specific syscall categories:

# Network syscalls
strace -e trace=network curl example.com

# File operations
strace -e trace=file ls /tmp

# Open, read, write operations
strace -e trace=open,read,write,openat cat /etc/passwd

# Connection-related syscalls
strace -e trace=network,connect,bind nginx -s reload

Count syscalls by type and time:

# Summary table showing call count, time, and errors
strace -c ls -la /home

# Follow child processes (threads, forks)
strace -f -c systemctl start nginx

Full syscall output with timing:

# Show syscall names and arguments
strace -e trace=all -v command

# Include timestamps for each syscall
strace -t ls /tmp

# Microsecond precision
strace -tt curl example.com

# Show relative time between calls
strace -r cat /var/log/syslog | head -20

Practical: Debug Nginx Worker Stuck

# Find Nginx worker PID
ps aux | grep nginx

# Attach strace to the worker
strace -p 2345 -e trace=network,file

# Expected output shows syscalls waiting on I/O
# If blocked on read/write, look at previous operations

Capture System Call Output

Write to file for later analysis:

# Output to file
strace -o /tmp/trace.log -f command

# Verbose output with full structs
strace -e trace=all -v -o /tmp/trace.txt command

# Parse the trace file
grep "open\|read\|write" /tmp/trace.log

lsof: List Open Files

Basic Usage

List all open files on the system:

lsof

List files opened by a specific process:

lsof -p 1234
lsof -p 1234,5678  # Multiple PIDs

Find What's Using a Port

Critical for troubleshooting port conflicts:

# What's listening on port 80?
lsof -i :80

# TCP only
lsof -i TCP:443

# UDP
lsof -i UDP:53

# All activity on a port (LISTEN, ESTABLISHED, etc.)
lsof -i :22 -n

Output fields explained:

COMMAND    PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
nginx    1234   root    6u  IPv4  12345      0t0  TCP *:80 (LISTEN)
  • FD: File descriptor (r=read, w=write, u=read/write, -=unknown)
  • TYPE: Socket, file, directory, etc.
  • NAME: Remote IP:port or local file path

Find Files Opened by User

# All files opened by www-data user
lsof -u www-data

# Exclude www-data
lsof -u ^root

# Multiple users
lsof -u www-data,mysql

Find Deleted Files Still Holding Disk Space

A common cause of "disk full" when no large files are visible:

# Find deleted files still open by any process
lsof | grep deleted

# Example output:
# python   2345  user   10w  REG 10,20  104857600 /tmp/logfile.txt (deleted)

# Kill the process to free space
kill 2345

Specific user:

lsof -u * | grep deleted

Network Connections

Monitor active network connections:

# All network connections
lsof -i

# Listen sockets only
lsof -i -sTCP:LISTEN

# Established connections
lsof -i -sTCP:ESTABLISHED

# IPv4 only
lsof -i 4

# IPv6 only
lsof -i 6

File and Directory Monitoring

# What files is a process accessing?
lsof -p 1234

# What process is using /var/log/apache2/access.log?
lsof /var/log/apache2/access.log

# What's in /home directory?
lsof /home

# Regular files only
lsof -p 1234 -a -d cwd  # Current working directory

Useful Filter Combinations

# Process using a specific file
lsof /tmp/lockfile.pid

# Processes on a remote host (NFS)
lsof -h 192.168.1.100

# All IPv4 connections to remote port 443
lsof -i 4 -i -sTCP:ESTABLISHED | grep :443

# Memory-mapped files
lsof -d mem

ltrace: Library Call Tracing

Complement to strace, traces calls to shared libraries:

# Trace library calls
ltrace ./myapp

# Show timing
ltrace -C ./myapp  # Demangle C++ symbols

# Count library calls
ltrace -c ./myapp

# Trace specific library
ltrace -e malloc,free ./myapp

/proc Filesystem: Deep Inspection

For processes already running, examine /proc/PID/ directly:

/proc/PID/fd - File Descriptors

# List all open file descriptors for PID 1234
ls -l /proc/1234/fd/

# 0=stdin, 1=stdout, 2=stderr, 3+=custom
# Example:
# lrwx------ 1 root root 64 Mar 29 10:00 0 -> /dev/pts/1
# l-wx------ 1 root root 64 Mar 29 10:00 1 -> /var/log/app.log
# l-wx------ 1 root root 64 Mar 29 10:00 2 -> /var/log/app.log
# lrwx------ 1 root root 64 Mar 29 10:00 3 -> socket:[12345]

# Check file descriptor limits
cat /proc/1234/limits | grep files

/proc/PID/maps - Memory Map

See what libraries and memory regions are loaded:

cat /proc/1234/maps

# Output shows:
# 7f1234560000-7f1234600000 r-xp 00000000 08:01 1234567 /lib64/libc.so.6
# base-end perms offset dev ino filename

/proc/PID/status - Process Details

Summary of CPU, memory, and file limits:

cat /proc/1234/status

# Key fields:
# VmPeak: Peak memory
# VmRSS: Resident set size
# FDSize: Number of file descriptors
# VmLck: Locked memory

Real-World Scenarios

Scenario 1: Find What's Holding a Port

A service won't start because port 8080 is in use:

# Identify the process
lsof -i :8080

# Kill it
kill -9 $(lsof -t -i :8080)

# Or graceful
kill $(lsof -t -i :8080)

Scenario 2: MySQL Connection Leak

MySQL running out of connections but nothing obvious in netstat:

# List all MySQL connections
lsof -p $(pgrep mysql) | grep TCP

# Count by state
lsof -p $(pgrep mysql) -i -sTCP | awk '{print $9}' | sort | uniq -c

# Find zombie connections
lsof -p $(pgrep mysql) | grep CLOSE_WAIT

Scenario 3: Debug Slow PHP-FPM Process

A worker is consuming CPU:

# Find the PID
ps aux | grep php-fpm

# Trace system calls
strace -p 2345 -e trace=file,network

# See what files it's accessing
lsof -p 2345 | grep -E "\.php|\.so"

# Check file descriptor count
ls -1 /proc/2345/fd | wc -l

Scenario 4: Recover Disk Space

Disk full but no large files visible:

# Find deleted files still open
lsof | grep deleted | head -20

# Check size
lsof | grep deleted | awk '{print $7}' | paste -sd+ | bc

# Identify the process
lsof | grep deleted | awk '{print $2}' | sort | uniq

# Restart the service
systemctl restart service-name

Performance Note: strace and lsof have overhead. For high-traffic processes, use them briefly for diagnosis, not continuous monitoring. Use watch or systemd journal for long-term observation.

File Descriptor Exhaustion: Monitor /proc/PID/limits and track FDSize in /proc/PID/status. Default limits are often 1024 or 65536. Increase with ulimit -n or systemd LimitNOFILE.

On this page