Author Archive: Lee Latham

November 11, 2013

Sysadmin Tips and Tricks - Using the ‘for’ Loop in Bash

Ever have a bunch of files to rename or a large set of files to move to different directories? Ever find yourself copy/pasting nearly identical commands a few hundred times to get a job done? A system administrator's life is full of tedious tasks that can be eliminated or simplified with the proper tools. That's right ... Those tedious tasks don't have to be executed manually! I'd like to introduce you to one of the simplest tools to automate time-consuming repetitive processes in Bash — the for loop.

Whether you have been programming for a few weeks or a few decades, you should be able to quickly pick up on how the for loop works and what it can do for you. To get started, let's take a look at a few simple examples of what the for loop looks like. For these exercises, it's always best to use a temporary directory while you're learning and practicing for loops. The command is very powerful, and we wouldn't want you to damage your system while you're still learning.

Here is our temporary directory:

rasto@lmlatham:~/temp$ ls -la
total 8
drwxr-xr-x 2 rasto rasto 4096 Oct 23 15:54 .
drwxr-xr-x 34 rasto rasto 4096 Oct 23 16:00 ..
rasto@lmlatham:~/temp$

We want to fill the directory with files, so let's use the for loop:

rasto@lmlatham:~/temp$ for cats_are_cool in {a..z}; do touch $cats_are_cool; done;
rasto@lmlatham:~/temp$

Note: This should be typed all in one line.

Here's the result:

rasto@lmlatham:~/temp$ ls -l
total 0
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 a
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 b
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 c
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 d
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 e
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 f
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 g
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 h
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 i
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 j
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 k
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 l
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 m
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 n
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 o
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 p
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 q
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 r
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 s
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 t
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 u
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 v
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 w
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 x
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 y
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 z
rasto@lmlatham:~/temp$

How did that simple command populate the directory with all of the letters in the alphabet? Let's break it down.

for cats_are_cool in {a..z}

The for is the command we are running, which is built into the Bash shell. cats_are_cool is a variable we are declaring. The specific name of the variable can be whatever you want it to be. Traditionally people often use f, but the variable we're using is a little more fun. Hereafter, our variable will be referred to as $cats_are_cool (or $f if you used the more boring "f" variable). Aside: You may be familiar with declaring a variable without the $ sign, and then using the $sign to invoke it when declaring environment variables.

When our command is executed, the variable we declared in {a..z}, will assume each of the values of a to z. Next, we use the semicolon to indicate we are done with the first phase of our for loop. The next part starts with do, which say for each of a–z, do <some thing>. In this case, we are creating files by touching them via touch $cats_are_cool. The first time through the loop, the command creates a, the second time through b and so forth. We complete that command with a semicolon, then we declare we are finished with the loop with "done".

This might be a great time to experiment with the command above, making small changes, if you wish. Let's do a little more. I just realized that I made a mistake. I meant to give the files a .txt extension. This is how we'd make that happen:

for dogs_are_ok_too in {a..z}; do mv $dogs_are_ok_too $dogs_are_ok_too.txt; done;
Note: It would be perfectly okay to re-use $cats_are_cool here. The variables are not persistent between executions.

As you can see, I updated the command so that a would be renamed a.txt, b would be renamed b.txt and so forth. Why would I want to do that manually, 26 times? If we check our directory, we see that everything was completed in that single command:

rasto@lmlatham:~/temp$ ls -l
total 0
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 a.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 b.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 c.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 d.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 e.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 f.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 g.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 h.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 i.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 j.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 k.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 l.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 m.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 n.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 o.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 p.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 q.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 r.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 s.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 t.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 u.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 v.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 w.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 x.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 y.txt
-rw-rw-r-- 1 rasto rasto 0 Oct 23 16:13 z.txt
rasto@lmlatham:~/temp$

Now we have files, but we don't want them to be empty. Let's put some text in them:

for f in `ls`; do cat /etc/passwd > $f; done

Note the backticks around ls. In Bash, backticks mean, "execute this and return the results," so it's like you executed ls and fed the results to the for loop! Next, cat /etc/passwd is redirecting the results to $f, in filenames a.txt, b.txt, etc. Still with me?

So now I've got a bunch of files with copies of /etc/passwd in them. What if I never wanted files for a, g, or h? First, I'd get a list of just the files I want to get rid of:

rasto@lmlatham:~/temp$ ls | egrep 'a|g|h'
a.txt
g.txt
h.txt

Then I could plug that command into the for loop (using backticks again) and do the removal of those files:

for f in `ls | egrep 'a|g|h'`; do rm $f; done

I know these examples don't seem very complex, but they give you a great first-look at the kind of functionality made possible by the for loop in Bash. Give it a whirl. Once you start smartly incorporating it in your day-to-day operations, you'll save yourself massive amounts of time ... Especially when you come across thousands or tens of thousands of very similar tasks.

Don't do work a computer should do!

-Lee

September 16, 2013

Sysadmin Tips and Tricks - Using strace to Monitor System Calls

Linux admins often encounter rogue processes that die without explanation, go haywire without any meaningful log data or fail in other interesting ways without providing useful information that can help troubleshoot the problem. Have you ever wished you could just see what the program is trying to do behind the scenes? Well, you can — strace (system trace) is very often the answer. It is included with most distros' package managers, and the syntax should be pretty much identical on any Linux platform.

First, let's get rid of a misconception: strace is not a "debugger," and it isn't a programmer's tool. It's a system administrator's tool for monitoring system calls and signals. It doesn't involve any sophisticated configurations, and you don't have to learn any new commands ... In fact, the most common uses of strace involve the bash commands you learned the early on:

  • read
  • write
  • open
  • close
  • stat
  • fork
  • execute (execve)
  • chmod
  • chown

 

You simply "attach" strace to the process, and it will display all the system calls and signals resulting from that process. Instead of executing the command's built-in logic, strace just makes the process's normal calls to the system and returns the results of the command with any errors it encountered. And that's where the magic lies.

Let's look an example to show that behavior in action. First, become root — you'll need to be root for strace to function properly. Second, make a simple text file called 'test.txt' with these two lines in it:

# cat test.txt
Hi I'm a text file
there are only these two lines in me.

Now, let's execute the cat again via strace:

$ strace cat test.txt 
execve("/bin/cat", ["cat", "test.txt"], [/* 22 vars */]) = 0
brk(0)  = 0x9b7b000
uname({sys="Linux", node="ip-208-109-127-49.ip.secureserver.net", ...}) = 0
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=30671, ...}) = 0
mmap2(NULL, 30671, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f35000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000_\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1594552, ...}) = 0
mmap2(NULL, 1320356, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7df2000
mmap2(0xb7f2f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13c) = 0xb7f2f000
mmap2(0xb7f32000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f32000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7df1000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7df0000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7df1b80, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7f2f000, 8192, PROT_READ) = 0
mprotect(0xb7f57000, 4096, PROT_READ) = 0
munmap(0xb7f35000, 30671) = 0
brk(0)  = 0x9b7b000
brk(0x9b9c000) = 0x9b9c000
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
open("test.txt", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=57, ...}) = 0
read(3, "Hi I'm a text file\nthere are onl"..., 4096) = 57
write(1, "Hi I'm a text file\nthere are onl"..., 57Hi I’m a text file
there are only these two lines in me.
) = 57
read(3, "", 4096) = 0
close(3) = 0
close(1) = 0
exit_group(0) = ?

Now that return may look really arcane, but if you study it a little bit, you'll see that it includes lots of information that even an ordinary admin can easily understand. The first line returned includes the execve system call where we'd execute /bin/cat with the parameter of test.txt. After that, you'll see the cat binary attempt to open some system libraries, and the brk and mmap2 calls to allocate memory. That stuff isn't usually particularly useful in the context we're working in here, but it's important to understand what's going on. What we're most interested in are often open calls:

open("test.txt", O_RDONLY|O_LARGEFILE) = 3

It looks like when we run cat test.txt, it will be opening "test.txt", doesn't it? In this situation, that information is not very surprising, but imagine if you are in a situation were you don't know what files a given file is trying to open ... strace immediately makes life easier. In this particular example, you'll see that "= 3" at the end, which is a temporary sort of "handle" for this particular file within the strace output. If you see a "read" call with '3' as the first parameter after this, you know it's reading from that file:

read(3, "Hi I'm a text file\nthere are onl"..., 4096) = 57

Pretty interesting, huh? strace defaults to just showing the first 32 or so characters in a read, but it also lets us know that there are 57 characters (including special characters) in the file! After the text is read into memory, we see it writing it to the screen, and delivering the actual output of the text file. Now that's a relatively simplified example, but it helps us understand what's going on behind the scenes.

Real World Example: Finding Log Files

Let's look at a real world example where we'll use strace for a specific purpose: You can't figure out where your Apache logs are being written, and you're too lazy to read the config file (or perhaps you can't find it). Wouldn't it be nice to follow everything Apache is doing when it starts up, including opening all its log files? Well you can:

strace -Ff -o output.txt -e open /etc/init.d/httpd restart

We are executing strace and telling it to follow all forks (-Ff), but this time we'll output to a file (-o output.txt) and only look for 'open' system calls to keep some of the chaff out of the output (-e open), and execute '/etc/init.d/httpd restart'. This will create a file called "output.txt" which we can use to find references to our log files:

#cat output.txt | grep log
[pid 13595] open("/etc/httpd/modules/mod_log_config.so", O_RDONLY) = 4
[pid 13595] open("/etc/httpd/modules/mod_logio.so", O_RDONLY) = 4
[pid 13595] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 10
[pid 13595] open("/etc/httpd/logs/ssl_error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 11
[pid 13595] open("/etc/httpd/logs/access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 12
[pid 13595] open("/etc/httpd/logs/cm4msaa7.com", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 13
[pid 13595] open("/etc/httpd/logs/ssl_access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 14
[pid 13595] open("/etc/httpd/logs/ssl_request_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 15
[pid 13595] open("/etc/httpd/modules/mod_log_config.so", O_RDONLY) = 9
[pid 13595] open("/etc/httpd/modules/mod_logio.so", O_RDONLY) = 9
[pid 13596] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 10
[pid 13596] open("/etc/httpd/logs/ssl_error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 11
open("/etc/httpd/logs/access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 12
open("/etc/httpd/logs/cm4msaa7.com", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 13
open("/etc/httpd/logs/ssl_access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 14
open("/etc/httpd/logs/ssl_request_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 15

The log files jump out at you don't they? Because we know that Apache will want to open its log files when it starts, all we have to do is we follow all the system calls it makes when it starts, and we'll find all of those files. Easy, right?

Real World Example: Locating Errors and Failures

Another valuable use of strace involves looking for errors. If a program fails when it makes a system call, you'll want to be able pinpoint any errors that might have caused that failure as you troubleshoot. In all cases where a system call fails, strace will return a line with "= -1" in the output, followed by an explanation. Note: The space before -1 is very important, and you'll see why in a moment.

For this example, let's say Apache isn't starting for some reason, and the logs aren't telling ua anything about why. Let's run strace:

strace -Ff -o output.txt -e open /etc/init.d/httpd start

Apache will attempt to restart, and when it fails, we can grep our output.txt for '= -1' to see any system calls that failed:

$ cat output.txt | grep '= -1'
[pid 13748] open("/etc/selinux/config", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libnsl.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/etc/gai.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)

With experience, you'll come to understand which errors matter and which ones don't. Most often, the last error is the most significant. The first few lines show the program trying different libraries to see if they are available, so they don't really matter to us in our pursuit of what's going wrong with our Apache restart, so we scan down and find that the last line:

[pid 13748] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)

After a little investigation on that file, I see that some maniac as set the immutable attribute:

lsattr /etc/httpd/logs/error_log
----i-------- /etc/httpd/logs/error_log

Our error couldn't be found in the log file because Apache couldn't open it! You can imagine how long it might take to figure out this particular problem without strace, but with this useful tool, the cause can be found in minutes.

Go and Try It!

All major Linux distros have strace available — just type strace at the command line for the basic usage. If the command is not found, install it via your distribution's package manager. Get in there and try it yourself!

For a fun first exercise, bring up a text editor in one terminal, then strace the editor process in another with the -p flag (strace -p <process_id>) since we want to look at an already-running process. When you go back and type in the text editor, the system calls will be shown in strace as you type ... You see what's happening in real time!

-Lee

Subscribe to Author Archive: %