Posts Tagged 'Commands'

October 16, 2013

Tips and Tricks: Troubleshooting Email Issues

Working in support, one of the most common issues we troubleshoot is a customer's ability to receive email. Depending on email server, this can be a headache and a half to figure out, but more often than not, we're able to fix the problem with one of only a few simple solutions. Because the SoftLayer Blog audience loves technical tips and tricks, I thought I'd share a few easy steps that make pinpointing the root cause of email issues much easier.

Before you gear up to go into battle, check the that server is not out of disk space on /var and that it is not in a read only state. That precursory step may seem silly, but Occam's Razor often holds true in technical troubleshooting. Once you verify that those two common problems aren't causing your email problems, the next step is to determine whether the email issues are server-wide or isolated to one mail account/domain. To do that, the first thing you need to do is make sure that the IMAP and POP services are responding.

Check IMAP and POP Services

The universal approach to checking IMAP and POP services is to use telnet:

telnet <serverip> 110
telnet <serverip> 143

If either of those commands fail, you're able to pinpoint which service to check on your server.

For most variants of Linux, you can check both services with a single command: netstat -plan|egrep -i "110|143". The resulting output will show if the services are listening and which process is doing the listening. In Windows, you can run a similar command from a command prompt: netstat -anb|find "LISTEN"| findstr "110 143".

If the ports are listening, and you're able to connect to them over telnet, your next stop should be your server's error logs.

Check Error Logs

You want to look for any mail errors that might clue you into the root cause of your email issues. In Linux, you can check /var/log/maillog, and in Windows, you can filter eventvwr.msc for mail only. If there are errors, a simple search will highlight them quickly.

If there are no errors, it's time to dig into the mail queue directly.

Check the Mail Queue

Depending on the mail server you use, the commands here are going to vary. Here are a few examples of how we'd investigate the most common mail servers we encounter:

QMail

Display the mail queue: /var/qmail/bin/qmail-qread
Display the number of messages in the queue: /var/qmail/bin/qmail-qstat
Reference article: Gaining Control Over the QMail Queue

Sendmail

Display the mail queue: sendmail -bp or mailq
Display the number of messages in the queue: mailq –OmaxQueueRunSize=1
Reference article: Quick Sendmail Cheatsheet

Exim

Display the mail queue: exim -bp
Display the number of messages in the queue: exim -bpc
Reference article: Exim cheatsheet

MailEnable

MailEnable users can can check to see that messages are moving by opening the mail directory:
Program Files\MailEnable\Queues\SMTP\Inbound\Messages
Reference article: How to diagnose inbound message delivery delays

With these commands, you can filter through the email queues to see whether any of them are for the users or domains you're having problems with. If nothing obvious presents itself at that point, it's time for some active testing.

Active Testing

Send an email to your mailserver from an external mailserver (anything will do as long as it's not on the same server). Watch for logging of the email as it's delivered:
tail -f maillog
On busy mailservers you might add |grep youremailid or simply look for a new message in the directory where the email will be stored.

The your primary goal in troubleshooting your email issues in this way is to isolate the root cause of your problem so that you can fix it more quickly. SoftLayer customers have direct access to our support team to help you through this process, but it's always nice to keep a quick reference like this in your back pocket to be able to pinpoint the problem yourself.

-Bill

September 16, 2013

Sysadmin Tips and Tricks - Using strace to Monitor System Calls

Linux admins often encounter rogue processes that die without explanation, go haywire without any meaningful log data or fail in other interesting ways without providing useful information that can help troubleshoot the problem. Have you ever wished you could just see what the program is trying to do behind the scenes? Well, you can — strace (system trace) is very often the answer. It is included with most distros' package managers, and the syntax should be pretty much identical on any Linux platform.

First, let's get rid of a misconception: strace is not a "debugger," and it isn't a programmer's tool. It's a system administrator's tool for monitoring system calls and signals. It doesn't involve any sophisticated configurations, and you don't have to learn any new commands ... In fact, the most common uses of strace involve the bash commands you learned the early on:

  • read
  • write
  • open
  • close
  • stat
  • fork
  • execute (execve)
  • chmod
  • chown

 

You simply "attach" strace to the process, and it will display all the system calls and signals resulting from that process. Instead of executing the command's built-in logic, strace just makes the process's normal calls to the system and returns the results of the command with any errors it encountered. And that's where the magic lies.

Let's look an example to show that behavior in action. First, become root — you'll need to be root for strace to function properly. Second, make a simple text file called 'test.txt' with these two lines in it:

# cat test.txt
Hi I'm a text file
there are only these two lines in me.

Now, let's execute the cat again via strace:

$ strace cat test.txt 
execve("/bin/cat", ["cat", "test.txt"], [/* 22 vars */]) = 0
brk(0)  = 0x9b7b000
uname({sys="Linux", node="ip-208-109-127-49.ip.secureserver.net", ...}) = 0
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=30671, ...}) = 0
mmap2(NULL, 30671, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f35000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000_\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1594552, ...}) = 0
mmap2(NULL, 1320356, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7df2000
mmap2(0xb7f2f000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x13c) = 0xb7f2f000
mmap2(0xb7f32000, 9636, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f32000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7df1000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7df0000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7df1b80, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7f2f000, 8192, PROT_READ) = 0
mprotect(0xb7f57000, 4096, PROT_READ) = 0
munmap(0xb7f35000, 30671) = 0
brk(0)  = 0x9b7b000
brk(0x9b9c000) = 0x9b9c000
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
open("test.txt", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=57, ...}) = 0
read(3, "Hi I'm a text file\nthere are onl"..., 4096) = 57
write(1, "Hi I'm a text file\nthere are onl"..., 57Hi I’m a text file
there are only these two lines in me.
) = 57
read(3, "", 4096) = 0
close(3) = 0
close(1) = 0
exit_group(0) = ?

Now that return may look really arcane, but if you study it a little bit, you'll see that it includes lots of information that even an ordinary admin can easily understand. The first line returned includes the execve system call where we'd execute /bin/cat with the parameter of test.txt. After that, you'll see the cat binary attempt to open some system libraries, and the brk and mmap2 calls to allocate memory. That stuff isn't usually particularly useful in the context we're working in here, but it's important to understand what's going on. What we're most interested in are often open calls:

open("test.txt", O_RDONLY|O_LARGEFILE) = 3

It looks like when we run cat test.txt, it will be opening "test.txt", doesn't it? In this situation, that information is not very surprising, but imagine if you are in a situation were you don't know what files a given file is trying to open ... strace immediately makes life easier. In this particular example, you'll see that "= 3" at the end, which is a temporary sort of "handle" for this particular file within the strace output. If you see a "read" call with '3' as the first parameter after this, you know it's reading from that file:

read(3, "Hi I'm a text file\nthere are onl"..., 4096) = 57

Pretty interesting, huh? strace defaults to just showing the first 32 or so characters in a read, but it also lets us know that there are 57 characters (including special characters) in the file! After the text is read into memory, we see it writing it to the screen, and delivering the actual output of the text file. Now that's a relatively simplified example, but it helps us understand what's going on behind the scenes.

Real World Example: Finding Log Files

Let's look at a real world example where we'll use strace for a specific purpose: You can't figure out where your Apache logs are being written, and you're too lazy to read the config file (or perhaps you can't find it). Wouldn't it be nice to follow everything Apache is doing when it starts up, including opening all its log files? Well you can:

strace -Ff -o output.txt -e open /etc/init.d/httpd restart

We are executing strace and telling it to follow all forks (-Ff), but this time we'll output to a file (-o output.txt) and only look for 'open' system calls to keep some of the chaff out of the output (-e open), and execute '/etc/init.d/httpd restart'. This will create a file called "output.txt" which we can use to find references to our log files:

#cat output.txt | grep log
[pid 13595] open("/etc/httpd/modules/mod_log_config.so", O_RDONLY) = 4
[pid 13595] open("/etc/httpd/modules/mod_logio.so", O_RDONLY) = 4
[pid 13595] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 10
[pid 13595] open("/etc/httpd/logs/ssl_error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 11
[pid 13595] open("/etc/httpd/logs/access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 12
[pid 13595] open("/etc/httpd/logs/cm4msaa7.com", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 13
[pid 13595] open("/etc/httpd/logs/ssl_access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 14
[pid 13595] open("/etc/httpd/logs/ssl_request_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 15
[pid 13595] open("/etc/httpd/modules/mod_log_config.so", O_RDONLY) = 9
[pid 13595] open("/etc/httpd/modules/mod_logio.so", O_RDONLY) = 9
[pid 13596] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 10
[pid 13596] open("/etc/httpd/logs/ssl_error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 11
open("/etc/httpd/logs/access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 12
open("/etc/httpd/logs/cm4msaa7.com", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 13
open("/etc/httpd/logs/ssl_access_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 14
open("/etc/httpd/logs/ssl_request_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = 15

The log files jump out at you don't they? Because we know that Apache will want to open its log files when it starts, all we have to do is we follow all the system calls it makes when it starts, and we'll find all of those files. Easy, right?

Real World Example: Locating Errors and Failures

Another valuable use of strace involves looking for errors. If a program fails when it makes a system call, you'll want to be able pinpoint any errors that might have caused that failure as you troubleshoot. In all cases where a system call fails, strace will return a line with "= -1" in the output, followed by an explanation. Note: The space before -1 is very important, and you'll see why in a moment.

For this example, let's say Apache isn't starting for some reason, and the logs aren't telling ua anything about why. Let's run strace:

strace -Ff -o output.txt -e open /etc/init.d/httpd start

Apache will attempt to restart, and when it fails, we can grep our output.txt for '= -1' to see any system calls that failed:

$ cat output.txt | grep '= -1'
[pid 13748] open("/etc/selinux/config", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/tls/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/i686/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/sse2/libperl.so", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libnsl.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/etc/gai.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
[pid 13748] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)

With experience, you'll come to understand which errors matter and which ones don't. Most often, the last error is the most significant. The first few lines show the program trying different libraries to see if they are available, so they don't really matter to us in our pursuit of what's going wrong with our Apache restart, so we scan down and find that the last line:

[pid 13748] open("/etc/httpd/logs/error_log", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)

After a little investigation on that file, I see that some maniac as set the immutable attribute:

lsattr /etc/httpd/logs/error_log
----i-------- /etc/httpd/logs/error_log

Our error couldn't be found in the log file because Apache couldn't open it! You can imagine how long it might take to figure out this particular problem without strace, but with this useful tool, the cause can be found in minutes.

Go and Try It!

All major Linux distros have strace available — just type strace at the command line for the basic usage. If the command is not found, install it via your distribution's package manager. Get in there and try it yourself!

For a fun first exercise, bring up a text editor in one terminal, then strace the editor process in another with the -p flag (strace -p <process_id>) since we want to look at an already-running process. When you go back and type in the text editor, the system calls will be shown in strace as you type ... You see what's happening in real time!

-Lee

October 4, 2011

An Introduction to Redis

I recently had the opportunity to get re-acquainted with Redis while evaluating solutions for a project on the Product Innovation team here at SoftLayer. I'd actually played with it a couple of times before, but this time it "clicked." Or my brain broke. Either way, I see a lot of potential for Redis now.

No one product is a perfect fit for all of your data storage needs, of course. There are such fundamental tradeoffs to be made in designing storage architectures that you should be immediately suspicious of any product that claims to fit every need.

The best solutions tend to be products that actually embrace these tradeoffs. Redis, for instance, has sacrificed a small amount of data durability in exchange for being awesome.

What is it?

Redis is a key/value store, but describing it that way is sort of like calling a helicopter a "vehicle." It's a technically correct description, but it leaves out some important stuff.

You can think of it like a sophisticated older brother of Memcached. It presents a flat keyspace, and you can set those keys to string values. Another feature of Memcached is the ability to perform remote atomic operations, like "incr" and "append." These are really handy, because you have the ability to modify remote data without fetching, and you have an assurance that you're the only one performing that operation at that instant.

Redis takes this concept of remote commands on data and goes completely nuts with it. The database is aware of data structures like hashes, lists and sets in addition to simple string values. You can sort, union, intersect, slice and dice to your heart's content without fetching any data. Redis is a data structure server. You can treat it like remote memory, and this has an awesome immediate benefit for a programmer: your code and brain are already optimized for these data types.

But it's not just about making storage simpler. It's fast, too. Crazy fast. If you make intelligent use of its data structures, it's possible to serve a lot of traffic from relatively modest hardware. Redis 2.4 can easily handle ~50k list appends a second on my notebook. With batching, it can append 2 million items to a list on a remote host in about 1.28 seconds.

It allows the remote, atomic and performant manipulation of data structures. It took me a little while to realize exactly how useful that is.

What's wrong with it?

Nothing. Move along.

OK, it's a little short on durability. Redis uses memory as its primary store and periodically flushes to disk. A common configuration is to do so every second.

That sounds pretty reasonable. If a server goes down, you could lose a second of data. Keep in mind, however, how many operations Redis can perform in a second. If you're in a high-volume environment, that could be a lot of data. It's not for your financial transactions.

It also supports relatively limited availability options. Currently, it only supports master/slave replication. Clustering support is planned for an upcoming release. It's looking pretty powerful, but it will take some real-world testing to know its performance impact.

These challenges should be taken into consideration, and it's probably clear if you're in a situation where the current tradeoffs aren't a good fit.

In my experience, a lot of developers seriously overestimate the consequences of their application losing small amounts of data. Also consider whether or not the chance of losing a second (or less) of data genuinely represents a bigger threat to your application than any other compromises you might have made.

More Information
You can check out the slightly aging docs or browse the impressively simple source. There are probably already bindings for your language of choice as well.

-Tim

August 15, 2011

UNIX Sysadmin Boot Camp: bash

Welcome back to UNIX Sysadmin Boot Camp. You've had a few days to get some reps in accessing your server via SSH, so it's about time we add some weight to your exercise by teaching you some of the tools you will be using regularly to manage your server.

As we mentioned earlier in this series, customers with control panels from cPanel and Parallels might be tempted to rely solely on those graphical interfaces. They are much more user-friendly in terms of performing routine server administration tasks, but at some point, you might need to get down and dirty on the command line. It's almost inevitable. This is where you'll use bash commands.

Here are some of the top 10 essential commands you should get to know and remember in bash. Click any of the commands to go to its official "manual" page.

  1. man – This command provides a manual of other bash commands. Want more info on a command? Type man commandname, and you'll get more information about "commandname" than you probably wanted to know. It's extremely useful if you need a quick reference for a command, and it's often much more detailed and readable than a simple --help or --h extension.
  2. ls – This command lets you list results. I showed you an example of this above, but the amount of options that are available to you with this command are worth looking into. Using the "manual" command above, run man ls and check out the possibilities. For example, if you're in /etc, running ls -l /etc will get you a slightly more detailed list. My most commonly used list command is ls -hal. Pop quiz for you (where you can test your man skills): What does the -hal mean?
  3. cd – This command lets you change directories. Want to go to /etc/? cd /etc/ will take you there. Want to jump back a directory? cd .. does the trick.
  4. mv – This command enables you to move files and folders. The syntax is mv originalpath/to/file newpath/to/file. Simple! There are more options that you can check out with the man command.
  5. rm – This command enables you to remove a file or directory. In the same vein as the mv command, this is one of those basic commands that you just have to know. By running rm filename, you remove the "filename" file.
  6. cp – This command enables you to copy files from one place to another. Want to make a backup of a file before editing it? Run cp origfile.bla origfile.bak, and you have a backup in case your edit of origfile.bla goes horrendously wrong and makes babies cry. The syntax is simply: cp /source /destination. As with the above commands, check out the manual by running man cp for more options.
  7. tar – On its own, tar is a command to group a bunch of files together, uncompressed. These files can then be compressed into .gzip format. The command can be used for creating or extracting, so it may be a good idea to familiarize yourself with the parameters, as you may find yourself using it quite often. For a GUI equivalent, think 7-zip or WinRAR for Windows.
  8. wget – I love the simplicity of this little command. It enables you to "get" or download a target file. Yes, there are options, but all you need is a direct link to a file, and you just pull one of these: wget urlhere. Bam! That file starts downloading. Doesn't matter what kind of file it is, it's downloaded.
  9. top – This handy little binary will give you a live view of memory and CPU usage currently affecting your machine, and is useful for finding out where you need to optimize. It can also help you pinpoint what processes may be causing a slowdown or a load issue.
  10. chmod – This little sucker is vital to make your server both secure and usable, particularly when you're going to be serving for the public like you would with a web server. Combine good usage of permission and iptables, and you have a locked down server

When you understand how to use these tools, you can start to monitor and track what's actually happening on your server. The more you know about your server, the more effective and efficient you can make it. In our next installment, we'll touch on some of the most common server logs and what you can do with the information they provide.

Did I miss any of your "essential" bash commands in my top 10 list? Leave a comment below with your favorites along with a quick explanation of what they do.

-Ryan

Subscribe to commands