Jak zostać Uniksowym guru w kwadrans

[Richard Stallman]

Richard M. Stallman, Linus Torvalds, and Donald E. Knuth engage in a discussion on whose impact on the computerized world was the greatest.
Stallman: "God told me I have programmed the best editor in the world!" Torvalds: "Well, God told *me* that I have programmed the best operating system in the world!" Knuth: "Wait, wait – I never said that."

[from rec.humor.funny]

UNIX is an extremely popular platform for deploying server software partly because of its security and stability, but also because it has a rich set of command line and scripting tools. Programmers use these tools for manipulating the file system, processing log files, and generally automating as much as possible.

If you want to be a serious server developer, you will need to have a certain facility with a number of UNIX tools; about 15. You will start to see similarities among them, particularly regular expressions, and soon you will feel very comfortable. Combining the simple commands, you can build very powerful tools very quickly—much faster than you could build the equivalent functionality in C or Java, for example.

Wszystko jest strumieniem

The first thing you need to know is that UNIX is based upon the idea of a stream. Everything is a stream, or appears to be. Device drivers look like streams, terminals look like streams, processes communicate via streams, etc… The input and output of a program are streams that you can redirect into a device, a file, or another program.

Here is an example device, the null device, that lets you throw output away. For example, you might want to run a program but ignore the output.

ls > /dev/null # ignore output of ls

where "# ignore output of ls" is a comment.

Most of the commands covered in this lecture process stdin and send results to stdout. In this manner, you can incrementally process a data stream by hooking the output of one tool to the input of another via a pipe. For example, the following piped sequence prints the number of files in the current directory modified in August.

ls -l | grep ' 10-' | wc -l

Imagine how long it would take you to write the equivalent C or Java program. You can become an extremely productive UNIX programmer if you learn to combine the simple command-line tools. Even when programming on a PC, I use MKS's UNIX shell and command library to make it look like a UNIX box. Worth the cash.

Powiedzieć komuś, że jest w błędzie, to jedno, a przedstawić mu prawdę, to zupełnie coś innego.

— John Locke

Uzyskiwanie pomocy

If you need to know about a command, ask for the "man" page. For example, to find out about the ls command, type

man ls
LS(1)       User Commands       LS(1)

NAME
     ls - list directory contents

SYNOPSIS
     ls [OPTION]... [FILE]...

DESCRIPTION
     List information about the FILEs (the current directory
     by default). Sort entries alphabetically if none
     of -cftuvSUX nor --sort.
...

You will get a summary of the command and any arguments.

If you cannot remember the command's name, try using apropos which finds commands and library routines related to that word. For example, to find out how to do checksums, type

apropos '\bgit\b'
gitrepository [] (5)  - layout - Git Repository Layout
gittutorial []   (7)  - A tutorial introduction to git
gitworkflows []  (7)  - An overview of recommended workflows with git

Special Directories and files

A shortcut for my home directory, /home/pracinf/wbzyl, is ~wbzyl.

When you are using the shell, there is the notion of current directory. The dot '.' character is a shorthand for the current directory and '..' is a shorthand for the directory above the current. So to access file test in the current directory, ./test is the same as plain test. If test is a directory above, use ../test.

/ is the root directory; there is no drive specification in UNIX.

The .bash_profile file is very important as it is how your shell session is initialized including your ever-important PATH environment variable. My bash shell initialization file is ~wbzyl/.bash_profile and has set up code like the following:

PATH=$PATH:$HOME/bin
export PATH

The export means that the assignment to PATH is visible to all child processes (that is, visible to all programs you run from the shell).

Podstawowe polecenia

cd

Changing a directory is done with cd dir where dir can be "." or ".." to move to current directory (do nothing) or go up a directory.

ls

Display files in a directory with ls. The -l option is used to display details of the files:

total 1234
drwxr-xr-x  5 wbzyl pracinf      4096 08-04 00:08 public_git
drwx--x--x  2 wbzyl pracinf      4096 08-02 20:20 public_html
drwxr-xr-x  7 wbzyl pracinf      4096 08-02 10:10 rpm
drwx--x--x  2 wbzyl pracinf      4096 08-02 10:00 tmp
...

"pracinfo" is wbzyl's group.

If you want to see hidden files (those starting with "."), use -a.

Combinations are possible: use ls -la to see details of all files including hidden ones.

wyświetlanie zawartości plików

There are 4 useful ways to display the contents or portions of a file. The first is the very commonly used command cat. For example, to display my email box, type:

cat /var/mail/wbzyl

If a file is really big, you will probably want to use more, which spits the file out in screen-size chunks.

more /var/mail/wbzyl

If you only want to see the first few lines of a file or the last few lines use head and tail.

head /var//mail/wbzyl
tail /var/mail/wbzyl

You can specify a number as an argument to get a specific number of lines:

head -30 /var/mail/wbzyl

The most useful incantation of tail prints the last few lines of a file and then waits, printing new lines as they are appended to the file. This is great for watching a log file:

tail -f /var/mail/wbzyl

If you need to know how many characters, words, or lines are in a file, use wc:

 wc /var/mail/mail
 100    1000   40000 /var/mail/wbzyl

Where the numbers are, in order, lines, words, then characters. For clarity, you can use wc -l to print just the number of lines.

pushd, popd

Instead of cd you can use pushd to save the current dir and then automatically cd to the specified directory. For example,

pwd
/home/pracinf/wbzyl
pushd /tmp
/tmp ~
pwd
/tmp
popd
~
pwd
/home/pracinf/wbzyl

top (htop)

To watch a dynamic display of the processes on your box in action, use top or htop.

ps

To print out (wide display) all processes running on a box, use

ps auxww  # double w

chmod

To change the privileges of a file or directory, use chmod. The privileges are 3 digit octal words with 3 bits per digit: rwxrwxrwx where the first digit is for the file owner, the 2nd for the group, and 3rd for anybody. 644 is a common word value file which means 110100100 or

rw-r--r--

When you do ls -l you will see these bits.

755 is a common word value for directories:

rwxr-xr-x

where directories need to be executable for cd to be able to enter that dir. 755 is a shorthand for the more readable argument

u=rwx,go=rx

u is user, g is group, o is other.

Use chmod -R for recursively applying to all the dirs below the argument as well.

Searching streams

One of the most useful tools available on UNIX and the one you may use the most is grep. This tool matches regular expressions (which includes simple words) and prints matching lines to stdout.

The simplest incantation looks for a particular character sequence in a set of files. Here is an example that looks for lines begining with 'Return-Path:' in the IMAP mails in the current directory.

grep '^Return-Path:' *,{S,RS,ST,FRST}

You may find the dot '.' regular expression useful. It matches any single character but is typically combined with the star, which matches zero or more of the preceding item. Be careful to enclose the expression in single quotes so the command-line expansion doesn't modify the argument. The following example, looks for references to any a forum page in a server log file:

sudo egrep '/forum/.*' /var/log/httpd/access_log

or equivalently:

sudo cat /var/log/access_log | grep '/forum/.*'

The second form is useful when you want to process a collection of files as a single stream as in:

sudo cat /var/log/httpd/*_log | grep '/forum/.*'

If you need to look for a string at the beginning of a line, use caret '':

 grep '^153.19.7.230' /var/log/httpd/access_log

This finds all lines in all access logs that begin with IP address 153.19.7.230.

If you would like to invert the pattern matching to find lines that do not match a pattern, use -v. Here is an example that finds references to non image GETs in a log file:

sudo cat /var/log/access_log | grep -v '/images'

Now imagine that you have an http log file and you would like to filter out page requests made by nonhuman spiders. If you have a file called spider.ips, you can find all nonspider page views via:

cat /var/log/httpd/access_log | grep -v -f /tmp/spider.ips

Finally, to ignore the case of the input stream, use -i.

Przekształcanie strumieni tekstu

Morphing a text stream is a fundamental UNIX operation.

tr

For manipulating whitespace, you will find tr very useful.

If you have columns of data separated by spaces and you would like the columns to collapse so there is a single column of data, tell tr to replace space with newline tr ' ' '\n'. Consider input file falski.txt:

ala ma kota
ola ma psa

To get all those names in a column, use

cat falski.txt | tr ' ' '\n'

If you would like to collapse all sequences of spaces into one single space, use tr -s ' '.

To convert a PC file to UNIX, you have to get rid of the '\r' characters. Use tr -d '\r'.

sed

If dropping or translating single characters is not enough, you can use sed (stream editor) to replace or delete text chunks matched by regular expressions. For example, to replace all references to text TT by kbd in the file ug.rdiscount, use

cat /ug.rdiscount | sed -r 's/TT/kbd/g'

If there are multiple references to TT on a single line, use the g suffix to indicate "global" on that line otherwise only the first occurrence will be removed:

... |  sed -r 's/TT/kbd/g'

If you would like to replace references to view.html with index.html, use

... | sed 's/view.html/index.html/'

If you want any .pas file converted to .p, you must match the file name with a regular expression and refer to it via \1:

... | sed 's/\(.*\).pas/\1.p/'

The \(...\) grouping collects text that you can refer to with \1.

If you want to kill everything from the ',' character to end of line, use the end-of-line marker $:

... | sed 's/,.*$//' # kill from comma to end of line

Tarballs

Note: The name comes from a similar word, hairball (stuff that cats throw up), I'm pretty sure.

To collect a bunch of files and directories together, use tar. To tar up and gzip your entire home directory and put the gzipped tarball into /tmp, do this

cd ~/.. # go one dir above dir you want to tar.gz; why?
tar zcvf /tmp/wbzyl.backup.tar.gz wbzyl

By convention, use .tar.gz as the extension. To ungzip and untar this file use

cd ~/tmp
tar zxvf /tmp/wbzyl.backup.tar.gz

tar untars things in the current directory!

After running the untar, you will find a new directory, ~/tmp/wbzyl, that is a copy of your home directory. Note that the way you tar things up dictates the directory structure when untarred. The fact that I mentioned wbzyl in the tar creation means that I'll have that dir when untarred. In contrast, the following will also make a copy of my home directory, but without having a parrt root dir:

cd ~wbzyl
tar cvf /tmp/wbzyl.backup.tar *

It is a good idea to tar things up with a root directory so that when you untar you don't generate a million files in the current directly. To see what's in a tarball, use

tar ztvf /tmp/wbzyl.backup.tar.gz

If you have a big file to compress, use gzip:

gzip bigfile

After execution, your file will have been renamed bigfile.gz. To uncompress, use

gzip -d bigfile.gz

To display a text file that is currently gzip'd, use zcat:

zcat bigfile.gz

Kopiowanie plików między komputerami

rsync

When you need to have a directory on one machine mirrored on another machine, use rsync. It compares all the files in a directory subtree and copies over any that have changed to the mirrored directory on the other machine. For example, here is how you could "pull" all css screencasts files from renia.local to the box from which you execute the rsync command:

hostname -f
wlodek.local
rsync -raz -e ssh -v wlodek@renia.local:screencasts/css-tricks.com \
    ~/backup/
ls ~/backup/css-tricks.com/
  VideoCast-57-css3.m4v
  ...

rsync will delete or truncate files to ensure the files stay the same. This is bad if you erase a file by mistake--it will wipe out your backup file. Change the options above to -rabz to tell rsync to make a copy of any existing file before it overwrites it. Or change a filename of any existing file:

rsync -rabz -e ssh -v --suffix .rsync_`date '+%Y%m%d'` \
    wlodek@renia.local:screencasts/css-tricks.com ~/backup/

where date '+%Y%m%d' (in reverse single quotes) means "execute this date command".

To exclude certain patterns from the sync, use --exclude:

rsync -rabz --exclude=tmp/ --suffix .rsync_`date '+%Y%m%d'` \
    -e ssh -v wbzyl@renia.local:screencasts/css-tricks.com' ~/backup/

scp

To copy a file or directory manually, use scp:

scp lecture.html wbzyl@sigma.ug.edu.pl:public_html/

Just like cp, use -r to copy a directory recursively.

Pozostałe programy

find

Most GUIs for Linux or PCs have a search facility, but from the command-line you can use find. To find all files with suffix .c starting in directory ~/projects, use:

find  ~/projects -name '.c'

The default "action" is to -print.

You can specify a globbing pattern to match. For example, to look under your home directory for any xml files, use:

find ~ -name '*.xml' -print

Note the use of the single quotes to prevent command-line expansion--you want the '*' to go to the find command.

You can execute a command for every file or directory found that matches a name. For example, do delete all xml files, do this:

find ~ -name '*.xml' -exec rm {} \;

where "{}" stands for "current file that matches". The end of the command must be terminated with ';' but because of the command-line expansion, you'll need to escape the ';'.

You can also specify time information in your query. Here is a shell script that uses find to delete all files older than 14 days.

#!/bin/sh
BACKUP_DIR=$HOME/backup
# number of days to keep backups
AGE=14 # days
AGE_MINS=$[ $AGE * 60 * 24 ]
# delete dirs/files
find $BACKUP_DIR/* -cmin +$AGE_MINS -type d -exec rm -rf {} \;

fuser

If you want to know who is using a port such as HTTP (80), use fuser. You must be root to use this:

sudo /sbin/fuser -n tcp 80
80/tcp:              1812 1884 1892 1898

The output indicates the list of processes associated with that port.

whereis

Sometimes you want to use a command but it's not in your PATH and you can't remember where it is. Use whereis to look in standard unix locations for the command.

whereis fuser
fuser: /sbin/fuser /usr/man/man1/fuser.1 /usr/man/man1/fuser.1.gz
whereis ls
ls: /bin/ls /usr/man/man1/ls.1 /usr/man/man1/ls.1.gz

whereis also shows man pages.

which, type

Sometimes you might be executing the wrong version of a command and you want to know which version of the command your PATH indicates should be run. Use which to ask:

which ls
alias ls='ls --color=tty'
    /bin/ls
type ls
ls is aliased to `ls --color=auto'

If nothing is found in your path, you'll see:

which fuser
/usr/bin/which: no fuser in (/usr/local/bin:/usr/X11R6/bin)

kill, pkill

To send a signal to a process, use kill. Typically you'll want to just say kill pid where pid can be found from ps or top (see below).

Use kill -9 pid when you can't get the process to die; this means kill it with "extreme prejudice".

tracepath, traceroute

If you are having trouble getting to a site, use traceroute to watch the sequence of hops used to get to a site:

/usr/sbin/tracepath sigma.ug.edu.pl
1:  wlodek.local                                          0.391ms pmtu 1500
    ...
5:  do.gda-ar3.z.gda-r2.tpnet.pl (213.25.12.95)           10.854ms asymm  6
6:  TP-tp-edu-gw.task.gda.pl (153.19.0.2)                 42.373ms asymm 13
7:  tp-jra10ge.task.gda.pl (153.19.252.33)                34.889ms asymm 12
8:  area1-ug1-swr.task.gda.pl (153.19.254.228)            34.365ms asymm 12
9:  sigma.ug.edu.pl (153.19.7.230)                        36.318ms reached
    Resume: pmtu 1500 hops 9 back 52

sudo /bin/traceroute sigma.ug.edu.pl
traceroute to sigma.ug.edu.pl (153.19.7.230), 30 hops max, 60 byte packets
1  wlodek.local  1.708 ms  2.529 ms  3.330 ms
   ...
5  do.gda-ar3.z.gda-r2.tpnet.pl (213.25.12.95)  21.905 ms  22.052 ms  30.636 ms
6  TP-tp-edu-gw.task.gda.pl (153.19.0.2)  42.546 ms  33.290 ms  33.401 ms
7  tp-jra10ge.task.gda.pl (153.19.252.33)  38.472 ms  38.876 ms  39.371 ms
8  area1-ug1-swr.task.gda.pl (153.19.254.228)  41.536 ms  41.720 ms  41.916 ms
9  sigma.ug.edu.pl (153.19.7.230)  39.618 ms  39.921 ms  40.236 ms

Jaki jest mój adres IP?

/sbin/ifconfig

or

/sbin/ifconfig eth0

Under the eth0 interface, you'll see the inet addr:

eth0      Link encap:Ethernet  HWaddr 00:0E:35:95:80:44
          inet addr:192.188.100.1  Bcast:192.188.100.255  Mask:255.255.255.0
          inet6 addr: fe00::40a:32ca:c950:1234/64 Scope:Link

Użyteczne potoki: generowanie histogramu

A histogram is set of count, value pairs indicating how often the value occurs. The basic operation will be to sort, then count how many values occur in a row and then reverse sort so that the value with the highest count is at the top of the report.

... | sort |uniq -c|sort -r -n

Note that sort sorts on the whole line, but the first column is obviously significant just as the first letter in someone's last name significantly positions their name in a sorted list.

uniq -c collapses all repeated sequences of values but prints the number of occurrences in front of the value. Recall the previous sorting:

cut -d ' ' -f 7  access_log | egrep '^/sp/' | \
sort | \
uniq
/sp/
/sp/exercises
/sp/git
/sp/images/alan_kay.jpg
/sp/images/alan_perlis.jpg
/sp/images/albert_einstein.jpg
...

Now add -c to uniq:

cut -d ' ' -f 7  access_log | egrep '^/sp/' | \
sort | \
uniq -c
   22 /sp/
    6 /sp/exercises
    2 /sp/git
    1 /sp/images/alan_kay.jpg
    2 /sp/images/alan_perlis.jpg
    4 /sp/images/albert_einstein.jpg
    ...

Now all you have to do is reverse sort the lines according to the first column numerically.

cut -d ' ' -f 7  access_log | egrep '^/sp/' | \
sort | \
uniq -c | \
sort -r -n
   51 /sp/stylesheets/uv.css
   51 /sp/stylesheets/sp.css
   51 /sp/stylesheets/screen.css
   51 /sp/stylesheets/print.css
   50 /sp/stylesheets/icons/external.png
   ...