Linux Shell Programming

Here we have a free and complete book about Shell

The thirst of the "free knowledge" is welcome.

Comandos Shell Script

Syntactic Diagram
Júlio Neves
Júlio Neves
Home | Articles Português Español

Buy the book

Changelogs

  Pub Talk 1  

  Pub Talk 2  

  Pub Talk 3  

  Pub Talk 4  

  Pub Talk 5  

Pub Talk - Part II



'Waiter, get me a pint and don't worry about my lad over here, he's finally getting to meet a real operating system and he's got a lot to learn!'

'So my friend, could you get anything of what I've said so far?'

'Well, I can get what you mean, but I actually don't see what's the point of it.'

'Take it easy pal! We're just begining... What I've said so far is a taste of what lies ahead. As soon as we start developing structured programs, you'll see how useful those tools can be. After learning that, you'll see how easy it is to reach the top shellves. Now, tell me: how do you like the grep family?'

'Pardon me! I don't know any grep family'

'Sure, sure... grep is an an acronym for "global regular expression print" - although there is a legend that tells that the name grep comes from ed (a text editor that is vim's grampa), in which the search command was g/_regular expression_/p, or g/_re_/p.'

'Well, this grep command takes regular expressions and matches them to the lines of an "input". By the way, there is this guy - Aurélio Marinho Jargas - who maintains a webpage that can give you all the hints, clues and even tutorials you want about regular expressions (regexp). If you feel like learning to program in Shell, Perl, Python, etc. you're better to see what he's got!'

The great grep

'Waiter, this time I'll try a caipirinha - the Brazilian National Drink wink - (See how to prepare it)!'

'So, I told you that grep matches regular expressions to the lines of an "input". But what are those "inputs"? Well, there are different ways of defining those inputs. Let's see!'

Searching a file:

$ grep mary /etc/passwd

Searching more than one file:

$ grep grep *.sh

Searching the output of a command

$ who | grep pelegrino

Considering the 1st example - which is the simplest one - I searched the occurrences of the word mary in any position of the file /etc/passwd. If I wanted to search it as a login name - or, in other words, just at the begining of the registers of that file - I should execute:

$ grep '^rafael' /etc/passwd

'Hold on, hold on... what's that caret (circumflex ^) and those apostrophes for?'

'The caret (^), as you'd know if you had read the other articles on regular expressions I told you about, constrains the matches to the begining of the lines and the apostrophes (') tell grep not to understand that circumflex, in order to be searched for.'

The 2nd example will list all the lines of all the files with the extension .sh that have the world grep. Since I use this extension to my Shell scripts, what I've done is to look for a good grep example in all my scripts.

And look!! grep accepts as input the output of another command, as long as it is indicated by a pipe symbol (|) - this is very common in shell and it accelerates enourmously the execution of commands, since it takes the output of a command and reads it as if it were a file.

So, looking at the 3rd example, the command who lists the users who are logged in the same machine as you are (remember: Linux is a multi user system) and the command grep verifies whether the user pelegrino is working or not.

The grep family

You know, the command grep is widely known, because it is frequently used, but what most people don't know is that there are three commands in the grep family. They are:

  • grep
  • egrep
  • fgrep

Their main features are:

  • grep
    Can (or cannot) use simple regular expressions, but when it is not the case of using them, it is better to execute fgrep (it is faster);
  • egrep ('e' standing for extended)
    Is a very powerful tool that uses regular expressions. It is often seen as the slowest brother of the grep family, hence it is more likely to use it when it is necessary to elaborate a regular expression that grep does not accept;
  • fgrep ('f' standing for fast, or file)
    As its own name points out, is the fast brother of the family. It is fast running (it is about 30% faster than grep and 50% faster than egrep), but it is does not allow the use of regular expressions
Pinguim com placa de atenção (em inglês) The considerations above on speed are valid to the Unix grep family. grep is faster running on Linux, because the other two (fgrep and egrep) are shell scripts that execute grep.

And I must say: I don't like that solution.

'Now that you know the differences among the tree, tell me: What do you think about the examples I gave before the explanation?'

'I thought fgrep would solve your problem a lot faster than grep.'

'Perfect!! I see you got what I said! Let's see some other examples to make their differences even clearer.'

  • Examples

I know that there is a text talking about Linux, but I'm not quite sure on whether the word Linux is written with a capital L or with a small one, what should I do?

There are two options in that case:

$ egrep (Linux | linux) arquivo.txt

or

$ grep [Ll]inux arquivo.txt

In the first case, the complex regular expression (Linux | linux) uses the parentheses to group up the options and the pipe (|) as a logical "or", which means that you are searching Linux or linux.

In the second case, on the other hand, the regular expression [Ll]inux means that you are searching a word that starst with L or l followed by inux. Since this expression is simpler, grep itself can solve it, so I think it is a more recomendable one (remember: egrep is slower).

Another example. If you want to list the subdirectories of a directory, you should run:

$ ls -l | grep '^d' drwxr-xr-x 3 root root 4096 Dec 18 2000 doc drwxr-xr-x 11 root root 4096 Jul 13 18:58 freeciv drwxr-xr-x 3 root root 4096 Oct 17 2000 gimp drwxr-xr-x 3 root root 4096 Aug 8 2000 gnome drwxr-xr-x 2 root root 4096 Aug 8 2000 idl drwxrwxr-x 14 root root 4096 Jul 13 18:58 locale drwxrwxr-x 12 root root 4096 Jan 14 2000 lyx drwxrwxr-x 3 root root 4096 Jan 17 2000 pixmaps drwxr-xr-x 3 root root 4096 Jul 2 20:30 scribus drwxrwxr-x 3 root root 4096 Jan 17 2000 sounds drwxr-xr-x 3 root root 4096 Dec 18 2000 xine

As you can see above, the circumflex (^) limits the search to the first position of the long output of the ls command. The apostrophes tell the shell not to 'understand' the circumflex (^).

Let's take another example. You know what are the first four positions of the output of a ls -s command for an ordinary file (not a directory, nor a link, nor anything...) should be:

      -
 Position    1st     2nd     3rd   4th
  Possible values   - r w x
  - -   s (suid)  

Thus, in order to find out what are the executable files in a directory, you should:

$ ls -la | egrep '^-..(x|s)' -rwxr-xr-x 1 root root 2875 Jun 18 19:38 rc -rwxr-xr-x 1 root root 857 Aug 9 22:03 rc.local -rwxr-xr-x 1 root root 18453 Jul 6 17:28 rc.sysinit

Once again the caret (^) limits the search to the begining of each line, hence, the listed occurrences are the ones that start with a -, followed by anything (the full stop - a dot - in a regular expression denotes any character), once again followed by any character, followed by an x or a s.

The same result would be found with the command:

$ ls -la | grep '^-..[xs]'

and the search would be faster.

Building a CD Library

'Let me use a nice and didactic example: the process of building a CD Library. Keep in mind that it is as possible to develop software to organize audio CDs, as it is to data CDs (including those you get when you buy magazines, those you burn for yourself, etc.).'

'Hold on a sec. Where am I taking the CD data from?'

'Firstly I'll show you how your software can obtain data from those who are using it, afterwards I'll show you how to get data from the screen or from a file.'

Informing the Parameters

'In our case, the layout of a music file will be:'

    name of the album^artist~name of the song:..:singer of the song

As you can see above, a circumflex (^) separates the name of the album from the rest of the register (which contains information on each song and on its singer). The artist and the name of the song are separated by a tilde (~), and a colon (:) separates name of the song and name of the singer.

The software I'm intended to develop is called musinc, and it will include registers on my music file. I will inform the content of each album as a parameter whenever I run the software, this way:

$ musinc "album^musician~music:musician~music:..."

That way, the software musinc will get data from each album as if it were a variable. The only difference between a received parameter and a variable is that the first one gets numerical names (I know it sounds strange... what I meant was that they get one character names), such as $1, $2, $3, ..., $9. Let's make a test:

$ cat teste #!/bin/bash # Program to test how to inform the parameters echo "1o. parm -> $1" echo "2o. parm -> $2" echo "3o. parm -> $3"

Let's run it now:

$ teste informing parameters to test bash: teste: cannot execute

OOPS, there is a detail I've forgotten: we have to make the file executable before running it:

$ chmod 755 teste $ teste informing parameters to test 1o. parm -> informing 2o. parm -> parameters 3o. parm -> to

Interestingly, the last word test was not considered by our program. That is because the program just considered the three first parameters. Let's execute it another way:

$ teste "informing parameters" to test 1o. parm -> informing parameters 2o. parm -> to 3o. parm -> test

With inverted commas Shell did not consider the blank space between the two first words, making it consider them as a single parameter.

Parametric Hints

Since we are talking about parameters, let me give you some hints:

Meaning of the main variables
$*   Set of all parameters (similar to $@)  
  Variable     Meaning
$0   Name of the program
$#   Amount of informed parameters

  • Examples
Making changes on the program teste, in order to use the variables we have just seen. Let's do it this way:
$ cat teste #!/bin/bash # Program to test how to inform the parameters (2nd Version) echo The program $0 received $# parameters echo "1o. parm -> $1" echo "2o. parm -> $2" echo "3o. parm -> $3" echo Todos de uma só \"tacada\": $*

Note that preceding the inverted commas I inserted a inverted slash, in order to tell Shell not to interpret them. Let's run the program.

$ teste informing parameters to test The program teste received 4 parameters 1o. parm -> informing 2o. parm -> parameters 3o. parm -> to Todos de uma "tacada": informing parameters to test

As I've said before, the parameters are numbered from 1 to 9, but that does not mean that it is not possible to use more than 9 parameters. Let's test it:

  • Example:
$ cat teste #!/bin/bash # Program to test how to inform the parameters (3rd Version) echo The program $0 received $# parameters echo "11th parm -> $11" shift echo "2nd parm -> $1" shift 2 echo "4th Parm -> $4"

Let's run it now:

$ teste informing parameters to test The program teste received 4 parameters 11th parm -> informing1 2nd parm -> parameters 4th parm -> test

There are two remarkable points about this script:

  1. In order to show that the parameters range from $1 to $9, I wrote an echo $11 and what happened? It was interpreted as a $1 followed by the character 1, and the result was informing1;
  2. The command shift, whose syntax is shift n (in which n is a variable that can assume any numerical value - although its default is 1), does not consider the first n parameters, making the first parameter the one numbered n+1.

Well, now that you know a little bit more about informing parameters, let's return to our CD Library and create our script for including CDs on bank called musics. It is a very simple script (as simple as everything else in Shell) and I'll list you so that you can see:

  • Examples:
$ cat musinc #!/bin/bash # Cadastra CDs (Version 1) # echo $1 >> musics

Since it is a is very functional script, I'll simply attach the received parameter at the end of the file songs. Let's include 3 albums and see if it works (in order to simplify, I'll suppose each album contains just 2 songs):

$ musinc "album 3^Musician5~Music5:Musician6~Music5" $ musinc "album 1^Musician1~Music1:Musician2~Music2" $ musinc "album 2^Musician3~Music3:Musician4~Music4"

Listing the content of songs.

$ cat musics album 3^Musician5~Music5:Musician6~Music6 album 1^Musician1~Music1:Musician2~Music2 album 2^Musician3~Music3:Musician4~Music4

It is not as functional as it was supposed to be... it could be a lot better. The albums are out of order, complicating the research. Let's change the script and test it again:

$ cat musinc #!/bin/bash # Cadastra CDs (versao 2) # echo $1 >> musics sort musics -o musics

Including another one

$ musinc "album 4^Musician7~Music7:Musician8~Music8"

Now let's see what happens to the song file:

$ cat musics album 1^Musician1~Music1:Musician2~Music2 album 2^Musician3~Music3:Musician4~Music4 album 3^Musician5~Music5:Musician6~Music5 album 4^Musician7~Music7:Musician8~Music8

I simply inserted a line that classifies the file musics, pointing the output to the same file (that's how the option -o works), after attaching each album.

WOW! Now it is nice and almost functional. But attention and don't panic! That is not the final version. The next version of the program will be a lot better and more friendly! We'll develop it as soon as we learn how to get data from the screen and how to format the input.

  • Examples

Listing with the cat command is totally out, let's make a program called muslist that lists the album whose name is given as parameter:

$ cat muslist #!/bin/bash # Search for CDs (version 1) # grep $1 musicas

Let's run it looking for album 2. As we have previously seen, when informing the sequence of characters album 2, it is necessary to prevent Shell from interpreting it (otherwise it would read two parameters). Let's try the following:

$ muslist "album 2" grep: can't open 2 musicas: album 1^Musician1~Music1:Musician2~Music2 musicas: album 2^Musician3~Music3:Musician4~Music4 musicas: album 3^Musician5~Music5:Musician6~Music6 musicas: album 4^Musician7~Music7:Musician8~Music8

'What a mess!! Where is the mistake? I put the parameter between inverted commas so that shell would not split it into two...'

'Yeap, but pay attention to how grep is running:

    grep $1 musics

Even putting album 2 between inverted commas, when Shell sees $1 it splits it into two arguments. So, the final content of the line that grep has executed is:

    grep album 2 musics

As the grep syntax is:

    grep  [arq1, arq2, ..., arqn]

grep has understood that it was supposed to look for the chain of characters album on the files 2 and musics. But, since there is no arquivo 2, an error has occurred. Moreover, since the word album was found in every register of musicas, all registers were listed.

Pinguim com placa de dica (em inglês) Use inverted commas whenever there is a blank space or a <TAB> in the chain of characters that grep will run. That helps the words after the blank space or <TAB> from being interpreted as file names.

On the other side, it is better not to consider the case of the letters in the research. The following program would solve two problems at the same time:

$ cat muslist #!/bin/bash # Search for CDs (version 2) # grep -i "$1" musics

In that case, the option -i tells grep not to consider the case of the letters. Another point is the parameter $1 that was inserted between inverted commas so that grep would understand the chain of characters as a single argument.

$ muslist "album 2" album2^Musician3~Music3:Musician4~Music4

Pay attention too to the fact that grep locates the chain of characters in any position of the register, so, this way we can search for album, song, singer or even for pieces of information. As soon as we get started with conditional commands, we'll get a new version of muslist that asks us in which of the fields the research will be performed.'

'Hold on pal! That putting between inverted commas thing is not really a friendly way of doing that...'

'You are right! Let me show you another way, then:

$ cat muslist #!/bin/bash # Consulta CDs (versao 3) # grep -i "$*" musics $ muslist album 2 album 2^Musician3~Music3:Musician4~Music4

The option $* stands for all parameters, and in that program it will be substituted by the chain album 2 (according to the previous example), and it will do what you wanted it to.

You should have realized by now that the problem about Shell is not if if does or not something, but what is the best way of doing it (as you've seen, the range of options is huge!).'

'But what if I have to exclude a CD? Once I forgot a CD of mine under the sun and when I looked at it again... it was lost. What if that happened again?'

'Well, let's make another script called musexc, in order to solve that kind of problem.'

Before developing it, I'd like to introduce you to a very useful option of the grep family. Meet the option -v. This option lists every input register, but the ones found by the command. Let's see the example:

$ grep -v "album 2" musics album 1^Musician1~Music1:Musician2~Music2 album 3^Musician5~Music5:Musician6~Music6 album 4^Musician7~Music7:Musician8~Music8

As I've mentioned, that grep from the example lists all the registers but the ones that refer to album 2, and that happens because it fits into the parameters of the command. Now we are ready to develop the script that will remove the lost CD from your CD Library. It looks like this:

$ cat musexc #!/bin/bash # Delete CDs from Library (version 1) # grep -v "$1" musics > /tmp/mus$$ mv -f /tmp/mus$$ musics

The first line sends the file musics to /tmp/mus$$, but extracting the registers that conform to the grep='s research. Afterwards, it moves (or renames, if you prefer this word) =/tmp/mus$$ to musics.

I used the file /tmp/mus$$ as a work copy, because, as I've mentioned previously, the $$ contains the PID (Process IDentification), because of that, when others edit the file musics, a different work copy will be made, and that avoids running over other's files.

'And that's it?'

'Yeah, man! Well, those programs we've made are quite basic, because we still lack knowledge about some tools. But, while I have another pint, you can practice using the examples, and I promise you will develop a nice control system for your CDs.

Next time we meet, I'll show you how conditional commands work and we'll improve those scripts.'

'That's it for now... but before:

Waiter, another round for me and my pal, please!'

Creative Commons license - Attribution and Non-Commercial (CC) 2009 By Visitors of Júlio Neves´s Pub.
All content of this page may be used under the terms of the Creative Commons License: Atribuição-UsoNãoComercial-PermanênciaDaLicença.