Pub Talk - Part I
Dialog overheard between a Linuxer and a "Mouseoholic"
- Who's Bash?
- Bash is the newest son of the Shell Family
- Hey dude! You're gonna drive me crazy. I had one doubt... Now I have two!
- No, man. It's been a long time you are crazy. Since you decided to use that operating system, you need to boot ten times a day and you don't have any idea what's happening in your computer. But, never mind. I'm gonna explain what a Shell is and what Shell Families are, and in the end, you'll exclaim: "Holy God of Shell ! Why didn't I choose linux before?"
The Linux Environment
- To understand Shell and how it works, first of all I'll show you how the layers in the Linux Environment works. Take a look at the graph:
Visão do shell em relação do Kernel do Linux
In this graph, we can see that the Hardware Layer is in the center and made of the physical components of your computer. Surrounding that we have the Linux Kernel Layer, that is your core. This layer communicates with the hardware, managing and controlling it. The kernel sends programs and commands to the central processing unit (CPU) for execution. Enclosing this, we have the Shell. This name is because it's a wrapper between the User and the Operating System (OS). All User interaction with the OS is managed by the Shell.
The Shell Environment
Well... to get to the Linux Core - the ambition of all applications - Shell filtering is needed, Let's understand how it works, to make the most of the numerous tools the Shell provides us.
Linux, by definition, is a multi-user operating system - we can't ever forget this - and to allow the access to specific users and deny others, there is a file called /etc/passwd . The /etc/passwd file holds data for the "host" function and contains information which controls the login of all users of the system. The last field of each user record in /etc/passwd tells the system which Shell the user will get at login.
Remember, I said that the last field of /etc/passwd tells the system the default user Shell at login? This means, if in this field we have prog , when the user logs in he will have the prog program screen, and when execution of prog finishes, the system will logout the user. Imagine how much security we can implement with this simple tool.
Do you remember I told the Shell, family, brother? That's it, lets understand this: the Shell is the concept of the Shell involving the operational system as is, and is the generic name to treat the sons of this idea that, for the years of Unix existence, was born. Nowadays there are lots of Shell flavors. We can tell about sh (Bourne Shell), the ksh (Korn Shell), the Bash (Bourne Again Shell) and the csh (C Shell).
A Little Bang in the Main Shell Flavors
Bourne Shell (sh)
Developed by Stephen Bourne, at AT&T Bell Labs (where Unix was developed too), this was during many years the default Shell of Unix Operational System. Is also called Standard Shell, because it was for years the only one Shell and is the most used today, because it was ported to all Unix environments and Linux distros.
Korn Shell (ksh)
Developed by David Korn, from Bell Labs, is a superset of sh, that means, it has all easinesses of sh and to them agregated many others. The total compatibility with sh brings many users and Shell programers to this environment.
Bourne Again Shell (Bash)
This is the most modern Shell (excepting on Bash 2) and whose number of adepts is growing more and more in whole world, or because it is the Linux default Shell, or because its big diversity of commands, that also incorporates many C-Shell commands.
C Shell (csh)
Developed by Bill Joy from Berkley University, is the most used BSD and Xenix Shell. Its command structure is very similar to C language structure. Your biggest sin was to ignore the SH compatibility, walking its own way.
There are some other Shells, but we only will talk about the three firsts, treating them generically as "Shell" and pointing the particular characteristics of each one, if they have.
Explaining Shell Work
The Shell is the first program you have when you make login at Linux. He will solve lots of things in order to not burden Kernel with repetitive tasks, alliviating him to take care about more noble tasks. As each user has your own Shell interposed between him and Linux, is the Shell that will interpret the commands that are typed and checks its syntax, passing it clean to execution.
- YO ! This kind of interpret command doesn't have anything with an interpreter ?
- Yes, it has. In the truth, the Shell is an interpeter that brings with him a powerfull language with high-level commands, that allows loop construction, decision structures and values storage in variables, as I'll show you.
Let's me explain the main tasks that Shell do, in its execution order. Pay attencion in this order, because she's fundamental to the rest of our speech understanding.
Examination of the Command Line
In this examination, the Shell identifies the special (reserved) characters that have meaning for interpretation of the line, and checks if the passed line is an assignment or a command.
Assignment
If the Shell finds two fields separated by an equal (= ) without blank spaces between them, identifies this sequency as an assignment.
$ ls linux
linux
In this example, the Shell identified the ls as a program and linux as a parameter passed to ls program.
$ value=1000
In this case, because we don't have blank spaces (and we can notice that the blank space is one of those reservated characters), the Shell identified an assignment and put 1000 on variable value.
Never do:
$ value = 10000
bash: value: not found
Bash found the word value between blanks spaces and "guess" that you were trying to execute a program called value , passing two parameters: != and 1000 .
Command
When a line is typed at linux prompt, she is divided in pieces separeted by blank spaces: the first piece is the name of the program and will have your existence searched; next identifies, in this order, options/parameters, redirects and variables. When the identified program exists, the Shell verifies the permissions of involved files (including the own program), generating an error if you don't have permissions to run this job.
Redirection Resolution
After identifies the components at command line you typed, the Shell goes to redirection resolution. The Shell has in your advantage issues something we call redirection, that can be input (stdin), output (stdout) or error (stderr), as I'll explain soon.
Variable Substitutions
At this point, Shell verifies if the variables ( parameters started by $ ), found at command scope, are defined and change them to its present values.
Metacharacters Substitutions
If any metacharacter ( * , ? ou [] ) was found at command line, it will be changed by its possible values, at this point.
Supose that the only file in your actual directory started by T be a directory named ThisIsAVeryHugeNameForADirectoryButIsMyDirectoryName , you can do:
$ cd T*
As until here who's working your command line is the Shell and the command (program) cd isn't executed yet, the Shell changes the T* in ThisIsAVeryHugeNameForADirectoryButIsMyDirectoryName and the command cd will be successfully executed.
Sending Command Line to the Kernel
Completed the previous jobs, the Shell mounts the command line, now with all changes done, call the kernel to execute it in a new Shell (Son Shell), wining a process number called PID (Process IDentification ) and stays inactive, taking a little nap, during the program execution. Once finished this process (together the son Shell), it takes the control again and, showing the prompt, tells it is ready to execute new commands.
Decrypting the Rosetta Stone
To take off that feeling you have when you see a Shell script, that's like a letter soup or hierogliphs, I'll show you the main special characteres that allow you to walk as Jean-François Champollion (make a little research at Google to find out who's this man) decrypting the Roseta's Stone.
Escape Characters
That's it. When we desire Shell to interpret a special character, we must "hide" it from him. This can be done in many ways:
Single Quotes (' )
When the Shell see a character chain between single quotes, he takes of the single quotes and doesn't interprets its content.
$ ls linux*
linuxmagazine
$ ls 'linux*'
bash: linux* no such file or directory
In the first case, Shell "expanded" the asterisk (* ) and discovered the file linuxmagazine to list.In the second case, the single quotes inhibited the Shell interpretation and we got the answer that there is no file linux* . That means, the asterisk (* ) was expanded in the first case, but was interpreted as a literal asterisk (* ) character in the second case.
Backslash (\ )
At the same way that single quotes work, backslash (\ ) inhibities the interpretation only of the character that follows her.
Imagine you acidentally had created a file named * (asterisk) - some Unix flavors allow it - and wants to remove it. If you do:
$ rm *
You will doing a big mess, because the rm would erase all files in the current directory. The best way to do this is:
$ rm \*
In this way, Shell didn't interpretate the asteristiks, doing its expansion.
Do the following cientific experience:
$ cd /etc
$ echo '*'
$ echo \*
$ echo *
Did you see the diferences? So, I don't need to explain nothing more.
Double Quotes (" )
Likelly at single quotes, excepting if the chain between double quotes has a dolar ($ ), a backquote (` ) or a backslash (\ ). You don't need to get stressed, but I didn't give samples of double quotes use because you don't know the dolar ($ ) nor the backquote (` ). From here, we'll see the use of these special characters too many times. The most important is to understand what any one means.
Redirection Characters
The most of commands have an input, an output and can generate errors. This input is called Standard Input or stdin and its default is the terminal keyboard. The output of a command is called Standard Output or stdout and its default is the terminal screen. To the terminal screen also is send by default the error messages that command can generate, called Standard Error or stderr. Let's see now how to change this state of things. Let's do a "parrot program". Do as following:
$ cat
The cat is an instruction that lists the contents of a specific file to the standard output (stdout ). If this input aren't defined, he waits the data from standard input (stdin ). As I didn't specify the input, he's waiting it from keyboard (standard input) and, as I also didn't tell the output, what I type will go to the screen (standard output), doing this way, as I was proposed, a "parrot program". Try it !
Redirecting Standard Output
To specify the output of a program we can use the > (greather than) or the >> (greather than, greather than) followed by the name of a file to wich we want to send the output.
Lets change the "parrot program" onto a "text editor" (how pretensious, hu ?).
$ cat > Arq
The cat continues without the specified input, so is waiting for data typed, but your output is redirected to the file Arq . In this way, everything typed is going to inside the Arq file, that means we did the shorter and whorst text editor of entire planet.
If I do again:
$ cat > Arq
The data in Arq will be lost, as before the redirecting the Shell will create an empty Arq file. To put information at the end of file, it should be done:
$ cat >> Arq
As I already told you, the Shell resolves the line and after send the command to execution. Thus, if you redirect the output of a file to itself, first the Shell empty the file and after send the command to execution. In this way, you just lost your dear file.
With this, we can notice that >> is used to insert data at the end of file.
Redirecting Standard Error Output
As the Shell receives data from a keyboard and send the output to a screen by default, the errors also are send to the screen if you don't specify another output. To redirect the errors, use 2> error_file . Pay atention that between the number 2 and the greather than sign (> ) there is no blank space.
Don't make the confusion between >> with 2> . The first one insert data at the end of a file and the second one redirects the standard error outupt ( stderr ) to the specified file. This is important!
Supose that, during a script execution you can, or not (it depends of the way the program execution takes), created a file named /tmp/IsThisExisting$$ . To erase trash from your disc, at the end of the script you could put a line like:
rm /tmp/IsThisExisting$$
If the file doesn't exist, an error message will be send to the screen. To not allow this, you should do:
rm /tmp/IsThisExisting$$ 2> /dev/null
About the example we just saw, I have two tips:
TIP # 1
The $$ has the PID , this means, the Process IDentification. As Linux is a multi-user system, is always good insert the $$ to the file names that will be used for many people to avoid properties problems, that means, if you named your file just as IsThisExist, the first user (the creator, then) will be your owner and all others will have a permission error when tried to write something in the file.
To test you Standard Error Output at Shell prompt, I'll give one example. Do:
$ ls donotexist
bash: donotexist no such file or directory
$ ls donotexist 2> errorfile
$ cat errofile
bash: donotexist no such file or directory
In this case, we saw that when we did a ls at donotexist , we got an error message. After redirect the standard error output to errorfile and run the same command, we got only the Shell prompt back. When listing the errorfile , we saw the error message was stored in it. Do the same test.
TIP # 2
- Who's the hell of /dev/null ?
- In Unix, there is a ghost file. It's called /dev/null . Everything is sent to this file disapears. It's like a Black Hole. In my example, as I was not interested in store a possible error message from rm command, I just redirect it to this file.
It is good to notice that those redirecting characters are cumulatives, this means, if in the previous example we did:
$ ls donotexist 2>> errorfile
the error message from ls will inserted at the end of errorfile.
Redirecting Standard Input
To do the Standard Input Redirection we use the < (less than).
- And this is used for what, you'll ask me.
- You'll understand too fast. Let me show you an example.
Supose taht you want to send an e-mail to your boss. To the Boss, we always whim, right? So, instead of start typing the e-mail at the prompt that makes the correction of a previous phrase impossible, you write a file with the message and after ten checks without see any error, you decide to send it, and do:
$ mail boss < filewithmailtotheboss
Your boss will receive the text in the filewithmailtotheboss.
Another type of very crazy redirection Shell allows is called here document. He's represented by << (less than, less than) and indicates to the Shell that the command escope begins at the next line and ends when found a line that contains only the label that follow the sign << .
See the following script, with a ftp routine:
ftp -ivn remotehost <<endftp
user $USER $PASSWD
binary
get remotefile
endftp
This little portion of code we have lots of interesting details:
- The options I used to
ftp (-ivn ) are used to it list everything is happening (-v from verbose), to not ask if you really want to get the file (-i from interactive) and, last but not least, to doesn't require user and password, (-n ), because these parameters will be informed by the specified instruction user;
- When I used the
<< endftp , I was telling the following: "Listen to me, Shell. Do not mess with nothing from here until find the label endftp . You didn't understand anything, as they are ftp specific instructions". If this was the end, it would be simple, but following the example, we can see that there are two variables ($USER and $PASSWD ), that the Shell will interpret before the redirection. But the great advantage of this kind of construction is that it allows to commands be interpreted inside the here document escope, that opposes what I just said. Soon I'll explain how this thing works. Now we can't, because you don't know all the tools;
- The command
user is a ftp command and is used to pass user and password that were read in a previous routine and inserted in our two variables: $USER and $PASSWD ;
- The binary is another
ftp instruction, that is used to indicate that the transfer of the remotefile file will be done in binary way, that means, the file data will not interpreted to know if it is ASCII, EBCDIC, etc;
- The
get remotefile tells to ftp to download this file from remotehost to our local host. If we want to send the file, we used the command put .
A very frequent error in the labels use (as the endftp in our previous example) is caused by the existence of blank spaces before or after it. Pay atention on it, because this kind of error uses to spank the programer's ass, until its detection. Remember: a good label must be an entire line to her.
- All right... all right... I know I was babling and walked by ftp commands, outing of our main subject, but is always good to learn and is very rare to find people that loves to teach...
Redirecting Commands (pipes)
The redirections we told until now always refered to files, that means, they sent things to a file, they got things from a file, they simulated local files. What we'll see from now redirects the output of a command to the input of another.
This is very usefull and make lots of things easy. You name is pipe and acts as a pipe between two commands, or in other words, acts pipeing information from one command to another. Your representation is a vertical bar (| ).
$ ls | wc -l
21
The ls command passed the file list to the wc command, that when it has the option -l counts the lines received. Using this, we can say how many files we have in our directory (21 in this case).
$ cat /etc/passwd | sort | lp
This command line sends contents of /etc/passwd file to the sort command input. This command classifies it and sends to lp , that is our printer spool manager.
Environment Characters
When you want to priorize one expression, you put it between parentesis, right? So, because of arithmetic, this is a normal think. But in Shell what really priorizes expressions are the backquote (` ) and not the parentesis. I'll give you examples of backquote uses, to get better understanding.
I want to know how many users are logged in my computer. I can do:
$ who | wc -l
The who command sends the connected users list to the command wc -l that counts how many lines it received and shows the answer in the terminal. So, if we want to have more than a number alone in the screen, what I want is that is stays in the middle of a phrase.
To send phrases to the screen I use the echo command. So let see how it works:
$ echo "There is who | wc -l connected users"
There is who | wc -l connected users
What? Look that! It didn't work. It didn't work indeed, and wasn't because the quotation marks I used, but because I must to execute the who | wc -l command before the echo command. To solve this problem, I need to priorize this second command with the use of backquote, doing this:
$ echo "There is `who | wc -l ` connected users"
There is 8 connected users
To remove those blank spaces before 8 that wc -l produced, we just need to remove quotations. Like this:
$ echo There is `who | wc -l ` connected users
There is 8 connected users
As I said, the quotation marks protect everything that is inside your limits from Shell interpretation. As to the Shell a single blank space as separator is enought, the extra spaces will be changed by an only one after we remove the quotation marks.
Before tell about parentesis use, let me give a little bang about the semi-collon (; ) use. When you are in the Shell, you must always give only one command in each line. To group commands at the same line, we need to separate ir by semi-collon (; ). So:
$ pwd ; cd /etc; pwd; cd -; pwd
/home/mydir
/etc/
/home/mydir
At this example, I listed the name of current directory with pwd command, changed to the /etc directory, again listed the directory name and finally back to the previous directory (cd - ), listing its name. Note that I put the semi-collon (; ) in all possible ways, to show that don't mind if there is blank spaces before or after this character.
Finally, lets see the parentesis case. Take a look at the following case, likelly the previous example:
$ (pwd ; cd /etc ; pwd;)
/home/mydir
/etc/
$ pwd
/home/mydir
- What the heck? I was in /home/mydir , changed to /etc , check that I really was in this directory with the pwd and, when the command group finished, I saw I was at the /etc/mydir , as if I never had out of there!
- Oh crap. It's a kind of magic !
- Are you crazy, man? Of course not! The interesting in the parentesis use is that it calls a new Shell to execute the commands inside them. In this way, we really gone to /etc directory, but when all parentesis commands were executed, the new Shell that was at /etc directory died and we came back to the previous Shell which our current directory as /home/mydir . Do other tests using cd and ls to fix the concepts.
Now we already know these concepts, take a lookt at the following example:
$ mail support << END
> Hi support, today at `date"+%hh:mm"`
> we had that problem again
> that I was reported by
> phone. As you ask
> here it goes a file list from
> the directory:
> `ls -l`
> Best Reggards.
> END
Finally now we have knowledge to show what we had talk about here document. The commands between backquote (` )are priorized and then the Shell will execute then before the mail instruction. When the support received the e-mail, will see that the commands date and ls were executed before the command mail , receiving then the snapshot of environment at the e-mail send moment.
The default Shell primary prompt, as we saw, is the dolar ($ ), but Shell uses a concept of secondary prompt, or command continue, that is sent to the screen when we have a line feed and the instruction didn't end yet. This prompt is represented as a greather than signal (> ), that we see at the beggining of the second line and above.
To end and mess with everything, I need to say that exists a newer, modern, construction that is used as command execution priorization way, like the backquotes. They are the constructions like $(cmd) , where cmd are one or many commands that will be executed with priority in its context.
In this way, the use of backquotes or constructions like $(cmd) have the same target, but for whom works with multi-plataform operational systems, I advice the use of backquotes, as the $(cmd) wasn't ported to all Shell flavoes. Here in the pub, I'll use both ways, with no distiction.
Lets see again the gave example to the backquote in this new point of view:
$ echo There is $(who | grep wc -l) connected users
There is 8 connected users
Take a look at this case:
$ Arqs=ls
$ echo $Arqs
ls
In this example, I did an assignment (= ) and run an instruction. What I wanted was the variable $Arqs had received the output of ls command. As the instructions of a script are interpreted from above to bellow and from left to right, the assignment was done before the execution of ls . To do what we want is needed to priorize the execution of this command in detriment of assignment and this can be done in any of the following ways:
$ Arqs=`ls`
or:
$ Arqs=$(ls)
To finish this topic, let see only one example. Say I'd want to put into the variable $Arqs the long list (ls -l ) of all files started by arq and followed by a single character (? ). I should do:
$ Arqs=$(ls -l arq?)
or:
$ Arqs=`ls -l arq?`
But, look at this:
$ echo $Arqs
-rw-r--r-- 1 jneves jneves 19 May 24 19:41 arq1 -rw-r--r-- 1 jneves jneves 23 May 24 19:43 arq2 -rw-r--r-- 1 jneves jneves 1866 Jan 22 2003 arql
- Wow! Everything messed!
- As I told you man, if you let the Shell "see" the blank spaces, always we have many blank spaces together, they will be changed by only one. To see a cute list, we need to protect the variable from Shell interpretation, like this:
$ echo "$Arqs"
-rw-r--r-- 1 jneves jneves 19 May 24 19:41 arq1
-rw-r--r-- 1 jneves jneves 23 May 24 19:43 arq2
-rw-r--r-- 1 jneves jneves 1866 Jan 22 2003 arql
- Look pall, go training these examples because, when we meet again, I'll explain a series of tipical Shell Programming instructions. Bye ! Oh, only one little thing I was forgotting: in Shell, the hash (# ) is used to do a comment.
$ exit # Ask the check
|