Monday, May 10, 2010

PROGRAMMING LANGUAGES


The earlier first section was Binary numbers. Some people said that I ought to reduce the math in this tutorial and I felt that perhaps I should start off differently. And so I've altered the sequence of sections in this first unit.
Recap of important computer concepts:
When we talk of computers, we refer to ‘hardware’ and ‘software’. All the physical components like monitor, keyboard, mouse, modem, printer etc. fall under the hardware category. We need computers to do something useful for us but computers aren’t smart enough to know what we want them to do (at least not yet!). We need to explicitly tell the computer our requirements in the form of instructions. A set of instructions for doing something useful forms a ‘program’. And we refer to these programs as software (i.e. unlike hardware, software is something that we can’t touch and feel). 
Major hardware parts in a computer: 
1.       Input and Output devices: Computer can receive input from input devices (like a keyboard, mouse or a scanner). Output devices are used by the computer to send out information to the user; example: monitor, printer, plotter etc.
2.       Arithmetic and Logical Unit (ALU): As the name implies this unit is responsible for all arithmetic calculations. It is part of the central processing unit.
3.       Memory: When our computer wants to work on some information, it will require some place where it can store data; where it can store the instructions; where it can store intermediate results of operations etc. Memory serves this purpose and there are different types of memory (for different requirements):
            Primary memory: This memory can be quickly accessed by the computer (in technical jargon we say that this memory has low access time; i.e. time taken to access data in primary memory is less). Generally, instructions and data needed by the computer immediately for processing are placed in primary memory. Data in primary memory is not permanent. Each time we restart the computer, the memory would get refreshed.
            Secondary memory/ storage: This is the place where we store all our data. It’s a long-term storage device (like a hard-disk). Access times are higher but secondary memory is cheaper than primary memory.  

CPU (Central Processing Unit):
            We’ve seen computer parts to get input, store information, display output and also to perform calculations. But there needs to be someone at a higher level to control the individual units; someone to decide when to capture the input, when to send output information to the monitor etc. The CPU takes up this responsibility and is termed the ‘brain’ of the computer. The CPU is also called the processor (a microprocessor chip).

The language computers understand:
We have a lot of languages in this world but all computers can understand only binary language. The vocabulary is very simple and small; it consists of only 2 things: 0 and 1. This is all that a computer understands. (Humans are comfortable with the decimal system - consisting of the numbers 0 to 9). We’ll look at the binary system later in this chapter.  
Remember: A computer finally needs everything in binary language (instructions and data).
Programming Languages:
Programs are written to tell the computer to do something useful for us. It might be as simple a task as adding two numbers or as complex as transferring data between 2 computers in a network.
There are several reasons why we need programs. Imagine searching through a stack of papers to search for some telephone bill. Imagine a company maintaining its accounts (expenses and income on a day-to-day basis) - if this was done using the traditional file and paper method, someone would have to keep entering all the details in sheets of paper and then stack them into files. If, in future, someone wants to find out when a particular product was sold they would have to manually read through each and every paper in each and every file. Computers can be programmed to perform such tasks and they will do it faster. Basically, the computer is very good in performing repetitive tasks. One point to note: man created computers and man programs computers to perform certain tasks. A program is a set of instructions that can be executed by the computer. The computer will obediently follow whatever the programmer instructs it to do (as long as it understands the instructions). The downside to this is that the computer does not have any special intelligence; it is only as intelligent as it is programmed to be. For instance, a robot can be programmed to walk. If it is not programmed to detect obstacles, the robot will simply bang into obstacles because it does not have the intelligence to recognize and avoid obstacles. If the programmer provides such provisions, then the robot will be able to handle such scenarios.
With many software programs already existing, you may wonder why do we need more software; why not just buy and use what already exists? Software is usually not custom designed for the requirements of a particular company and you may not even have software for your own special requirement. For example, dentists never used to have any software specifically for their use but now you can see a range of dental software (the dentist can maintain details of each of his patients’ tooth in the computer). Programs needn’t be as complicated as dental software; you may want to have a little software to calculate your CGPA or to calculate your rank in class. To perform such tasks a programmer has to write a program (or a set of instructions). These instructions are written in a specific programming language and the programmer can chose to write the instruction in any one of the available languages.
Evolution of Programming Languages:
Any programming language can be categorized into one of the following categories:
·         High Level
·         Middle Level
·         Low Level (machine and assembly languages)
Middle level languages are sometimes clubbed together under the category of high-level languages.

Machine Level Languages:
A computer has a processor and if we want the computer to do something then we need to direct instructions at the processor. The problem is that computers can understand only binary language (i.e. the language which comprises of only 1s and 0s). All instructions to the processor should be in binary and so a programmer can write programs using a series of 1s and 0s. Every processor will understand only some instructions (this depends on how the processor was designed). An instruction to increment a variable might be:
110110100011101
Imagine having 100 such instructions in a program! The advantage is that no conversion (or translation) is needed because the computer can understand the 1s and 0s. This is called machine level language and this was used at the time computers came into existence.
The drawbacks are quite clear. By using machine language writing programs is very tedious and also prone to error (even if one bit is wrong, the entire program functionality could get altered). Trying to identify the error will also be a very tedious job (imagine reading a series of 1’s and 0’s trying to locate the mistake).
Assembly Languages:
Since it is very difficult for us to write instructions in binary language, assembly languages were developed. Instead of using a set of 1s and 0s for a particular instruction, the programmer could use some short abbreviations (called ‘mnemonics’) to write the program (for example: ADD is a mnemonic to add two numbers). Every processor has an instruction set that describes the various commands that the processor can understand. A normal instruction set will have instructions like JMP (jump), ADD (for addition) and so on. A programmer will write the program using these instructions and an assembler will convert the mnemonics into binary form so that the processor can understand it. The problem is that assembly level languages are specific to each processor. For example the 8085 (the simplest microprocessor) has an instruction set different from the 8086 microprocessor. Thus you would have to rewrite the program for each microprocessor. Another problem with assembly level programming is that the programmer needs to know details about the processor’s architecture such as the internal registers where values can be stored, how many internal registers are available for use etc.
Low level languages are very close to the hardware but difficult for writing programs.
High Level Languages:
Writing larger programs in assembly language is quite difficult and hence, high-level languages were developed (like BASIC). These languages were not close to the actual computer hardware but were very close to the English language. They tried to simplify the process of programming. In these languages the programmer needn’t worry about internal registers and processor architecture; instructions could be typed in almost normal English. The English-like instructions had to be converted to machine language using a compiler. One simple example of an English like command in COBOL is:
ADD incentive TO basic GIVING salary.
The instruction is quite self-explanatory (Add ‘incentive’ and ‘basic’ and store the result in ‘salary’). COBOL is also a high-level language.
                As programs grew larger (for example: maintaining a record of all employees, accounts maintenance etc.), even high level languages had certain drawbacks. These languages are also referred to as unstructured form of programming. The reason they are unstructured is because when a large program is written, it becomes very difficult to analyze the program at a later date (by someone else or even by the original programmer himself). There are many reasons for this; one is the use of statements like GOTO in high-level languages. In any program some tasks will either be performed repetitively or the programmer might want the program to execute a different set of instructions if some condition is satisfied. For example in a division program, in case the denominator is 0 then the program should not divide the two numbers; instead it should display an error message. Only if the denominator is not 0 should the program divide the two numbers. This kind of programming is referred to as program flow control. In a larger program there could be various other possibilities and all this was dealt with by using the GOTO statement. If you’ve used BASIC you will be aware that for every instruction that you type, you have to specify a line number (example 10,20,30 and so on). GOTO will be followed by some line number (indicating the instruction that has to be executed next. Ex: 1020 GOTO 50 means that now you want the program flow to go to line number 50).
                The problem with this is that in a large program it will become very difficult for anyone to understand the program logic. Imagine reading through 100 lines and then finding one GOTO statement to the 200th line. Then on the 205th line there could be another GOTO statement directing you to the 150th line. This should make it clear as to what is meant by the term ‘unstructured’. This is a major problem with high level languages. Another fact is that high level languages are very far away from the actual hardware. Examples of high-level languages are BASIC, Fortran (Formula Translation), COBOL (Common Business Oriented Language) and LIST (List processing).
Thus to take advantage of high level and low level languages, middle level languages like C and C++ were developed. They were easy to use and tried to retain the advantages of low level languages (i.e. being closer to the architecture).