Saturday, May 22, 2010

Introduction To Programming - What is Code?

In the most technical sense computer code (or source code) is a series of instructions that can be translated through a process that varies from trivial to complicated* into machine "language".

While technically correct, the previous definition is pretty boring. It is also very far from what actual code is like. It certainly doesn't help us understand what it's for and why it works the way it works - which I think are the more interesting questions.

Before I try to talk more about what real world code is like and some more interesting technical aspects, I would like to say that there are many different programming languages and there are many things that can be considered computer code, from the structural language called HTML that describes the page you are reading to the assembly language that is used to write processor instructions.

It's hard to cover the huge variety of what can be considered "code" in a single, broad definition so instead of doing that I will try to explain how code is used in some of the common programming languages and the why's, how's and what's of this code.

Why do we need it?

Computer code is a tool we use to describe ideas, concepts and processes in a kind of functional way that can be converted into machine language.

We need this tool because there is a dividing line between what can be easily and efficiently done in hardware - the physical components of a computer, primarily in the processor - and what can be efficiently done in software. Code begins pretty much where the hardware leaves off.

Processors are good at things like adding one number to another number, moving bits from one place to another, multiplying, subtracting and going through these instructions in sequence. They are also good at performing logical decisions like "if this number is a zero move to the next instruction, if it's not zero jump 5 instructions"

Machine language is these basic instructions, there is a language called Assembly which exists directly "above" machine code. It allows giving instructions in a way that translates almost immediately into these instructions. It is rarely used for complicated things and is considered one of the toughest and most respected specializations in computer programming. Its proximity to the actual hardware gives the programmer the most control of any language over what goes on in the processor. It can allow for some of the most efficient code that can be written.

This control comes at a cost - expressing complex ideas is very hard, writing even the simplest programs requires a lot of thought and training and even with training reading Assembly is a tedious job. A good knowledge of Assembly is one of the essential tools for hackers.

Assembly is rarely used in software development. languages like Java, c and c++ are far more popular as they allow describing more complicated structures and expressing complicated ideas in simpler ways then assembly.

A 2 lines of code in java, like this one counting to 100 and printing each number:

for (int i=0; i<100; i++)
    system.out.println("i = "+i);

Will translate into hundreds of instructions in assembly.

When programmers speak of computer code, they rarely talk about Assembly.
The thing is, using other more "high level" languages like Java or c++ requires some kind of translation into this computer code as this is the only thing that can run on the physical hardware. This is why a process called "compilation" was introduces, which translates higher level languages into machine code. This abstraction allows us to create far more complicated software in exchange for giving up control of the specific instructions given to the hardware.

Higher and more specialized languages exist that are even further away from the processor.
The common thing to all programming languages is that they exchange some level of control over what really happens on the machine for clarity and ease of use.

The building blocks

We discussed the why, so now let's talk about what code is made out of.

The goal of any higher level programming language is to provide the syntax or "semantics" to express complicated ideas in a coherent, manageable and extendable way.

There are some concepts that are shared between most languages, these are:
variables, conditionals, loops, procedures (or methods/functions) and mathematical operations. Another important common concept is the Comment. Slightly less common are classes.

To better understand what code is let's look at the building blocks.

Variables - variables are a way of giving a name to some of the computer's memory in a way we can easily use in the code.
For example, we can define:

int myVariable = 100;

now we can refer to "myVariable" in other places in the code, we could change its value, add to it, subtract from it or compare it to other values.

Conditionals - conditionals are a way of asking for one thing or another to happen, depending on some logical condition.
For example this rule:

if (myVariable<100)
    system.out.println("My variable is less than 100");
    system.out.println("My variable is equal to or larger than 100");

States that the line "My variable is less than 100" should be printed if the value within "myVariable" is less than 100, and the line "My variable is equal to or larger than 100" otherwise.

Loops - loops are a tool we use when we want to perform repeated operations. One of the useful properties of computers is that they can do certain things many times and very fast. Loops are a way to facilitate this property. A loop generally consists of one or more instructions we want to perform and some condition that will make the loop stop.
For example:

for (int i=0; i<100; i++)
    system.out.println("i = "+i);

This code says something like this:
"For as long as the variable "i" - which starts out at zero - is smaller then 100, increase "i" and perform the next instruction"
put another way - do the next line 100 times.
This next line says write to the screen the text "i = " followed by the value of i.
the output would look like :
i = 1
i = 2
...(and so on)

Procedures/Functions/Methods - Procedures are one of the mechanisms that allow us to create new tools and break apart complicated logic into smaller, reusable and more manageable parts.

It is an essential mechanism when we want to collaborate with other programmers and logically separate different parts of our program.

One example of a function is the "system.out.println("some text");" that i was using in other examples.
In Java, this call hides all the complicated logic required to tell the computer to print out the text into a very simple, easy to use line of code.

Because someone else did the work of writing this method for us we don't need to worry about how exactly it was done or how to do it - we only need to know what it's supposed to do, and how to use it - which is far simpler then writing it in the first place.

The Comment - a comment is just as it sounds, commenting about the code that was written.
Code files are no different than other text files - that is, they are simply text. The compiler takes our text and converts it into machine code. By marking certain lines as comments we can tell the compiler to ignore these lines. This allows us to communicate with the reader and provide more insight into the code.

Comments are a vital component of programming languages and are one of the few tools that are dedicated to communicating our intentions to other programmers.

A comment in code looks very much like other text, but it contains a marker that tells the compiler to ignore
//This is a comment, it can contain any text we want and many special characters and we can use it
//To help other programmers understand the code we are writing
//For example
//The following code computes 2 * 2 and sets the result in myVariable
//Assign the value 2 to myVariable
int myVariable = 2;
//Multiply myVariable by itself, and set it to myVariable
myVariable = myVariable*myVariable;

Classes - These are more complicated elements and harder to explain without more technical background. In general terms classes are a way of creating useful metaphors and using them in code. They are the foundation of what is called "Object Oriented Programming" which is the ruling paradigm in software development for many years.

What is code?

Computer code and its components serve a dual function:
The first is to help us to describe our ideas in terms that allow us and other programmers to read understand the ideas behind the code - the purpose of what we want to achieve with the code.
The second is that the code must translate into machine code that can perform the function for which it was written.

This is duality is important to understand - all working code essentially complies with the second requirement, it will run and perform its function. The first and more important requirement - to provide a coherent picture of what the writer of the code intended is an often overlooked** and vitally important role of code. This requirement to be clear and coherent is central to the job of a programmer. It is the hardest to achieve is rarely taught.

The compiler imposes certain rules, for example it requires that lines be terminated with a semicolon. It requires that a parenthesis that opened must also be closed. It also requires the user of specific "keywords" to be used to express specific ideas. A conditional must always take a certain form (as demonstrated above). Many compilers are "case sensitive", this means that MyVariable is a different variable then myVariable simply because one uses a different capitalization on the "m".

Despite all of these constraints, the compiler doesn't particularly care about the "shape" of things or about the names of things. that is - it doesn't care what names you use, as long as you follow the rules of syntax.
In this sense a compiler is like a strangely strict teacher that will accept any answer as long as it is spelled and punctuated correctly.

The ability to name things, comment on them and use more coherent and reusable structures in code exists so that programmers can impart meaning to the code that would not exist otherwise. This meaning exists only in the minds of the programmers who read the code - the computer does not "understand" the intentions of the programmer. It does not even "see" the original code as written by the programmer.

The vast majority of the lines of code written and much of the effort in writing them is dedicated conveying an idea. Of course, it's important that they actually perform the function for which they were written, but it's at least just as important and often far more important that they do so in a way that describes ideas simply, coherently and on some rare occasions - beautifully.

* - The complexity of the translation depends primarily on the language, compiler and hardware involved.

** - When I was in high school my teacher would often complain that my code was completely incoherent. I would argue that it works. Only after trying in my second year of high school to re-read some code I wrote in the previous year that I really understood why writing well formatted, well commented and coherent code is so vitally important. Writing incoherent code means that to use it or change it you must first spend a long amount of time understanding it.

No comments: