IA Algorithms Enrichment 2017
Activities > Nov 4 > Functions

Functions

So far, you have learned how to create variables, write text and data to the screen, receive input from the user, make decisions based on data held in variables, and loop code based on a condition. And, in truth, this is tha majority of all you really need to write any program you want in C++ (the other useful thing to have is the ability to make arrays, which we will discuss shortly) -- these features are what makes C++ Turing complete, or capable of solving any problem a computer can solve. But the problem is that writing more advanced programs than the ones you have seen so far are incredibly difficult without the use of more higher-level programming language features (i.e. those that create arbitrary constructs and translate them into). So, the first such construct we will look at is a function.

A function in programming is not too different from a function that you might use in algebra. It takes in an input and spits out an output by doing something to that input. However, functions in programming can do a lot more than what we typically use them for in math. To elaborate:

  • A function can take any number of inputs, or no inputs at all
  • The inputs of a function can be of any data type (including string!), and they need not all be the same
  • A function can "return" (output) any data type (including string!), or return nothing at all

In programming, we typically use functions to avoid writing repetitive code, as well as to organize our code. If we find ourselves writing the same code (or very similar code) several times over and over again, we begin setting ourselves up for failure: if one of those instances of that code fails, we have to spend lots of time trying to find it and debug it; if all of the instances fail, we have to spend time debugging and fixing every single one of them, and it can be hard to distinguish one instance from another. Functions are abstractions, or programming language structures that allow us to wrap code in a common interface, allowing quick reuse of existing code. As such, when you use a function to implement repetitive code, and you find that the function is wrong, or needs to be improved, and you fix or improve that function, after which all other code that uses that function will be fixed or improved without you doing anything else. Being able to understand and write code using such abstractions is highly valued when writing large software projects.

In this page:

A quick note

Thus far, almost all of the example code that you have seen has been complete -- it has all the appropriate #include statements, has using namespace std at the top, defines a function main and has return 0 at the end. As we learn more programming constructs, writing out the full program will become excessively long at points, so instead, many of the code snippets here will be fragments of the full source, which demonstrate only the code we're discussing. There will still be full-length example programs where they are necessary to demonstrate a concept, but it is expected that you will be able to understand where the example code belongs in the overall source file.

Function basics

As it turns out, you have already seen what a function looks like:

int main () {
	//some code...
	
	return 0;
}

main is an example of a function. Let's examine a little more closely what all of this means:

int main ()

Here, we are defining a function called main that outputs (returns) an int and takes no input parameters (the empty parentheses). This description is known as a function's signature, or the aspects of a function that uniquely identify it: name, return type, and parameters. The reason that we always define this function is because it is your program's entry point -- the operating system starts your program by calling its main function.

OK, let's look at the other function-related line in the code:

return 0;

This is main's return statement, which is how main tells the computer what its "output" is (or, using math terms, what main "evaluates to"). The 0 is the main's return value, which is the value that main will evaluate to. The type of a function's return value must always match its return type (in this case, int, as in int main), otherwise it will not compile (this is under the same principle that you cannot store values of a differing data type in a variable).

Functions neither need to have the int return type, nor return the value 0 (0 is actually just a message to the operating system to tell it if the program had any execution errors). You can return any data type, such as double, bool, string, or even one of your own, when we learn abstract data types.

Generally speaking, a function looks like this:

returnType functionName (inputParameterList) {
	//code
	return value;
}

Let's expand the idea of a function just by starting with a regular algebra function: f(x) = x2. Let's implement this as a function in C++:

double f (double x) {
	return x * x;
}

Now, we've changed some things from before: instead of returning an int, we now return a double; and instead of taking no parameters, we now have one: x, which is a double. Inside the function, x is just a regular variable with type double, and we can use it just like we would any other variable. Note that we did not have to declare it in the function body -- because we gave the variable a type and a name in the parameter list, the compiler knows all it needs to make a regular variable from that alone.

For those wondering: We wrote x * x because ^ is not exponentiation in C++ (it's actually bitwise XOR). There is a function in the cmath header file (#include <cmath>) called pow that computes arbitrary powers, but it is actually faster in terms of runtime to just multiply a number by itself than it is to call pow(x, 2).

So, now we have written a function for f, but how do we use it? Let's go back to our main:

int main () {
	double a;
	
	cout << "Please input a number: ";
	cin >> a;
	
	cout << a << "^2 = " << f(a) << endl;
	
	return 0;
}

Please input a number: 5
5^2 = 25

This is called a function call. A function call is exactly like you would expect -- the name of the function, followed by the inputs you are passing to it in parentheses. When execution reaches this point, the program calls that function with its input x set to the value of a; f then executes its body until its return statement is reached. When f returns, control returns to main, where the function call evaluates to the return value of f (in this case, 25). cout sees the value 25, and thus prints 25 to the screen.

Now, let's do a little more with functions, something more akin to what you might do in a program:

string guessMyNumber (int a, int b) {
	string msg;
	int sum = a + b;
	
	if (sum >= 52 || sum <= 32)
		msg = "That is very far from my number.";
	else
		msg = "That is close to my number!";
		
	return msg;
}

int main () {
	int num1, num2;
	string result;
	
	cout << "Input two integers to be summed: ";
	cin >> num1 >> num2;
	
	result = guessMyNumber(num1, num2);
	
	cout << result << endl;
	return 0;
}

Input two integers to be summed: 648 8964
That is very far from my number.

Note that we now have multiple input parameters -- we separate these by inserting a comma between each parameter. When you call a function with multiple parameters, order matters -- so, in this example, num1 is passed to guessMyNumber as a, and num2 is passed as b. If we instead wrote guessMyNumber(num2, num1), then a would have the same value as num2, and b would have the same value as num1. If we only called guessMyNumber with one input (say, guessMyNumber(num1)), then the compiler would error out when you try to compile, as guessMyNumber expects two inputs. The same thing would happen if you called it with three or more inputs.

Just like we do in main, we can use any lanuage constructs we have learned so far -- variables, if-else statements, loops -- in other functions as well. Here, we return a string that indicates whether the user input two numbers that sum close to 42.

Note that, just because guessMyNumber is defined before main, main still executes first (recall that main is the entry point for your program). The program will ask the user for two integers, then passes them to guessMyNumber, which sums the integers and uses the result and stores a message in a string named msg, which is then returned back to main. main then stores the message in a variable named result, which is subsequently printed to the screen.

So far, all of the functions we have seen so far have a return statement as the last statement in the function body. However, this isn't a requirement -- a return can appear anywhere in the function body, so long as the following requirements are met:

  • A return statement is always executed
  • All return statements in your code return variables or values that are appropriate for the return type.

As such, in order to meet these requirements (namely the first), if you have a return statement in the middle of your function body, you are going to need multiple return statements in your source. This pattern is often used to detect invalid inputs and quit the function early and gracefully before the function does anything with those inputs. Let's see an example:

/* Converts a lowercase letter to uppercase.
   This function is implemented in the standard <cctype> header */
char toUpper (char letter) {
	//If the letter is not lowercase, then just return the input
	if (!(letter >= 'a' && letter <= 'z')) return letter;
	
	/* ASCII -- how computers represent letters -- gives lowercase letters a
	   higher value than lowercase, so we need to subtract the difference
	   between the two. You can see the mapping here:
	   https://www.cs.cmu.edu/~pattis/15-1XX/common/handouts/ascii.html
	   Internally, C++ treats all characters just as regular numbers
	   (mapped according to the table above), so we can use math to figure out
	   the right difference. */
	return letter - ('a' - 'A');
}

Now, note that if letter is not a lowercase letter, this function will simply return the input letter, but more importantly, it will not execute the second return statement on line 14 -- control will return to the calling function, which will resume execution.

Prototyping

In the last example above, you may have noticed that guessMyNumber was defined before main. Why is that?

C++ compilers are only required to know about things that exist in the source before any particular line. You could define guessMyNumber after main, except it may not compile -- the compiler might think that guessMyNumber doesn't exist because it simply hasn't reached it yet.

That said, we usually want to make main the first actual code in our file, since this is where execution begins, and we don't want to go digging for main in a file full of functions. How do we fix this problem?

The answer: you prototype your functions. Prototyping basically tells the compiler that a function with a particular signature does exist, but it does not actually define the function (that is done later). Let's rewrite the above example, but with a function prototype:

//This is the prototype
string guessMyNumber (int a, int b);

int main () {
	int num1, num2;
	string result;
	
	cout << "Input two integers to be summed: ";
	cin >> num1 >> num2;
	
	result = guessMyNumber(num1, num2);
	
	cout << result << endl;
	return 0;
}

//This is the definition
string guessMyNumber (int a, int b) {
	string msg;
	int sum = a + b;
	
	if (sum >= 52 || sum <= 32)
		msg = "That is very far from my number.";
	else
		msg = "That is close to my number!";
		
	return msg;
}

Now, you'll notice that the prototype and the first line of the function definition are almost identical -- they both have the same format, listing the function's return type, name, and parameters, except the prototype ends with a semicolon (instead of an opening curly brace). This has allowed us to move the definition of guessMyNumber after main.

You can prototype and define any number of functions in the same file; it is recommended that you use the following format when writing source code with functions:

//any #includes here
using namespace std; //if necessary

//All of your function prototypes go here. Do not prototype main.

int main () {
	//code...
	
	return 0;
}

//All of the prototyped functions are defined here.

Why not prototype main? The entire purpose of prototyping was to make main the first function that is defined in the source file. You could prototype main, but it's pointless -- main is defined before anything else, so there is no dependency issue here.

Activity

Write the following programs with one or more function(s). Be sure to prototype any functions other than main that you write.

  • Ask the user for a double input x, and output the result of the following piecewise function:
    • If x < 0, f(x) = 0
    • If x >= 0 && x < 5, f(x) = 2x
    • If x >= 5 && x < 10, f(x) = -x2 + 10x - 15
    • If x >= 10, f(x) = 0
  • Pythagorean theroem: Ask the user for two doubles, and output the length of the hypotenuse of the right triangle formed by using those inputs as leg lengths.
    • The Pythagorean theorem states that a right triangle with leg lengths a, b and hypotenuse length c is represented by the equation a2 + b2 = c2.
    • To compute a square root, put #include <cmath> at the top of your source file and use the sqrt() function.

void functions

So, we have seen that we can write functions to abstract code and return some value that the code is designed to produce. But what if we just want to abstract away code for the sake of having the abstraction, not caring about the return value at all? Enter the void function.

void printName (string name) {
	cout << "Hello, " << name << '!' << endl;
}

In the above example, you can see two things that are different from before:

  • The return type is void.
  • There is no return statement.

The change in the return type should come as no surprise, but what about the return statement? Well, because we don't need to return anything, we can omit it; the compiler assumes that the function will stop executing when it reaches the end of the function body. In a sense, void functions still "return" to their caller (the function that called them), but they do not evaluate to any particular value after being called.

Because void functions don't return anything, none of the following calls to printName would work:

string myName = "John Doe"
int a = printName(myName);	//assigning nothing into an int!
cout << printName(myName) << endl;	//printing out nothing!

But what if we wanted to have an early return? Like we saw with toUpper, we might have some input (or combination of inputs) that are "invalid" that still prevent the void function from executing properly; so we will want to stop its execution early. As it turns out, just because the function is void does not mean that it can't have a return statement; rather, the return statement can't have a return value. So, you might have a function that looks like this:

/* **NOTE**: Ideally, this function should return the average, after which the
             caller would print it, but for the sake of example we are printing
             the average here */
//Take a sum and a count and print the average
void printAvg (double sum, int n) {
	//Can't divide by 0! So just exit
	if (n == 0) return;
	
	cout << "Average: " << (sum/n) << endl;
}

Activity

Ask the user for an int, and output the following triangle:

#
##
###
####
...
######

where the triangle's height and the length of the last line are both the number that the user input. Be sure to write a void function that prints this triangle for a given integer input.

Passing values

So, we have been able to create and call functions using inputs, but how exactly are those inputs passed to the function? Or, we could reframe it as such: what happens if we were to change the parameters in a function? Would we see the change in the caller?

Let's find out:

#include <iostream>
using namespace std;

void doSomething (int a);

int main () {
	int x = 5;
	
	doSomething(x);
	
	cout << x << endl;
	
	return 0;
}

void doSomething (int a) {
	a = 10;
}

5

So, changing the value of the input (a) inside of doSomething had no effect on the value of x in main. This is because function parameters are passed by value, in other words, the computer copies the value of the input parameter to a new variable in the new function, which then operates on that copy, instead of the original. So, in the above example, you could change a any number of times in any number of ways, but x in main will still be 5 when doSomething returns.

But what if we wanted to change x from doSomething? To do this, we tell the compiler to pass by reference; or, instead of copying the value of the variable into a new variable, simply make a new name for the original variable and use that in the function. Let's change our example so it passes by reference:

#include <iostream>
using namespace std;

//Notice the change both here and in the function definition
void doSomething (int &a);

int main () {
	int x = 5;
	
	doSomething(x);
	
	cout << x << endl;
	
	return 0;
}

void doSomething (int &a) {
	a = 10;
}

10

The change is subtle: we added a & before the variable name in the parameter list in both the function prototype and the function definition. Note that the reference does not have the same name as the original variable.

Now, we see that making the change to a in doSomething affects the variable x in main. This is called a side effect.

constness

Let's take a brief break from functions and talk about variables (keep passing by reference fresh in your mind, though, we will need it shortly). Up until now, your variable declarations have been pretty simple and straightforward:

int x;
string myStr;

The above code snippet creates two variables that you can assign a value to, read from, and reassign as much as you would like. However, there may be a time when you don't want your variables to change -- you still want them to hold a value, but that value should never change. As it turns out, there are a lot of modifiers that you can tack on to your variable declarations that modify the behavior of those variables. One such modifier, const, tells the compiler that the value of the variable should never change once set:

#include <iostream>
using namespace std;

int main () {
	const int MY_NUM = 42;
	int i;
	
	cout << "Please input a number: ";
	cin >> i;
	
	cout << MY_NUM + i << endl;
	
	return 0;
}

Please input a number: 7
49

So far, nothing out of the ordinary. Let's try modifying MY_NUM:

#include <iostream>
using namespace std;

int main () {
	const int MY_NUM = 42;
	int i;
	
	cout << "Please input a number: ";
	cin >> i;
	
	++MY_NUM;	//attempting to modify MY_NUM
	
	cout << MY_NUM + i << endl;
	
	return 0;
}

const.cpp:11:4: error: increment of read-only variable `MY_NUM'

As you can see, the compiler now errors out when we try to modify MY_NUM.

Why would we want to keep ourselves from modifying a variable? Well, the general answer to this question is that there are lots of cases where we do not want to modify a variable -- most especially when dealing with functions. It is very easy to modify a variable -- even if by accident. Adding const to a variable makes the compiler enforce this restriction for us, to ensure that no modification takes place.

When would this be the case? Let's see.

string is hiding a secret

For the sake of argument, let's make our own print function. It will just use cout internally, but it will take a single string as an input. I give you the following two implementations of the print function:

#include <iostream>
using namespace std;

void printValue (string str);
void printRef (string &str);

int main () {
	string myStr;
	
	cout << "Input a string: ";
	cin >> myStr;
	
	/* We comment one call out so only one print function is called, but
	   we can quickly switch between the two when needed */
	for (int i = 0; i < (1 << 20); ++i) {	//1 << 20 == 2^20
		printValue(myStr);
		//printRef(myStr);
	}
	
	return 0;
}

void printValue (string str) {
	cout << str << endl;
}

void printRef (string &str) {
	cout << str << endl;
}

Ostensibly, both functions do the same thing: they print the print the string that is passed to the function, followed by a newline, and then return. The only difference is that printValue takes str by value, and printRef takes a value by reference. Note how the for loop calls each function 1000 times.

I have generated a massive input for myStr -- 32768 (215) characters. We will perform the following experiment: run this program as shown above with the massive input and record how long it took, then we will comment out line 15 and uncomment line 16 -- so we now call printRef instead of printValue -- and then run the program on the exact same input and record the time. What do you think will happen?

printValue took 2.154 seconds; printRef took 0.511 seconds

Why such a big difference? Well, the problem lies with how string is implemented under the hood. A string is implemented as a container arround a sequence of characters (we will discuss this sequence in more depth when we discuss arrays). While a single char is quite small (in fact, it is tied for the smallest data type in C++, the other being bool), a very long sequence of them can be quite large (in this specific case, 33 kilobytes). Because we passed by value and not reference, we were copying those 33 kilobytes to the new function -- 220 times. However, when we passed by reference, we weren't making any copies of the string -- instead, we just made new names for the same variable; so, the computer had to do much less work here.

We wouldn't notice this behavior with any of the other data types we have seen thus far -- the numeric data types all have fixed size, and the largest of which (double) is still only 8 bytes, which can be copied by a computer relatively quickly. strings, on the other hand, grow with the size of their contents, and tend to be large by default, so the performance hit for copying a string is much greater.

But remember -- references allow us to modify the original variable. But printRef isn't supposed to do this -- all it should do is print the value, not modify it. This is where const comes into play. We can use const in combination with the reference to tell the compiler that we want to pass by reference, but not modify the variable:

void printRef (const string &str) {
	cout << str << endl;
}

Now, if we tried to make any modifications to str in the code, the compiler would tell us. Now we have the benefits of passing by reference with the guarantee that str will not be modified.

Generally speaking, you use this pattern when dealing with large data types (we will see more of these when we discuss abstract data types), but not with small data types. The reason for this is because the computer still has to perform a copy under the hood (the amount of data copied is 4 bytes on a 32-bit system and 8 bytes on a 64-bit system) to support the reference, and simply copying small data such as an int or a double has the same (or better) performance. However, if you do need a function to modify the original input variables, then you should pass by (non-const) reference.