IA Algorithms Enrichment 2017
Activities > Nov 11 > Arrays

Arrays

In all of the algorithm demos so far, we have looked at algorithms that deal with lists of things -- usually numbers, but they can be applied to lists of different data types as well -- however, until now, we have been unable to actually store such a list in our programs -- we have only been able to declare individual variables that store one piece of data. We could simply declare a bunch of variables and give them all different names, but this is a pain to set up and keep track of.

This is where arrays come in. Arrays are special variables that hold a sequence of variables under one variable name. You can make an array of any valid data type.

In this page:

Array concepts

In order to understand how an array works, let's first look at how arrays are stored in the computer's memory (RAM). You can think of computer memory as a long row of boxes, each with its own unique address. You can use that address to pull data out of those boxes.

Suppose we have the following variable declarations:

int a;
double b;
char c;
string d;

In memory, these variables are stored something like this:

Example memory layout

As you can see, each variable is laid out in memory in the order it was declared, starting at an arbitrary address (100 in this case). If we were to declare more variables, they would be added starting at address 104 (in this example), or before address 100 if we declared them before a.

Now, if we wanted to store a list or array of items, ideally, we would just make one variable refer to multiple boxes in memory. And, in fact, this is basically how arrays work under the hood:

Example array memory layout

Above, we have an array variable named a that consists of four "boxes" in memory, at addresses 100, 101, 102, and 103. Each of these boxes stores an int independently of the others.

But how do we refer to each of these ints? Before, we just used the name of the variable to retrieve its value, but since one variable now refers to four distinct values, this will no longer work. So, in addition to using the varaible name, we also have to specify the value's index, or position in the array:

Array index example

Here, the index is the offset from the first element in the array. Note how this makes the index of the first element of the array 0. Be sure to remember this; it is a common mistake for beginners to start indexing an array from 1.

So, the general idea is this: We need to create an array variable with a preset size (in this example, 4), initialize it (if necessary), and then use it by indexing into the array.

Creating and using an array

Let's see how to do this in code:

//declare a 4-element array
int a [4];

//initialize the first two ints
a[0] = 1;
a[1] = 1;

//Do some math with our array
a[2] = a[1] + a[0];
a[3] = a[2] + a[1];

cout << "First four numbers of the Fibonacci sequence:" << endl;
cout << a[0] << endl;
cout << a[1] << endl;
cout << a[2] << endl;
cout << a[3] << endl;

First four numbers of the Fibonacci sequence:
1
1
2
3

This is fairly straightforward -- it stores 1 into the first two elements, and then computes the next two Fibonacci numbers.

One thing to note here is the initial declaration is the size argument -- we told the compiler to make an int array of size 4, although the highest index is 3 for this array -- this is because the array declaration expects the total size of the array, not the highest index. Another thing to note here is that the size argument must be computable at compile time. In other words, we cannot use a variable to set this size -- it must be a literal integer value.

However, what if we wanted to do this with the first twenty numbers of the Fibonacci sequence, instead of the first four? Doing it this way would be very time consuming and it would make the source look bloated. Unlike the size argument in the declaration, however, we can use a variable to index into an array that has already been created. We can use this fact in tandem with a for loop to visit each index in the array, in order:

//declare a 20-element array
int fib [20];

//Initialize the first two elements
a[0] = 1;
a[1] = 1;

/* Iterate and calculate the Fibonacci sequence through 20
   We start at 2 since we have hard-coded values for indices 0 and 1 */
for (int i = 2; i < 20; ++i)
	a[i] = a[i - 1] + a[i - 2];

cout << "Fibonacci sequence through the 20th number:" << endl;

//We use the same tactic as above, but starting from 0 so as to iterate through the whole array
for (int i = 0; i < 20; ++i)
	cout << a[i] << endl;

Fibonacci sequence through the 20th number:
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765

As you can see above, we set the for loop's control variable such that on each iteration, we use it as the index for the array. We can also use addition and subtraction to get nearby elements in the array.

range-for

C++11 introduced another way of looping through an array called the range-for loop. Instead of creating a control variable and indexing into the array by hand, you simply tell the compiler that you want to iterate over every element in the array; the "control variable" in a range-for loop becomes the element in the array at a particular index. Here's how you use it:

//declare a 20-element array
int fib [20];

//Initialize the first two elements
a[0] = 1;
a[1] = 1;

//Note that we cannot change this loop to be a range-for loop
for (int i = 2; i < 20; ++i)
	a[i] = a[i - 1] + a[i - 2];

cout << "Fibonacci sequence through the 20th number:" << endl;

//Here's the range-for loop:
for (int val : a)
	cout << val << endl;

This syntax reads as "for val in a, do ...". Effectively, the compiler will visit every element of a in order and copy that value to val for a single loop iteration.

Note how we couldn't change the first for loop to be a range-for loop, because it starts in the middle of the array, rather than the beginning. When using the range-for syntax, the compiler assumes that you are iterating through the whole array. Another thing you must consider when writing a range-for loop is that val is a copy -- any changes you make to val will not carry over to the corresponding value in the array. If you do need this functionality, you can use something remarkably similar to pass-by-reference:

int nums [20];

for (int &num : nums)
	num = 0;
	
//Now, nums will be initialized with all 0s

The concept is the same here: a range-for loop works much like a function call, so we get a reference to each array element if we want to modify it. Otherwise, if the array is of a small data type, then we simply get each element by value; if the array is of a large data type, then we get a const reference to each element.

Going out of bounds

So far, we have seen well-formed examples of how to use arrays properly. But what if we (intentionally or unintentionally) use them improperly. In other words, what happens when we index out of bounds?

int nums [20];

//NOTE! Index 20 is out of bounds, yet i will take on values in [0, 20]
for (int i = 0; i <= 20; ++i) {
	nums[i] = i;
	cout << i << endl;
}

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
???

Firstly, one curious thing: it compiles! The compiler doesn't notice that we're indexing out of the array's bounds (in fact, we could explicitly use an out-of-bounds index and it would still compile).

But the big takeaway here is that we cannot predictably say what will happen when i reaches a value of 20. What happens here is called undefined behavior, which is fairly descriptive of what happens: basically anything. Your program could crash, your computer could blue screen/panic, or, worst of all, it could even get the right answer. We simply cannot say that any of those things will happen, only that they might.

The important thing to take away here is that the computer won't check that you are in bounds for you; you are going to have to make sure that your code never goes out of bounds. The most common error is not setting up the for loop properly, typically by using the <= operator instead of < in the loop condition. My advice here is to always use the range-for loop syntax whenever possible, since this takes away the difficulty of figuring out the proper loop condition.

Using arrays with functions

So, we see that arrays are themselves variables, but we can't simply pass them to a function like we would any other variable. That is because arrays have a slightly different data type: they are array of int/double/etc. as opposed to a regular int/ double/etc. When we declare an array, we specify the "array-of" type by providing the array length. However, we couldn't do this again in a function call, because the array has already been allocated and been assigned a length!

Because of this, array parameters to a function have a slightly different syntax:

void someFunction (int myArray[], ...)

// ** OR **

void someFunction (int *myArray, ...)

Both of the above function signatures are identical. Note that the first form does NOT have a length argument between the square brackets (or anything, for that matter). As mentioned before, it is infeasible for the compiler to guarantee that an array input to a function will always have a specified length; so all functions that take an array as an argument are required to take arrays of arbitrary length.

The other interesting thing to note here is that, unlike a normal variable, when you pass an array to a function, you are actually getting the original array (through a mechanism very similar to pass-by-reference). Because of this, you need to be careful if you decide to modify the array -- you could end up unintentionally changing the original array. If this is a concern, be sure to add const before the declaration to ensure you aren't modifying the array:

void someFunction (const int *myArray)

Note: the second format introduced here takes advantage of the fact that arrays often convert themselves into pointers to their first element. In truth, the first format is interpreted by the compiler as the second format, so these two are truly identical. This is not a technicality that you need to worry about right now -- either way you choose, you can still use the array like you would in the function in which you declared it.

However, because we can't specify the length of the array as a part of the parameter itself, how do we know how long the array is? Especially considering that our function must be able to handle an array of any possible length, this functionality is useless to us unless we can use the array properly. To avoid this, you should (almost) always include another parameter that specifies the length:

void someFunction (int *myArray, int size) {
//...
}

By doing so, you require whoever is calling your function (it may not be you!) to provide a value for the size of the array, so your function will now always know what the length of the array is. If someone passes a bad value for the size of the array, this is not something you need to worry about; it is common to see libraries (especially the standard library) fall back to undefined behavior if the user provides an invalid input, so it is OK for you to do the same.

There is one caveat, though: range-for loops will not work on the array in this function. This is because the C++ compiler uses the underlying array type to determine how many iterations it has to make and write the loop control logic for you. When you pass an array to a function, it actually gets converted to a pointer (which we will discuss later), which has no intrinsic length value, so the compiler no longer knows how to write the range-for control logic.

Activity

Write a program that computes a modified version of the Fibonacci sequence, which is as follows:

mi = mi-1 + mi-3; m0 = m1 = m2 = 1

In other words, each element in the sequence is the sum of the preceding element and the element 3 positions beforehand. The first 3 numbers in the sequence are all 1.

vector: the resizeable array

The one unfortunate thing about C++ arrays is that they are of constant size: once you've declared them, you cannot change their size. This puts a natural limit on their practicality: there are a lot of problems that you might be presented with that give you a list of arbitrary length, and you need to be able to store these lists without being given the length ahead of time. How would we handle these?

The answer lies in an object provided by the C++ standard library: vector. A vector is essentially a resizeable array. To use a vector, though, we will need to add an include statement at the top of our source file:

#include <vector>

As with iostream, you will need to have the using namespace std; statement after your #include directives. Once you've included the vector header, you can declare a vector variable like so:

#include <vector>
using namespace std;

int main () {
	vector<int> myInts;
	
	//...
	
	return 0;
}

Note how the declaration syntax is quite different from that of a native array: our data type is vector<int>, and there is no initial size. You can read this data type as "vector of int". vectors are much like strings in that they run some initialization steps when they are first created. As declared above, the vector would create a list of length 0. This might seem useless at first, but we will see in a moment that this is actually helpful for building lists of arbitrary length.

You can create a vector with a nonzero initial size, like so:

vector<int> myInts (5);

This will create a vector with an initial size of 5. You can use a vector much like you would a regular array; the indexing syntax is the same:

vector<int> myInts (5);

myInts[0] = 1;
myInts[1] = 1;

for (int i = 2; i < myInts.size(); ++i) {
	myInts[i] = myInts[i - 1] + myInts[i - 2];
}

Notice the myInts.size() in the for loop condition. Because vectors are resizable, they also have a way to determine how many elements they contain at any time; this is what the size() function of vector is for; it will return the current size of the vector. Note that size() is associated with the vector; you will need to write vectorName.size() to get the size of a vector named vectorName. (We will talk more about what this syntax means when we talk about abstract data types).

But the problem here is that if we try to index out of bounds, we still end up with undefined behavior! So, we need another way to tell the vector that it needs to resize:

vector<int> myInts (5);

myInts[0] = 1;
myInts[1] = 1;

for (int i = 2; i < myInts.size(); ++i) {
	myInts[i] = myInts[i - 1] + myInts[i - 2];
}

//Resize the vector so it has 20 elements
myInts.resize(20);

//myInts now has 20 elements
for (int i = 5; i < myInts.size(); ++i) {
	myInts[i] = myInts[i - 1] + myInts[i - 2];
}

So, we have a function called resize() that allows us to change the size of the vector to. Like the size() function, resize() is associated with the vector that it is operating on, so you will need to prefix the function name and a period to the function call in order for it to work.

But this method is still suboptimal -- what if we do not know the ultimate size of the vector? Let's return to the declaration that created an empty vector. There is another function called push_back that increases the size of the vector and adds its argument to the end:

vector<int> myInts;

//myInts.size() == 0

myInts.push_back(1);

//myInts.size() == 1

myInts.push_back(1);

//myInts.size() == 2

for (int i = 2; i < 20; ++i) {
	myInts.push_back(myInts[i - 1] + myInts[i - 2]);
}

//myInts.size() == 20

//You can use range-for with a vector too
for (int val : myInts) {
	cout << val << end;
}

1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765

So, now we are building the vector element-by-element: we add one element to it at a time, increasing its size by one each time we call push_back. In the first for loop, i will always be the current size of the vector, so indices i - 1 and i - 2 will always be valid, so undefined behavior will not occur.

The trick here is that if you use push_back, you should always start out with an empty vector. The reason is that push_back adds its values to the end of the current vector, so if you start with a vector with a nonzero size, you will start adding elements after the end of the initial ending index, thus having uninitialized values at the beginning of your vector. Be sure that if you are using push_back, you always start with a vector of size 0.

Using vectors with functions

Although vectors are conceptually similar to arrays, due to differences in how they are implemented in C++, the syntax you have to use in certain situations is different. The best example of this is passing a vector in a function call. Unlike arrays, to pass a vector, you can use essentially the same syntax as you would when you declare a vector:

//Function to determine if a vector is sorted
bool isSorted (vector<int> vec) {
	for (int i = 0; i < (vec.size() - 1); ++i) {
		if (!(vec[i] < vec[i + 1])) return false;
	}
	
	return true;
}

Unlike an array, vectors actually passed by value (copied into new vector variables), so passing them as above can actually be rather slow. Like a string, your code will be much faster if you pass by const-reference:

//Function to determine if a vector is sorted
bool isSorted (const vector<int> &vec) {
	for (int i = 0; i < (vec.size() - 1); ++i) {
		if (!(vec[i] < vec[i + 1])) return false;
	}
	
	return true;
}

As before, if you need to modify a vector, then remove the const and simply pass it by reference.

The added benefit of using vectors with functions is that the range-for loop will always work, even if the vector was passed into the function:

int vectorSum (const vector<int> &vec) {
	int sum = 0;
	
	for (int el : vec) sum += el;
	
	return sum;
}

Activity

Given a arbitrary-length list of positive integers terminated by a negative integer, find the mode. If there is more than one mode, just output one of the modes. Do not consider the terminal negative integer in the calculation.