Arrays
In all of the algorithm demos so far, we have looked at algorithms that deal with lists of things -- usually numbers, but they can be applied to lists of different data types as well -- however, until now, we have been unable to actually store such a list in our programs -- we have only been able to declare individual variables that store one piece of data. We could simply declare a bunch of variables and give them all different names, but this is a pain to set up and keep track of.
This is where arrays come in. Arrays are special variables that hold a sequence of variables under one variable name. You can make an array of any valid data type.
In this page:
Array concepts
In order to understand how an array works, let's first look at how arrays are stored in the computer's memory (RAM). You can think of computer memory as a long row of boxes, each with its own unique address. You can use that address to pull data out of those boxes.
Suppose we have the following variable declarations:
int a; double b; char c; string d;
In memory, these variables are stored something like this:
As you can see, each variable is laid out in memory in the order it was declared, starting at an arbitrary address (100
in this case).
If we were to declare more variables, they would be added starting at address 104
(in this example), or before address 100
if we declared them before a
.
Now, if we wanted to store a list or array of items, ideally, we would just make one variable refer to multiple boxes in memory. And, in fact, this is basically how arrays work under the hood:
Above, we have an array variable named a
that consists of four "boxes" in memory, at addresses 100
, 101
,
102
, and 103
. Each of these boxes stores an int
independently of the others.
But how do we refer to each of these int
s? Before, we just used the name of the variable to retrieve its value, but since one variable now refers
to four distinct values, this will no longer work. So, in addition to using the varaible name, we also have to specify the value's index,
or position in the array:
Here, the index is the offset from the first element in the array.
Note how this makes the index of the first element of the array 0
.
Be sure to remember this; it is a common mistake for beginners to start indexing an array from 1
.
So, the general idea is this: We need to create an array variable with a preset size (in this example, 4
),
initialize it (if necessary), and then use it by indexing into the array.
Creating and using an array
Let's see how to do this in code:
//declare a 4-element array int a [4]; //initialize the first two ints a[0] = 1; a[1] = 1; //Do some math with our array a[2] = a[1] + a[0]; a[3] = a[2] + a[1]; cout << "First four numbers of the Fibonacci sequence:" << endl; cout << a[0] << endl; cout << a[1] << endl; cout << a[2] << endl; cout << a[3] << endl;
First four numbers of the Fibonacci sequence:
1
1
2
3
This is fairly straightforward -- it stores 1
into the first two elements, and then computes the next two Fibonacci numbers.
One thing to note here is the initial declaration is the size argument -- we told the compiler to make an int
array of size 4,
although the highest index is 3
for this array -- this is because the array declaration expects the total size of the array, not
the highest index. Another thing to note here is that the size argument must be computable at compile time.
In other words, we cannot use a variable to set this size -- it must be a literal integer value.
However, what if we wanted to do this with the first twenty numbers of the Fibonacci sequence,
instead of the first four? Doing it this way would be very time consuming and it would make the source look bloated.
Unlike the size argument in the declaration, however, we can use a variable to index into an array
that has already been created. We can use this fact in tandem with a for
loop to visit each index in the array, in order:
//declare a 20-element array int fib [20]; //Initialize the first two elements a[0] = 1; a[1] = 1; /* Iterate and calculate the Fibonacci sequence through 20 We start at 2 since we have hard-coded values for indices 0 and 1 */ for (int i = 2; i < 20; ++i) a[i] = a[i - 1] + a[i - 2]; cout << "Fibonacci sequence through the 20th number:" << endl; //We use the same tactic as above, but starting from 0 so as to iterate through the whole array for (int i = 0; i < 20; ++i) cout << a[i] << endl;
Fibonacci sequence through the 20th number:
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
As you can see above, we set the for
loop's control variable such that on each iteration,
we use it as the index for the array. We can also use addition and subtraction to get nearby elements
in the array.
range-for
C++11 introduced another way of looping through an array called the range-for
loop.
Instead of creating a control variable and indexing into the array by hand, you simply tell the compiler that you want
to iterate over every element in the array; the "control variable" in a range-for loop becomes the element in the array
at a particular index. Here's how you use it:
//declare a 20-element array int fib [20]; //Initialize the first two elements a[0] = 1; a[1] = 1; //Note that we cannot change this loop to be a range-for loop for (int i = 2; i < 20; ++i) a[i] = a[i - 1] + a[i - 2]; cout << "Fibonacci sequence through the 20th number:" << endl; //Here's the range-for loop: for (int val : a) cout << val << endl;
This syntax reads as "for val in a, do ...".
Effectively, the compiler will visit every element of a
in order
and copy that value to val
for a single loop iteration.
Note how we couldn't change the first for
loop to be a range-for
loop, because it starts
in the middle of the array, rather than the beginning. When using the range-for
syntax, the compiler
assumes that you are iterating through the whole array. Another thing you must consider when writing a range-for
loop is that val
is a copy -- any changes you make to val
will not
carry over to the corresponding value in the array. If you do need this functionality, you can use something remarkably similar to
pass-by-reference:
int nums [20]; for (int &num : nums) num = 0; //Now, nums will be initialized with all 0s
The concept is the same here: a range-for
loop works much like a function call, so we get a reference to each array element if we want to modify it.
Otherwise, if the array is of a small data type, then we simply get each element by value; if the array is of a large data type, then we get a const
reference to each element.
Going out of bounds
So far, we have seen well-formed examples of how to use arrays properly. But what if we (intentionally or unintentionally) use them improperly. In other words, what happens when we index out of bounds?
int nums [20]; //NOTE! Index 20 is out of bounds, yet i will take on values in [0, 20] for (int i = 0; i <= 20; ++i) { nums[i] = i; cout << i << endl; }
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
???
Firstly, one curious thing: it compiles! The compiler doesn't notice that we're indexing out of the array's bounds (in fact, we could explicitly use an out-of-bounds index and it would still compile).
But the big takeaway here is that we cannot predictably say what will happen when i
reaches a value of 20
.
What happens here is called undefined behavior, which is fairly descriptive of what happens: basically anything.
Your program could crash, your computer could blue screen/panic, or, worst of all, it could even get the right answer.
We simply cannot say that any of those things will happen, only that they might.
The important thing to take away here is that the computer won't check that you are in bounds for you; you are going to have to make sure that your code
never goes out of bounds. The most common error is not setting up the for
loop properly, typically by using the <=
operator
instead of <
in the loop condition. My advice here is to always use the range-for
loop syntax whenever possible,
since this takes away the difficulty of figuring out the proper loop condition.
Using arrays with functions
So, we see that arrays are themselves variables, but we can't simply pass them to a function like we would any other variable. That is because arrays have a slightly
different data type: they are array of int
/double
/etc. as opposed to a regular int
/
double
/etc. When we declare an array, we specify the "array-of" type by providing the array length. However, we couldn't do this again in a function call,
because the array has already been allocated and been assigned a length!
Because of this, array parameters to a function have a slightly different syntax:
void someFunction (int myArray[], ...) // ** OR ** void someFunction (int *myArray, ...)
Both of the above function signatures are identical. Note that the first form does NOT have a length argument between the square brackets (or anything, for that matter). As mentioned before, it is infeasible for the compiler to guarantee that an array input to a function will always have a specified length; so all functions that take an array as an argument are required to take arrays of arbitrary length.
The other interesting thing to note here is that, unlike a normal variable, when you pass an array to a function, you are actually getting the original array
(through a mechanism very similar to pass-by-reference). Because of this, you need to be careful if you decide to modify the array -- you could end up unintentionally changing
the original array. If this is a concern, be sure to add const
before the declaration to ensure you aren't modifying the array:
void someFunction (const int *myArray)
Note: the second format introduced here takes advantage of the fact that arrays often convert themselves into pointers to their first element. In truth, the first format is interpreted by the compiler as the second format, so these two are truly identical. This is not a technicality that you need to worry about right now -- either way you choose, you can still use the array like you would in the function in which you declared it.
However, because we can't specify the length of the array as a part of the parameter itself, how do we know how long the array is? Especially considering that our function must be able to handle an array of any possible length, this functionality is useless to us unless we can use the array properly. To avoid this, you should (almost) always include another parameter that specifies the length:
void someFunction (int *myArray, int size) { //... }
By doing so, you require whoever is calling your function (it may not be you!) to provide a value for the size of the array, so your function will now always know what the length of the array is. If someone passes a bad value for the size of the array, this is not something you need to worry about; it is common to see libraries (especially the standard library) fall back to undefined behavior if the user provides an invalid input, so it is OK for you to do the same.
There is one caveat, though: range-for
loops will not work on the array in this function. This is because the C++ compiler
uses the underlying array type to determine how many iterations it has to make and write the loop control logic for you. When you pass an array to a function,
it actually gets converted to a pointer (which we will discuss later), which has no intrinsic length value, so the compiler no
longer knows how to write the range-for
control logic.
Activity
Write a program that computes a modified version of the Fibonacci sequence, which is as follows:
mi = mi-1 + mi-3; m0 = m1 = m2 = 1
In other words, each element in the sequence is the sum of the preceding element and the element 3 positions beforehand. The first 3 numbers in the sequence are all 1.
vector
: the resizeable array
The one unfortunate thing about C++ arrays is that they are of constant size: once you've declared them, you cannot change their size. This puts a natural limit on their practicality: there are a lot of problems that you might be presented with that give you a list of arbitrary length, and you need to be able to store these lists without being given the length ahead of time. How would we handle these?
The answer lies in an object provided by the C++ standard library: vector
. A vector
is essentially a resizeable array.
To use a vector, though, we will need to add an include
statement at the top of our source file:
#include <vector>
As with iostream
, you will need to have the using namespace std;
statement after your #include
directives.
Once you've included the vector
header, you can declare a vector
variable like so:
#include <vector> using namespace std; int main () { vector<int> myInts; //... return 0; }
Note how the declaration syntax is quite different from that of a native array: our data type is vector<int>
, and there is no initial size.
You can read this data type as "vector
of int
". vector
s are much like string
s in that they run some
initialization steps when they are first created. As declared above, the vector
would create a list of length 0
.
This might seem useless at first, but we will see in a moment that this is actually helpful for building lists of arbitrary length.
You can create a vector
with a nonzero initial size, like so:
vector<int> myInts (5);
This will create a vector
with an initial size of 5. You can use a vector
much like you would a regular array; the indexing syntax is the same:
vector<int> myInts (5); myInts[0] = 1; myInts[1] = 1; for (int i = 2; i < myInts.size(); ++i) { myInts[i] = myInts[i - 1] + myInts[i - 2]; }
Notice the myInts.size()
in the for
loop condition. Because vector
s are resizable, they also have a way to determine how many
elements they contain at any time; this is what the size()
function of vector
is for; it will return the current size of the vector
.
Note that size()
is associated with the vector; you will need to write vectorName.size()
to get the size of a vector
named vectorName
.
(We will talk more about what this syntax means when we talk about abstract data types).
But the problem here is that if we try to index out of bounds, we still end up with undefined behavior! So, we need another way to tell the vector
that it needs to resize:
vector<int> myInts (5); myInts[0] = 1; myInts[1] = 1; for (int i = 2; i < myInts.size(); ++i) { myInts[i] = myInts[i - 1] + myInts[i - 2]; } //Resize the vector so it has 20 elements myInts.resize(20); //myInts now has 20 elements for (int i = 5; i < myInts.size(); ++i) { myInts[i] = myInts[i - 1] + myInts[i - 2]; }
So, we have a function called resize()
that allows us to change the size of the vector
to.
Like the size()
function, resize()
is associated with the vector
that it is operating on,
so you will need to prefix the function name and a period to the function call in order for it to work.
But this method is still suboptimal -- what if we do not know the ultimate size of the vector
? Let's return to the declaration that created an empty vector
.
There is another function called push_back
that increases the size of the vector
and adds its argument to the end:
vector<int> myInts; //myInts.size() == 0 myInts.push_back(1); //myInts.size() == 1 myInts.push_back(1); //myInts.size() == 2 for (int i = 2; i < 20; ++i) { myInts.push_back(myInts[i - 1] + myInts[i - 2]); } //myInts.size() == 20 //You can use range-for with a vector too for (int val : myInts) { cout << val << end; }
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
So, now we are building the vector element-by-element: we add one element to it at a time, increasing its size by one each time we call push_back
.
In the first for
loop, i
will always be the current size of the vector
, so indices i - 1
and i - 2
will always be valid, so undefined behavior will not occur.
The trick here is that if you use push_back
, you should always start out with an empty vector
. The reason is that push_back
adds its values to the end of the current vector
, so if you start with a vector
with a nonzero size, you will start adding elements
after the end of the initial ending index, thus having uninitialized values at the beginning of your vector
. Be sure that if you are using push_back
,
you always start with a vector
of size 0.
Using vector
s with functions
Although vector
s are conceptually similar to arrays, due to differences in how they are implemented in C++, the syntax you have to use in certain situations is
different. The best example of this is passing a vector
in a function call. Unlike arrays, to pass a vector
,
you can use essentially the same syntax as you would when you declare a vector
:
//Function to determine if a vector is sorted bool isSorted (vector<int> vec) { for (int i = 0; i < (vec.size() - 1); ++i) { if (!(vec[i] < vec[i + 1])) return false; } return true; }
Unlike an array, vectors
actually passed by value (copied into new vector
variables), so passing them as above can actually be rather slow.
Like a string
, your code will be much faster if you pass by const
-reference:
//Function to determine if a vector is sorted bool isSorted (const vector<int> &vec) { for (int i = 0; i < (vec.size() - 1); ++i) { if (!(vec[i] < vec[i + 1])) return false; } return true; }
As before, if you need to modify a vector
, then remove the const
and simply pass it by reference.
The added benefit of using vector
s with functions is that the range-for
loop will always work, even if the vector was
passed into the function:
int vectorSum (const vector<int> &vec) { int sum = 0; for (int el : vec) sum += el; return sum; }
Activity
Given a arbitrary-length list of positive integers terminated by a negative integer, find the mode. If there is more than one mode, just output one of the modes. Do not consider the terminal negative integer in the calculation.