A Programmer’s approach of looking at Array vs. Linked List
In general, the array is considered a data structure for which size is fixed at the compile time, and array memory is allocated either from the Data section (e.g. global array) or Stack section (e.g. local array).
Similarly, a linked list is considered a data structure for which size is not fixed and memory is allocated from the Heap section (e.g. using malloc(), etc.) as and when needed. In this sense, the array is taken as a static data structure (residing in Data or Stack section) while the linked list is taken as a dynamic data structure (residing in the Heap section). Memory representation of the array and the linked list can be visualized as follows:
An array of 4 elements (integer type) have been initialized with 1, 2, 3, and 4. Suppose, these elements are allocated at memory addresses 0x100, 0x104, 0x108 and 0x10C respectively.
[(1)] [(2)] [(3)] [(4)] 0x100 0x104 0x108 0x10C
A linked list with 4 nodes where each node has an integer as data and these data are initialized with 1, 2, 3, and 4. Suppose, these nodes are allocated via malloc() and memory allocated for them is 0x200, 0x308, 0x404 and 0x20B respectively.
[(1), 0x308] [(2),0x404] [(3),0x20B] [(4),NULL] 0x200 0x308 0x404 0x20B
Anyone with even little understanding of array and linked-list might not be interested in the above explanation. I mean, it is well known that the array elements are allocated memory in sequence i.e. contiguous memory while nodes of a linked list are non-contiguous in memory. Though it sounds trivial yet this is the most important difference between an array and a linked list. It should be noted that due to this contiguous versus non-contiguous memory, array and linked list are different. This difference is what makes array vs. linked list! In the following sections, we will try to explore this very idea further.
Since elements of an array are contiguous in memory, we can access any element randomly using an index e.g. intArr will directly access the fourth element of the array. (For newbies, array indexing starts from 0 and that’s why the fourth element is indexed with 3). Also, due to contiguous memory for successive elements in the array, no extra information is needed to be stored in individual elements i.e. no overhead of metadata in arrays. Contrary to this, linked list nodes are non-contiguous in memory. It means that we need some mechanism to traverse or access linked list nodes. To achieve this, each node stores the location of the next node and this forms the basis of the link from one node to the next node. Therefore, it’s called a Linked list. Though storing the location of the next node is overhead in the linked list but it’s required. Typically, we see the linked list node declaration as follows:
So array elements are contiguous in memory and therefore do not require any metadata. And linked list nodes are non-contiguous in memory thereby requiring metadata in the form of the location of the next node. Apart from this difference, we can see that the array could have several unused elements because memory has already been allocated. But the linked list will have only the required no. of data items. All the above information about the array and the linked list has been mentioned in several textbooks though in different ways.
What if we need to allocate array memory from the Heap section (i.e. at run time) and linked list memory from Data/Stack section. First of all, is it possible? Before that, one might ask why would someone need to do this? Now, I hope that the remaining article would make you rethink the idea of array vs. linked-list 🙂
Now consider the case when we need to store certain data in an array (because the array has the property of random access due to contiguous memory) but we don’t know the total size apriori. One possibility is to allocate memory of this array from Heap at run time. For example, as follows:
/*At run-time, suppose we know the required size for an integer array (e.g. input size from the user). Say, the array size is stored in the variable arrSize. Allocate this array from Heap as follows*/
Though the memory of this array is allocated from Heap, the elements can still be accessed via the index mechanism e.g. dynArr[i]. Based on the programming problem, we have combined one benefit of the array (i.e. random access of elements) and one benefit of the linked list (i.e. delaying the memory allocation till run time and allocating memory from Heap). Another advantage of having this type of dynamic array is that this method of allocating array from Heap at run time could reduce code size (of course, it depends on certain other factors e.g. program format, etc.)
Now consider the case when we need to store data in a linked list (because no. of nodes in a linked list would be equal to actual data items stored i.e. no extra space like an array) but we aren’t allowed to get this memory from Heap again and again for each node. This might look hypothetical situation to some folks but it’s not a very uncommon requirement in embedded systems. Basically, in several embedded programs, allocating memory via malloc(), etc. isn’t allowed due to multiple reasons. One obvious reason is performance i.e. allocating memory via malloc() is costly in terms of time complexity because your embedded program is required to be deterministic most of the time. Another reason could be module-specific memory management i.e. each module in the embedded system may manage its memory. In short, if we need to perform our memory management, instead of relying on the system-provided APIs of malloc() and free(), we might choose the linked list which is simulated using an array. I hope that you got some idea of why we might need to simulate the linked list using an array. Now, let us first see how this can be done. Suppose, the type of a node in the linked list (i.e. the underlying array) is declared as follows:
If we initialize this linked list (which is actually an array), it would look as follows in memory:
[(0),-1] [(0),-1] [(0),-1] [(0),-1] [(0),-1] 0x500 0x508 0x510 0x518 0x520
The important thing to notice is that all the nodes of the linked list are contiguous in memory (each one occupying 8 bytes) and the next index of each node is set to -1. This (i.e. -1) is done to denote that each node of the linked list is empty as of now. This linked list is denoted by head index 0.
Now, if this linked list is updated with four elements of data parts 4, 3, 2, and 1 successively, it would look as follows in memory. This linked list can be viewed as 0x500 -> 0x508 -> 0x510 -> 0x518.
[(1),1] [(2),2] [(3),3] [(4),-2] [(0),-1] 0x500 0x508 0x510 0x518 0x520
The important thing to notice is next index of the last node (i.e. fourth node) is set to -2. This (i.e. -2) is done to denote the end of the linked list. Also, the head node of the linked list is index 0. This concept of simulating linked lists using an array would look more interesting if we delete say the second node from the above-linked list. In that case, the linked list will look as follows in memory:
[(1),2] [(0),-1] [(3),3] [(4),-2] [(0),-1] 0x500 0x508 0x510 0x518 0x520
The resultant linked list is 0x500 -> 0x510 -> 0x518. Here, it should be noted that even though we have deleted the second node from our linked list, the memory for this node is still there because the underlying array is still there. But the next index of the first node now points to the third node (for which index is 2).
Hopefully, the above examples would have given some idea that for the simulated linked list, we need to write our API similar to malloc() and free() which would be used to insert and delete a node. Now, this is what’s called own memory management. Let us see how this can be done algorithmically.
There are multiple ways to do so. If we take the simplistic approach of creating a linked list using an array, we can use the following logic. For inserting a node, traverse the underlying array and find a node whose next index is -1. It means that this node is empty. Use this node as a new node. Update the data part in this new node and set the next index of this node to the current head node (i.e. head index) of the linked list. Finally, make the index of this new node as the head index of the linked list. To visualize it, let us take an example. Suppose the linked list is as follows where head Index is 0 i.e. linked list is 0x500 -> 0x508 -> 0x518 -> 0x520
[(1),1] [(2),3] [(0),-1] [(4),4] [(5),-2] 0x500 0x508 0x510 0x518 0x520
After inserting a new node with data 8, the linked list would look as follows with head index as 2.
[(1),1] [(2),3] [(8),0] [(4),4] [(5),-2] 0x500 0x508 0x510 0x518 0x520
So the linked list nodes would be at addresses 0x510 -> 0x500 -> 0x508 -> 0x518 -> 0x520
For deleting a node, we need to set the next index of the node as -1 so that the node is marked as the empty node. But, before doing so, we need to make sure that the next index of the previous node is updated correctly to the index of the next node of this node to be deleted. We can see that we have done our memory management for creating a linked list out of the array memory. But, this is one way of inserting and deleting nodes in this linked list. It can be easily noticed that finding an empty node is not so efficient in terms of time complexity. We’re searching the complete array linearly to find an empty node.
Let us see if we can optimize it further. We can maintain a linked list of empty nodes as well in the same array. In that case, the linked list would be denoted by two indexes – one index would be for the linked list which has the actual data values i.e. nodes that have been inserted so far, and the other indexes for a linked list of empty nodes. By doing so, whenever, we need to insert a new node in the existing linked list, we can quickly find an empty node. Let us take an example:
[(4),2] [(0),3] [(5),5] [(0),-1] [(0),1] [(9),-1] 0x500 0x508 0x510 0x518 0x520 0x528
The above-linked list which is represented using two indexes (0 and 5) has two linked lists: one for actual values and another for empty nodes. The linked list with actual values has nodes at address 0x500 -> 0x510 -> 0x528 while the linked list with empty nodes has nodes at addresses 0x520 -> 0x508 -> 0x518. It can be seen that finding an empty node (i.e. writing our API similar to malloc()) should be relatively faster now because we can quickly find a free node. In real-world embedded programs, a fixed chunk of memory (normally called memory pool) is allocated using malloc() only once by a module. And then the management of this memory pool (which is an array) is done by that module itself using the techniques mentioned earlier. Sometimes, there are multiple memory pools each one having different sizes of a node. Of course, there are several other aspects of our memory management but we’ll leave them here. But it’s worth mentioning that there are several methods by which the insertion (which requires our memory allocation) and deletion (which requires our memory freeing) can be improved further.
If we look carefully, it can be noticed that the Heap section of memory is a big array of bytes that is being managed by the underlying operating system (OS). And OS is providing this memory management service to programmers via malloc(), free(), etc. Aha !!
The important takeaways from this article can be summed as follows:
A) Array means contiguous memory. It can exist in any memory section be it Data or Stack or Heap.
B) Linked List means non-contiguous linked memory. It can exist in any memory section be it Heap or Data or Stack.
C) As a programmer, looking at a data structure from a memory perspective could provide us with better insight into choosing a particular data structure or even designing a new data structure. For example, we might create an array of linked lists, etc.
Please write comments if you find anything incorrect, or if you want to share more information about the topic discussed above