πŸ—οΈData Structures & Algorithms

Big O

Big O allow us to measure speed and efficiency. Different data structures are optimize for different scenarios. Tu measure this, we use time complexity (Big O notation).

Here's the representation of four types of Big O notations.

  • O(1) Constant time: This is the fastest an algorithm can operate. It's mean the operation take the same amount of time no matter how much data there is.

  • O(n) Linear time: The amount of data increases the time it takes grows at that same rate. Basically, a foreach loop.

  • O(nΒ²) Quadratic time: This happens when every single item in a dataset needs to interact with every other item. This is very slow. For example, if you have 100 items in a list, it means there are 10000 operations (100*100).

  • O(log n) Logarithmic time: This is the second fastest algorithm. The goal is to reduce the amount of work needed by cutting the problem size down each step. Instead of checking every item, the algorithm repeatedly divides the data into smaller chunks. For example you can look at the middle of a sorted list, then instantly ignore half of the remaining data, and repeat. As the data grows, the time increases very slowly.

The Array

An array is a collection of items of the same variable type that are stored at contiguous memory locations. It is one of the most popular and simple data structures used in programming. Because each element has a specific index, retrieving a value is extremely fast and you can go directly to its position without searching.

array = [ 1, 2, 3 ]
print(array[1]) # will print the value "2"

Accessing or replacing any element of an array by an index is an O(1) operation because it's efficient. No need to search the element. If you want to add an element in the array, beyond the array size, it's an O(n) operation because you will need to copy each element on another array one by one. The more elements in the array, the more time it takes. If you remove an element in that is not in the end of the array, it's an O(n) operation because you need to shift down all subsequent element to maintain the array's continuous structure. But, if you remove the last element of an array it's an O(1) operation because there is no shifting down.

Linked List

A linked list is a fundamental data structure in computer science. In linked list each element is stored in a separate node and each node contains a value and a pointer to the next node. This make list more flexible than arrays. It mainly allows efficient insertion and deletion operations compared to arrays. However, accessing specific elements is slower because unlike an array you have to traverse the list from the beginning to find it. Because of this, in terms of complexity, it's an O(n) operation. If you want to add or remove a node and you actually already have a reference to the node that you are trying to insert or remove around it's a O(1) operation. If you actually have to search for it, it's a O(n) operation because you have to traverse the list. Here's a simple example in language C.

struct Node {
  // Data can be any type and count
  int data;
  
  // Pointer to the next node
  struct Node* next;
};

There is several types of linked lists.

  • Singly Linked List

  • Doubly Linked List

  • Circular Linked List

Coming soon...

The Stack

The stack is a simple linear data structure that follows a particular order in which the operations are performed. The order may be LIFO(Last In First Out) or FILO(First In Last Out). LIFO implies that the element that is inserted last, comes out first and FILO implies that the element that is inserted first, comes out last. Pushing or popping an element of the stack is an O(1) operation because you just add or remove the element to the top of the stack. Searches through a stack is an O(n) operation because you cannot accessing directly an element.

mov rax, 10     ; load value 10
push rax        ; push 10 onto the stack

mov rax, 20     ; load value 20
push rax        ; push 20 onto the stack

pop rbx         ; rbx gets 20 (last pushed)
pop rcx         ; rcx gets 10 (first pushed)

Queue

Unlike the stack, the queue follows the principle of First in, First out (FIFO), where the first element added to the queue is the first one to be removed. The queue is stored in an ordered manner but unlike an array, elements can only be added in the back and removed in the front. For the Big O complexity of queue, just like a stack, you cannot access elements directly using an index. If you want to search an element in it for a reason or an other, you will have to go through the whole queue find it. So it would be an O(n) operation. If you want to access the first element it's an O(1) operation. In terms of insertion or deletion, those are O(1) operations because you just have to add to the back and remove to the front.

queue = [ 1, 2, 3 ]
queue.append(4)
queue.pop(0)

Heap

A Heap is a complete binary tree data structure that satisfies the heap property. For every node, the value of its children is greater than or equal to its own value. Heaps are usually used to implement priority queues, where the smallest (or largest) element is always at the root of the tree. To access the root element, it's an O(1) operation and an O(n) for any elements within the heap. Inserting or removing an element is an O(log n) operation because first it's added into the heap and then because of the heap property of the parent child ration ship being bigger or smaller, it bubbles to its right position. There are two types of heaps.

Min Heap

The Min Heap, where every parent is smaller than its children.

Max Heap

The Max Heap, where every parent is larger than its children.

Hashmap

A hashmap is a data structure that stores data in key-value pairs, enabling efficient retrieval, insertion, and deletion operations. It uses a hash function to determines where the value should be stored and convert a key into an index within an underlying array, allowing for average-case constant time complexity, O(1), for these operations. A hash collision happens when two different keys produce the same index in the hashmap. Since both values cannot occupy the same slot, the hashmap uses a collision-resolution strategy to store them correctly. The two most common methods are chaining, where multiple items are stored in a small list at the same index, and open addressing, where the hashmap searches for the next available space in the array. These techniques ensure that operations remain efficient even when collisions occur. Resolving collisions, can produce an O(n) operation in worst case if everything is put onto that link list and you have to do a lot of searching. This is the same logic for searching, inserting and deleting an element.

Set

Coming soon...

Last updated