You have a million users. Need to find one instantly by username. You have a navigation system. Need to find shortest route through thousands of intersections. You have an undo feature. Need to track every action, go back through them, restore any previous state.
Simple arrays and lists work for small datasets. But they become inefficient for complex operations:
Abstract Data Structures (ADS) solve these problems. They organize data in specific ways optimized for specific operations. Need fast insertion/deletion? Linked lists. Need fast searching? Binary search trees or hash maps. Need unique values only? Sets.
Understanding these structures changes everything. You start to see which structure fits which problem. You choose the right tool for each task. You build efficient systems that scale.
This isn't just theoretical computer science. These structures power every major application—from databases to compilers to operating systems to search engines.
Let's explore how to build efficient data organizations.
Definition: A linked list is a linear data structure where elements (nodes) are stored non-contiguously in memory, with each node containing data and a reference (link) to the next node.
Problem with arrays: Arrays store elements in contiguous memory. To insert in middle:
Time: O(n)
Linked lists solve this:
Definition: A singly linked list is a linked list where each node contains data and a reference to the next node, allowing traversal in only one direction.
class Node: def __init__(self, data): self.data = data self.next = None # Reference to next node
class LinkedList: def __init__(self): self.head = None # First node def is_empty(self): return self.head is None def insert_at_beginning(self, data): new_node = Node(data) new_node.next = self.head self.head = new_node def insert_at_end(self, data): new_node = Node(data) if self.is_empty(): self.head = new_node return # Traverse to end current = self.head while current.next is not None: current = current.next current.next = new_node def delete(self, data): if self.is_empty(): return # Deleting head if self.head.data == data: self.head = self.head.next return # Find node to delete current = self.head while current.next is not None: if current.next.data == data: current.next = current.next.next return current = current.next def search(self, data): current = self.head while current is not None: if current.data == data: return True current = current.next return False def display(self): elements = [] current = self.head while current is not None: elements.append(str(current.data)) current = current.next print(" -> ".join(elements))
ll = LinkedList() ll.insert_at_end(10) ll.insert_at_end(20) ll.insert_at_end(30) ll.display() # 10 -> 20 -> 30 ll.insert_at_beginning(5) ll.display() # 5 -> 10 -> 20 -> 30 ll.delete(20) ll.display() # 5 -> 10 -> 30 print(ll.search(10)) # True print(ll.search(99)) # False
| Operation | Time Complexity |
|---|---|
| Insert at beginning | O(1) |
| Insert at end | O(n) |
| Delete | O(n) |
| Search | O(n) |
| Access by index | O(n) |
Definition: A doubly linked list is a linked list where each node contains data and references to both the next and previous nodes, allowing bidirectional traversal.
class DoublyNode: def __init__(self, data): self.data = data self.next = None self.prev = None # Reference to previous node
class DoublyLinkedList: def __init__(self): self.head = None self.tail = None # Reference to last node def insert_at_beginning(self, data): new_node = DoublyNode(data) if self.head is None: self.head = self.tail = new_node return new_node.next = self.head self.head.prev = new_node self.head = new_node def insert_at_end(self, data): new_node = DoublyNode(data) if self.tail is None: self.head = self.tail = new_node return new_node.prev = self.tail self.tail.next = new_node self.tail = new_node def delete(self, data): current = self.head while current is not None: if current.data == data: # Update previous node if current.prev: current.prev.next = current.next else: self.head = current.next # Update next node if current.next: current.next.prev = current.prev else: self.tail = current.prev return current = current.next def display_forward(self): elements = [] current = self.head while current is not None: elements.append(str(current.data)) current = current.next print(" <-> ".join(elements)) def display_backward(self): elements = [] current = self.tail while current is not None: elements.append(str(current.data)) current = current.prev print(" <-> ".join(elements))
dll = DoublyLinkedList() dll.insert_at_end(10) dll.insert_at_end(20) dll.insert_at_end(30) dll.display_forward() # 10 <-> 20 <-> 30 dll.display_backward() # 30 <-> 20 <-> 10
Definition: A circular linked list is a linked list where the last node points back to the first node instead of None, forming a circle. Can be singly or doubly circular.
class CircularLinkedList: def __init__(self): self.head = None def insert_at_end(self, data): new_node = Node(data) if self.head is None: self.head = new_node new_node.next = self.head # Points to itself return current = self.head while current.next != self.head: current = current.next current.next = new_node new_node.next = self.head # Circle back def display(self): if self.head is None: return elements = [] current = self.head while True: elements.append(str(current.data)) current = current.next if current == self.head: break print(" -> ".join(elements) + " -> (back to head)")
Definition: A binary search tree is a tree data structure where each node has at most two children, with all values in the left subtree less than the node's value and all values in the right subtree greater than the node's value.
class TreeNode: def __init__(self, data): self.data = data self.left = None # Left child self.right = None # Right child class BinarySearchTree: def __init__(self): self.root = None def insert(self, data): if self.root is None: self.root = TreeNode(data) else: self._insert_recursive(self.root, data) def _insert_recursive(self, node, data): if data < node.data: if node.left is None: node.left = TreeNode(data) else: self._insert_recursive(node.left, data) else: if node.right is None: node.right = TreeNode(data) else: self._insert_recursive(node.right, data) def search(self, data): return self._search_recursive(self.root, data) def _search_recursive(self, node, data): if node is None: return False if data == node.data: return True elif data < node.data: return self._search_recursive(node.left, data) else: return self._search_recursive(node.right, data) def inorder_traversal(self, node, result=[]): if node: self.inorder_traversal(node.left, result) result.append(node.data) self.inorder_traversal(node.right, result) return result
Insert: 50, 30, 70, 20, 40, 60, 80
bst = BinarySearchTree() values = [50, 30, 70, 20, 40, 60, 80] for val in values: bst.insert(val) print(bst.search(40)) # True print(bst.search(99)) # False sorted_values = bst.inorder_traversal(bst.root, []) print(sorted_values) # [20, 30, 40, 50, 60, 70, 80]
Produces sorted sequence
Good for copying tree
Good for deleting tree
Inorder: 1, 2, 3, 4, 5, 6, 7 (sorted)
Preorder: 4, 2, 1, 3, 6, 5, 7
Postorder: 1, 3, 2, 5, 7, 6, 4
| Operation | Average | Worst Case |
|---|---|---|
| Search | O(log n) | O(n) |
| Insert | O(log n) | O(n) |
| Delete | O(log n) | O(n) |
When tree becomes unbalanced (like a linked list). Insert in order: 1, 2, 3, 4, 5
Solution: Self-balancing trees (AVL, Red-Black trees) maintain O(log n)
Definition: A hash map (hash table, dictionary) is a data structure that maps keys to values using a hash function to compute an index into an array of buckets.
Hash function: Converts key to array index
Example:
Key: "Alice"
Hash function: sum of ASCII values % array_size
"Alice" → (65+108+105+99+101) % 10 = 478 % 10 = 8
Store at index 8
class HashMap: def __init__(self, size=10): self.size = size self.table = [[] for _ in range(size)] # Array of lists def _hash(self, key): return hash(key) % self.size def put(self, key, value): index = self._hash(key) # Check if key exists, update value for i, (k, v) in enumerate(self.table[index]): if k == key: self.table[index][i] = (key, value) return # Key doesn't exist, add new self.table[index].append((key, value)) def get(self, key): index = self._hash(key) for k, v in self.table[index]: if k == key: return v raise KeyError(f"Key '{key}' not found") def remove(self, key): index = self._hash(key) for i, (k, v) in enumerate(self.table[index]): if k == key: del self.table[index][i] return raise KeyError(f"Key '{key}' not found") def contains(self, key): index = self._hash(key) return any(k == key for k, v in self.table[index])
phonebook = HashMap() phonebook.put("Alice", "555-1234") phonebook.put("Bob", "555-5678") phonebook.put("Carol", "555-9012") print(phonebook.get("Alice")) # 555-1234 print(phonebook.get("Bob")) # 555-5678 phonebook.put("Alice", "555-0000") # Update print(phonebook.get("Alice")) # 555-0000 phonebook.remove("Bob") print(phonebook.contains("Bob")) # False
Collision: Two keys hash to same index
Each bucket is a list
Index 0: []
Index 1: [("Alice","555-1234"), ("David","555-3333")]
Index 2: [("Bob","555-5678")]
Find next available slot
| Case | Insert | Search | Delete |
|---|---|---|---|
| Average | O(1) | O(1) | O(1) |
| Worst (all collisions) | O(n) | O(n) | O(n) |
In practice: With good hash function and load factor management, operations are O(1).
Python's built-in dictionaries are hash maps:
# Create dictionary student = { "name": "Alice", "id": "S12345", "major": "CS", "gpa": 3.8 } # Access print(student["name"]) # Alice # Modify student["gpa"] = 3.9 # Add student["year"] = 3 # Check existence if "major" in student: print(student["major"]) # Remove del student["year"] # Iterate for key, value in student.items(): print(f"{key}: {value}")
Definition: A set is a data structure that stores unique elements with no particular order, typically implemented using hash tables for fast membership testing.
class Set: def __init__(self): self.elements = {} # Use hash map def add(self, element): self.elements[element] = True def remove(self, element): if element in self.elements: del self.elements[element] def contains(self, element): return element in self.elements def size(self): return len(self.elements) def union(self, other_set): result = Set() for element in self.elements: result.add(element) for element in other_set.elements: result.add(element) return result def intersection(self, other_set): result = Set() for element in self.elements: if other_set.contains(element): result.add(element) return result def difference(self, other_set): result = Set() for element in self.elements: if not other_set.contains(element): result.add(element) return result
set_a = Set() set_a.add(1) set_a.add(2) set_a.add(3) set_b = Set() set_b.add(2) set_b.add(3) set_b.add(4) union = set_a.union(set_b) # {1, 2, 3, 4} intersection = set_a.intersection(set_b) # {2, 3} difference = set_a.difference(set_b) # {1}
# Create set fruits = {"apple", "banana", "orange"} # Add fruits.add("grape") # Remove fruits.remove("banana") # Check membership if "apple" in fruits: print("Has apples") # Set operations set1 = {1, 2, 3, 4} set2 = {3, 4, 5, 6} print(set1 | set2) # Union: {1, 2, 3, 4, 5, 6} print(set1 & set2) # Intersection: {3, 4} print(set1 - set2) # Difference: {1, 2} print(set1 ^ set2) # Symmetric difference: {1, 2, 5, 6}
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] unique = list(set(numbers)) # [1, 2, 3, 4]
valid_users = {"alice", "bob", "carol"}
if username in valid_users:
grant_access()
course1_students = {"alice", "bob", "carol"}
course2_students = {"bob", "david", "alice"}
both_courses = course1_students & course2_students # {"alice", "bob"}
| Structure | Search | Insert | Delete | Notes |
|---|---|---|---|---|
| Array | O(n) | O(n) | O(n) | O(1) if at end |
| Linked List | O(n) | O(1)* | O(1)* | *With reference |
| BST (balanced) | O(log n) | O(log n) | O(log n) | Must stay balanced |
| Hash Map | O(1)† | O(1)† | O(1)† | †Average case |
| Set | O(1)† | O(1)† | O(1)† | †Average case |
You started wondering how to build efficient data organizations. Now you understand.
Linked lists organize data with references instead of contiguous memory— singly for one-direction traversal, doubly for bidirectional, circular for loops.
Binary search trees maintain sorted hierarchies where left children are smaller and right children are larger, enabling O(log n) operations when balanced.
Hash maps use hash functions to map keys to array indices, providing O(1) average-case operations through clever indexing and collision resolution.
Sets store unique elements using hash tables, supporting fast membership testing and mathematical set operations like union and intersection.
Each structure optimizes for specific operations. Choose based on your needs—fast searching, efficient insertion, ordered traversal, unique values.
Every major system uses these structures. Databases use B-trees (similar to BST). Compilers use hash maps for symbol tables. Operating systems use linked lists for process scheduling. Search engines use sophisticated tree structures.
Understanding abstract data structures changes how you think about efficiency. You're no longer limited to basic arrays. You can choose the right tool for each problem, building systems that scale.