Databases

Welcome to MindMentor!

Databases

Computer Science

Where Does All Your Data Actually Live?

You log into Netflix. It remembers what you watched. Suggests new shows based on your preferences. Picks up exactly where you left off.

You check your bank account. Every transaction is there. Going back years. Perfectly accurate. Never lost.

You search for a product on Amazon. Millions of items. Thousands match your query. Results appear in milliseconds.

How?

Data. Massive amounts of data. Organized. Searchable. Reliable.

But it's not just stored randomly in files somewhere. It's in databases—carefully designed systems that store, organize, and retrieve information efficiently.

Understanding databases changes everything. You start to see why some apps are fast and others slow. Why good database design matters. Why companies pay database administrators well. Why data integrity is crucial.

This isn't just theory for computer scientists. This is the foundation of every modern application—from social media to banking to healthcare to e-commerce.

Let's explore how data is really organized.

Database Schema

What Is a Schema?

Definition: A database schema is the structure that defines how data is organized in a database, including tables, fields, relationships, constraints, and data types.

Think of a schema as a blueprint. Before you build a house, you need blueprints showing where rooms go, how they connect, what materials to use.

Before you build a database, you need a schema showing what data you'll store, how it's organized, how pieces relate to each other.

Components of a Schema

  • Tables: Store related data in rows and columns.
  • Fields (Columns): Define what type of information each column holds.
  • Records (Rows): Individual entries in the table.
  • Data types: What kind of data each field holds (numbers, text, dates).
  • Constraints: Rules about what data is allowed.
  • Relationships: How tables connect to each other.

Example Schema: School Database

Students table:

Field Type Constraint
StudentID integer primary key
FirstName text
LastName text
DateOfBirth date
Email text

Courses table:

Field Type Constraint
CourseID integer primary key
CourseName text
Credits integer
Department text

Enrollments table:

Field Type Constraint
EnrollmentID integer primary key
StudentID integer foreign key
CourseID integer foreign key
Grade text
Semester text

This schema shows three tables and how they relate. Students can enroll in courses. The Enrollments table connects them.

Relational Databases

What Makes a Database "Relational"?

Definition: A relational database is a database that organizes data into tables (relations) with rows and columns, where relationships between tables are established through keys.

Invented by Edgar F. Codd in 1970. Changed everything about data storage.

Tables (Relations)

Each table stores one type of entity. Students table stores students. Courses table stores courses. Not: Students_and_Courses_and_Everything_Else table. This separation is crucial. We'll see why when we discuss normalization.

Primary Keys

Definition: A primary key is a field (or combination of fields) that uniquely identifies each record in a table.

Rules:

  • Must be unique (no two records have same value)
  • Cannot be NULL (must have a value)
  • Should never change

Example: In the Students table, StudentID is the primary key. Every student has a unique StudentID. No two students share an ID. If StudentID = 12345, there's exactly one student with that ID.

Why not use names? Two students might have the same name. Names can change (marriage). IDs don't change.

Foreign Keys

Definition: A foreign key is a field in one table that refers to the primary key of another table, creating a relationship between the tables.

Example: The Enrollments table has StudentID as a foreign key. It references the primary key StudentID in the Students table. This says: "This enrollment belongs to this specific student."

Referential integrity: The database ensures foreign keys always point to valid records. You can't enroll a student with ID 99999 if no such student exists.

Relationships

One-to-One (1:1): One record in Table A relates to exactly one record in Table B. Example: Each employee has one passport. Each passport belongs to one employee.

One-to-Many (1:M): One record in Table A relates to many records in Table B. Example: One teacher teaches many courses. Each course has one teacher.

Many-to-Many (M:N): Many records in Table A relate to many records in Table B. Example: Students enroll in many courses. Courses have many students. Implementation: Requires a junction table (like Enrollments). Can't directly represent M:N in relational databases.

Entity-Relationship Diagrams (ERD)

Visualizing Database Design

Definition: An Entity-Relationship Diagram is a visual representation of entities (tables), their attributes (fields), and the relationships between them.

Before writing code or creating tables, design the database visually.

ERD Components

Entities: Represented as rectangles. These become tables.

Student

Attributes: Represented as ovals connected to entities. These become fields.

Student
StudentID
FirstName
LastName

Relationships: Represented as diamonds connecting entities.

Student
Enrolls
Course

Cardinality: Shows relationship type (1:1, 1:M, M:N).

Crow's foot notation (common):

  • One: Single line — represents exactly one record
  • Many: Crow's foot (three lines) — represents multiple records

Example ERD — School System

Teacher
1
Teaches
M
Course
M
Enrolls
M
Student

This shows:

  • One teacher teaches many courses (1:M)
  • One course has one teacher (M:1)
  • Students enroll in many courses (M:N)
  • Courses have many students (M:N)

Normalization: 1NF, 2NF, 3NF

Why Normalize?

Definition: Normalization is the process of organizing database tables to minimize redundancy and dependency, improving data integrity and reducing anomalies.

Problems with bad design:

  • Update anomalies: Change data in one place, must change it everywhere.
  • Insertion anomalies: Can't add data without adding unrelated data.
  • Deletion anomalies: Delete something, accidentally delete other information.

Example of bad design — Students_and_Courses table:

StudentID StudentName Course1 Course2 Course3
1 Alice Math Physics NULL
2 Bob Math Biology Chemistry

Problems:

  • What if a student takes 4 courses? Need a Course4 column?
  • Lots of NULL values (wasted space)
  • Hard to query "all students in Math"
  • If Alice changes name, update one row. But what about duplicates elsewhere?

First Normal Form (1NF)

Definition: A table is in 1NF if each field contains only atomic (indivisible) values and each record is unique.

Rules:

  • Each cell contains a single value (no lists, no sets)
  • Each record is unique (has primary key)
  • No repeating groups (like Course1, Course2, Course3)

Bad (not 1NF):

StudentID StudentName Courses
1 Alice Math, Physics
2 Bob Math, Bio, Chem

"Courses" column contains multiple values. Violates 1NF.

Good (1NF):

StudentID StudentName Course
1 Alice Math
1 Alice Physics
2 Bob Math
2 Bob Biology
2 Bob Chemistry

Each cell has a single value. Each row is a unique combination of student and course. Problem: Still redundant. "Alice" appears twice. If she changes name, update multiple rows.

Second Normal Form (2NF)

Definition: A table is in 2NF if it's in 1NF and all non-key attributes are fully dependent on the entire primary key (no partial dependencies).

Only relevant for composite primary keys (keys made of multiple fields).

Example with composite key — Composite primary key: (StudentID, CourseID)

Bad (1NF but not 2NF):

StudentID CourseID StudentName CourseName Grade
1 101 Alice Math A
1 102 Alice Physics B
2 101 Bob Math A

Problems:

  • StudentName depends only on StudentID (not the full key)
  • CourseName depends only on CourseID (not the full key)
  • Only Grade depends on both StudentID and CourseID

This is partial dependency—some attributes depend on part of the key, not the whole key.

Good (2NF) — Split into three tables:

Students:

StudentID StudentName
1 Alice
2 Bob

Courses:

CourseID CourseName
101 Math
102 Physics

Enrollments:

StudentID CourseID Grade
1 101 A
1 102 B
2 101 A

Now all non-key attributes fully depend on the entire primary key. Problem: Still possible redundancy if non-key attributes depend on other non-key attributes.

Third Normal Form (3NF)

Definition: A table is in 3NF if it's in 2NF and no non-key attribute depends on another non-key attribute (no transitive dependencies).

Bad (2NF but not 3NF):

StudentID StudentName AdvisorID AdvisorName
1 Alice 501 Dr. Smith
2 Bob 501 Dr. Smith
3 Carol 502 Dr. Jones

Primary key: StudentID. Problem: AdvisorName depends on AdvisorID (not on StudentID directly). This is transitive dependency: StudentID → AdvisorID → AdvisorName. If Dr. Smith changes name, must update multiple rows.

Good (3NF) — Split into two tables:

Students:

StudentID StudentName AdvisorID
1 Alice 501
2 Bob 501
3 Carol 502

Advisors:

AdvisorID AdvisorName
501 Dr. Smith
502 Dr. Jones

Now if Dr. Smith changes name, update one row in the Advisors table. Students table doesn't change.

Benefits of Normalization:

  • Eliminates redundancy: Data stored once
  • Improves integrity: Changes in one place
  • Easier updates: Modify single record
  • Saves storage: No duplicate data
  • Prevents anomalies: Consistent database

When NOT to Normalize: Sometimes denormalization (intentionally breaking normalization rules) improves performance. Example: Data warehouses often denormalize for faster queries. Reading from one table is faster than joining many tables. Trade-off: Storage space and update complexity vs. query speed.

SQL: Structured Query Language

The Language of Databases

Definition: SQL (Structured Query Language) is a standardized programming language used to manage and manipulate relational databases.

SQL lets you: create tables, insert data, update data, delete data, and query data.

SELECT: Retrieving Data

Basic syntax:

SELECT column1, column2
FROM table_name
WHERE condition;

Get all students:

SELECT * FROM Students;

Get specific columns:

SELECT FirstName, LastName FROM Students;

Filter with WHERE:

SELECT * FROM Students
WHERE LastName = 'Smith';

Multiple conditions:

SELECT * FROM Students
WHERE LastName = 'Smith' AND FirstName = 'John';

Pattern matching with LIKE:

SELECT * FROM Students
WHERE Email LIKE '%@gmail.com';

Sorting with ORDER BY:

SELECT * FROM Students
ORDER BY LastName ASC, FirstName ASC;

Limiting results:

SELECT * FROM Students
LIMIT 10;

INSERT: Adding Data

Syntax:

INSERT INTO table_name (column1, column2)
VALUES (value1, value2);

Example:

INSERT INTO Students (StudentID, FirstName, LastName, Email)
VALUES (1, 'Alice', 'Johnson', 'alice@email.com');

Multiple rows:

INSERT INTO Students (StudentID, FirstName, LastName)
VALUES
  (2, 'Bob', 'Smith'),
  (3, 'Carol', 'Williams'),
  (4, 'David', 'Brown');

UPDATE: Modifying Data

Syntax:

UPDATE table_name
SET column1 = value1, column2 = value2
WHERE condition;

Example:

UPDATE Students
SET Email = 'new.email@gmail.com'
WHERE StudentID = 1;
⚠️ WARNING: Always use a WHERE clause! Without it, you update ALL rows:

-- DANGEROUS! Updates every student
UPDATE Students SET Email = 'same@email.com';

DELETE: Removing Data

Syntax:

DELETE FROM table_name
WHERE condition;

Example:

DELETE FROM Students
WHERE StudentID = 1;
⚠️ WARNING: Without WHERE, deletes ALL rows:

-- DANGEROUS! Deletes all students
DELETE FROM Students;

SQL Joins

Definition: A join is an SQL operation that combines rows from two or more tables based on a related column.

Why needed? Data is normalized across multiple tables. Joins bring it back together.

INNER JOIN

Definition: Returns only rows where there's a match in both tables.

SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;

Example — Students and their enrollments:

SELECT Students.FirstName, Students.LastName, Courses.CourseName
FROM Students
INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
INNER JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Result: Only students who are enrolled in courses.

LEFT JOIN (LEFT OUTER JOIN)

Definition: Returns all rows from the left table and matching rows from the right table. If no match, NULL for right table columns.

SELECT Students.FirstName, Courses.CourseName
FROM Students
LEFT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
LEFT JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Result: All students. For students with no enrollments, CourseName shows NULL.

RIGHT JOIN (RIGHT OUTER JOIN)

Definition: Returns all rows from the right table and matching rows from the left table. Less commonly used (can rewrite as LEFT JOIN by swapping tables).

FULL OUTER JOIN

Definition: Returns all rows from both tables. If no match, NULL for missing columns.

SELECT Students.FirstName, Courses.CourseName
FROM Students
FULL OUTER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
FULL OUTER JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Aggregate Functions

Definition: Aggregate functions perform calculations on multiple rows and return a single value.

COUNT

Counts number of rows.

SELECT COUNT(*) FROM Students;
-- Returns total number of students

SELECT COUNT(*) FROM Students
WHERE LastName = 'Smith';
-- Returns number of students with last name Smith

SUM

Adds up values.

SELECT SUM(Credits) FROM Enrollments
WHERE StudentID = 1;
-- Total credits for student 1

AVG

Calculates average.

SELECT AVG(Grade) FROM Enrollments
WHERE CourseID = 101;
-- Average grade in course 101

MIN and MAX

Find minimum and maximum values.

SELECT MIN(DateOfBirth) FROM Students;
-- Oldest student (earliest birth date)

SELECT MAX(DateOfBirth) FROM Students;
-- Youngest student (latest birth date)

GROUP BY

Groups rows that have the same values.

-- Count students in each course:
SELECT CourseID, COUNT(*) AS NumStudents
FROM Enrollments
GROUP BY CourseID;

-- Average grade per course:
SELECT Courses.CourseName, AVG(Enrollments.Grade) AS AvgGrade
FROM Enrollments
INNER JOIN Courses ON Enrollments.CourseID = Courses.CourseID
GROUP BY Courses.CourseName;

HAVING

Filters groups (WHERE filters rows before grouping; HAVING filters groups after).

-- Courses with more than 10 students:
SELECT CourseID, COUNT(*) AS NumStudents
FROM Enrollments
GROUP BY CourseID
HAVING COUNT(*) > 10;

Views

Definition: A view is a virtual table based on a SQL query that can be queried like a regular table but doesn't store data itself.

Why use views?

  • Simplify complex queries
  • Provide security (limit access to certain columns)
  • Present data differently without changing tables

Creating a View

CREATE VIEW StudentCourses AS
SELECT Students.FirstName, Students.LastName, Courses.CourseName
FROM Students
INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
INNER JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Using the View

SELECT * FROM StudentCourses
WHERE LastName = 'Smith';

Much simpler than writing the full join query every time!

Transactions

Definition: A transaction is a sequence of database operations treated as a single unit of work that either completes entirely or has no effect at all.

ACID Properties

  • Atomicity: All operations succeed or all fail. No partial completion.
  • Consistency: Database remains in valid state before and after transaction.
  • Isolation: Concurrent transactions don't interfere with each other.
  • Durability: Once committed, changes are permanent (even if system crashes).

Example: Bank Transfer

Transfer $100 from Account A to Account B.

BEGIN TRANSACTION;

UPDATE Accounts
SET Balance = Balance - 100
WHERE AccountID = 'A';

UPDATE Accounts
SET Balance = Balance + 100
WHERE AccountID = 'B';

COMMIT;

If anything goes wrong (power failure, error in second UPDATE), the entire transaction rolls back. Account A doesn't lose $100 without Account B gaining it.

Commands

  • BEGIN TRANSACTION — Start transaction
  • COMMIT — Make changes permanent
  • ROLLBACK — Undo all changes in transaction

Example with Error Handling

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 'A';

-- Check if sufficient funds
IF (SELECT Balance FROM Accounts WHERE AccountID = 'A') < 0 THEN
    ROLLBACK;
    -- Transaction cancelled
ELSE
    UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 'B';
    COMMIT;
    -- Transaction successful
END IF;

Data Warehousing

What Is a Data Warehouse?

Definition: A data warehouse is a large, centralized repository of integrated data from multiple sources, optimized for analysis and reporting rather than transaction processing.

Operational Databases (OLTP)

Online Transaction Processing. Handle day-to-day operations.

  • Optimized for INSERT, UPDATE, DELETE
  • Current data
  • Normalized for data integrity
  • Example: Bank's transaction system

Data Warehouses (OLAP)

Online Analytical Processing. Handle analysis and reporting.

  • Optimized for complex queries (SELECT)
  • Historical data (often years)
  • Denormalized for query performance
  • Example: Bank's business intelligence system

Data Warehouse Architecture

  • Data sources: Various operational databases, external sources
  • ETL process (Extract, Transform, Load):
    • Extract: Pull data from sources
    • Transform: Clean, standardize, aggregate data
    • Load: Insert into data warehouse
  • Data warehouse: Central repository
  • Data marts: Subsets of warehouse for specific departments (sales, finance, marketing)
  • BI tools: Business intelligence software for analysis, visualization, reporting

Star Schema

Definition: A star schema is a data warehouse design where a central fact table connects to multiple dimension tables, forming a star-like structure.

  • Fact table: Contains measurable data (sales, revenue, quantities)
  • Dimension tables: Contain descriptive attributes (products, customers, time)

Example: Retail Sales Warehouse

Fact table — Sales:

SaleID ProductID CustomerID StoreID DateID Quantity Revenue
… fact rows …

Dimension tables:

ProductID ProductName Category Price
Product dimension
CustomerID Name City Country
Customer dimension
StoreID StoreName City Region
Store dimension
DateID Date Month Quarter Year
Date dimension

Query example — Total sales by category per month:

SELECT Product.Category, Date.Month, SUM(Sales.Revenue)
FROM Sales
JOIN Product ON Sales.ProductID = Product.ProductID
JOIN Date    ON Sales.DateID    = Date.DateID
GROUP BY Product.Category, Date.Month;

Why Denormalize in Warehouses? Joins are expensive. Complex analytics query many tables. Denormalizing—storing redundant data—makes queries faster. Trade-off: More storage space, harder to update vs. much faster queries. For warehouses: Fast queries matter more. Data changes infrequently (loaded in batches). Storage is cheap.

Putting It All Together

You started wondering where your data actually lives.

Now you understand.

Database schemas define structure—tables, fields, relationships, constraints. The blueprint before you build.

Relational databases organize data into tables connected by keys—primary keys uniquely identify records, foreign keys create relationships.

ERD diagrams visualize database design before implementation, showing entities, attributes, and relationships clearly.

Normalization eliminates redundancy and anomalies—1NF removes repeating groups, 2NF eliminates partial dependencies, 3NF removes transitive dependencies.

SQL is the language—SELECT retrieves data, INSERT adds it, UPDATE modifies it, DELETE removes it.

Joins combine tables—INNER JOIN for matches, LEFT JOIN for all from left table, joins reconstruct normalized data.

Aggregate functions summarize data—COUNT, SUM, AVG, MIN, MAX with GROUP BY and HAVING.

Views simplify complex queries, providing virtual tables without storing duplicate data.

Transactions ensure data integrity with ACID properties—all operations succeed or all fail, maintaining consistency.

Data warehouses support analytics, using denormalized star schemas for fast complex queries on historical data.

Every time you shop online, check your bank account, or search for information, databases work behind the scenes. Organized. Reliable. Fast.

Understanding databases changes how you think about data. You're no longer just a user. You understand the architecture beneath.