MindMentor – Computer Science – Databases – Schema, Relational Models, ERD, Normalization, SQL & Data Warehousing

Where Does All Your Data Actually Live?

You log into Netflix. It remembers what you watched. Suggests new shows based on your preferences. Picks up exactly where you left off.

You check your bank account. Every transaction is there. Going back years. Perfectly accurate. Never lost.

You search for a product on Amazon. Millions of items. Thousands match your query. Results appear in milliseconds.

How?

Data. Massive amounts of data. Organized. Searchable. Reliable.

But it's not just stored randomly in files somewhere. It's in databases—carefully designed systems that store, organize, and retrieve information efficiently.

Understanding databases changes everything. You start to see why some apps are fast and others slow. Why good database design matters. Why companies pay database administrators well. Why data integrity is crucial.

This isn't just theory for computer scientists. This is the foundation of every modern application—from social media to banking to healthcare to e-commerce.

Let's explore how data is really organized.

Database Schema

What Is a Schema?

Definition: A database schema is the structure that defines how data is organized in a database, including tables, fields, relationships, constraints, and data types.

Think of a schema as a blueprint. Before you build a house, you need blueprints showing where rooms go, how they connect, what materials to use.

Before you build a database, you need a schema showing what data you'll store, how it's organized, how pieces relate to each other.

Components of a Schema

Tables: Store related data in rows and columns.
Fields (Columns): Define what type of information each column holds.
Records (Rows): Individual entries in the table.
Data types: What kind of data each field holds (numbers, text, dates).
Constraints: Rules about what data is allowed.
Relationships: How tables connect to each other.

Example Schema: School Database

Students table:

Field	Type	Constraint
StudentID	integer	primary key
FirstName	text
LastName	text
DateOfBirth	date
Email	text

Courses table:

Field	Type	Constraint
CourseID	integer	primary key
CourseName	text
Credits	integer
Department	text

Enrollments table:

Field	Type	Constraint
EnrollmentID	integer	primary key
StudentID	integer	foreign key
CourseID	integer	foreign key
Grade	text
Semester	text

This schema shows three tables and how they relate. Students can enroll in courses. The Enrollments table connects them.

Relational Databases

What Makes a Database "Relational"?

Definition: A relational database is a database that organizes data into tables (relations) with rows and columns, where relationships between tables are established through keys.

Invented by Edgar F. Codd in 1970. Changed everything about data storage.

Tables (Relations)

Each table stores one type of entity. Students table stores students. Courses table stores courses. Not: Students_and_Courses_and_Everything_Else table. This separation is crucial. We'll see why when we discuss normalization.

Primary Keys

Definition: A primary key is a field (or combination of fields) that uniquely identifies each record in a table.

Rules:

Must be unique (no two records have same value)
Cannot be NULL (must have a value)
Should never change

Example: In the Students table, StudentID is the primary key. Every student has a unique StudentID. No two students share an ID. If StudentID = 12345, there's exactly one student with that ID.

Why not use names? Two students might have the same name. Names can change (marriage). IDs don't change.

Foreign Keys

Definition: A foreign key is a field in one table that refers to the primary key of another table, creating a relationship between the tables.

Example: The Enrollments table has StudentID as a foreign key. It references the primary key StudentID in the Students table. This says: "This enrollment belongs to this specific student."

Referential integrity: The database ensures foreign keys always point to valid records. You can't enroll a student with ID 99999 if no such student exists.

Relationships

One-to-One (1:1): One record in Table A relates to exactly one record in Table B. Example: Each employee has one passport. Each passport belongs to one employee.

One-to-Many (1:M): One record in Table A relates to many records in Table B. Example: One teacher teaches many courses. Each course has one teacher.

Many-to-Many (M:N): Many records in Table A relate to many records in Table B. Example: Students enroll in many courses. Courses have many students. Implementation: Requires a junction table (like Enrollments). Can't directly represent M:N in relational databases.

Entity-Relationship Diagrams (ERD)

Visualizing Database Design

Definition: An Entity-Relationship Diagram is a visual representation of entities (tables), their attributes (fields), and the relationships between them.

Before writing code or creating tables, design the database visually.

ERD Components

Entities: Represented as rectangles. These become tables.

Student

Attributes: Represented as ovals connected to entities. These become fields.

Student

StudentID

FirstName

LastName

Relationships: Represented as diamonds connecting entities.

Student

Enrolls

Course

Cardinality: Shows relationship type (1:1, 1:M, M:N).

Crow's foot notation (common):

One: Single line — represents exactly one record
Many: Crow's foot (three lines) — represents multiple records

Example ERD — School System

Teacher

1

Teaches

M

Course

M

Enrolls

M

Student

This shows:

One teacher teaches many courses (1:M)
One course has one teacher (M:1)
Students enroll in many courses (M:N)
Courses have many students (M:N)

Normalization: 1NF, 2NF, 3NF

Why Normalize?

Definition: Normalization is the process of organizing database tables to minimize redundancy and dependency, improving data integrity and reducing anomalies.

Problems with bad design:

Update anomalies: Change data in one place, must change it everywhere.
Insertion anomalies: Can't add data without adding unrelated data.
Deletion anomalies: Delete something, accidentally delete other information.

Example of bad design — Students_and_Courses table:

StudentID	StudentName	Course1	Course2	Course3
1	Alice	Math	Physics	NULL
2	Bob	Math	Biology	Chemistry

Problems:

What if a student takes 4 courses? Need a Course4 column?
Lots of NULL values (wasted space)
Hard to query "all students in Math"
If Alice changes name, update one row. But what about duplicates elsewhere?

First Normal Form (1NF)

Definition: A table is in 1NF if each field contains only atomic (indivisible) values and each record is unique.

Rules:

Each cell contains a single value (no lists, no sets)
Each record is unique (has primary key)
No repeating groups (like Course1, Course2, Course3)

Bad (not 1NF):

StudentID	StudentName	Courses
1	Alice	Math, Physics
2	Bob	Math, Bio, Chem

"Courses" column contains multiple values. Violates 1NF.

Good (1NF):

StudentID	StudentName	Course
1	Alice	Math
1	Alice	Physics
2	Bob	Math
2	Bob	Biology
2	Bob	Chemistry

Each cell has a single value. Each row is a unique combination of student and course. Problem: Still redundant. "Alice" appears twice. If she changes name, update multiple rows.

Second Normal Form (2NF)

Definition: A table is in 2NF if it's in 1NF and all non-key attributes are fully dependent on the entire primary key (no partial dependencies).

Only relevant for composite primary keys (keys made of multiple fields).

Example with composite key — Composite primary key: (StudentID, CourseID)

Bad (1NF but not 2NF):

StudentID	CourseID	StudentName	CourseName	Grade
1	101	Alice	Math	A
1	102	Alice	Physics	B
2	101	Bob	Math	A

Problems:

StudentName depends only on StudentID (not the full key)
CourseName depends only on CourseID (not the full key)
Only Grade depends on both StudentID and CourseID

This is partial dependency—some attributes depend on part of the key, not the whole key.

Good (2NF) — Split into three tables:

Students:

StudentID	StudentName
1	Alice
2	Bob

Courses:

CourseID	CourseName
101	Math
102	Physics

Enrollments:

StudentID	CourseID	Grade
1	101	A
1	102	B
2	101	A

Now all non-key attributes fully depend on the entire primary key. Problem: Still possible redundancy if non-key attributes depend on other non-key attributes.

Third Normal Form (3NF)

Definition: A table is in 3NF if it's in 2NF and no non-key attribute depends on another non-key attribute (no transitive dependencies).

Bad (2NF but not 3NF):

StudentID	StudentName	AdvisorID	AdvisorName
1	Alice	501	Dr. Smith
2	Bob	501	Dr. Smith
3	Carol	502	Dr. Jones

Primary key: StudentID. Problem: AdvisorName depends on AdvisorID (not on StudentID directly). This is transitive dependency: StudentID → AdvisorID → AdvisorName. If Dr. Smith changes name, must update multiple rows.

Good (3NF) — Split into two tables:

Students:

StudentID	StudentName	AdvisorID
1	Alice	501
2	Bob	501
3	Carol	502

Advisors:

AdvisorID	AdvisorName
501	Dr. Smith
502	Dr. Jones

Now if Dr. Smith changes name, update one row in the Advisors table. Students table doesn't change.

Benefits of Normalization:

Eliminates redundancy: Data stored once
Improves integrity: Changes in one place
Easier updates: Modify single record
Saves storage: No duplicate data
Prevents anomalies: Consistent database

When NOT to Normalize: Sometimes denormalization (intentionally breaking normalization rules) improves performance. Example: Data warehouses often denormalize for faster queries. Reading from one table is faster than joining many tables. Trade-off: Storage space and update complexity vs. query speed.

SQL: Structured Query Language

The Language of Databases

Definition: SQL (Structured Query Language) is a standardized programming language used to manage and manipulate relational databases.

SQL lets you: create tables, insert data, update data, delete data, and query data.

SELECT: Retrieving Data

Basic syntax:

SELECT column1, column2
FROM table_name
WHERE condition;

Get all students:

SELECT * FROM Students;

Get specific columns:

SELECT FirstName, LastName FROM Students;

Filter with WHERE:

SELECT * FROM Students
WHERE LastName = 'Smith';

Multiple conditions:

SELECT * FROM Students
WHERE LastName = 'Smith' AND FirstName = 'John';

Pattern matching with LIKE:

SELECT * FROM Students
WHERE Email LIKE '%@gmail.com';

Sorting with ORDER BY:

SELECT * FROM Students
ORDER BY LastName ASC, FirstName ASC;

Limiting results:

SELECT * FROM Students
LIMIT 10;

INSERT: Adding Data

Syntax:

INSERT INTO table_name (column1, column2)
VALUES (value1, value2);

Example:

INSERT INTO Students (StudentID, FirstName, LastName, Email)
VALUES (1, 'Alice', 'Johnson', 'alice@email.com');

Multiple rows:

INSERT INTO Students (StudentID, FirstName, LastName)
VALUES
  (2, 'Bob', 'Smith'),
  (3, 'Carol', 'Williams'),
  (4, 'David', 'Brown');

UPDATE: Modifying Data

Syntax:

UPDATE table_name
SET column1 = value1, column2 = value2
WHERE condition;

Example:

UPDATE Students
SET Email = 'new.email@gmail.com'
WHERE StudentID = 1;

⚠️ WARNING: Always use a WHERE clause! Without it, you update ALL rows:

-- DANGEROUS! Updates every student
UPDATE Students SET Email = 'same@email.com';

DELETE: Removing Data

Syntax:

DELETE FROM table_name
WHERE condition;

Example:

DELETE FROM Students
WHERE StudentID = 1;

⚠️ WARNING: Without WHERE, deletes ALL rows:

-- DANGEROUS! Deletes all students
DELETE FROM Students;

SQL Joins

Definition: A join is an SQL operation that combines rows from two or more tables based on a related column.

Why needed? Data is normalized across multiple tables. Joins bring it back together.

INNER JOIN

Definition: Returns only rows where there's a match in both tables.

SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;

Example — Students and their enrollments:

SELECT Students.FirstName, Students.LastName, Courses.CourseName
FROM Students
INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
INNER JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Result: Only students who are enrolled in courses.

LEFT JOIN (LEFT OUTER JOIN)

Definition: Returns all rows from the left table and matching rows from the right table. If no match, NULL for right table columns.

SELECT Students.FirstName, Courses.CourseName
FROM Students
LEFT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
LEFT JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Result: All students. For students with no enrollments, CourseName shows NULL.

RIGHT JOIN (RIGHT OUTER JOIN)

Definition: Returns all rows from the right table and matching rows from the left table. Less commonly used (can rewrite as LEFT JOIN by swapping tables).

FULL OUTER JOIN

Definition: Returns all rows from both tables. If no match, NULL for missing columns.

SELECT Students.FirstName, Courses.CourseName
FROM Students
FULL OUTER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
FULL OUTER JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Aggregate Functions

Definition: Aggregate functions perform calculations on multiple rows and return a single value.

COUNT

Counts number of rows.

SELECT COUNT(*) FROM Students;
-- Returns total number of students

SELECT COUNT(*) FROM Students
WHERE LastName = 'Smith';
-- Returns number of students with last name Smith

SUM

Adds up values.

SELECT SUM(Credits) FROM Enrollments
WHERE StudentID = 1;
-- Total credits for student 1

AVG

Calculates average.

SELECT AVG(Grade) FROM Enrollments
WHERE CourseID = 101;
-- Average grade in course 101

MIN and MAX

Find minimum and maximum values.

SELECT MIN(DateOfBirth) FROM Students;
-- Oldest student (earliest birth date)

SELECT MAX(DateOfBirth) FROM Students;
-- Youngest student (latest birth date)

GROUP BY

Groups rows that have the same values.

-- Count students in each course:
SELECT CourseID, COUNT(*) AS NumStudents
FROM Enrollments
GROUP BY CourseID;

-- Average grade per course:
SELECT Courses.CourseName, AVG(Enrollments.Grade) AS AvgGrade
FROM Enrollments
INNER JOIN Courses ON Enrollments.CourseID = Courses.CourseID
GROUP BY Courses.CourseName;

HAVING

Filters groups (WHERE filters rows before grouping; HAVING filters groups after).

-- Courses with more than 10 students:
SELECT CourseID, COUNT(*) AS NumStudents
FROM Enrollments
GROUP BY CourseID
HAVING COUNT(*) > 10;

Views

Definition: A view is a virtual table based on a SQL query that can be queried like a regular table but doesn't store data itself.

Why use views?

Simplify complex queries
Provide security (limit access to certain columns)
Present data differently without changing tables

Creating a View

CREATE VIEW StudentCourses AS
SELECT Students.FirstName, Students.LastName, Courses.CourseName
FROM Students
INNER JOIN Enrollments ON Students.StudentID = Enrollments.StudentID
INNER JOIN Courses     ON Enrollments.CourseID = Courses.CourseID;

Using the View

SELECT * FROM StudentCourses
WHERE LastName = 'Smith';

Much simpler than writing the full join query every time!

Transactions

Definition: A transaction is a sequence of database operations treated as a single unit of work that either completes entirely or has no effect at all.

ACID Properties

Atomicity: All operations succeed or all fail. No partial completion.
Consistency: Database remains in valid state before and after transaction.
Isolation: Concurrent transactions don't interfere with each other.
Durability: Once committed, changes are permanent (even if system crashes).

Example: Bank Transfer

Transfer $100 from Account A to Account B.

BEGIN TRANSACTION;

UPDATE Accounts
SET Balance = Balance - 100
WHERE AccountID = 'A';

UPDATE Accounts
SET Balance = Balance + 100
WHERE AccountID = 'B';

COMMIT;

If anything goes wrong (power failure, error in second UPDATE), the entire transaction rolls back. Account A doesn't lose $100 without Account B gaining it.

Commands

BEGIN TRANSACTION — Start transaction
COMMIT — Make changes permanent
ROLLBACK — Undo all changes in transaction

Example with Error Handling

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 'A';

-- Check if sufficient funds
IF (SELECT Balance FROM Accounts WHERE AccountID = 'A') < 0 THEN
    ROLLBACK;
    -- Transaction cancelled
ELSE
    UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 'B';
    COMMIT;
    -- Transaction successful
END IF;

Data Warehousing

What Is a Data Warehouse?

Definition: A data warehouse is a large, centralized repository of integrated data from multiple sources, optimized for analysis and reporting rather than transaction processing.

Operational Databases (OLTP)

Online Transaction Processing. Handle day-to-day operations.

Optimized for INSERT, UPDATE, DELETE
Current data
Normalized for data integrity
Example: Bank's transaction system

Data Warehouses (OLAP)

Online Analytical Processing. Handle analysis and reporting.

Optimized for complex queries (SELECT)
Historical data (often years)
Denormalized for query performance
Example: Bank's business intelligence system

Data Warehouse Architecture

Data sources: Various operational databases, external sources
ETL process (Extract, Transform, Load):
- Extract: Pull data from sources
- Transform: Clean, standardize, aggregate data
- Load: Insert into data warehouse
Data warehouse: Central repository
Data marts: Subsets of warehouse for specific departments (sales, finance, marketing)
BI tools: Business intelligence software for analysis, visualization, reporting

Star Schema

Definition: A star schema is a data warehouse design where a central fact table connects to multiple dimension tables, forming a star-like structure.

Fact table: Contains measurable data (sales, revenue, quantities)
Dimension tables: Contain descriptive attributes (products, customers, time)

Example: Retail Sales Warehouse

Fact table — Sales:

SaleID	ProductID	CustomerID	StoreID	DateID	Quantity	Revenue
… fact rows …

Dimension tables:

ProductID	ProductName	Category	Price
Product dimension

CustomerID	Name	City	Country
Customer dimension

StoreID	StoreName	City	Region
Store dimension

DateID	Date	Month	Quarter	Year
Date dimension

Query example — Total sales by category per month:

SELECT Product.Category, Date.Month, SUM(Sales.Revenue)
FROM Sales
JOIN Product ON Sales.ProductID = Product.ProductID
JOIN Date    ON Sales.DateID    = Date.DateID
GROUP BY Product.Category, Date.Month;

Why Denormalize in Warehouses? Joins are expensive. Complex analytics query many tables. Denormalizing—storing redundant data—makes queries faster. Trade-off: More storage space, harder to update vs. much faster queries. For warehouses: Fast queries matter more. Data changes infrequently (loaded in batches). Storage is cheap.

Putting It All Together

You started wondering where your data actually lives.

Now you understand.

Database schemas define structure—tables, fields, relationships, constraints. The blueprint before you build.

Relational databases organize data into tables connected by keys—primary keys uniquely identify records, foreign keys create relationships.

ERD diagrams visualize database design before implementation, showing entities, attributes, and relationships clearly.

Normalization eliminates redundancy and anomalies—1NF removes repeating groups, 2NF eliminates partial dependencies, 3NF removes transitive dependencies.

SQL is the language—SELECT retrieves data, INSERT adds it, UPDATE modifies it, DELETE removes it.

Joins combine tables—INNER JOIN for matches, LEFT JOIN for all from left table, joins reconstruct normalized data.

Aggregate functions summarize data—COUNT, SUM, AVG, MIN, MAX with GROUP BY and HAVING.

Views simplify complex queries, providing virtual tables without storing duplicate data.

Transactions ensure data integrity with ACID properties—all operations succeed or all fail, maintaining consistency.

Data warehouses support analytics, using denormalized star schemas for fast complex queries on historical data.

Every time you shop online, check your bank account, or search for information, databases work behind the scenes. Organized. Reliable. Fast.

Understanding databases changes how you think about data. You're no longer just a user. You understand the architecture beneath.