Most Frequently asked sql Interview Questions (2024)
Question: What is SQL?
Answer:
SQL (Structured Query Language) is a standardized programming language used for managing and manipulating relational databases. It is primarily used to interact with databases by querying, updating, inserting, and deleting data. SQL is essential for tasks related to database management and is widely used in various industries for backend development, data analysis, and more.
Key Features of SQL:
- Database Management: SQL is used to create, manage, and administer relational databases.
- Data Manipulation: SQL allows you to query data (retrieving specific information), update existing data, insert new data, and delete data.
- Data Definition: It is used to define the structure of data in a database, including creating tables, defining columns, data types, and establishing relationships between tables.
- Data Integrity and Security: SQL can enforce rules to ensure data integrity (such as constraints) and control access to the database using user roles and permissions.
Common SQL Commands:
-
SELECT: Retrieves data from one or more tables.
SELECT * FROM Employees WHERE Age > 30;
This command retrieves all columns from the “Employees” table where the age is greater than 30.
-
INSERT: Adds new data to a table.
INSERT INTO Employees (Name, Age, Department) VALUES ('John Doe', 28, 'Sales');
-
UPDATE: Modifies existing data in a table.
UPDATE Employees SET Age = 29 WHERE Name = 'John Doe';
-
DELETE: Removes data from a table.
DELETE FROM Employees WHERE Name = 'John Doe';
-
CREATE: Defines new tables, databases, views, indexes, etc.
CREATE TABLE Employees ( ID INT PRIMARY KEY, Name VARCHAR(100), Age INT, Department VARCHAR(50) );
-
ALTER: Modifies the structure of an existing table (e.g., adding/removing columns).
ALTER TABLE Employees ADD Salary DECIMAL(10, 2);
-
DROP: Deletes a table or database.
DROP TABLE Employees;
-
JOIN: Combines rows from two or more tables based on a related column.
SELECT Employees.Name, Departments.DepartmentName FROM Employees JOIN Departments ON Employees.DepartmentID = Departments.ID;
-
WHERE: Filters the records that meet certain conditions.
SELECT * FROM Employees WHERE Age > 30;
-
ORDER BY: Sorts the result set based on one or more columns.
SELECT * FROM Employees ORDER BY Age DESC;
-
GROUP BY: Groups records that have the same values in specified columns, often used with aggregate functions like
COUNT
,SUM
,AVG
, etc.SELECT Department, COUNT(*) AS NumberOfEmployees FROM Employees GROUP BY Department;
Types of SQL:
-
DML (Data Manipulation Language):
SELECT
: Fetch data from tables.INSERT
: Add new data.UPDATE
: Modify existing data.DELETE
: Remove data.
-
DDL (Data Definition Language):
CREATE
: Create new database objects like tables, views, etc.ALTER
: Modify existing database objects.DROP
: Delete database objects.
-
DCL (Data Control Language):
GRANT
: Provide privileges to users.REVOKE
: Remove privileges from users.
-
TCL (Transaction Control Language):
COMMIT
: Save the transaction.ROLLBACK
: Undo the transaction.SAVEPOINT
: Set a savepoint within a transaction.SET TRANSACTION
: Configure the transaction behavior.
SQL and Relational Databases:
SQL is designed for relational databases, which store data in tables (rows and columns). Tables can be related to each other via keys (primary keys, foreign keys), and SQL allows you to define these relationships.
Example of SQL Query:
Suppose you have a Customers
table:
CustomerID | Name | City | Age |
---|---|---|---|
1 | John | New York | 28 |
2 | Alice | Los Angeles | 32 |
3 | Bob | Chicago | 40 |
You can use the following SQL query to retrieve all customers aged 30 or above:
SELECT Name, City FROM Customers WHERE Age >= 30;
Result:
Name | City |
---|---|
Alice | Los Angeles |
Bob | Chicago |
Conclusion:
SQL is a powerful and essential language for managing relational databases. It helps you store, retrieve, update, and manipulate data efficiently. Understanding SQL is fundamental for anyone working with databases, whether for backend development, data analysis, or working with cloud-based database systems.
Question: What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN?
Answer:
In SQL, JOIN operations are used to combine rows from two or more tables based on a related column between them. The four most commonly used types of joins are INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. Here’s a breakdown of each:
1. INNER JOIN:
Definition:
The INNER JOIN
returns only the rows that have matching values in both tables. If there is no match, the row is excluded from the result set.
Behavior:
- Combines rows from both tables where there is a match on the specified condition.
- If there is no match, that row is not included in the result.
Example:
Consider the following tables:
-
Employees table:
ID Name DepartmentID 1 John 101 2 Alice 102 3 Bob 103 -
Departments table:
DepartmentID DepartmentName 101 HR 102 IT 104 Marketing
Query:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
INNER JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Alice | IT |
Explanation: Only rows where a matching DepartmentID
exists in both tables are returned. Bob
does not appear because there is no matching department for DepartmentID
103 in the Departments table.
2. LEFT JOIN (or LEFT OUTER JOIN):
Definition:
The LEFT JOIN
returns all rows from the left table (the first table in the query), and the matched rows from the right table (second table). If there is no match, the result will contain NULL
for columns from the right table.
Behavior:
- All rows from the left table are included in the result.
- If a row in the left table has no corresponding match in the right table, the right table’s columns will be
NULL
.
Example:
Using the same tables:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Alice | IT |
Bob | NULL |
Explanation: All employees are listed, including Bob
who does not have a department, so the DepartmentName
is NULL
.
3. RIGHT JOIN (or RIGHT OUTER JOIN):
Definition:
The RIGHT JOIN
returns all rows from the right table (the second table in the query), and the matched rows from the left table (the first table). If there is no match, the result will contain NULL
for columns from the left table.
Behavior:
- All rows from the right table are included in the result.
- If a row in the right table has no corresponding match in the left table, the left table’s columns will be
NULL
.
Example:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
RIGHT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Alice | IT |
NULL | Marketing |
Explanation: All departments are listed, including Marketing
, which has no employee assigned to it. The Name
for Marketing
is NULL
.
4. FULL JOIN (or FULL OUTER JOIN):
Definition:
The FULL JOIN
(or FULL OUTER JOIN
) returns all rows when there is a match in either the left or the right table. It returns NULL
on the side where there is no match.
Behavior:
- Returns all rows from both tables.
- If there is no match between the tables,
NULL
values are returned for the columns of the table without a match.
Example:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
FULL JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Alice | IT |
Bob | NULL |
NULL | Marketing |
Explanation: The result includes all employees and all departments. Since Bob
does not belong to any department, the DepartmentName
for his row is NULL
. Similarly, the Marketing
department has no employees, so the Name
is NULL
.
Summary of Differences:
Type | Rows from Left Table | Rows from Right Table | Matching Rows | Non-Matching Rows |
---|---|---|---|---|
INNER JOIN | Only matching rows | Only matching rows | Yes | No |
LEFT JOIN | All rows from left | Matching rows or NULL | Yes | No |
RIGHT JOIN | Matching rows or NULL | All rows from right | Yes | No |
FULL JOIN | All rows from left | All rows from right | Yes | Yes (NULLs for no match) |
Conclusion:
- INNER JOIN: Only the matching rows from both tables.
- LEFT JOIN: All rows from the left table, with matching rows from the right table;
NULL
for non-matching right table rows. - RIGHT JOIN: All rows from the right table, with matching rows from the left table;
NULL
for non-matching left table rows. - FULL JOIN: All rows from both tables, with
NULL
for non-matching rows.
Each type of join serves different purposes depending on the desired result from the relationship between the tables.
Question: Write a SQL query to find the Nth highest salary from a table.
Answer:
To find the Nth highest salary from a table, you can use the following approaches, depending on the SQL dialect you’re working with. Below is a common approach using a subquery and the LIMIT
/OFFSET
clause (for MySQL) or ROW_NUMBER()
function (for SQL Server, PostgreSQL, or Oracle).
1. Using LIMIT
and OFFSET
(for MySQL)
Assuming you have a table called Employees
with columns id
and salary
, and you want to find the Nth highest salary:
SELECT DISTINCT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET N-1;
Explanation:
ORDER BY salary DESC
: Sorts the salaries in descending order (highest to lowest).LIMIT 1 OFFSET N-1
: Fetches the Nth salary, whereN-1
is the number of rows to skip. TheLIMIT 1
ensures only one result is returned.
For example, to find the 3rd highest salary, you would use OFFSET 2
.
2. Using ROW_NUMBER()
(for SQL Server, PostgreSQL, Oracle)
If you’re using a database that supports window functions (like SQL Server, PostgreSQL, Oracle), you can use the ROW_NUMBER()
function to assign a rank to each row and then filter out the row corresponding to the Nth highest salary.
WITH RankedSalaries AS (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = N;
Explanation:
ROW_NUMBER() OVER (ORDER BY salary DESC)
: Assigns a unique number to each salary based on its ranking (highest to lowest).WITH RankedSalaries AS (...)
: Creates a common table expression (CTE) that calculates the rank of each salary.WHERE rank = N
: Filters out the row where the rank is equal to N.
For example, to find the 3rd highest salary, you would use WHERE rank = 3
.
3. Using DENSE_RANK()
or RANK()
(for SQL Server, PostgreSQL, Oracle)
If you want to handle cases where there are ties (e.g., two employees having the same salary), you can use DENSE_RANK()
or RANK()
instead of ROW_NUMBER()
.
WITH RankedSalaries AS (
SELECT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = N;
Explanation:
DENSE_RANK() OVER (ORDER BY salary DESC)
: This assigns ranks in such a way that if two employees have the same salary, they will receive the same rank (i.e., no gaps between ranks).
4. Using Subquery (for all SQL databases)
A more general approach that works across various SQL databases, including MySQL, SQL Server, and others:
SELECT MAX(salary) AS NthHighestSalary
FROM Employees
WHERE salary NOT IN (
SELECT salary
FROM Employees
ORDER BY salary DESC
LIMIT N-1
);
Explanation:
- The subquery returns the top
N-1
salaries from theEmployees
table, and the outer query selects the maximum salary from the remaining salaries (which corresponds to the Nth highest salary).
Example Query for Finding the 3rd Highest Salary:
For MySQL:
SELECT DISTINCT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET 2;
For SQL Server, PostgreSQL, Oracle (using ROW_NUMBER()
):
WITH RankedSalaries AS (
SELECT salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = 3;
For Handling Ties with DENSE_RANK()
(SQL Server, PostgreSQL, Oracle):
WITH RankedSalaries AS (
SELECT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = 3;
Conclusion:
- MySQL: Use
LIMIT
andOFFSET
. - SQL Server, PostgreSQL, Oracle: Use
ROW_NUMBER()
orDENSE_RANK()
to handle ranking. - General approach: Use a subquery with
NOT IN
to exclude the firstN-1
salaries and get the maximum from the remaining ones.
This should work for most use cases when you need to find the Nth highest salary from a table.
Question: What are subqueries in SQL?
Answer:
A subquery in SQL is a query nested inside another query. It can be used to perform operations that would require multiple steps in a single query, allowing for more complex queries and logic. Subqueries can return a single value, a list of values, or a table of results, and they are often used with comparison operators (such as =
, <
, >
, IN
, EXISTS
, etc.) or in the FROM
, SELECT
, WHERE
, or HAVING
clauses.
Types of Subqueries:
-
Single-Row Subquery:
- Returns a single value (a single row and column).
- Commonly used with comparison operators like
=
,>
,<
,>=
,<=
, or<>
.
Example:
SELECT employee_name, salary FROM employees WHERE salary = (SELECT MAX(salary) FROM employees);
Explanation: The inner query (
SELECT MAX(salary) FROM employees
) returns the highest salary, and the outer query retrieves the employee with that salary. -
Multiple-Row Subquery:
- Returns multiple rows but a single column.
- Commonly used with operators like
IN
orNOT IN
.
Example:
SELECT employee_name, department FROM employees WHERE department IN (SELECT department FROM employees WHERE salary > 50000);
Explanation: The inner query returns a list of departments where employees have a salary greater than 50,000, and the outer query selects employees who belong to those departments.
-
Multiple-Column Subquery:
- Returns multiple rows and multiple columns.
- Can be used with operators like
=
,IN
, etc., when comparing multiple values.
Example:
SELECT employee_name, department, salary FROM employees WHERE (department, salary) IN (SELECT department, salary FROM employees WHERE experience > 5);
Explanation: The inner query returns a set of
(department, salary)
pairs where employees have more than 5 years of experience, and the outer query retrieves the employees with the same department and salary. -
Correlated Subquery:
- A correlated subquery is a subquery that refers to columns in the outer query. Unlike a regular subquery, which is executed once, a correlated subquery is evaluated once for each row processed by the outer query.
Example:
SELECT employee_name, salary FROM employees outer WHERE salary > (SELECT AVG(salary) FROM employees inner WHERE outer.department = inner.department);
Explanation: The inner query is correlated with the outer query by the
department
column. For each employee, the average salary for their department is calculated, and only employees with a salary greater than that average are selected.
Subquery Placement:
-
Subquery in the
SELECT
Clause:- Can be used to retrieve additional computed columns for each row.
Example:
SELECT employee_name, (SELECT department_name FROM departments WHERE departments.department_id = employees.department_id) AS department_name FROM employees;
Explanation: This subquery returns the department name for each employee by matching the
department_id
with thedepartments
table. -
Subquery in the
FROM
Clause:- Treats the subquery as a derived table (or inline view), which can be joined with other tables or used in further queries.
Example:
SELECT dept.department_name, emp.employee_name FROM (SELECT department_id, department_name FROM departments) AS dept JOIN employees AS emp ON dept.department_id = emp.department_id;
Explanation: The subquery in the
FROM
clause acts as a derived table, providing a list of departments that is then joined with theemployees
table. -
Subquery in the
WHERE
Clause:- Commonly used to filter results based on the results of another query.
Example:
SELECT employee_name, salary FROM employees WHERE department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');
Explanation: The inner query returns the
department_id
for the ‘Sales’ department, which is then used to filter the employees in that department.
Important Notes about Subqueries:
-
Efficiency:
- Subqueries can sometimes result in performance issues if not written efficiently, especially with correlated subqueries, as the inner query is executed multiple times for each row of the outer query.
-
Nesting Subqueries:
- Subqueries can be nested within other subqueries. However, deep nesting can make queries harder to maintain and can impact performance.
-
Existence of Data:
- The
EXISTS
andNOT EXISTS
operators are often used with subqueries to check for the existence of rows returned by the inner query.
Example:
SELECT employee_name FROM employees e WHERE EXISTS (SELECT 1 FROM projects p WHERE p.employee_id = e.employee_id);
Explanation: The query returns employees who are associated with at least one project.
- The
-
Subquery with
IN
:- The
IN
operator is used when the subquery returns a list of values to compare against.
Example:
SELECT employee_name FROM employees WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York');
Explanation: This query returns employees working in departments located in New York.
- The
Conclusion:
Subqueries are a powerful SQL feature that allows for more complex queries. They help in filtering data, performing calculations, and creating temporary result sets without the need for multiple queries. There are different types of subqueries, and understanding when and how to use them can significantly improve the flexibility and readability of your SQL queries.
Question: What is the difference between DELETE and TRUNCATE?
Answer:
Both DELETE and TRUNCATE are used to remove data from a table in SQL, but they differ in how they operate and the effect they have on the table structure, performance, and transaction logging.
1. DELETE:
- Definition: The
DELETE
statement is used to delete specific rows from a table based on a condition. - Syntax:
DELETE FROM table_name WHERE condition;
- Scope: You can specify a
WHERE
clause to delete specific rows. If noWHERE
clause is provided, all rows will be deleted. - Transaction Logging: The
DELETE
operation is fully logged in the transaction log, meaning each row deleted is logged individually. - Performance: Since each row deletion is logged,
DELETE
can be slower compared toTRUNCATE
, especially for large tables. - Triggers:
DELETE
can activate triggers (e.g.,AFTER DELETE
triggers). - Rollback: Since
DELETE
is part of a transaction, it can be rolled back if wrapped in a transaction. - Table Structure: It does not affect the structure of the table. The table structure, indexes, and constraints remain unchanged.
- Example:
DELETE FROM employees WHERE department = 'Sales';
2. TRUNCATE:
- Definition: The
TRUNCATE
statement is used to remove all rows from a table, but unlikeDELETE
, it does not log individual row deletions. - Syntax:
TRUNCATE TABLE table_name;
- Scope:
TRUNCATE
removes all rows in the table and cannot be used with aWHERE
clause. It is a bulk operation, and all rows are deleted without any condition. - Transaction Logging:
TRUNCATE
is a minimal logging operation. It logs the deallocation of the data pages used by the table rather than logging each row deletion. - Performance:
TRUNCATE
is typically faster thanDELETE
because it does not log each row individually and does not fire triggers. It is especially faster for large tables. - Triggers:
TRUNCATE
does not activateDELETE
triggers because it does not operate on individual rows. - Rollback: Like
DELETE
,TRUNCATE
can be rolled back if it is part of a transaction. However, it is often faster thanDELETE
due to reduced logging. - Table Structure: It does not affect the structure of the table. The table, its structure, and constraints remain intact. However, depending on the database,
TRUNCATE
may reset the identity column (if present). - Example:
TRUNCATE TABLE employees;
Key Differences:
Feature | DELETE | TRUNCATE |
---|---|---|
Rows Affected | Can delete specific rows based on a condition. | Deletes all rows from the table. |
Where Clause | Can use a WHERE clause to filter rows. | Cannot use a WHERE clause. |
Transaction Logging | Fully logs each row deletion. | Minimally logs deallocation of data pages. |
Performance | Slower, especially for large tables. | Faster for large tables, due to minimal logging. |
Triggers | Fires triggers (e.g., AFTER DELETE triggers). | Does not fire triggers. |
Rollback | Can be rolled back within a transaction. | Can be rolled back within a transaction, but faster. |
Table Structure | Does not affect the structure (indexes, constraints). | Does not affect the structure but may reset the identity column (depending on DB). |
Locking | May lock rows or the entire table. | Locks the entire table. |
Summary:
- DELETE is used when you want to selectively delete rows based on conditions and requires more transaction log space due to individual row deletions.
- TRUNCATE is a more efficient, bulk operation used to remove all rows from a table quickly, but it is limited to clearing the entire table and does not support conditions or row-based operations.
Question: What is normalization and denormalization in SQL?
Answer:
1. Normalization:
Normalization is the process of organizing data in a relational database to reduce redundancy and dependency by dividing large tables into smaller ones. The goal of normalization is to ensure that the database structure is efficient, with each piece of data being stored only once. This process involves applying a set of rules called normal forms to a database schema. Each “normal form” builds on the previous one, and the further you normalize, the more complex the structure becomes but also the more efficient it is in terms of storage and data integrity.
Key Objectives of Normalization:
- Eliminate redundancy: Avoid duplicate data in the database.
- Ensure data integrity: Enforce logical relationships and dependencies between different pieces of data.
- Reduce anomalies: Prevent anomalies (insertion, update, and deletion anomalies) that could occur when data is stored in a less structured format.
Normal Forms:
- 1st Normal Form (1NF): Ensures that each column contains atomic (indivisible) values, and each record is unique.
- 2nd Normal Form (2NF): Meets 1NF and also eliminates partial dependency, ensuring that every non-key column depends on the whole primary key.
- 3rd Normal Form (3NF): Meets 2NF and removes transitive dependency, i.e., non-key columns depend only on the primary key.
- Boyce-Codd Normal Form (BCNF): A stricter version of 3NF, removing any dependencies where a non-prime attribute is functionally dependent on another non-prime attribute.
Benefits of Normalization:
- Reduced redundancy (avoiding storing the same data in multiple places).
- Improved data integrity (less chance for inconsistent data).
- Better storage utilization (smaller tables and more efficient data storage).
- Avoidance of update, insert, and delete anomalies.
Example of Normalization:
Suppose you have a table of Orders
with redundant customer data.
OrderID | CustomerName | CustomerAddress | ProductName | Quantity |
---|---|---|---|---|
1 | John Doe | 123 Elm St | Widget A | 10 |
2 | Jane Smith | 456 Oak St | Widget B | 5 |
3 | John Doe | 123 Elm St | Widget C | 7 |
After normalization, you might split it into two tables:
Customers Table:
CustomerID | CustomerName | CustomerAddress |
---|---|---|
1 | John Doe | 123 Elm St |
2 | Jane Smith | 456 Oak St |
Orders Table:
OrderID | CustomerID | ProductName | Quantity |
---|---|---|---|
1 | 1 | Widget A | 10 |
2 | 2 | Widget B | 5 |
3 | 1 | Widget C | 7 |
2. Denormalization:
Denormalization is the process of deliberately introducing redundancy into a database by combining normalized tables or copying data across multiple places. This is done to improve read performance and simplify complex queries, especially in reporting or analytical applications where speed is more important than minimizing redundancy.
While normalization reduces redundancy and improves data integrity, denormalization can reduce the need for complex joins in queries, thereby improving query performance in certain cases. However, denormalization also comes with trade-offs, such as the potential for data anomalies and more complex data updates.
Reasons for Denormalization:
- Performance improvement: Denormalization can speed up read-heavy operations by reducing the need for multiple joins, especially when large amounts of data are being accessed frequently.
- Simplified queries: It can simplify complex queries or aggregations by including all required data in one place.
- Optimized for reporting/analytics: Denormalized data is often used in data warehouses where read performance is prioritized over data integrity.
Trade-offs of Denormalization:
- Redundancy: The same data might be stored in multiple places, which can lead to data inconsistencies.
- Update anomalies: If data is duplicated, updates might need to be made in multiple places, leading to potential inconsistencies.
- Higher storage costs: Since data is repeated, more storage space is needed.
Example of Denormalization:
For the previous example, denormalizing the Orders
and Customers
table into a single Orders
table:
OrderID | CustomerName | CustomerAddress | ProductName | Quantity |
---|---|---|---|---|
1 | John Doe | 123 Elm St | Widget A | 10 |
2 | Jane Smith | 456 Oak St | Widget B | 5 |
3 | John Doe | 123 Elm St | Widget C | 7 |
Here, you’ve combined customer data with order data, reducing the need for joins but introducing redundancy.
Key Differences Between Normalization and Denormalization:
Aspect | Normalization | Denormalization |
---|---|---|
Purpose | Reduces redundancy and ensures data integrity. | Improves query performance and simplifies data retrieval. |
Impact on Data | Avoids data redundancy. | Introduces data redundancy. |
Complexity | Increases complexity of queries (joins required). | Reduces complexity of queries (fewer joins). |
Data Integrity | Ensures high data integrity. | May lead to data inconsistencies and anomalies. |
Performance | Slower performance for read operations (joins). | Faster performance for read-heavy queries. |
Update Anomalies | Avoids update anomalies. | May lead to update, insert, and delete anomalies. |
Storage | More efficient in terms of storage (less duplication). | May use more storage due to redundancy. |
Use Case | OLTP (Online Transaction Processing) systems where data consistency is key. | OLAP (Online Analytical Processing) systems, reporting, and data warehouses. |
Summary:
- Normalization is about organizing data to minimize redundancy and avoid anomalies, while Denormalization involves intentionally adding redundancy to optimize read performance.
- Both have their place depending on the use case. Normalization is typically used in transactional systems, while denormalization is more common in systems focused on query performance, such as data warehouses.
Question: What is a primary key and a foreign key in SQL?
Answer:
1. Primary Key:
A primary key is a column or a set of columns in a database table that uniquely identifies each row in that table. The primary key ensures that no two rows in a table have the same value for the primary key column(s). A primary key must have the following properties:
- Uniqueness: Each value in the primary key column must be unique. No two rows can have the same primary key value.
- Non-null: The primary key column cannot contain
NULL
values. Every row must have a valid value for the primary key. - Single and Composite: A primary key can consist of a single column or multiple columns. A composite primary key is created when more than one column is used together to form a unique identifier.
- Indexing: Most databases automatically create an index on the primary key, which helps in optimizing search and retrieval operations.
Example of a Primary Key:
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY, -- EmployeeID is the primary key
FirstName VARCHAR(50),
LastName VARCHAR(50),
Department VARCHAR(50)
);
Here, the EmployeeID
column is the primary key for the Employees
table, ensuring that each EmployeeID
value is unique and non-null.
2. Foreign Key:
A foreign key is a column or a set of columns in one table that is used to establish a link between the data in two tables. A foreign key in one table points to the primary key in another table, creating a relationship between the two tables. The foreign key ensures referential integrity by enforcing that every value in the foreign key column(s) must either be NULL
or match a value in the referenced table’s primary key or unique column.
- Referential Integrity: The foreign key enforces referential integrity by ensuring that the value in the foreign key column corresponds to a valid row in the referenced (parent) table.
- Relationship: A foreign key defines a one-to-many or many-to-one relationship between tables. For example, one employee can belong to one department, but one department can have many employees.
Example of a Foreign Key:
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DepartmentID INT,
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);
In this example, the DepartmentID
column in the Employees
table is a foreign key that references the DepartmentID
column in the Departments
table. This ensures that every employee is associated with a valid department that exists in the Departments
table.
Key Differences Between Primary Key and Foreign Key:
Aspect | Primary Key | Foreign Key |
---|---|---|
Purpose | Uniquely identifies each record in the table. | Establishes a relationship between two tables. |
Uniqueness | Values must be unique across the table. | Values may not be unique and can be repeated across rows. |
Nullability | Cannot contain NULL values. | Can contain NULL values, unless explicitly restricted. |
References | Points to data within the same table. | Points to the primary key of another table (parent table). |
Indexing | Automatically indexed by the database. | May or may not be indexed by the database. |
Relation Type | Defines no relationships, but uniquely identifies records. | Defines a relationship (usually one-to-many) between tables. |
Enforcement of Integrity | Enforces uniqueness and non-null constraint. | Enforces referential integrity between tables. |
Example of a Relationship Between Primary and Foreign Keys:
Let’s say we have two tables: Departments
and Employees
. The Departments
table stores information about each department, and the Employees
table stores information about employees. Each employee belongs to one department.
Departments Table (Primary Key: DepartmentID
):
CREATE TABLE Departments (
DepartmentID INT PRIMARY KEY, -- Primary Key
DepartmentName VARCHAR(50)
);
Employees Table (Foreign Key: DepartmentID
):
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY, -- Primary Key
EmployeeName VARCHAR(50),
DepartmentID INT, -- Foreign Key
FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID) -- Foreign Key
);
In this example:
DepartmentID
in theDepartments
table is a primary key that uniquely identifies each department.DepartmentID
in theEmployees
table is a foreign key that references theDepartmentID
in theDepartments
table, creating a relationship between the two tables. This ensures that every employee is assigned to a valid department.
Question: What is the concept of GROUP BY in SQL?
Answer:
The GROUP BY
clause in SQL is used to group rows that have the same values in specified columns into summary rows, like “total”, “average”, or “count”. This clause is often used in conjunction with aggregate functions such as COUNT()
, SUM()
, AVG()
, MAX()
, and MIN()
to perform calculations on each group of rows.
The GROUP BY
statement is particularly useful when you want to organize and summarize your data based on certain columns and perform aggregate calculations on each group.
Key Points:
- Grouping Rows: The
GROUP BY
clause groups rows that have the same values into summary rows. - Aggregate Functions: Typically used with aggregate functions like
COUNT()
,SUM()
,AVG()
, etc., to perform calculations on each group. - Column Selection: When using
GROUP BY
, you must include in theSELECT
clause either the grouped columns or apply aggregate functions to other columns. - HAVING Clause: Often used in conjunction with
GROUP BY
to filter groups based on aggregate values, similar to how theWHERE
clause filters rows.
Syntax:
SELECT column1, column2, aggregate_function(column3)
FROM table_name
GROUP BY column1, column2;
Example 1: Basic Usage of GROUP BY
Let’s say we have a Sales
table that contains information about sales transactions:
SaleID | Product | Amount | SaleDate |
---|---|---|---|
1 | ProductA | 100 | 2024-01-01 |
2 | ProductB | 150 | 2024-01-01 |
3 | ProductA | 200 | 2024-01-02 |
4 | ProductB | 100 | 2024-01-02 |
5 | ProductA | 300 | 2024-01-02 |
To calculate the total sales amount for each product, you can use the GROUP BY
clause like this:
SELECT Product, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Product;
Output:
Product | TotalSales |
---|---|
ProductA | 600 |
ProductB | 250 |
Explanation:
- The
GROUP BY Product
statement groups the rows by theProduct
column. - The
SUM(Amount)
function calculates the total sales amount for each product.
Example 2: GROUP BY with HAVING
If you want to find products with total sales greater than a certain value, you can use the HAVING
clause to filter the groups after aggregation.
SELECT Product, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Product
HAVING SUM(Amount) > 300;
Output:
Product | TotalSales |
---|---|
ProductA | 600 |
Explanation:
- The
HAVING
clause filters out products where the total sales amount is less than or equal to 300, leaving onlyProductA
with total sales greater than 300.
Rules and Considerations:
-
Columns in SELECT: In the
SELECT
clause, you can only include columns that are either part of theGROUP BY
clause or are used with aggregate functions. For example, you cannot select a column that isn’t part of the grouping unless it’s aggregated. -
Order of Clauses: The
GROUP BY
clause is placed after theWHERE
clause (if any) and before theORDER BY
clause (if any).
Example with Multiple Columns:
You can also group by multiple columns. For instance, if you want to calculate the total sales amount for each product on each date:
SELECT Product, SaleDate, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Product, SaleDate;
Output:
Product | SaleDate | TotalSales |
---|---|---|
ProductA | 2024-01-01 | 100 |
ProductB | 2024-01-01 | 150 |
ProductA | 2024-01-02 | 500 |
ProductB | 2024-01-02 | 200 |
Key Takeaways:
GROUP BY
is used to group rows based on one or more columns.- It works with aggregate functions to summarize data.
HAVING
is used to filter the groups created byGROUP BY
.- You can group by multiple columns to create more granular groupings.
Question: What are indexes in SQL?
Answer:
An index in SQL is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional space and time spent on updates, inserts, and deletes. Indexes are used to quickly locate and access data without having to scan the entire table. They are particularly useful for improving query performance, especially when dealing with large datasets.
Key Points:
- Purpose: The primary purpose of an index is to enhance the speed of queries, particularly those involving
SELECT
statements withWHERE
,ORDER BY
, andJOIN
clauses. - Types: There are different types of indexes such as unique indexes, composite indexes, and full-text indexes.
- Structure: An index is often implemented using a B-tree or hashing, depending on the database system. In a B-tree, data is arranged in a hierarchical manner that allows quick searching.
Syntax for Creating an Index:
CREATE INDEX index_name
ON table_name (column1, column2, ...);
Types of Indexes:
-
Single-Column Index:
- Indexes created on a single column.
- Example: Index on the
name
column of aUsers
table.
CREATE INDEX idx_name ON Users (name);
-
Composite Index:
- Indexes created on multiple columns.
- Useful when queries use multiple columns in the
WHERE
clause.
CREATE INDEX idx_name_date ON Orders (name, order_date);
-
Unique Index:
- Ensures that the indexed columns have unique values.
- Often used for primary and unique keys.
CREATE UNIQUE INDEX idx_email ON Users (email);
-
Full-Text Index:
- Used for indexing large text fields for full-text searches (commonly used with
TEXT
orVARCHAR
fields). - Not supported by all database systems but often used in search engines.
CREATE FULLTEXT INDEX idx_fulltext ON Articles (content);
- Used for indexing large text fields for full-text searches (commonly used with
-
Primary Key and Foreign Key Indexes:
- Automatically created when defining a primary or foreign key on a table.
- A primary key automatically creates a unique index on the primary key column(s), and a foreign key often creates an index to speed up lookups in the parent table.
How Indexes Work:
- When you create an index, the database creates an auxiliary data structure (usually a B-tree or hash table) to store pointers to the rows in the table.
- The database uses this index to quickly locate the rows that match the query conditions, avoiding a full table scan.
- For example, if you have an index on the
name
column in theUsers
table, and you search for a specific name, the database uses the index to locate the relevant rows quickly.
Example: Index Usage
Consider the following Employees
table:
EmployeeID | FirstName | LastName | Age | Department |
---|---|---|---|---|
1 | John | Doe | 30 | HR |
2 | Jane | Smith | 25 | IT |
3 | Mary | Johnson | 35 | HR |
4 | James | Brown | 40 | IT |
If you frequently search by LastName
and Department
, creating an index on those columns would speed up queries.
CREATE INDEX idx_lastname_department ON Employees (LastName, Department);
Now, when querying:
SELECT * FROM Employees WHERE LastName = 'Doe' AND Department = 'HR';
The database can use the index to find the relevant row much faster than scanning the entire table.
Advantages of Using Indexes:
- Faster Data Retrieval: Indexes significantly improve the speed of data retrieval queries, especially on large tables.
- Efficient Sorting: When querying with
ORDER BY
, indexes can help the database to quickly sort the data without performing a full sort. - Better Performance for Joins: Indexes can improve the performance of
JOIN
operations by speeding up matching rows between tables.
Disadvantages of Using Indexes:
- Increased Storage Space: Indexes consume additional disk space because they store duplicate data (pointers) to the rows.
- Slower Writes (INSERT, UPDATE, DELETE): Every time a row is inserted, updated, or deleted, the corresponding indexes must be updated, which can slow down write operations.
- Complexity in Maintenance: As the table grows and changes, indexes may need to be rebuilt or reorganized to maintain performance.
Common Use Cases:
- Primary Key Constraints: Automatically indexed.
- Foreign Key Constraints: Foreign keys typically benefit from indexing for faster lookups and integrity checks.
- Frequent Query Filters: Columns frequently used in
WHERE
clauses benefit from indexing. - Frequent Sorting: Columns used in
ORDER BY
clauses can be indexed for better performance.
Example: Dropping an Index
To remove an index from a table, you can use the DROP INDEX
command.
DROP INDEX idx_lastname_department ON Employees;
Key Takeaways:
- Indexes improve query performance by allowing the database to quickly locate rows based on specific column values.
- B-tree and hash are common data structures used for indexing.
- Indexes can be single-column or composite (multiple columns).
- While indexes enhance read performance, they may slow down write operations and consume additional storage space.
Question: What is the difference between CHAR and VARCHAR in SQL?
Answer:
CHAR and VARCHAR are both data types used to store character strings in SQL. While they may seem similar, there are key differences between them, mainly in how they store data and how much space they consume.
Key Differences:
1. Fixed vs Variable Length:
- CHAR:
- Fixed-length data type. The storage size is fixed according to the specified length, even if the actual data is shorter.
- For example, if you define a column as
CHAR(10)
and store the string"Hello"
, it will still use 10 bytes of storage, padding the remaining 5 characters with spaces.
- VARCHAR:
- Variable-length data type. It stores only the actual length of the string and does not pad with extra spaces.
- For example, if you define a column as
VARCHAR(10)
and store the string"Hello"
, it will only use 5 bytes of storage (the length of"Hello"
).
2. Storage Efficiency:
- CHAR:
- May waste storage space because it always uses the defined length.
- Useful for storing strings that are always of a consistent length, like country codes (
CHAR(2)
for “US”, “IN”, etc.).
- VARCHAR:
- More storage-efficient because it only uses the space needed for the string, plus a small amount of overhead to store the length of the string.
- Ideal for strings of varying lengths, such as names, addresses, or descriptions.
3. Performance:
- CHAR:
- Because of its fixed size, CHAR can be faster for retrieving fixed-length strings. However, it can be slower when storing variable-length strings due to the padding.
- VARCHAR:
- VARCHAR can be slightly slower for retrieval, as the database needs to handle variable-length data, but it’s generally more efficient in terms of storage and handling strings of varying lengths.
4. Padding Behavior:
- CHAR:
- Automatically pads the string with spaces if it’s shorter than the defined length.
- For example, if
CHAR(5)
contains"Hi"
, the database stores it as"Hi "
(with 3 trailing spaces).
- VARCHAR:
- Does not add padding. It stores only the actual data, so
"Hi"
will be stored as"Hi"
with no trailing spaces.
- Does not add padding. It stores only the actual data, so
5. Use Cases:
- CHAR:
- Best used for columns that always store data of the same length, such as:
- Fixed-length codes (e.g., country codes, postal codes, product codes).
- Binary data (if you’re working with fixed-length binary strings).
- Best used for columns that always store data of the same length, such as:
- VARCHAR:
- Ideal for columns where the length of the data can vary:
- Names, email addresses, descriptions, addresses, etc.
- Ideal for columns where the length of the data can vary:
Example:
-
Using CHAR:
CREATE TABLE Users ( user_id INT PRIMARY KEY, country_code CHAR(2) );
- If you store
"US"
in thecountry_code
column, it takes up 2 bytes. But if you store"IN"
, it will still take up exactly 2 bytes, with no padding.
- If you store
-
Using VARCHAR:
CREATE TABLE Users ( user_id INT PRIMARY KEY, email VARCHAR(255) );
- If you store
"[email protected]"
in theemail
column, it will only take up the exact length of the string (16 bytes
in this case).
- If you store
Summary Table:
Feature | CHAR | VARCHAR |
---|---|---|
Length | Fixed | Variable |
Storage | Always uses the specified length | Uses only the actual length + overhead |
Padding | Pads with spaces if shorter | No padding, stores only actual data |
Efficiency | Wastes space for shorter strings | More efficient for variable-length strings |
Use Cases | Fixed-length data (e.g., country codes) | Variable-length data (e.g., names, email addresses) |
Key Takeaways:
- Use CHAR when the length of the data is constant and known.
- Use VARCHAR when the length of the data varies, as it is more efficient in terms of storage.
Question: How do you handle NULL values in SQL?
Answer:
In SQL, NULL represents the absence of a value or an unknown value. It is different from an empty string or zero; NULL simply means that the value is missing or undefined. Handling NULL values correctly is crucial to ensure data integrity and avoid unexpected behavior in queries. Here are several ways to handle NULL values in SQL:
1. Checking for NULL values:
- IS NULL: Used to check if a value is NULL.
- IS NOT NULL: Used to check if a value is not NULL.
SELECT * FROM Employees WHERE salary IS NULL;
SELECT * FROM Employees WHERE salary IS NOT NULL;
- Example: The query
SELECT * FROM Employees WHERE salary IS NULL;
retrieves all rows where the salary is NULL (i.e., not yet assigned).
2. Handling NULL in SELECT queries:
-
COALESCE(): This function returns the first non-NULL value in a list of expressions. It is useful for substituting a NULL with a default value.
SELECT COALESCE(salary, 0) FROM Employees;
- In this example, if
salary
is NULL, it will return0
instead.
- In this example, if
-
IFNULL() (MySQL) / NVL() (Oracle): Similar to
COALESCE()
, these functions replace NULL values with a specified value. Different databases have different implementations:- In MySQL:
IFNULL(expression, replacement_value)
- In Oracle:
NVL(expression, replacement_value)
SELECT IFNULL(salary, 0) FROM Employees; -- MySQL
- In MySQL:
-
NULLIF(): This function compares two expressions. If they are equal, it returns NULL; otherwise, it returns the first expression.
SELECT NULLIF(salary, 0) FROM Employees;
- If
salary
is 0, the result will beNULL
; otherwise, it will return the actual salary.
- If
3. NULL in Aggregates:
-
When performing aggregate functions like
SUM()
,AVG()
,COUNT()
,MIN()
, andMAX()
, NULL values are ignored. For example, theSUM()
function does not count NULL values, but it will still calculate the sum of non-NULL values.SELECT SUM(salary) FROM Employees;
- If there are NULL values in the
salary
column, they will be ignored, and only non-NULL salaries will be summed up.
- If there are NULL values in the
4. Using NULL in INSERT statements:
-
You can insert a
NULL
value into a column if the column allowsNULL
values.INSERT INTO Employees (name, salary) VALUES ('John Doe', NULL);
- In this example, the
salary
is inserted asNULL
, assuming that the column allows NULLs.
- In this example, the
5. Filtering and Handling NULL values in WHERE clause:
-
NULL values cannot be compared using
=
or!=
. Instead, you must useIS NULL
orIS NOT NULL
.SELECT * FROM Employees WHERE salary = NULL; -- Incorrect, won't work! SELECT * FROM Employees WHERE salary IS NULL; -- Correct
-
To handle NULLs when filtering, you can use
IS NULL
orIS NOT NULL
in theWHERE
clause, as shown above.
6. Default Values for NULL:
-
You can set default values for columns to avoid inserting NULL values when no data is provided. This is typically done when creating or altering the table schema.
CREATE TABLE Employees ( id INT, name VARCHAR(50), salary DECIMAL(10, 2) DEFAULT 0.00 );
- In this case, if no
salary
is provided during an insert, it will default to0.00
instead of NULL.
- In this case, if no
7. Replacing NULL values in an UPDATE statement:
-
If you want to update NULL values to something else, you can use the
UPDATE
statement with theIS NULL
condition.UPDATE Employees SET salary = 0 WHERE salary IS NULL;
- This query sets all NULL
salary
values to0
.
- This query sets all NULL
8. Handling NULLs in JOINs:
-
In JOIN operations, NULL values are treated specially:
- INNER JOIN: NULL values from either table are excluded.
- LEFT JOIN (or RIGHT JOIN): If a NULL value is in the left table, NULLs will appear in the right table columns (and vice versa).
SELECT E.name, D.department_name FROM Employees E LEFT JOIN Departments D ON E.department_id = D.department_id;
- If an employee doesn’t belong to any department, their
department_name
will be NULL in the result.
9. NULL in SQL Constraints:
-
Columns can be defined to allow
NULL
or not. If you want to prevent NULL values, you can define a column asNOT NULL
.CREATE TABLE Employees ( id INT NOT NULL, name VARCHAR(50) NOT NULL, salary DECIMAL(10, 2) NULL );
- The
salary
column can accept NULL values, whileid
andname
cannot be NULL.
- The
Key Takeaways:
- NULL represents an unknown or missing value and cannot be treated like other values (e.g., zero or empty string).
- Use IS NULL or IS NOT NULL to check for NULL values.
- Functions like COALESCE(), IFNULL(), and NVL() help handle NULLs by substituting default values.
- Aggregate functions ignore NULL values, but you can use COUNT() to include NULL values by counting rows.
- When inserting, updating, or selecting, carefully handle NULLs to avoid errors and unexpected results.
Question: Write a SQL query to find employees who have not received a salary in the last month.
Answer:
To find employees who have not received a salary in the last month, we need to compare the salary_date
(or similar date column) with the current date and filter out employees who received their last salary payment in the past month. Here’s a SQL query to achieve this:
Assuming the following table structure:
- Employees table with columns:
employee_id
,name
- Salaries table with columns:
employee_id
,salary_amount
,salary_date
SELECT E.employee_id, E.name
FROM Employees E
LEFT JOIN Salaries S ON E.employee_id = S.employee_id
WHERE S.salary_date < CURDATE() - INTERVAL 1 MONTH OR S.salary_date IS NULL;
Explanation:
- LEFT JOIN: We use a
LEFT JOIN
to ensure that even employees who don’t have any salary records are included in the result. - S.salary_date < CURDATE() - INTERVAL 1 MONTH: This condition filters out employees who received their last salary more than a month ago.
- S.salary_date IS NULL: This condition includes employees who haven’t received any salary at all (i.e., they don’t have any entries in the
Salaries
table).
This query will return all employees who have not received a salary in the last month or have never received a salary.
Question: Explain the difference between HAVING and WHERE in SQL.
Answer:
Both HAVING
and WHERE
are used to filter records in SQL, but they are used in different contexts and serve different purposes. Here’s a detailed explanation of their differences:
-
WHERE Clause:
- Purpose: The
WHERE
clause is used to filter records before any grouping is done (i.e., it operates on individual rows in the table). - Used with: It can be used with
SELECT
,INSERT
,UPDATE
, andDELETE
statements. - Applies to: Non-aggregated columns (columns that are not the result of an aggregate function like
COUNT()
,SUM()
, etc.). - Execution order: It is executed before the
GROUP BY
clause (if present).
Example:
SELECT name, salary FROM Employees WHERE salary > 50000;
- This query filters employees who have a salary greater than 50,000 before any grouping or aggregation happens.
- Purpose: The
-
HAVING Clause:
- Purpose: The
HAVING
clause is used to filter records after the data has been grouped (i.e., it operates on aggregated data, or the result ofGROUP BY
). - Used with: It is used with
GROUP BY
and is often used in conjunction with aggregate functions likeSUM()
,COUNT()
,AVG()
, etc. - Applies to: Aggregated data, meaning it filters groups of rows after the grouping has been done.
- Execution order: It is executed after the
GROUP BY
clause.
Example:
SELECT department, AVG(salary) FROM Employees GROUP BY department HAVING AVG(salary) > 50000;
- This query first groups employees by their department and then filters the groups where the average salary is greater than 50,000.
- Purpose: The
Key Differences:
Aspect | WHERE | HAVING |
---|---|---|
Purpose | Filters rows before grouping | Filters groups after aggregation |
Used With | Individual rows (non-aggregated columns) | Aggregated data (using functions like COUNT() , SUM() ) |
Can Be Used In | SELECT , INSERT , UPDATE , DELETE | SELECT with GROUP BY |
Execution Order | Executes before GROUP BY | Executes after GROUP BY |
Example Showing the Difference:
Using WHERE
:
SELECT department, salary
FROM Employees
WHERE salary > 50000;
This query filters employees whose salary is greater than 50,000 before any grouping occurs.
Using HAVING
with GROUP BY
:
SELECT department, AVG(salary)
FROM Employees
GROUP BY department
HAVING AVG(salary) > 50000;
This query first groups employees by department and then filters the departments where the average salary is greater than 50,000. HAVING
is used to filter the aggregated result.
Conclusion:
- Use
WHERE
to filter individual rows based on column values before aggregation. - Use
HAVING
to filter results after aggregation, typically when usingGROUP BY
and aggregate functions.
Question: What is a view in SQL?
Answer:
A view in SQL is a virtual table that represents the result of a stored query. It does not store data itself but rather provides a way to simplify complex queries by encapsulating them into a reusable structure. Views allow you to present data from one or more tables in a particular format, and they are treated like regular tables in SQL queries.
Key characteristics of a view:
-
Virtual Table: A view is a virtual table that consists of a SELECT query. It doesn’t physically store data but retrieves data from underlying tables each time it’s accessed.
-
Read-Only or Updatable: Views can be either read-only or updatable, depending on how the view is defined and the database’s support. Simple views based on a single table are often updatable, but complex views involving joins, aggregation, or DISTINCT may be read-only.
-
Encapsulation: Views can encapsulate complex logic and simplify queries for end-users or other applications. They help abstract away the complexity of underlying data structures and simplify query writing.
-
Security: Views can be used to limit access to certain columns or rows of a table, improving security. For example, you can create a view that only exposes certain columns, restricting access to sensitive data.
-
Dynamic: The data in a view is dynamic because it is generated based on the underlying tables. If the data in the tables changes, the view reflects the changes immediately when accessed.
Syntax for Creating a View:
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Example:
Suppose we have a table Employees
:
EmployeeID | Name | Department | Salary |
---|---|---|---|
1 | John | HR | 50000 |
2 | Sarah | IT | 60000 |
3 | David | HR | 55000 |
Now, if we want to create a view that shows only the employees in the HR department:
CREATE VIEW HR_Employees AS
SELECT EmployeeID, Name, Salary
FROM Employees
WHERE Department = 'HR';
The view HR_Employees
will now present:
EmployeeID | Name | Salary |
---|---|---|
1 | John | 50000 |
3 | David | 55000 |
This view can be queried just like a regular table:
SELECT * FROM HR_Employees;
Advantages of Using Views:
-
Simplified Queries: Views abstract complex queries and make it easier to retrieve the required data without repeatedly writing complex SQL statements.
-
Data Security: By using views, you can restrict access to specific columns or rows of a table, providing an extra layer of security.
-
Data Integrity: Views help maintain consistency by ensuring users work with the same, pre-defined logic for querying data, rather than writing custom queries that might vary across different applications.
Types of Views:
- Simple Views: A view based on a single table without complex joins or aggregations.
- Complex Views: A view based on multiple tables using joins, groupings, or aggregate functions.
Modifying Data Through Views:
While views are primarily used for querying data, in certain cases, they can be used to update underlying data. However, there are restrictions:
- Updatable Views: These are views based on a single table where you can perform
INSERT
,UPDATE
, orDELETE
operations. - Non-Updatable Views: Views based on complex queries like joins, aggregates, or grouped data may not allow direct updates.
Dropping a View:
To remove a view, you can use the DROP VIEW
statement:
DROP VIEW view_name;
Conclusion:
- A view is a virtual table in SQL that is defined by a query and does not store data itself.
- It simplifies complex queries, improves security, and can abstract away complex table structures.
- Views can be used to improve maintainability and readability of SQL code by encapsulating repetitive or complex logic.
Question: What is the purpose of DISTINCT in SQL?
Answer:
The DISTINCT keyword in SQL is used to remove duplicate records from the result set of a query. It ensures that the rows returned by the query are unique, meaning that no two rows will be identical.
When you use DISTINCT
, SQL will compare the values of all selected columns in the query result and eliminate any duplicate rows, returning only unique combinations of values.
Key points about DISTINCT:
- Eliminates Duplicates:
DISTINCT
ensures that only unique (non-duplicate) rows are returned in the query result. - Applies to All Columns in the SELECT Clause:
DISTINCT
considers all the columns in theSELECT
clause when determining uniqueness. It will return unique combinations of values across all selected columns. - Can Be Used with Multiple Columns: You can use
DISTINCT
on multiple columns to find distinct combinations of values across those columns.
Syntax:
SELECT DISTINCT column1, column2, ...
FROM table_name
WHERE condition;
Example 1: Using DISTINCT with a Single Column
Suppose you have a table Employees
:
EmployeeID | Name | Department |
---|---|---|
1 | John | HR |
2 | Sarah | IT |
3 | David | HR |
4 | Michael | IT |
If you want to retrieve a list of unique departments, you can use DISTINCT
:
SELECT DISTINCT Department
FROM Employees;
Result:
Department |
---|
HR |
IT |
Here, even though “HR” and “IT” appear multiple times in the table, DISTINCT
ensures that they are listed only once in the result.
Example 2: Using DISTINCT with Multiple Columns
Suppose you have a table Sales
:
Product | Region | SalesAmount |
---|---|---|
TV | North | 1000 |
TV | South | 1200 |
TV | North | 1000 |
Phone | North | 800 |
Phone | South | 900 |
If you want to retrieve unique combinations of Product
and Region
(ignoring duplicates), you can do:
SELECT DISTINCT Product, Region
FROM Sales;
Result:
Product | Region |
---|---|
TV | North |
TV | South |
Phone | North |
Phone | South |
In this case, DISTINCT
eliminates the duplicate row (TV
in North
with SalesAmount
1000), returning only unique combinations of Product
and Region
.
When to Use DISTINCT:
- Removing Duplicates: When you need to ensure that the result contains only unique values (e.g., a list of distinct categories or unique combinations of columns).
- Aggregating Data: In conjunction with aggregation functions like
COUNT()
,SUM()
,AVG()
, etc., to get unique values before performing calculations.
Performance Considerations:
- Using
DISTINCT
can be resource-intensive, especially on large tables, because it requires the database to check all rows to remove duplicates. - If performance is critical and you know that the data is already unique or you can filter duplicates through other means (like using
GROUP BY
), then avoid usingDISTINCT
.
Conclusion:
- The DISTINCT keyword in SQL is used to eliminate duplicate rows from the result set, ensuring that only unique records are returned.
- It applies to all selected columns and can be used with one or more columns to find distinct combinations.
Question: Write a SQL query to find the top 3 highest paid employees.
Answer:
To find the top 3 highest paid employees, you can use the ORDER BY
clause to sort the employees based on their salary in descending order and then limit the results to the top 3 using the LIMIT
(in MySQL, PostgreSQL) or TOP
(in SQL Server) clause.
SQL Query (for MySQL, PostgreSQL, SQLite, etc.):
SELECT EmployeeID, Name, Salary
FROM Employees
ORDER BY Salary DESC
LIMIT 3;
Explanation:
- SELECT EmployeeID, Name, Salary: This selects the columns for the employee ID, name, and salary.
- FROM Employees: Specifies the table from which the data will be retrieved.
- ORDER BY Salary DESC: Sorts the result by the
Salary
column in descending order (highest salary first). - LIMIT 3: Limits the result to the top 3 rows (i.e., the 3 highest-paid employees).
SQL Query (for SQL Server):
SQL Server uses TOP
instead of LIMIT
to restrict the number of rows returned:
SELECT TOP 3 EmployeeID, Name, Salary
FROM Employees
ORDER BY Salary DESC;
Example Data in Employees Table:
EmployeeID | Name | Salary |
---|---|---|
1 | John | 5000 |
2 | Sarah | 6000 |
3 | David | 7000 |
4 | Michael | 8000 |
5 | Emma | 9000 |
Result (Top 3 Highest Paid Employees):
EmployeeID | Name | Salary |
---|---|---|
5 | Emma | 9000 |
4 | Michael | 8000 |
3 | David | 7000 |
This query will return the top 3 highest paid employees based on their salaries.
Question: What are INNER JOIN and OUTER JOIN in SQL?
Answer:
INNER JOIN and OUTER JOIN are both used in SQL to combine rows from two or more tables based on a related column between them. However, they differ in how they handle matching and non-matching rows.
1. INNER JOIN:
An INNER JOIN returns only the rows where there is a match in both tables based on the specified condition.
- If a row from the left table does not have a matching row in the right table, it will not appear in the result set.
- It is the most common type of join.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2 ON table1.column = table2.column;
Example:
Consider two tables, Employees and Departments:
Employees Table:
EmployeeID | Name | DepartmentID |
---|---|---|
1 | John | 101 |
2 | Sarah | 102 |
3 | David | 103 |
4 | Michael | NULL |
Departments Table:
DepartmentID | DepartmentName |
---|---|
101 | HR |
102 | IT |
103 | Finance |
Query:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
INNER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Sarah | IT |
David | Finance |
- Michael does not appear because his
DepartmentID
isNULL
, and no matching department exists for him.
2. OUTER JOIN:
An OUTER JOIN returns all rows from at least one table and the matching rows from the other table. If there is no match, the result will contain NULL
values for the columns from the table that does not have a match.
There are three types of OUTER JOIN:
- LEFT OUTER JOIN (or simply LEFT JOIN)
- RIGHT OUTER JOIN (or simply RIGHT JOIN)
- FULL OUTER JOIN
2.1 LEFT OUTER JOIN (LEFT JOIN):
A LEFT OUTER JOIN returns all the rows from the left table (table1) and the matching rows from the right table (table2). If there is no match in the right table, the result will contain NULL
values for columns from the right table.
Syntax:
SELECT columns
FROM table1
LEFT OUTER JOIN table2 ON table1.column = table2.column;
Example:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
LEFT OUTER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Sarah | IT |
David | Finance |
Michael | NULL |
- Michael is included, but since there is no matching
DepartmentID
, theDepartmentName
isNULL
.
2.2 RIGHT OUTER JOIN (RIGHT JOIN):
A RIGHT OUTER JOIN returns all the rows from the right table (table2) and the matching rows from the left table (table1). If there is no match in the left table, the result will contain NULL
values for columns from the left table.
Syntax:
SELECT columns
FROM table1
RIGHT OUTER JOIN table2 ON table1.column = table2.column;
Example:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
RIGHT OUTER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Sarah | IT |
David | Finance |
NULL | Marketing |
- If there are departments without matching employees, those departments are still included in the result with
NULL
for the employee name.
2.3 FULL OUTER JOIN:
A FULL OUTER JOIN returns all rows when there is a match in one of the tables. If there is no match, the result will contain NULL
values for the columns of the table without a match.
Syntax:
SELECT columns
FROM table1
FULL OUTER JOIN table2 ON table1.column = table2.column;
Example:
SELECT Employees.Name, Departments.DepartmentName
FROM Employees
FULL OUTER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;
Result:
Name | DepartmentName |
---|---|
John | HR |
Sarah | IT |
David | Finance |
Michael | NULL |
NULL | Marketing |
- In this example, Michael has no department, and Marketing has no employees, so
NULL
is shown for the unmatched values.
Summary of Differences:
Join Type | Returns | Example |
---|---|---|
INNER JOIN | Only matching rows from both tables | John, Sarah, David |
LEFT JOIN | All rows from the left table and matching from the right table | John, Sarah, David, Michael |
RIGHT JOIN | All rows from the right table and matching from the left table | John, Sarah, David, Marketing |
FULL OUTER JOIN | All rows from both tables, with NULL for non-matching rows | John, Sarah, David, Michael, Marketing |
Key Takeaways:
-
INNER JOIN: Returns only the rows where there is a match in both tables.
-
OUTER JOIN: Returns all rows from one or both tables, filling in
NULL
where there is no match.- LEFT JOIN: All rows from the left table and matching rows from the right.
- RIGHT JOIN: All rows from the right table and matching rows from the left.
- FULL OUTER JOIN: All rows from both tables, with
NULL
for unmatched rows.
Question: What is the
SELECT INTO
statement in SQL?
Answer:
The SELECT INTO
statement in SQL is used to select data from one table and insert it into a new table in a single operation. This is especially useful when you want to create a new table based on the structure and data of an existing table.
The SELECT INTO
statement creates a new table in the database and populates it with the selected data. The new table will have the same column definitions (data types) as the columns in the SELECT
query.
Syntax:
SELECT column1, column2, ...
INTO new_table
FROM existing_table
WHERE condition;
column1, column2, ...
: The columns you want to select from the source table.new_table
: The name of the new table that will be created.existing_table
: The table from which you are selecting the data.condition
: An optional condition to filter the rows you want to select (using theWHERE
clause).
Key Points:
- Creates a new table: The new table (
new_table
) does not have to exist before executing theSELECT INTO
statement. If it does not exist, it is created automatically. - Data types are copied: The new table will have the same data types as the columns of the
SELECT
query from the existing table. - No indexes or constraints: The
SELECT INTO
statement does not copy indexes, primary keys, or foreign keys from the original table. Only the table structure and data are copied. - Works in SQL Server:
SELECT INTO
is commonly used in SQL Server, but some databases like MySQL or PostgreSQL may use a different syntax (such asCREATE TABLE AS SELECT
).
Example 1: Basic SELECT INTO
to create a new table
Suppose you have a table Employees
, and you want to create a new table EmployeeBackup
with the same structure and data:
Employees Table:
EmployeeID | Name | Department |
---|---|---|
1 | John | HR |
2 | Sarah | IT |
3 | David | Finance |
Query:
SELECT EmployeeID, Name, Department
INTO EmployeeBackup
FROM Employees;
Result:
- A new table called
EmployeeBackup
is created with the same structure and data as theEmployees
table.
EmployeeBackup Table:
EmployeeID | Name | Department |
---|---|---|
1 | John | HR |
2 | Sarah | IT |
3 | David | Finance |
Example 2: SELECT INTO
with a WHERE
condition
You can also use a WHERE
clause to filter the rows you want to copy into the new table.
Query:
SELECT EmployeeID, Name, Department
INTO ITDepartmentEmployees
FROM Employees
WHERE Department = 'IT';
Result:
- A new table
ITDepartmentEmployees
is created, but it will only contain the employees whose department isIT
.
ITDepartmentEmployees Table:
EmployeeID | Name | Department |
---|---|---|
2 | Sarah | IT |
Limitations:
- Does not copy constraints: The
SELECT INTO
statement does not copy any primary keys, foreign keys, or indexes from the original table. These need to be added manually if necessary. - Cannot be used with existing tables: The
SELECT INTO
statement creates a new table, so it cannot be used to insert data into an already existing table. For inserting into an existing table, you would use theINSERT INTO
statement instead.
Conclusion:
- The
SELECT INTO
statement in SQL is useful for creating new tables and inserting data into them based on the result of a query. - It is widely used in scenarios where you need to back up data, create temporary tables, or clone a subset of data from one table into another.
Question: What is the ROW_NUMBER()
function in SQL?
Answer:
The ROW_NUMBER()
function in SQL is a window function that assigns a unique sequential integer to rows within a result set. The numbering starts at 1 for the first row and increments by 1 for each subsequent row in the specified order. It is typically used to assign a “rank” or “row number” to rows in a result set based on a particular order.
Key Points:
- Window Function:
ROW_NUMBER()
is often used in the context of a window, which means that it can operate over a subset of data (called a “window”) defined by thePARTITION BY
clause. If no partitioning is needed, it can be used over the entire result set. - Order: The
ROW_NUMBER()
function requires anORDER BY
clause to determine the sequence in which the row numbers are assigned. - Uniqueness: Each row gets a unique row number within the specified partition (or the entire result set if no partition is defined).
Syntax:
SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2) AS row_num
FROM table_name;
PARTITION BY column1
: (Optional) Divides the result set into partitions to which theROW_NUMBER()
function is applied. If omitted, the entire result set is treated as a single partition.ORDER BY column2
: Defines the order in which the row numbers are assigned.row_num
: The alias for the column that holds the row numbers.
Example 1: Simple ROW_NUMBER()
with no partition
Query:
SELECT EmployeeID, Name, Department,
ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num
FROM Employees;
Result:
EmployeeID | Name | Department | row_num |
---|---|---|---|
1 | John | HR | 1 |
2 | Sarah | IT | 2 |
3 | David | Finance | 3 |
4 | Mike | HR | 4 |
- In this example, the rows are ordered by
EmployeeID
, and a unique sequential row number is assigned to each row.
Example 2: Using ROW_NUMBER()
with PARTITION BY
Query:
SELECT EmployeeID, Name, Department,
ROW_NUMBER() OVER (PARTITION BY Department ORDER BY EmployeeID) AS row_num
FROM Employees;
Result:
EmployeeID | Name | Department | row_num |
---|---|---|---|
1 | John | HR | 1 |
4 | Mike | HR | 2 |
2 | Sarah | IT | 1 |
3 | David | Finance | 1 |
- In this example, the
ROW_NUMBER()
function is applied to each department separately (due toPARTITION BY Department
). - The row number is assigned to each row within each department, and the order within each department is determined by the
EmployeeID
.
Use Cases:
-
Pagination:
ROW_NUMBER()
is commonly used for paginating query results. For example, if you want to retrieve a specific “page” of records, you can useROW_NUMBER()
to assign row numbers and filter the rows for the desired page.Example for pagination:
SELECT * FROM ( SELECT EmployeeID, Name, Department, ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num FROM Employees ) AS numbered_employees WHERE row_num BETWEEN 11 AND 20;
-
Ranking: It can be used to assign a sequential rank to rows, especially in cases where you need to identify the “top N” rows or a specific rank of rows in a dataset.
Example for ranking:
SELECT EmployeeID, Name, Salary, ROW_NUMBER() OVER (ORDER BY Salary DESC) AS rank FROM Employees;
-
Identifying duplicates:
ROW_NUMBER()
can help identify duplicates by partitioning the data and ordering it to check for repeat values.
Limitations:
- Not suitable for ties: Unlike
RANK()
orDENSE_RANK()
,ROW_NUMBER()
does not handle ties. Each row gets a unique number, even if rows have the same values. - Performance: Depending on the size of the dataset and the complexity of the query, the
ROW_NUMBER()
function can impact query performance, especially when partitioning and ordering over large datasets.
Conclusion:
The ROW_NUMBER()
function is a versatile SQL function useful for creating unique sequential numbers for rows within a result set. It is particularly useful for tasks like pagination, ranking, or partitioning data for analysis.
Question: How do you fetch only odd or even rows from a table in SQL?
Answer:
In SQL, you can fetch odd or even rows by utilizing the ROW_NUMBER()
window function. This function assigns a unique sequential integer to rows within a result set, which can then be used to filter for odd or even rows.
Here’s how you can approach it:
Steps:
- Assign Row Numbers: Use the
ROW_NUMBER()
function to assign a sequential number to each row in the result set. - Filter Rows: After assigning row numbers, use the modulus operator (
%
) to filter for odd or even rows.
Example 1: Fetching Odd Rows (1st, 3rd, 5th, etc.)
Query:
WITH NumberedRows AS (
SELECT EmployeeID, Name, Department,
ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num
FROM Employees
)
SELECT EmployeeID, Name, Department
FROM NumberedRows
WHERE row_num % 2 <> 0;
Explanation:
ROW_NUMBER()
assigns a unique sequential number to each row ordered byEmployeeID
.WHERE row_num % 2 <> 0
filters out the even-numbered rows and keeps only the odd-numbered ones (i.e., rows with odd row numbers).
Example 2: Fetching Even Rows (2nd, 4th, 6th, etc.)
Query:
WITH NumberedRows AS (
SELECT EmployeeID, Name, Department,
ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num
FROM Employees
)
SELECT EmployeeID, Name, Department
FROM NumberedRows
WHERE row_num % 2 = 0;
Explanation:
ROW_NUMBER()
again assigns a sequential number to each row.WHERE row_num % 2 = 0
filters out the odd-numbered rows and keeps only the even-numbered ones (i.e., rows with even row numbers).
Notes:
- The
ROW_NUMBER()
function is useful when you need to work with a specific order, such as ordering by a primary key (EmployeeID
in this example). You can change theORDER BY
clause inside theROW_NUMBER()
function depending on how you want to order the rows. - These queries can be modified to handle partitions (e.g.,
PARTITION BY Department
in case you want to fetch odd or even rows per department).
Conclusion:
By using the ROW_NUMBER()
function with the modulus operator (%
), you can easily filter and fetch either odd or even rows from a table. This approach provides flexibility for more complex queries where you may want to work with specific rows based on their order.
Read More
If you can’t get enough from this article, Aihirely has plenty more related information, such as sql interview questions, sql interview experiences, and details about various sql job positions. Click here to check it out.
Tags
- SQL
- SQL interview questions
- SQL joins
- SQL queries
- INNER JOIN
- LEFT JOIN
- RIGHT JOIN
- FULL JOIN
- Subqueries
- SQL normalization
- SQL denormalization
- Primary key
- Foreign key
- GROUP BY
- HAVING vs WHERE
- SQL views
- SQL indexes
- DENSE RANK
- ROW NUMBER
- SQL DISTINCT
- SQL SELECT INTO
- CHAR vs VARCHAR
- NULL values in SQL
- SQL performance optimization
- SQL aggregate functions
- SQL database design
- SQL DELETE vs TRUNCATE
- SQL DISTINCT
- SQL query optimization