Most Frequently asked sql Interview Questions (2024)

author image Hirely
at 25 Dec, 2024

Question: What is SQL?

Answer:

SQL (Structured Query Language) is a standardized programming language used for managing and manipulating relational databases. It is primarily used to interact with databases by querying, updating, inserting, and deleting data. SQL is essential for tasks related to database management and is widely used in various industries for backend development, data analysis, and more.

Key Features of SQL:

  • Database Management: SQL is used to create, manage, and administer relational databases.
  • Data Manipulation: SQL allows you to query data (retrieving specific information), update existing data, insert new data, and delete data.
  • Data Definition: It is used to define the structure of data in a database, including creating tables, defining columns, data types, and establishing relationships between tables.
  • Data Integrity and Security: SQL can enforce rules to ensure data integrity (such as constraints) and control access to the database using user roles and permissions.

Common SQL Commands:

  1. SELECT: Retrieves data from one or more tables.

    SELECT * FROM Employees WHERE Age > 30;

    This command retrieves all columns from the “Employees” table where the age is greater than 30.

  2. INSERT: Adds new data to a table.

    INSERT INTO Employees (Name, Age, Department)
    VALUES ('John Doe', 28, 'Sales');
  3. UPDATE: Modifies existing data in a table.

    UPDATE Employees SET Age = 29 WHERE Name = 'John Doe';
  4. DELETE: Removes data from a table.

    DELETE FROM Employees WHERE Name = 'John Doe';
  5. CREATE: Defines new tables, databases, views, indexes, etc.

    CREATE TABLE Employees (
        ID INT PRIMARY KEY,
        Name VARCHAR(100),
        Age INT,
        Department VARCHAR(50)
    );
  6. ALTER: Modifies the structure of an existing table (e.g., adding/removing columns).

    ALTER TABLE Employees ADD Salary DECIMAL(10, 2);
  7. DROP: Deletes a table or database.

    DROP TABLE Employees;
  8. JOIN: Combines rows from two or more tables based on a related column.

    SELECT Employees.Name, Departments.DepartmentName
    FROM Employees
    JOIN Departments ON Employees.DepartmentID = Departments.ID;
  9. WHERE: Filters the records that meet certain conditions.

    SELECT * FROM Employees WHERE Age > 30;
  10. ORDER BY: Sorts the result set based on one or more columns.

    SELECT * FROM Employees ORDER BY Age DESC;
  11. GROUP BY: Groups records that have the same values in specified columns, often used with aggregate functions like COUNT, SUM, AVG, etc.

    SELECT Department, COUNT(*) AS NumberOfEmployees
    FROM Employees
    GROUP BY Department;

Types of SQL:

  1. DML (Data Manipulation Language):

    • SELECT: Fetch data from tables.
    • INSERT: Add new data.
    • UPDATE: Modify existing data.
    • DELETE: Remove data.
  2. DDL (Data Definition Language):

    • CREATE: Create new database objects like tables, views, etc.
    • ALTER: Modify existing database objects.
    • DROP: Delete database objects.
  3. DCL (Data Control Language):

    • GRANT: Provide privileges to users.
    • REVOKE: Remove privileges from users.
  4. TCL (Transaction Control Language):

    • COMMIT: Save the transaction.
    • ROLLBACK: Undo the transaction.
    • SAVEPOINT: Set a savepoint within a transaction.
    • SET TRANSACTION: Configure the transaction behavior.

SQL and Relational Databases:

SQL is designed for relational databases, which store data in tables (rows and columns). Tables can be related to each other via keys (primary keys, foreign keys), and SQL allows you to define these relationships.

Example of SQL Query:

Suppose you have a Customers table:

CustomerIDNameCityAge
1JohnNew York28
2AliceLos Angeles32
3BobChicago40

You can use the following SQL query to retrieve all customers aged 30 or above:

SELECT Name, City FROM Customers WHERE Age >= 30;

Result:

NameCity
AliceLos Angeles
BobChicago

Conclusion:

SQL is a powerful and essential language for managing relational databases. It helps you store, retrieve, update, and manipulate data efficiently. Understanding SQL is fundamental for anyone working with databases, whether for backend development, data analysis, or working with cloud-based database systems.

Question: What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN?

Answer:

In SQL, JOIN operations are used to combine rows from two or more tables based on a related column between them. The four most commonly used types of joins are INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN. Here’s a breakdown of each:


1. INNER JOIN:

Definition: The INNER JOIN returns only the rows that have matching values in both tables. If there is no match, the row is excluded from the result set.

Behavior:

  • Combines rows from both tables where there is a match on the specified condition.
  • If there is no match, that row is not included in the result.

Example:

Consider the following tables:

  • Employees table:

    IDNameDepartmentID
    1John101
    2Alice102
    3Bob103
  • Departments table:

    DepartmentIDDepartmentName
    101HR
    102IT
    104Marketing

Query:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
INNER JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
AliceIT

Explanation: Only rows where a matching DepartmentID exists in both tables are returned. Bob does not appear because there is no matching department for DepartmentID 103 in the Departments table.


2. LEFT JOIN (or LEFT OUTER JOIN):

Definition: The LEFT JOIN returns all rows from the left table (the first table in the query), and the matched rows from the right table (second table). If there is no match, the result will contain NULL for columns from the right table.

Behavior:

  • All rows from the left table are included in the result.
  • If a row in the left table has no corresponding match in the right table, the right table’s columns will be NULL.

Example:

Using the same tables:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
AliceIT
BobNULL

Explanation: All employees are listed, including Bob who does not have a department, so the DepartmentName is NULL.


3. RIGHT JOIN (or RIGHT OUTER JOIN):

Definition: The RIGHT JOIN returns all rows from the right table (the second table in the query), and the matched rows from the left table (the first table). If there is no match, the result will contain NULL for columns from the left table.

Behavior:

  • All rows from the right table are included in the result.
  • If a row in the right table has no corresponding match in the left table, the left table’s columns will be NULL.

Example:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
RIGHT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
AliceIT
NULLMarketing

Explanation: All departments are listed, including Marketing, which has no employee assigned to it. The Name for Marketing is NULL.


4. FULL JOIN (or FULL OUTER JOIN):

Definition: The FULL JOIN (or FULL OUTER JOIN) returns all rows when there is a match in either the left or the right table. It returns NULL on the side where there is no match.

Behavior:

  • Returns all rows from both tables.
  • If there is no match between the tables, NULL values are returned for the columns of the table without a match.

Example:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
FULL JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
AliceIT
BobNULL
NULLMarketing

Explanation: The result includes all employees and all departments. Since Bob does not belong to any department, the DepartmentName for his row is NULL. Similarly, the Marketing department has no employees, so the Name is NULL.


Summary of Differences:

TypeRows from Left TableRows from Right TableMatching RowsNon-Matching Rows
INNER JOINOnly matching rowsOnly matching rowsYesNo
LEFT JOINAll rows from leftMatching rows or NULLYesNo
RIGHT JOINMatching rows or NULLAll rows from rightYesNo
FULL JOINAll rows from leftAll rows from rightYesYes (NULLs for no match)

Conclusion:

  • INNER JOIN: Only the matching rows from both tables.
  • LEFT JOIN: All rows from the left table, with matching rows from the right table; NULL for non-matching right table rows.
  • RIGHT JOIN: All rows from the right table, with matching rows from the left table; NULL for non-matching left table rows.
  • FULL JOIN: All rows from both tables, with NULL for non-matching rows.

Each type of join serves different purposes depending on the desired result from the relationship between the tables.

Question: Write a SQL query to find the Nth highest salary from a table.

Answer:

To find the Nth highest salary from a table, you can use the following approaches, depending on the SQL dialect you’re working with. Below is a common approach using a subquery and the LIMIT/OFFSET clause (for MySQL) or ROW_NUMBER() function (for SQL Server, PostgreSQL, or Oracle).


1. Using LIMIT and OFFSET (for MySQL)

Assuming you have a table called Employees with columns id and salary, and you want to find the Nth highest salary:

SELECT DISTINCT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET N-1;

Explanation:

  • ORDER BY salary DESC: Sorts the salaries in descending order (highest to lowest).
  • LIMIT 1 OFFSET N-1: Fetches the Nth salary, where N-1 is the number of rows to skip. The LIMIT 1 ensures only one result is returned.

For example, to find the 3rd highest salary, you would use OFFSET 2.


2. Using ROW_NUMBER() (for SQL Server, PostgreSQL, Oracle)

If you’re using a database that supports window functions (like SQL Server, PostgreSQL, Oracle), you can use the ROW_NUMBER() function to assign a rank to each row and then filter out the row corresponding to the Nth highest salary.

WITH RankedSalaries AS (
    SELECT salary,
           ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
    FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = N;

Explanation:

  • ROW_NUMBER() OVER (ORDER BY salary DESC): Assigns a unique number to each salary based on its ranking (highest to lowest).
  • WITH RankedSalaries AS (...): Creates a common table expression (CTE) that calculates the rank of each salary.
  • WHERE rank = N: Filters out the row where the rank is equal to N.

For example, to find the 3rd highest salary, you would use WHERE rank = 3.


3. Using DENSE_RANK() or RANK() (for SQL Server, PostgreSQL, Oracle)

If you want to handle cases where there are ties (e.g., two employees having the same salary), you can use DENSE_RANK() or RANK() instead of ROW_NUMBER().

WITH RankedSalaries AS (
    SELECT salary,
           DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
    FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = N;

Explanation:

  • DENSE_RANK() OVER (ORDER BY salary DESC): This assigns ranks in such a way that if two employees have the same salary, they will receive the same rank (i.e., no gaps between ranks).

4. Using Subquery (for all SQL databases)

A more general approach that works across various SQL databases, including MySQL, SQL Server, and others:

SELECT MAX(salary) AS NthHighestSalary
FROM Employees
WHERE salary NOT IN (
    SELECT salary
    FROM Employees
    ORDER BY salary DESC
    LIMIT N-1
);

Explanation:

  • The subquery returns the top N-1 salaries from the Employees table, and the outer query selects the maximum salary from the remaining salaries (which corresponds to the Nth highest salary).

Example Query for Finding the 3rd Highest Salary:

For MySQL:

SELECT DISTINCT salary
FROM Employees
ORDER BY salary DESC
LIMIT 1 OFFSET 2;

For SQL Server, PostgreSQL, Oracle (using ROW_NUMBER()):

WITH RankedSalaries AS (
    SELECT salary,
           ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank
    FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = 3;

For Handling Ties with DENSE_RANK() (SQL Server, PostgreSQL, Oracle):

WITH RankedSalaries AS (
    SELECT salary,
           DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
    FROM Employees
)
SELECT salary
FROM RankedSalaries
WHERE rank = 3;

Conclusion:

  • MySQL: Use LIMIT and OFFSET.
  • SQL Server, PostgreSQL, Oracle: Use ROW_NUMBER() or DENSE_RANK() to handle ranking.
  • General approach: Use a subquery with NOT IN to exclude the first N-1 salaries and get the maximum from the remaining ones.

This should work for most use cases when you need to find the Nth highest salary from a table.

Question: What are subqueries in SQL?

Answer:

A subquery in SQL is a query nested inside another query. It can be used to perform operations that would require multiple steps in a single query, allowing for more complex queries and logic. Subqueries can return a single value, a list of values, or a table of results, and they are often used with comparison operators (such as =, <, >, IN, EXISTS, etc.) or in the FROM, SELECT, WHERE, or HAVING clauses.


Types of Subqueries:

  1. Single-Row Subquery:

    • Returns a single value (a single row and column).
    • Commonly used with comparison operators like =, >, <, >=, <=, or <>.

    Example:

    SELECT employee_name, salary
    FROM employees
    WHERE salary = (SELECT MAX(salary) FROM employees);

    Explanation: The inner query (SELECT MAX(salary) FROM employees) returns the highest salary, and the outer query retrieves the employee with that salary.

  2. Multiple-Row Subquery:

    • Returns multiple rows but a single column.
    • Commonly used with operators like IN or NOT IN.

    Example:

    SELECT employee_name, department
    FROM employees
    WHERE department IN (SELECT department FROM employees WHERE salary > 50000);

    Explanation: The inner query returns a list of departments where employees have a salary greater than 50,000, and the outer query selects employees who belong to those departments.

  3. Multiple-Column Subquery:

    • Returns multiple rows and multiple columns.
    • Can be used with operators like =, IN, etc., when comparing multiple values.

    Example:

    SELECT employee_name, department, salary
    FROM employees
    WHERE (department, salary) IN (SELECT department, salary FROM employees WHERE experience > 5);

    Explanation: The inner query returns a set of (department, salary) pairs where employees have more than 5 years of experience, and the outer query retrieves the employees with the same department and salary.

  4. Correlated Subquery:

    • A correlated subquery is a subquery that refers to columns in the outer query. Unlike a regular subquery, which is executed once, a correlated subquery is evaluated once for each row processed by the outer query.

    Example:

    SELECT employee_name, salary
    FROM employees outer
    WHERE salary > (SELECT AVG(salary) FROM employees inner WHERE outer.department = inner.department);

    Explanation: The inner query is correlated with the outer query by the department column. For each employee, the average salary for their department is calculated, and only employees with a salary greater than that average are selected.


Subquery Placement:

  1. Subquery in the SELECT Clause:

    • Can be used to retrieve additional computed columns for each row.

    Example:

    SELECT employee_name, 
           (SELECT department_name FROM departments WHERE departments.department_id = employees.department_id) AS department_name
    FROM employees;

    Explanation: This subquery returns the department name for each employee by matching the department_id with the departments table.

  2. Subquery in the FROM Clause:

    • Treats the subquery as a derived table (or inline view), which can be joined with other tables or used in further queries.

    Example:

    SELECT dept.department_name, emp.employee_name
    FROM (SELECT department_id, department_name FROM departments) AS dept
    JOIN employees AS emp ON dept.department_id = emp.department_id;

    Explanation: The subquery in the FROM clause acts as a derived table, providing a list of departments that is then joined with the employees table.

  3. Subquery in the WHERE Clause:

    • Commonly used to filter results based on the results of another query.

    Example:

    SELECT employee_name, salary
    FROM employees
    WHERE department_id = (SELECT department_id FROM departments WHERE department_name = 'Sales');

    Explanation: The inner query returns the department_id for the ‘Sales’ department, which is then used to filter the employees in that department.


Important Notes about Subqueries:

  1. Efficiency:

    • Subqueries can sometimes result in performance issues if not written efficiently, especially with correlated subqueries, as the inner query is executed multiple times for each row of the outer query.
  2. Nesting Subqueries:

    • Subqueries can be nested within other subqueries. However, deep nesting can make queries harder to maintain and can impact performance.
  3. Existence of Data:

    • The EXISTS and NOT EXISTS operators are often used with subqueries to check for the existence of rows returned by the inner query.

    Example:

    SELECT employee_name
    FROM employees e
    WHERE EXISTS (SELECT 1 FROM projects p WHERE p.employee_id = e.employee_id);

    Explanation: The query returns employees who are associated with at least one project.

  4. Subquery with IN:

    • The IN operator is used when the subquery returns a list of values to compare against.

    Example:

    SELECT employee_name
    FROM employees
    WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York');

    Explanation: This query returns employees working in departments located in New York.


Conclusion:

Subqueries are a powerful SQL feature that allows for more complex queries. They help in filtering data, performing calculations, and creating temporary result sets without the need for multiple queries. There are different types of subqueries, and understanding when and how to use them can significantly improve the flexibility and readability of your SQL queries.

Question: What is the difference between DELETE and TRUNCATE?

Answer:

Both DELETE and TRUNCATE are used to remove data from a table in SQL, but they differ in how they operate and the effect they have on the table structure, performance, and transaction logging.

1. DELETE:

  • Definition: The DELETE statement is used to delete specific rows from a table based on a condition.
  • Syntax:
    DELETE FROM table_name WHERE condition;
  • Scope: You can specify a WHERE clause to delete specific rows. If no WHERE clause is provided, all rows will be deleted.
  • Transaction Logging: The DELETE operation is fully logged in the transaction log, meaning each row deleted is logged individually.
  • Performance: Since each row deletion is logged, DELETE can be slower compared to TRUNCATE, especially for large tables.
  • Triggers: DELETE can activate triggers (e.g., AFTER DELETE triggers).
  • Rollback: Since DELETE is part of a transaction, it can be rolled back if wrapped in a transaction.
  • Table Structure: It does not affect the structure of the table. The table structure, indexes, and constraints remain unchanged.
  • Example:
    DELETE FROM employees WHERE department = 'Sales';

2. TRUNCATE:

  • Definition: The TRUNCATE statement is used to remove all rows from a table, but unlike DELETE, it does not log individual row deletions.
  • Syntax:
    TRUNCATE TABLE table_name;
  • Scope: TRUNCATE removes all rows in the table and cannot be used with a WHERE clause. It is a bulk operation, and all rows are deleted without any condition.
  • Transaction Logging: TRUNCATE is a minimal logging operation. It logs the deallocation of the data pages used by the table rather than logging each row deletion.
  • Performance: TRUNCATE is typically faster than DELETE because it does not log each row individually and does not fire triggers. It is especially faster for large tables.
  • Triggers: TRUNCATE does not activate DELETE triggers because it does not operate on individual rows.
  • Rollback: Like DELETE, TRUNCATE can be rolled back if it is part of a transaction. However, it is often faster than DELETE due to reduced logging.
  • Table Structure: It does not affect the structure of the table. The table, its structure, and constraints remain intact. However, depending on the database, TRUNCATE may reset the identity column (if present).
  • Example:
    TRUNCATE TABLE employees;

Key Differences:

FeatureDELETETRUNCATE
Rows AffectedCan delete specific rows based on a condition.Deletes all rows from the table.
Where ClauseCan use a WHERE clause to filter rows.Cannot use a WHERE clause.
Transaction LoggingFully logs each row deletion.Minimally logs deallocation of data pages.
PerformanceSlower, especially for large tables.Faster for large tables, due to minimal logging.
TriggersFires triggers (e.g., AFTER DELETE triggers).Does not fire triggers.
RollbackCan be rolled back within a transaction.Can be rolled back within a transaction, but faster.
Table StructureDoes not affect the structure (indexes, constraints).Does not affect the structure but may reset the identity column (depending on DB).
LockingMay lock rows or the entire table.Locks the entire table.

Summary:

  • DELETE is used when you want to selectively delete rows based on conditions and requires more transaction log space due to individual row deletions.
  • TRUNCATE is a more efficient, bulk operation used to remove all rows from a table quickly, but it is limited to clearing the entire table and does not support conditions or row-based operations.

Question: What is normalization and denormalization in SQL?

Answer:

1. Normalization:

Normalization is the process of organizing data in a relational database to reduce redundancy and dependency by dividing large tables into smaller ones. The goal of normalization is to ensure that the database structure is efficient, with each piece of data being stored only once. This process involves applying a set of rules called normal forms to a database schema. Each “normal form” builds on the previous one, and the further you normalize, the more complex the structure becomes but also the more efficient it is in terms of storage and data integrity.

Key Objectives of Normalization:

  • Eliminate redundancy: Avoid duplicate data in the database.
  • Ensure data integrity: Enforce logical relationships and dependencies between different pieces of data.
  • Reduce anomalies: Prevent anomalies (insertion, update, and deletion anomalies) that could occur when data is stored in a less structured format.

Normal Forms:

  • 1st Normal Form (1NF): Ensures that each column contains atomic (indivisible) values, and each record is unique.
  • 2nd Normal Form (2NF): Meets 1NF and also eliminates partial dependency, ensuring that every non-key column depends on the whole primary key.
  • 3rd Normal Form (3NF): Meets 2NF and removes transitive dependency, i.e., non-key columns depend only on the primary key.
  • Boyce-Codd Normal Form (BCNF): A stricter version of 3NF, removing any dependencies where a non-prime attribute is functionally dependent on another non-prime attribute.

Benefits of Normalization:

  • Reduced redundancy (avoiding storing the same data in multiple places).
  • Improved data integrity (less chance for inconsistent data).
  • Better storage utilization (smaller tables and more efficient data storage).
  • Avoidance of update, insert, and delete anomalies.

Example of Normalization: Suppose you have a table of Orders with redundant customer data.

OrderIDCustomerNameCustomerAddressProductNameQuantity
1John Doe123 Elm StWidget A10
2Jane Smith456 Oak StWidget B5
3John Doe123 Elm StWidget C7

After normalization, you might split it into two tables:

Customers Table:

CustomerIDCustomerNameCustomerAddress
1John Doe123 Elm St
2Jane Smith456 Oak St

Orders Table:

OrderIDCustomerIDProductNameQuantity
11Widget A10
22Widget B5
31Widget C7

2. Denormalization:

Denormalization is the process of deliberately introducing redundancy into a database by combining normalized tables or copying data across multiple places. This is done to improve read performance and simplify complex queries, especially in reporting or analytical applications where speed is more important than minimizing redundancy.

While normalization reduces redundancy and improves data integrity, denormalization can reduce the need for complex joins in queries, thereby improving query performance in certain cases. However, denormalization also comes with trade-offs, such as the potential for data anomalies and more complex data updates.

Reasons for Denormalization:

  • Performance improvement: Denormalization can speed up read-heavy operations by reducing the need for multiple joins, especially when large amounts of data are being accessed frequently.
  • Simplified queries: It can simplify complex queries or aggregations by including all required data in one place.
  • Optimized for reporting/analytics: Denormalized data is often used in data warehouses where read performance is prioritized over data integrity.

Trade-offs of Denormalization:

  • Redundancy: The same data might be stored in multiple places, which can lead to data inconsistencies.
  • Update anomalies: If data is duplicated, updates might need to be made in multiple places, leading to potential inconsistencies.
  • Higher storage costs: Since data is repeated, more storage space is needed.

Example of Denormalization: For the previous example, denormalizing the Orders and Customers table into a single Orders table:

OrderIDCustomerNameCustomerAddressProductNameQuantity
1John Doe123 Elm StWidget A10
2Jane Smith456 Oak StWidget B5
3John Doe123 Elm StWidget C7

Here, you’ve combined customer data with order data, reducing the need for joins but introducing redundancy.


Key Differences Between Normalization and Denormalization:

AspectNormalizationDenormalization
PurposeReduces redundancy and ensures data integrity.Improves query performance and simplifies data retrieval.
Impact on DataAvoids data redundancy.Introduces data redundancy.
ComplexityIncreases complexity of queries (joins required).Reduces complexity of queries (fewer joins).
Data IntegrityEnsures high data integrity.May lead to data inconsistencies and anomalies.
PerformanceSlower performance for read operations (joins).Faster performance for read-heavy queries.
Update AnomaliesAvoids update anomalies.May lead to update, insert, and delete anomalies.
StorageMore efficient in terms of storage (less duplication).May use more storage due to redundancy.
Use CaseOLTP (Online Transaction Processing) systems where data consistency is key.OLAP (Online Analytical Processing) systems, reporting, and data warehouses.

Summary:

  • Normalization is about organizing data to minimize redundancy and avoid anomalies, while Denormalization involves intentionally adding redundancy to optimize read performance.
  • Both have their place depending on the use case. Normalization is typically used in transactional systems, while denormalization is more common in systems focused on query performance, such as data warehouses.

Question: What is a primary key and a foreign key in SQL?

Answer:

1. Primary Key:

A primary key is a column or a set of columns in a database table that uniquely identifies each row in that table. The primary key ensures that no two rows in a table have the same value for the primary key column(s). A primary key must have the following properties:

  • Uniqueness: Each value in the primary key column must be unique. No two rows can have the same primary key value.
  • Non-null: The primary key column cannot contain NULL values. Every row must have a valid value for the primary key.
  • Single and Composite: A primary key can consist of a single column or multiple columns. A composite primary key is created when more than one column is used together to form a unique identifier.
  • Indexing: Most databases automatically create an index on the primary key, which helps in optimizing search and retrieval operations.

Example of a Primary Key:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,       -- EmployeeID is the primary key
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Department VARCHAR(50)
);

Here, the EmployeeID column is the primary key for the Employees table, ensuring that each EmployeeID value is unique and non-null.

2. Foreign Key:

A foreign key is a column or a set of columns in one table that is used to establish a link between the data in two tables. A foreign key in one table points to the primary key in another table, creating a relationship between the two tables. The foreign key ensures referential integrity by enforcing that every value in the foreign key column(s) must either be NULL or match a value in the referenced table’s primary key or unique column.

  • Referential Integrity: The foreign key enforces referential integrity by ensuring that the value in the foreign key column corresponds to a valid row in the referenced (parent) table.
  • Relationship: A foreign key defines a one-to-many or many-to-one relationship between tables. For example, one employee can belong to one department, but one department can have many employees.

Example of a Foreign Key:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    DepartmentID INT,
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)
);

In this example, the DepartmentID column in the Employees table is a foreign key that references the DepartmentID column in the Departments table. This ensures that every employee is associated with a valid department that exists in the Departments table.

Key Differences Between Primary Key and Foreign Key:

AspectPrimary KeyForeign Key
PurposeUniquely identifies each record in the table.Establishes a relationship between two tables.
UniquenessValues must be unique across the table.Values may not be unique and can be repeated across rows.
NullabilityCannot contain NULL values.Can contain NULL values, unless explicitly restricted.
ReferencesPoints to data within the same table.Points to the primary key of another table (parent table).
IndexingAutomatically indexed by the database.May or may not be indexed by the database.
Relation TypeDefines no relationships, but uniquely identifies records.Defines a relationship (usually one-to-many) between tables.
Enforcement of IntegrityEnforces uniqueness and non-null constraint.Enforces referential integrity between tables.

Example of a Relationship Between Primary and Foreign Keys:

Let’s say we have two tables: Departments and Employees. The Departments table stores information about each department, and the Employees table stores information about employees. Each employee belongs to one department.

Departments Table (Primary Key: DepartmentID):

CREATE TABLE Departments (
    DepartmentID INT PRIMARY KEY,     -- Primary Key
    DepartmentName VARCHAR(50)
);

Employees Table (Foreign Key: DepartmentID):

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,       -- Primary Key
    EmployeeName VARCHAR(50),
    DepartmentID INT,                 -- Foreign Key
    FOREIGN KEY (DepartmentID) REFERENCES Departments(DepartmentID)  -- Foreign Key
);

In this example:

  • DepartmentID in the Departments table is a primary key that uniquely identifies each department.
  • DepartmentID in the Employees table is a foreign key that references the DepartmentID in the Departments table, creating a relationship between the two tables. This ensures that every employee is assigned to a valid department.

Question: What is the concept of GROUP BY in SQL?

Answer:

The GROUP BY clause in SQL is used to group rows that have the same values in specified columns into summary rows, like “total”, “average”, or “count”. This clause is often used in conjunction with aggregate functions such as COUNT(), SUM(), AVG(), MAX(), and MIN() to perform calculations on each group of rows.

The GROUP BY statement is particularly useful when you want to organize and summarize your data based on certain columns and perform aggregate calculations on each group.

Key Points:

  • Grouping Rows: The GROUP BY clause groups rows that have the same values into summary rows.
  • Aggregate Functions: Typically used with aggregate functions like COUNT(), SUM(), AVG(), etc., to perform calculations on each group.
  • Column Selection: When using GROUP BY, you must include in the SELECT clause either the grouped columns or apply aggregate functions to other columns.
  • HAVING Clause: Often used in conjunction with GROUP BY to filter groups based on aggregate values, similar to how the WHERE clause filters rows.

Syntax:

SELECT column1, column2, aggregate_function(column3)
FROM table_name
GROUP BY column1, column2;

Example 1: Basic Usage of GROUP BY

Let’s say we have a Sales table that contains information about sales transactions:

SaleIDProductAmountSaleDate
1ProductA1002024-01-01
2ProductB1502024-01-01
3ProductA2002024-01-02
4ProductB1002024-01-02
5ProductA3002024-01-02

To calculate the total sales amount for each product, you can use the GROUP BY clause like this:

SELECT Product, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Product;

Output:

ProductTotalSales
ProductA600
ProductB250

Explanation:

  • The GROUP BY Product statement groups the rows by the Product column.
  • The SUM(Amount) function calculates the total sales amount for each product.

Example 2: GROUP BY with HAVING

If you want to find products with total sales greater than a certain value, you can use the HAVING clause to filter the groups after aggregation.

SELECT Product, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Product
HAVING SUM(Amount) > 300;

Output:

ProductTotalSales
ProductA600

Explanation:

  • The HAVING clause filters out products where the total sales amount is less than or equal to 300, leaving only ProductA with total sales greater than 300.

Rules and Considerations:

  • Columns in SELECT: In the SELECT clause, you can only include columns that are either part of the GROUP BY clause or are used with aggregate functions. For example, you cannot select a column that isn’t part of the grouping unless it’s aggregated.

  • Order of Clauses: The GROUP BY clause is placed after the WHERE clause (if any) and before the ORDER BY clause (if any).

Example with Multiple Columns:

You can also group by multiple columns. For instance, if you want to calculate the total sales amount for each product on each date:

SELECT Product, SaleDate, SUM(Amount) AS TotalSales
FROM Sales
GROUP BY Product, SaleDate;

Output:

ProductSaleDateTotalSales
ProductA2024-01-01100
ProductB2024-01-01150
ProductA2024-01-02500
ProductB2024-01-02200

Key Takeaways:

  • GROUP BY is used to group rows based on one or more columns.
  • It works with aggregate functions to summarize data.
  • HAVING is used to filter the groups created by GROUP BY.
  • You can group by multiple columns to create more granular groupings.

Question: What are indexes in SQL?

Answer:

An index in SQL is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional space and time spent on updates, inserts, and deletes. Indexes are used to quickly locate and access data without having to scan the entire table. They are particularly useful for improving query performance, especially when dealing with large datasets.

Key Points:

  • Purpose: The primary purpose of an index is to enhance the speed of queries, particularly those involving SELECT statements with WHERE, ORDER BY, and JOIN clauses.
  • Types: There are different types of indexes such as unique indexes, composite indexes, and full-text indexes.
  • Structure: An index is often implemented using a B-tree or hashing, depending on the database system. In a B-tree, data is arranged in a hierarchical manner that allows quick searching.

Syntax for Creating an Index:

CREATE INDEX index_name
ON table_name (column1, column2, ...);

Types of Indexes:

  1. Single-Column Index:

    • Indexes created on a single column.
    • Example: Index on the name column of a Users table.
    CREATE INDEX idx_name ON Users (name);
  2. Composite Index:

    • Indexes created on multiple columns.
    • Useful when queries use multiple columns in the WHERE clause.
    CREATE INDEX idx_name_date ON Orders (name, order_date);
  3. Unique Index:

    • Ensures that the indexed columns have unique values.
    • Often used for primary and unique keys.
    CREATE UNIQUE INDEX idx_email ON Users (email);
  4. Full-Text Index:

    • Used for indexing large text fields for full-text searches (commonly used with TEXT or VARCHAR fields).
    • Not supported by all database systems but often used in search engines.
    CREATE FULLTEXT INDEX idx_fulltext ON Articles (content);
  5. Primary Key and Foreign Key Indexes:

    • Automatically created when defining a primary or foreign key on a table.
    • A primary key automatically creates a unique index on the primary key column(s), and a foreign key often creates an index to speed up lookups in the parent table.

How Indexes Work:

  • When you create an index, the database creates an auxiliary data structure (usually a B-tree or hash table) to store pointers to the rows in the table.
  • The database uses this index to quickly locate the rows that match the query conditions, avoiding a full table scan.
  • For example, if you have an index on the name column in the Users table, and you search for a specific name, the database uses the index to locate the relevant rows quickly.

Example: Index Usage

Consider the following Employees table:

EmployeeIDFirstNameLastNameAgeDepartment
1JohnDoe30HR
2JaneSmith25IT
3MaryJohnson35HR
4JamesBrown40IT

If you frequently search by LastName and Department, creating an index on those columns would speed up queries.

CREATE INDEX idx_lastname_department ON Employees (LastName, Department);

Now, when querying:

SELECT * FROM Employees WHERE LastName = 'Doe' AND Department = 'HR';

The database can use the index to find the relevant row much faster than scanning the entire table.

Advantages of Using Indexes:

  1. Faster Data Retrieval: Indexes significantly improve the speed of data retrieval queries, especially on large tables.
  2. Efficient Sorting: When querying with ORDER BY, indexes can help the database to quickly sort the data without performing a full sort.
  3. Better Performance for Joins: Indexes can improve the performance of JOIN operations by speeding up matching rows between tables.

Disadvantages of Using Indexes:

  1. Increased Storage Space: Indexes consume additional disk space because they store duplicate data (pointers) to the rows.
  2. Slower Writes (INSERT, UPDATE, DELETE): Every time a row is inserted, updated, or deleted, the corresponding indexes must be updated, which can slow down write operations.
  3. Complexity in Maintenance: As the table grows and changes, indexes may need to be rebuilt or reorganized to maintain performance.

Common Use Cases:

  • Primary Key Constraints: Automatically indexed.
  • Foreign Key Constraints: Foreign keys typically benefit from indexing for faster lookups and integrity checks.
  • Frequent Query Filters: Columns frequently used in WHERE clauses benefit from indexing.
  • Frequent Sorting: Columns used in ORDER BY clauses can be indexed for better performance.

Example: Dropping an Index

To remove an index from a table, you can use the DROP INDEX command.

DROP INDEX idx_lastname_department ON Employees;

Key Takeaways:

  • Indexes improve query performance by allowing the database to quickly locate rows based on specific column values.
  • B-tree and hash are common data structures used for indexing.
  • Indexes can be single-column or composite (multiple columns).
  • While indexes enhance read performance, they may slow down write operations and consume additional storage space.

Question: What is the difference between CHAR and VARCHAR in SQL?

Answer:

CHAR and VARCHAR are both data types used to store character strings in SQL. While they may seem similar, there are key differences between them, mainly in how they store data and how much space they consume.

Key Differences:

1. Fixed vs Variable Length:

  • CHAR:
    • Fixed-length data type. The storage size is fixed according to the specified length, even if the actual data is shorter.
    • For example, if you define a column as CHAR(10) and store the string "Hello", it will still use 10 bytes of storage, padding the remaining 5 characters with spaces.
  • VARCHAR:
    • Variable-length data type. It stores only the actual length of the string and does not pad with extra spaces.
    • For example, if you define a column as VARCHAR(10) and store the string "Hello", it will only use 5 bytes of storage (the length of "Hello").

2. Storage Efficiency:

  • CHAR:
    • May waste storage space because it always uses the defined length.
    • Useful for storing strings that are always of a consistent length, like country codes (CHAR(2) for “US”, “IN”, etc.).
  • VARCHAR:
    • More storage-efficient because it only uses the space needed for the string, plus a small amount of overhead to store the length of the string.
    • Ideal for strings of varying lengths, such as names, addresses, or descriptions.

3. Performance:

  • CHAR:
    • Because of its fixed size, CHAR can be faster for retrieving fixed-length strings. However, it can be slower when storing variable-length strings due to the padding.
  • VARCHAR:
    • VARCHAR can be slightly slower for retrieval, as the database needs to handle variable-length data, but it’s generally more efficient in terms of storage and handling strings of varying lengths.

4. Padding Behavior:

  • CHAR:
    • Automatically pads the string with spaces if it’s shorter than the defined length.
    • For example, if CHAR(5) contains "Hi", the database stores it as "Hi " (with 3 trailing spaces).
  • VARCHAR:
    • Does not add padding. It stores only the actual data, so "Hi" will be stored as "Hi" with no trailing spaces.

5. Use Cases:

  • CHAR:
    • Best used for columns that always store data of the same length, such as:
      • Fixed-length codes (e.g., country codes, postal codes, product codes).
      • Binary data (if you’re working with fixed-length binary strings).
  • VARCHAR:
    • Ideal for columns where the length of the data can vary:
      • Names, email addresses, descriptions, addresses, etc.

Example:

  1. Using CHAR:

    CREATE TABLE Users (
      user_id INT PRIMARY KEY,
      country_code CHAR(2)
    );
    • If you store "US" in the country_code column, it takes up 2 bytes. But if you store "IN", it will still take up exactly 2 bytes, with no padding.
  2. Using VARCHAR:

    CREATE TABLE Users (
      user_id INT PRIMARY KEY,
      email VARCHAR(255)
    );
    • If you store "[email protected]" in the email column, it will only take up the exact length of the string (16 bytes in this case).

Summary Table:

FeatureCHARVARCHAR
LengthFixedVariable
StorageAlways uses the specified lengthUses only the actual length + overhead
PaddingPads with spaces if shorterNo padding, stores only actual data
EfficiencyWastes space for shorter stringsMore efficient for variable-length strings
Use CasesFixed-length data (e.g., country codes)Variable-length data (e.g., names, email addresses)

Key Takeaways:

  • Use CHAR when the length of the data is constant and known.
  • Use VARCHAR when the length of the data varies, as it is more efficient in terms of storage.

Question: How do you handle NULL values in SQL?

Answer:

In SQL, NULL represents the absence of a value or an unknown value. It is different from an empty string or zero; NULL simply means that the value is missing or undefined. Handling NULL values correctly is crucial to ensure data integrity and avoid unexpected behavior in queries. Here are several ways to handle NULL values in SQL:

1. Checking for NULL values:

  • IS NULL: Used to check if a value is NULL.
  • IS NOT NULL: Used to check if a value is not NULL.
SELECT * FROM Employees WHERE salary IS NULL;
SELECT * FROM Employees WHERE salary IS NOT NULL;
  • Example: The query SELECT * FROM Employees WHERE salary IS NULL; retrieves all rows where the salary is NULL (i.e., not yet assigned).

2. Handling NULL in SELECT queries:

  • COALESCE(): This function returns the first non-NULL value in a list of expressions. It is useful for substituting a NULL with a default value.

    SELECT COALESCE(salary, 0) FROM Employees;
    • In this example, if salary is NULL, it will return 0 instead.
  • IFNULL() (MySQL) / NVL() (Oracle): Similar to COALESCE(), these functions replace NULL values with a specified value. Different databases have different implementations:

    • In MySQL: IFNULL(expression, replacement_value)
    • In Oracle: NVL(expression, replacement_value)
    SELECT IFNULL(salary, 0) FROM Employees;  -- MySQL
  • NULLIF(): This function compares two expressions. If they are equal, it returns NULL; otherwise, it returns the first expression.

    SELECT NULLIF(salary, 0) FROM Employees;
    • If salary is 0, the result will be NULL; otherwise, it will return the actual salary.

3. NULL in Aggregates:

  • When performing aggregate functions like SUM(), AVG(), COUNT(), MIN(), and MAX(), NULL values are ignored. For example, the SUM() function does not count NULL values, but it will still calculate the sum of non-NULL values.

    SELECT SUM(salary) FROM Employees;
    • If there are NULL values in the salary column, they will be ignored, and only non-NULL salaries will be summed up.

4. Using NULL in INSERT statements:

  • You can insert a NULL value into a column if the column allows NULL values.

    INSERT INTO Employees (name, salary) VALUES ('John Doe', NULL);
    • In this example, the salary is inserted as NULL, assuming that the column allows NULLs.

5. Filtering and Handling NULL values in WHERE clause:

  • NULL values cannot be compared using = or !=. Instead, you must use IS NULL or IS NOT NULL.

    SELECT * FROM Employees WHERE salary = NULL;  -- Incorrect, won't work!
    SELECT * FROM Employees WHERE salary IS NULL; -- Correct
  • To handle NULLs when filtering, you can use IS NULL or IS NOT NULL in the WHERE clause, as shown above.

6. Default Values for NULL:

  • You can set default values for columns to avoid inserting NULL values when no data is provided. This is typically done when creating or altering the table schema.

    CREATE TABLE Employees (
        id INT,
        name VARCHAR(50),
        salary DECIMAL(10, 2) DEFAULT 0.00
    );
    • In this case, if no salary is provided during an insert, it will default to 0.00 instead of NULL.

7. Replacing NULL values in an UPDATE statement:

  • If you want to update NULL values to something else, you can use the UPDATE statement with the IS NULL condition.

    UPDATE Employees SET salary = 0 WHERE salary IS NULL;
    • This query sets all NULL salary values to 0.

8. Handling NULLs in JOINs:

  • In JOIN operations, NULL values are treated specially:

    • INNER JOIN: NULL values from either table are excluded.
    • LEFT JOIN (or RIGHT JOIN): If a NULL value is in the left table, NULLs will appear in the right table columns (and vice versa).
    SELECT E.name, D.department_name
    FROM Employees E
    LEFT JOIN Departments D ON E.department_id = D.department_id;
    • If an employee doesn’t belong to any department, their department_name will be NULL in the result.

9. NULL in SQL Constraints:

  • Columns can be defined to allow NULL or not. If you want to prevent NULL values, you can define a column as NOT NULL.

    CREATE TABLE Employees (
        id INT NOT NULL,
        name VARCHAR(50) NOT NULL,
        salary DECIMAL(10, 2) NULL
    );
    • The salary column can accept NULL values, while id and name cannot be NULL.

Key Takeaways:

  • NULL represents an unknown or missing value and cannot be treated like other values (e.g., zero or empty string).
  • Use IS NULL or IS NOT NULL to check for NULL values.
  • Functions like COALESCE(), IFNULL(), and NVL() help handle NULLs by substituting default values.
  • Aggregate functions ignore NULL values, but you can use COUNT() to include NULL values by counting rows.
  • When inserting, updating, or selecting, carefully handle NULLs to avoid errors and unexpected results.

Question: Write a SQL query to find employees who have not received a salary in the last month.

Answer:

To find employees who have not received a salary in the last month, we need to compare the salary_date (or similar date column) with the current date and filter out employees who received their last salary payment in the past month. Here’s a SQL query to achieve this:

Assuming the following table structure:

  • Employees table with columns: employee_id, name
  • Salaries table with columns: employee_id, salary_amount, salary_date
SELECT E.employee_id, E.name
FROM Employees E
LEFT JOIN Salaries S ON E.employee_id = S.employee_id
WHERE S.salary_date < CURDATE() - INTERVAL 1 MONTH OR S.salary_date IS NULL;

Explanation:

  • LEFT JOIN: We use a LEFT JOIN to ensure that even employees who don’t have any salary records are included in the result.
  • S.salary_date < CURDATE() - INTERVAL 1 MONTH: This condition filters out employees who received their last salary more than a month ago.
  • S.salary_date IS NULL: This condition includes employees who haven’t received any salary at all (i.e., they don’t have any entries in the Salaries table).

This query will return all employees who have not received a salary in the last month or have never received a salary.

Question: Explain the difference between HAVING and WHERE in SQL.

Answer:

Both HAVING and WHERE are used to filter records in SQL, but they are used in different contexts and serve different purposes. Here’s a detailed explanation of their differences:

  1. WHERE Clause:

    • Purpose: The WHERE clause is used to filter records before any grouping is done (i.e., it operates on individual rows in the table).
    • Used with: It can be used with SELECT, INSERT, UPDATE, and DELETE statements.
    • Applies to: Non-aggregated columns (columns that are not the result of an aggregate function like COUNT(), SUM(), etc.).
    • Execution order: It is executed before the GROUP BY clause (if present).

    Example:

    SELECT name, salary
    FROM Employees
    WHERE salary > 50000;
    • This query filters employees who have a salary greater than 50,000 before any grouping or aggregation happens.
  2. HAVING Clause:

    • Purpose: The HAVING clause is used to filter records after the data has been grouped (i.e., it operates on aggregated data, or the result of GROUP BY).
    • Used with: It is used with GROUP BY and is often used in conjunction with aggregate functions like SUM(), COUNT(), AVG(), etc.
    • Applies to: Aggregated data, meaning it filters groups of rows after the grouping has been done.
    • Execution order: It is executed after the GROUP BY clause.

    Example:

    SELECT department, AVG(salary)
    FROM Employees
    GROUP BY department
    HAVING AVG(salary) > 50000;
    • This query first groups employees by their department and then filters the groups where the average salary is greater than 50,000.

Key Differences:

AspectWHEREHAVING
PurposeFilters rows before groupingFilters groups after aggregation
Used WithIndividual rows (non-aggregated columns)Aggregated data (using functions like COUNT(), SUM())
Can Be Used InSELECT, INSERT, UPDATE, DELETESELECT with GROUP BY
Execution OrderExecutes before GROUP BYExecutes after GROUP BY

Example Showing the Difference:

Using WHERE:

SELECT department, salary
FROM Employees
WHERE salary > 50000;

This query filters employees whose salary is greater than 50,000 before any grouping occurs.

Using HAVING with GROUP BY:

SELECT department, AVG(salary)
FROM Employees
GROUP BY department
HAVING AVG(salary) > 50000;

This query first groups employees by department and then filters the departments where the average salary is greater than 50,000. HAVING is used to filter the aggregated result.

Conclusion:

  • Use WHERE to filter individual rows based on column values before aggregation.
  • Use HAVING to filter results after aggregation, typically when using GROUP BY and aggregate functions.

Question: What is a view in SQL?

Answer:

A view in SQL is a virtual table that represents the result of a stored query. It does not store data itself but rather provides a way to simplify complex queries by encapsulating them into a reusable structure. Views allow you to present data from one or more tables in a particular format, and they are treated like regular tables in SQL queries.

Key characteristics of a view:

  1. Virtual Table: A view is a virtual table that consists of a SELECT query. It doesn’t physically store data but retrieves data from underlying tables each time it’s accessed.

  2. Read-Only or Updatable: Views can be either read-only or updatable, depending on how the view is defined and the database’s support. Simple views based on a single table are often updatable, but complex views involving joins, aggregation, or DISTINCT may be read-only.

  3. Encapsulation: Views can encapsulate complex logic and simplify queries for end-users or other applications. They help abstract away the complexity of underlying data structures and simplify query writing.

  4. Security: Views can be used to limit access to certain columns or rows of a table, improving security. For example, you can create a view that only exposes certain columns, restricting access to sensitive data.

  5. Dynamic: The data in a view is dynamic because it is generated based on the underlying tables. If the data in the tables changes, the view reflects the changes immediately when accessed.

Syntax for Creating a View:

CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Example:

Suppose we have a table Employees:

EmployeeIDNameDepartmentSalary
1JohnHR50000
2SarahIT60000
3DavidHR55000

Now, if we want to create a view that shows only the employees in the HR department:

CREATE VIEW HR_Employees AS
SELECT EmployeeID, Name, Salary
FROM Employees
WHERE Department = 'HR';

The view HR_Employees will now present:

EmployeeIDNameSalary
1John50000
3David55000

This view can be queried just like a regular table:

SELECT * FROM HR_Employees;

Advantages of Using Views:

  1. Simplified Queries: Views abstract complex queries and make it easier to retrieve the required data without repeatedly writing complex SQL statements.

  2. Data Security: By using views, you can restrict access to specific columns or rows of a table, providing an extra layer of security.

  3. Data Integrity: Views help maintain consistency by ensuring users work with the same, pre-defined logic for querying data, rather than writing custom queries that might vary across different applications.

Types of Views:

  1. Simple Views: A view based on a single table without complex joins or aggregations.
  2. Complex Views: A view based on multiple tables using joins, groupings, or aggregate functions.

Modifying Data Through Views:

While views are primarily used for querying data, in certain cases, they can be used to update underlying data. However, there are restrictions:

  • Updatable Views: These are views based on a single table where you can perform INSERT, UPDATE, or DELETE operations.
  • Non-Updatable Views: Views based on complex queries like joins, aggregates, or grouped data may not allow direct updates.

Dropping a View:

To remove a view, you can use the DROP VIEW statement:

DROP VIEW view_name;

Conclusion:

  • A view is a virtual table in SQL that is defined by a query and does not store data itself.
  • It simplifies complex queries, improves security, and can abstract away complex table structures.
  • Views can be used to improve maintainability and readability of SQL code by encapsulating repetitive or complex logic.

Question: What is the purpose of DISTINCT in SQL?

Answer:

The DISTINCT keyword in SQL is used to remove duplicate records from the result set of a query. It ensures that the rows returned by the query are unique, meaning that no two rows will be identical.

When you use DISTINCT, SQL will compare the values of all selected columns in the query result and eliminate any duplicate rows, returning only unique combinations of values.

Key points about DISTINCT:

  1. Eliminates Duplicates: DISTINCT ensures that only unique (non-duplicate) rows are returned in the query result.
  2. Applies to All Columns in the SELECT Clause: DISTINCT considers all the columns in the SELECT clause when determining uniqueness. It will return unique combinations of values across all selected columns.
  3. Can Be Used with Multiple Columns: You can use DISTINCT on multiple columns to find distinct combinations of values across those columns.

Syntax:

SELECT DISTINCT column1, column2, ...
FROM table_name
WHERE condition;

Example 1: Using DISTINCT with a Single Column

Suppose you have a table Employees:

EmployeeIDNameDepartment
1JohnHR
2SarahIT
3DavidHR
4MichaelIT

If you want to retrieve a list of unique departments, you can use DISTINCT:

SELECT DISTINCT Department
FROM Employees;

Result:

Department
HR
IT

Here, even though “HR” and “IT” appear multiple times in the table, DISTINCT ensures that they are listed only once in the result.

Example 2: Using DISTINCT with Multiple Columns

Suppose you have a table Sales:

ProductRegionSalesAmount
TVNorth1000
TVSouth1200
TVNorth1000
PhoneNorth800
PhoneSouth900

If you want to retrieve unique combinations of Product and Region (ignoring duplicates), you can do:

SELECT DISTINCT Product, Region
FROM Sales;

Result:

ProductRegion
TVNorth
TVSouth
PhoneNorth
PhoneSouth

In this case, DISTINCT eliminates the duplicate row (TV in North with SalesAmount 1000), returning only unique combinations of Product and Region.

When to Use DISTINCT:

  1. Removing Duplicates: When you need to ensure that the result contains only unique values (e.g., a list of distinct categories or unique combinations of columns).
  2. Aggregating Data: In conjunction with aggregation functions like COUNT(), SUM(), AVG(), etc., to get unique values before performing calculations.

Performance Considerations:

  • Using DISTINCT can be resource-intensive, especially on large tables, because it requires the database to check all rows to remove duplicates.
  • If performance is critical and you know that the data is already unique or you can filter duplicates through other means (like using GROUP BY), then avoid using DISTINCT.

Conclusion:

  • The DISTINCT keyword in SQL is used to eliminate duplicate rows from the result set, ensuring that only unique records are returned.
  • It applies to all selected columns and can be used with one or more columns to find distinct combinations.

Question: Write a SQL query to find the top 3 highest paid employees.

Answer:

To find the top 3 highest paid employees, you can use the ORDER BY clause to sort the employees based on their salary in descending order and then limit the results to the top 3 using the LIMIT (in MySQL, PostgreSQL) or TOP (in SQL Server) clause.

SQL Query (for MySQL, PostgreSQL, SQLite, etc.):

SELECT EmployeeID, Name, Salary
FROM Employees
ORDER BY Salary DESC
LIMIT 3;

Explanation:

  1. SELECT EmployeeID, Name, Salary: This selects the columns for the employee ID, name, and salary.
  2. FROM Employees: Specifies the table from which the data will be retrieved.
  3. ORDER BY Salary DESC: Sorts the result by the Salary column in descending order (highest salary first).
  4. LIMIT 3: Limits the result to the top 3 rows (i.e., the 3 highest-paid employees).

SQL Query (for SQL Server):

SQL Server uses TOP instead of LIMIT to restrict the number of rows returned:

SELECT TOP 3 EmployeeID, Name, Salary
FROM Employees
ORDER BY Salary DESC;

Example Data in Employees Table:

EmployeeIDNameSalary
1John5000
2Sarah6000
3David7000
4Michael8000
5Emma9000

Result (Top 3 Highest Paid Employees):

EmployeeIDNameSalary
5Emma9000
4Michael8000
3David7000

This query will return the top 3 highest paid employees based on their salaries.

Question: What are INNER JOIN and OUTER JOIN in SQL?

Answer:

INNER JOIN and OUTER JOIN are both used in SQL to combine rows from two or more tables based on a related column between them. However, they differ in how they handle matching and non-matching rows.


1. INNER JOIN:

An INNER JOIN returns only the rows where there is a match in both tables based on the specified condition.

  • If a row from the left table does not have a matching row in the right table, it will not appear in the result set.
  • It is the most common type of join.

Syntax:

SELECT columns
FROM table1
INNER JOIN table2 ON table1.column = table2.column;

Example:

Consider two tables, Employees and Departments:

Employees Table:

EmployeeIDNameDepartmentID
1John101
2Sarah102
3David103
4MichaelNULL

Departments Table:

DepartmentIDDepartmentName
101HR
102IT
103Finance

Query:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
INNER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
SarahIT
DavidFinance
  • Michael does not appear because his DepartmentID is NULL, and no matching department exists for him.

2. OUTER JOIN:

An OUTER JOIN returns all rows from at least one table and the matching rows from the other table. If there is no match, the result will contain NULL values for the columns from the table that does not have a match.

There are three types of OUTER JOIN:

  • LEFT OUTER JOIN (or simply LEFT JOIN)
  • RIGHT OUTER JOIN (or simply RIGHT JOIN)
  • FULL OUTER JOIN

2.1 LEFT OUTER JOIN (LEFT JOIN):

A LEFT OUTER JOIN returns all the rows from the left table (table1) and the matching rows from the right table (table2). If there is no match in the right table, the result will contain NULL values for columns from the right table.

Syntax:

SELECT columns
FROM table1
LEFT OUTER JOIN table2 ON table1.column = table2.column;

Example:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
LEFT OUTER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
SarahIT
DavidFinance
MichaelNULL
  • Michael is included, but since there is no matching DepartmentID, the DepartmentName is NULL.

2.2 RIGHT OUTER JOIN (RIGHT JOIN):

A RIGHT OUTER JOIN returns all the rows from the right table (table2) and the matching rows from the left table (table1). If there is no match in the left table, the result will contain NULL values for columns from the left table.

Syntax:

SELECT columns
FROM table1
RIGHT OUTER JOIN table2 ON table1.column = table2.column;

Example:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
RIGHT OUTER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
SarahIT
DavidFinance
NULLMarketing
  • If there are departments without matching employees, those departments are still included in the result with NULL for the employee name.

2.3 FULL OUTER JOIN:

A FULL OUTER JOIN returns all rows when there is a match in one of the tables. If there is no match, the result will contain NULL values for the columns of the table without a match.

Syntax:

SELECT columns
FROM table1
FULL OUTER JOIN table2 ON table1.column = table2.column;

Example:

SELECT Employees.Name, Departments.DepartmentName
FROM Employees
FULL OUTER JOIN Departments
ON Employees.DepartmentID = Departments.DepartmentID;

Result:

NameDepartmentName
JohnHR
SarahIT
DavidFinance
MichaelNULL
NULLMarketing
  • In this example, Michael has no department, and Marketing has no employees, so NULL is shown for the unmatched values.

Summary of Differences:

Join TypeReturnsExample
INNER JOINOnly matching rows from both tablesJohn, Sarah, David
LEFT JOINAll rows from the left table and matching from the right tableJohn, Sarah, David, Michael
RIGHT JOINAll rows from the right table and matching from the left tableJohn, Sarah, David, Marketing
FULL OUTER JOINAll rows from both tables, with NULL for non-matching rowsJohn, Sarah, David, Michael, Marketing

Key Takeaways:

  • INNER JOIN: Returns only the rows where there is a match in both tables.

  • OUTER JOIN: Returns all rows from one or both tables, filling in NULL where there is no match.

    • LEFT JOIN: All rows from the left table and matching rows from the right.
    • RIGHT JOIN: All rows from the right table and matching rows from the left.
    • FULL OUTER JOIN: All rows from both tables, with NULL for unmatched rows.

    Question: What is the SELECT INTO statement in SQL?

Answer:

The SELECT INTO statement in SQL is used to select data from one table and insert it into a new table in a single operation. This is especially useful when you want to create a new table based on the structure and data of an existing table.

The SELECT INTO statement creates a new table in the database and populates it with the selected data. The new table will have the same column definitions (data types) as the columns in the SELECT query.


Syntax:

SELECT column1, column2, ...
INTO new_table
FROM existing_table
WHERE condition;
  • column1, column2, ...: The columns you want to select from the source table.
  • new_table: The name of the new table that will be created.
  • existing_table: The table from which you are selecting the data.
  • condition: An optional condition to filter the rows you want to select (using the WHERE clause).

Key Points:

  1. Creates a new table: The new table (new_table) does not have to exist before executing the SELECT INTO statement. If it does not exist, it is created automatically.
  2. Data types are copied: The new table will have the same data types as the columns of the SELECT query from the existing table.
  3. No indexes or constraints: The SELECT INTO statement does not copy indexes, primary keys, or foreign keys from the original table. Only the table structure and data are copied.
  4. Works in SQL Server: SELECT INTO is commonly used in SQL Server, but some databases like MySQL or PostgreSQL may use a different syntax (such as CREATE TABLE AS SELECT).

Example 1: Basic SELECT INTO to create a new table

Suppose you have a table Employees, and you want to create a new table EmployeeBackup with the same structure and data:

Employees Table:

EmployeeIDNameDepartment
1JohnHR
2SarahIT
3DavidFinance

Query:

SELECT EmployeeID, Name, Department
INTO EmployeeBackup
FROM Employees;

Result:

  • A new table called EmployeeBackup is created with the same structure and data as the Employees table.

EmployeeBackup Table:

EmployeeIDNameDepartment
1JohnHR
2SarahIT
3DavidFinance

Example 2: SELECT INTO with a WHERE condition

You can also use a WHERE clause to filter the rows you want to copy into the new table.

Query:

SELECT EmployeeID, Name, Department
INTO ITDepartmentEmployees
FROM Employees
WHERE Department = 'IT';

Result:

  • A new table ITDepartmentEmployees is created, but it will only contain the employees whose department is IT.

ITDepartmentEmployees Table:

EmployeeIDNameDepartment
2SarahIT

Limitations:

  • Does not copy constraints: The SELECT INTO statement does not copy any primary keys, foreign keys, or indexes from the original table. These need to be added manually if necessary.
  • Cannot be used with existing tables: The SELECT INTO statement creates a new table, so it cannot be used to insert data into an already existing table. For inserting into an existing table, you would use the INSERT INTO statement instead.

Conclusion:

  • The SELECT INTO statement in SQL is useful for creating new tables and inserting data into them based on the result of a query.
  • It is widely used in scenarios where you need to back up data, create temporary tables, or clone a subset of data from one table into another.

Question: What is the ROW_NUMBER() function in SQL?

Answer:

The ROW_NUMBER() function in SQL is a window function that assigns a unique sequential integer to rows within a result set. The numbering starts at 1 for the first row and increments by 1 for each subsequent row in the specified order. It is typically used to assign a “rank” or “row number” to rows in a result set based on a particular order.

Key Points:

  1. Window Function: ROW_NUMBER() is often used in the context of a window, which means that it can operate over a subset of data (called a “window”) defined by the PARTITION BY clause. If no partitioning is needed, it can be used over the entire result set.
  2. Order: The ROW_NUMBER() function requires an ORDER BY clause to determine the sequence in which the row numbers are assigned.
  3. Uniqueness: Each row gets a unique row number within the specified partition (or the entire result set if no partition is defined).

Syntax:

SELECT ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2) AS row_num
FROM table_name;
  • PARTITION BY column1: (Optional) Divides the result set into partitions to which the ROW_NUMBER() function is applied. If omitted, the entire result set is treated as a single partition.
  • ORDER BY column2: Defines the order in which the row numbers are assigned.
  • row_num: The alias for the column that holds the row numbers.

Example 1: Simple ROW_NUMBER() with no partition

Query:

SELECT EmployeeID, Name, Department,
       ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num
FROM Employees;

Result:

EmployeeIDNameDepartmentrow_num
1JohnHR1
2SarahIT2
3DavidFinance3
4MikeHR4
  • In this example, the rows are ordered by EmployeeID, and a unique sequential row number is assigned to each row.

Example 2: Using ROW_NUMBER() with PARTITION BY

Query:

SELECT EmployeeID, Name, Department,
       ROW_NUMBER() OVER (PARTITION BY Department ORDER BY EmployeeID) AS row_num
FROM Employees;

Result:

EmployeeIDNameDepartmentrow_num
1JohnHR1
4MikeHR2
2SarahIT1
3DavidFinance1
  • In this example, the ROW_NUMBER() function is applied to each department separately (due to PARTITION BY Department).
  • The row number is assigned to each row within each department, and the order within each department is determined by the EmployeeID.

Use Cases:

  1. Pagination: ROW_NUMBER() is commonly used for paginating query results. For example, if you want to retrieve a specific “page” of records, you can use ROW_NUMBER() to assign row numbers and filter the rows for the desired page.

    Example for pagination:

    SELECT * 
    FROM (
        SELECT EmployeeID, Name, Department,
               ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num
        FROM Employees
    ) AS numbered_employees
    WHERE row_num BETWEEN 11 AND 20;
  2. Ranking: It can be used to assign a sequential rank to rows, especially in cases where you need to identify the “top N” rows or a specific rank of rows in a dataset.

    Example for ranking:

    SELECT EmployeeID, Name, Salary,
           ROW_NUMBER() OVER (ORDER BY Salary DESC) AS rank
    FROM Employees;
  3. Identifying duplicates: ROW_NUMBER() can help identify duplicates by partitioning the data and ordering it to check for repeat values.


Limitations:

  • Not suitable for ties: Unlike RANK() or DENSE_RANK(), ROW_NUMBER() does not handle ties. Each row gets a unique number, even if rows have the same values.
  • Performance: Depending on the size of the dataset and the complexity of the query, the ROW_NUMBER() function can impact query performance, especially when partitioning and ordering over large datasets.

Conclusion:

The ROW_NUMBER() function is a versatile SQL function useful for creating unique sequential numbers for rows within a result set. It is particularly useful for tasks like pagination, ranking, or partitioning data for analysis.

Question: How do you fetch only odd or even rows from a table in SQL?

Answer:

In SQL, you can fetch odd or even rows by utilizing the ROW_NUMBER() window function. This function assigns a unique sequential integer to rows within a result set, which can then be used to filter for odd or even rows.

Here’s how you can approach it:

Steps:

  1. Assign Row Numbers: Use the ROW_NUMBER() function to assign a sequential number to each row in the result set.
  2. Filter Rows: After assigning row numbers, use the modulus operator (%) to filter for odd or even rows.

Example 1: Fetching Odd Rows (1st, 3rd, 5th, etc.)

Query:

WITH NumberedRows AS (
    SELECT EmployeeID, Name, Department,
           ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num
    FROM Employees
)
SELECT EmployeeID, Name, Department
FROM NumberedRows
WHERE row_num % 2 <> 0;

Explanation:

  • ROW_NUMBER() assigns a unique sequential number to each row ordered by EmployeeID.
  • WHERE row_num % 2 <> 0 filters out the even-numbered rows and keeps only the odd-numbered ones (i.e., rows with odd row numbers).

Example 2: Fetching Even Rows (2nd, 4th, 6th, etc.)

Query:

WITH NumberedRows AS (
    SELECT EmployeeID, Name, Department,
           ROW_NUMBER() OVER (ORDER BY EmployeeID) AS row_num
    FROM Employees
)
SELECT EmployeeID, Name, Department
FROM NumberedRows
WHERE row_num % 2 = 0;

Explanation:

  • ROW_NUMBER() again assigns a sequential number to each row.
  • WHERE row_num % 2 = 0 filters out the odd-numbered rows and keeps only the even-numbered ones (i.e., rows with even row numbers).

Notes:

  • The ROW_NUMBER() function is useful when you need to work with a specific order, such as ordering by a primary key (EmployeeID in this example). You can change the ORDER BY clause inside the ROW_NUMBER() function depending on how you want to order the rows.
  • These queries can be modified to handle partitions (e.g., PARTITION BY Department in case you want to fetch odd or even rows per department).

Conclusion:

By using the ROW_NUMBER() function with the modulus operator (%), you can easily filter and fetch either odd or even rows from a table. This approach provides flexibility for more complex queries where you may want to work with specific rows based on their order.

Read More

If you can’t get enough from this article, Aihirely has plenty more related information, such as sql interview questions, sql interview experiences, and details about various sql job positions. Click here to check it out.

Trace Job opportunities

Hirely, your exclusive interview companion, empowers your competence and facilitates your interviews.

Get Started Now