Top R Interview Questions and Answers for 2025

author image Hirely
at 08 Jan, 2025

Question: What is the difference between a matrix and a data frame in R?

Answer:

Both matrices and data frames are used to store data in R, but they have distinct characteristics and are used for different purposes. Here’s a breakdown of the differences:


1. Structure

  • Matrix:

    • A matrix is a two-dimensional array in R that stores data of the same type (numeric, character, etc.).
    • Matrices have rows and columns, and every element in a matrix must be of the same data type.
    • The matrix is created using the matrix() function.

    Example:

    mat <- matrix(1:9, nrow = 3, ncol = 3)
    print(mat)

    This creates a 3x3 matrix of numbers from 1 to 9.

  • Data Frame:

    • A data frame is a two-dimensional table-like structure used for storing data of different types (numeric, character, factor, etc.).
    • Unlike matrices, columns in a data frame can have different types of data.
    • Data frames are typically used for storing datasets in R and are created using the data.frame() function.

    Example:

    df <- data.frame(
      Name = c("John", "Alice", "Bob"),
      Age = c(25, 30, 22),
      Score = c(90.5, 85.3, 78.9)
    )
    print(df)

    This creates a data frame with columns of different data types (character, numeric).


2. Homogeneity of Data

  • Matrix:

    • All elements in a matrix must be of the same data type. If you attempt to mix data types (for example, numeric and character), R will automatically coerce all elements into the most general type (e.g., converting all to character type).

    Example:

    mat <- matrix(c(1, "a", 3, 4), nrow = 2, ncol = 2)
    print(mat)

    Output:

        [,1] [,2]
    [1,] "1"  "3" 
    [2,] "a"  "4"

    The numeric value 1 is converted to a character string "1" because one of the elements in the matrix is a character.

  • Data Frame:

    • Each column in a data frame can contain different types of data (e.g., numeric, character, factor), making data frames more flexible than matrices when dealing with real-world data.

    Example:

    df <- data.frame(
      ID = 1:3,
      Name = c("Alice", "Bob", "Charlie"),
      Age = c(25, 30, 22)
    )
    print(df)

    Output:

      ID    Name Age
    1  1   Alice  25
    2  2     Bob  30
    3  3 Charlie  22

    Here, each column (ID, Name, Age) has a different data type: numeric, character, and numeric, respectively.


3. Usage

  • Matrix:

    • Typically used when you need to perform matrix operations such as linear algebra (matrix multiplication, inverse, etc.).
    • It is a mathematical object that is well-suited for mathematical computations where all data is of the same type.

    Example (Matrix multiplication):

    mat1 <- matrix(1:4, nrow = 2, ncol = 2)
    mat2 <- matrix(5:8, nrow = 2, ncol = 2)
    result <- mat1 %*% mat2  # Matrix multiplication
    print(result)
  • Data Frame:

    • Primarily used for storing and manipulating data in tabular form.
    • Ideal for use in data analysis, where different types of data (e.g., numeric, categorical) are often mixed in the same dataset.
    • Data frames are also the most common structure used for importing and working with datasets in R.

    Example (Working with data frames):

    df <- data.frame(
      Name = c("John", "Alice", "Bob"),
      Age = c(25, 30, 22),
      Score = c(90.5, 85.3, 78.9)
    )
    summary(df)

4. Indexing and Accessing Data

  • Matrix:

    • Indexing in a matrix is done using two indices: one for the row and one for the column.

    Example:

    mat <- matrix(1:9, nrow = 3, ncol = 3)
    mat[2, 3]  # Access the element at row 2, column 3
  • Data Frame:

    • Data frames can be accessed similarly using indexing, but you can also reference columns by name.

    Example:

    df <- data.frame(
      Name = c("John", "Alice", "Bob"),
      Age = c(25, 30, 22)
    )
    df[1, 2]  # Access the element at row 1, column 2 (Age)
    df$Name   # Access the "Name" column by name

5. Efficiency

  • Matrix:

    • Matrices are more efficient when working with large datasets that contain only one type of data because R does not need to manage multiple types of data in each column.
  • Data Frame:

    • Data frames are less efficient in terms of memory and computational speed because they allow different data types in different columns.

6. Summary of Differences:

FeatureMatrixData Frame
Data TypeHomogeneous (all elements must be the same type)Heterogeneous (each column can have different types)
Structure2D array with rows and columns2D table with rows and columns
Use CaseMathematical operations, matrix algebraStoring and analyzing data with mixed data types
IndexingTwo-dimensional indexing (row, column)Two-dimensional or column-based indexing (with names)
Data HandlingEfficient for numerical dataFlexible for real-world data (numeric, character, factors)
OperationsSuited for mathematical operations like matrix multiplicationSuited for data manipulation and analysis

Summary:

  • A matrix is used when you need to store and manipulate data of the same type (e.g., numeric data) and perform mathematical operations.
  • A data frame is used when you need to work with tabular data that may include different types (numeric, character, factor), making it more suitable for data analysis and statistical operations.

Matrices are ideal for mathematical computations, while data frames are ideal for data analysis, as they allow the storage of diverse data types in a structured format.

Question: What is the tapply() function in R?

Answer:

The tapply() function in R is used to apply a function to subsets of a vector, based on the values of a factor or a grouping variable. It allows you to perform operations on grouped data, similar to the apply() function but with a focus on data grouped by a factor.


Syntax:

tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

Arguments:

  • X: A vector (usually numeric) on which the function will be applied.
  • INDEX: A factor or a list of factors that define the subsets of the vector X.
  • FUN: The function to be applied to each subset of data.
  • : Additional arguments passed to the function FUN.
  • simplify: If TRUE (default), the result will be simplified to an array or vector. If FALSE, the result will be returned as a list.

How does tapply() work?

  • Grouping: It groups the vector X based on the factor(s) in INDEX.
  • Function application: It then applies the function FUN to each subset of data.
  • Return: It returns the result in a simplified form (unless simplify = FALSE, in which case a list is returned).

Example 1: Basic Usage of tapply()

Suppose you have a vector of numbers representing scores, and a factor representing two different groups (e.g., male and female).

# Data
scores <- c(85, 92, 78, 88, 95, 77)
gender <- factor(c("Male", "Female", "Male", "Female", "Male", "Female"))

# Applying tapply to calculate the mean score for each gender
result <- tapply(scores, gender, mean)

print(result)

Output:

  Female    Male 
  84.66667  86.66667

In this example:

  • scores is the numeric vector.
  • gender is the factor that defines the grouping.
  • The function mean is applied to each subset (Male and Female), and the mean score is calculated for each group.

Example 2: Using tapply() with Multiple Factors

You can also use tapply() with multiple grouping factors. For example, if you have another factor for Age Group and want to apply a function to multiple factors.

# Data
scores <- c(85, 92, 78, 88, 95, 77)
gender <- factor(c("Male", "Female", "Male", "Female", "Male", "Female"))
age_group <- factor(c("Adult", "Adult", "Teen", "Teen", "Adult", "Teen"))

# Applying tapply to calculate mean score for each combination of Gender and Age Group
result <- tapply(scores, list(gender, age_group), mean)

print(result)

Output:

        age_group
gender   Adult Teen
  Female  92.0  77.0
  Male    90.0  78.0

In this example:

  • scores is the numeric vector.
  • gender and age_group are the factors that define the groups.
  • The mean score is computed for each combination of gender and age_group.

Example 3: Using a Custom Function with tapply()

You can also apply custom functions using tapply(). For instance, you might want to calculate the sum of scores for each gender:

# Data
scores <- c(85, 92, 78, 88, 95, 77)
gender <- factor(c("Male", "Female", "Male", "Female", "Male", "Female"))

# Applying tapply to calculate the sum of scores for each gender
result <- tapply(scores, gender, sum)

print(result)

Output:

 Female   Male 
  257    258 

Here, we used sum as the function to apply, so the sum of scores for each gender is calculated.


Summary of tapply() Usage:

  • tapply() is used to apply a function to subsets of data, grouped by a factor (or multiple factors).
  • It simplifies operations like calculating the mean, sum, or other statistical functions for each group in the data.
  • It returns the result in a simplified format, or as a list if simplify = FALSE.

Common Uses:

  • Calculating aggregate statistics (mean, sum, etc.) by group.
  • Grouping data by categorical variables.
  • Applying custom functions to grouped data.

Read More

If you can’t get enough from this article, Aihirely has plenty more related information, such as R interview questions, R interview experiences, and details about various R job positions. Click here to check it out.

Related Posts

Trace Job opportunities

Hirely, your exclusive interview companion, empowers your competence and facilitates your interviews.

Get Started Now