Popular R Interview Questions

author image Hirely
at 08 Jan, 2025

Question: What is a data frame in R?

Answer:

A data frame in R is one of the most commonly used data structures for storing and working with data. It is a two-dimensional, tabular structure that allows you to store data in rows and columns, where each column can hold a different type of data (e.g., numeric, character, logical, etc.). Essentially, a data frame is similar to a spreadsheet or a SQL table, and it’s very efficient for data manipulation and analysis.

Key Characteristics of a Data Frame:

  1. Two-Dimensional Structure:

    • A data frame consists of rows and columns, where:
      • Rows represent individual records or observations.
      • Columns represent variables or features.
  2. Heterogeneous Data Types:

    • Each column can contain different data types (e.g., one column might contain numeric values, another might contain character strings, etc.).
    • This makes data frames versatile for handling real-world datasets, where variables of different types need to be stored together.
  3. Column Names:

    • Each column in a data frame has a name (a label), which is used to refer to the column. These column names must be unique.
    • Column names are typically stored as character vectors.
  4. Data Frame Properties:

    • Attributes: Data frames can have row names (optional), but the default is simply the sequential numbering of rows.
    • Row Access: Data frames allow you to access rows and columns by their index, and you can also access them by column names.

How to Create a Data Frame in R:

You can create a data frame in R using the data.frame() function.

# Example: Creating a simple data frame
data <- data.frame(
  Name = c("John", "Alice", "Bob"),
  Age = c(25, 30, 22),
  Gender = c("Male", "Female", "Male")
)

# View the data frame
print(data)

This creates a data frame with 3 columns: Name, Age, and Gender, and 3 rows.

Output:

   Name Age Gender
1  John  25   Male
2 Alice  30 Female
3   Bob  22   Male

Accessing Data in a Data Frame:

  1. Accessing Columns:

    • You can access columns by name or by index.
    data$Age  # Access by column name
    data[["Age"]]  # Alternative way to access by column name
    data[, 2]  # Access by column index (2nd column)
  2. Accessing Rows:

    • You can access specific rows using indices.
    data[1, ]  # Access the first row
    data[2, ]  # Access the second row
  3. Accessing Specific Cells:

    • You can access a specific cell using both row and column indices.
    data[1, 2]  # Access the value in the first row, second column

Manipulating Data in a Data Frame:

  1. Adding a New Column:

    data$Country <- c("USA", "Canada", "UK")  # Adding a new column
  2. Subsetting Rows Based on Conditions:

    # Select rows where Age is greater than 25
    subset_data <- data[data$Age > 25, ]
  3. Sorting:

    # Sort data by Age (ascending)
    sorted_data <- data[order(data$Age), ]
  4. Removing Columns:

    data$Country <- NULL  # Removes the 'Country' column

Advantages of Data Frames:

  • Flexibility: They can handle mixed data types in different columns, making them useful for a variety of data analysis tasks.
  • Data Handling: R has a rich set of functions for manipulating data frames, such as subset(), merge(), aggregate(), and apply(), which makes them a powerful tool for data wrangling.
  • Compatibility: Data frames can easily be exported to and imported from external sources like CSV files, Excel files, databases, and more.

Comparison with Other R Data Structures:

  • Vectors: A vector is a one-dimensional array that contains data of a single type. Unlike data frames, vectors cannot hold different types of data in different positions.
  • Matrices: A matrix is similar to a data frame but can only hold elements of the same data type. It lacks the flexibility of data frames when it comes to heterogeneous data.
  • Lists: A list in R can hold data of different types, including vectors, matrices, and even data frames. However, unlike a data frame, the elements of a list are not organized in a tabular format.

Conclusion:

A data frame in R is an essential and highly flexible structure for working with data. It allows for the storage of heterogeneous data types and is widely used in data manipulation, statistical analysis, and visualization. Data frames form the backbone of many data analysis workflows in R, and understanding how to work with them is fundamental to performing data analysis in R.

Question: What are the different data types in R?

Answer:

R, being a high-level statistical programming language, offers a variety of data types that help in organizing and manipulating data effectively. These data types can be categorized into atomic data types and complex data structures. Here’s a detailed overview of the most common data types in R:


1. Atomic Data Types

Atomic data types are the simplest type of data in R. They cannot be divided into smaller components and are the building blocks of more complex data structures like vectors, matrices, and data frames.

(a) Numeric

  • Definition: Numeric data types represent numbers. In R, numeric values can be both integers and floating-point numbers (decimals).
  • Examples:
    x <- 25.5  # Numeric (floating point)
    y <- 42    # Numeric (integer)

(b) Integer

  • Definition: Integer values are whole numbers without a decimal point.
  • Examples:
    x <- 25L  # Integer (Note the 'L' suffix)
    y <- -42L
  • Note: In R, integers are denoted by appending an “L” to the number.

(c) Complex

  • Definition: Complex numbers are numbers that have a real and an imaginary part.
  • Examples:
    z <- 2 + 3i  # Complex number (real part = 2, imaginary part = 3)

(d) Character

  • Definition: Character data types are used to store textual data or strings. In R, text is enclosed in either double quotes (" ") or single quotes (' ').
  • Examples:
    name <- "John"
    message <- 'Hello, World!'

(e) Logical

  • Definition: Logical values represent TRUE or FALSE. These are often used in logical conditions and decision-making processes.
  • Examples:
    is_active <- TRUE
    is_valid <- FALSE

(f) Raw

  • Definition: The raw data type represents raw bytes (useful in binary data handling). Raw values are typically used for low-level operations and are less commonly used in typical data analysis.
  • Examples:
    x <- as.raw(25)

2. Structured Data Types

These are more complex data structures that allow you to combine atomic data types.

(a) Vectors

  • Definition: A vector is an ordered collection of elements of the same data type (numeric, character, logical, etc.). It is the most basic data structure in R.
  • Examples:
    nums <- c(1, 2, 3, 4)  # Numeric vector
    names <- c("Alice", "Bob", "Charlie")  # Character vector

(b) Lists

  • Definition: A list is an ordered collection of elements, but unlike vectors, the elements can be of different data types (numeric, character, logical, etc.). Lists can hold other complex structures like vectors, matrices, or even other lists.
  • Examples:
    my_list <- list(1, "Hello", TRUE, c(1, 2, 3))

(c) Matrices

  • Definition: A matrix is a two-dimensional array where all elements must be of the same data type. It is like a vector, but organized into rows and columns.
  • Examples:
    mat <- matrix(1:6, nrow=2, ncol=3)  # 2 rows and 3 columns

(d) Data Frames

  • Definition: A data frame is a two-dimensional structure that is similar to a matrix, but it allows each column to contain different data types (numeric, character, etc.). It is one of the most commonly used structures in R for handling tabular data.
  • Examples:
    df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))

(e) Factors

  • Definition: A factor is used to represent categorical data. It is an R data type for storing categorical variables that take on a limited number of unique values, called levels.
  • Examples:
    gender <- factor(c("Male", "Female", "Male"))

3. Special Data Types

(a) NULL

  • Definition: NULL represents an absence of any value or object. It is used to represent missing or undefined data.
  • Examples:
    x <- NULL

(b) NA (Not Available)

  • Definition: NA represents missing or undefined data. It is used in cases where data is missing from a dataset.
  • Examples:
    age <- c(25, NA, 30)

(c) NaN (Not a Number)

  • Definition: NaN is a special value that represents an undefined or unrepresentable number, such as the result of 0/0.
  • Examples:
    x <- 0/0  # Result is NaN

(d) Inf (Infinity)

  • Definition: Inf represents positive infinity, and -Inf represents negative infinity. They are used when a number exceeds the range of representable numbers.
  • Examples:
    positive_inf <- Inf
    negative_inf <- -Inf

Summary of R Data Types:

Data TypeDescriptionExample
NumericReal numbers (e.g., floating-point numbers)25.5, 42, 3.14
IntegerWhole numbers (appended with ‘L’)25L, -42L
ComplexNumbers with real and imaginary parts2 + 3i
CharacterText or string data"Hello", 'World'
LogicalBoolean values (TRUE/FALSE)TRUE, FALSE
RawRaw binary data (rarely used)as.raw(25)
VectorOrdered collection of elements of the same typec(1, 2, 3), c('a', 'b', 'c')
ListOrdered collection of elements of different typeslist(1, 'apple', TRUE)
MatrixTwo-dimensional array of same data typematrix(1:6, nrow=2, ncol=3)
Data FrameTabular data structure with different data typesdata.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
FactorCategorical data with levelsfactor(c('Male', 'Female', 'Male'))
NULLRepresents absence of a valueNULL
NARepresents missing dataNA
NaNRepresents undefined numerical resultNaN
InfPositive or negative infinityInf, -Inf

Conclusion:

R provides a wide range of atomic and structured data types, which are essential for performing various types of data manipulation, analysis, and modeling. Understanding these data types is crucial for working efficiently in R and performing accurate data analysis tasks.

Read More

If you can’t get enough from this article, Aihirely has plenty more related information, such as R interview questions, R interview experiences, and details about various R job positions. Click here to check it out.

Related Posts

Trace Job opportunities

Hirely, your exclusive interview companion, empowers your competence and facilitates your interviews.

Get Started Now