R Interview Questions and Answers(New)
Question: What are R’s built-in statistical functions?
Answer:
R is a powerful statistical computing and data analysis language, offering a wide variety of built-in functions for statistical analysis. These functions cover common tasks like summarizing data, performing hypothesis tests, modeling, and more. Here’s a comprehensive list of R’s built-in statistical functions, categorized by their primary use.
1. Descriptive Statistics
These functions are used to summarize or describe the main features of a dataset.
-
mean()
: Computes the arithmetic mean (average) of a numeric vector.mean(x) # x is a numeric vector
-
median()
: Computes the median of a numeric vector.median(x)
-
sd()
: Computes the standard deviation of a numeric vector.sd(x)
-
var()
: Computes the variance of a numeric vector.var(x)
-
summary()
: Provides a summary of the main statistics (min, 1st quartile, median, mean, 3rd quartile, max) for a dataset or vector.summary(x)
-
quantile()
: Computes the quantiles (e.g., 25th, 50th, and 75th percentiles) of a numeric vector.quantile(x)
-
range()
: Computes the minimum and maximum values of a vector.range(x)
-
IQR()
: Computes the interquartile range (Q3 - Q1).IQR(x)
2. Probability Distributions
R provides functions for working with common probability distributions (e.g., Normal, Binomial, Poisson).
-
Normal Distribution:
dnorm()
: Probability density function (PDF) for a normal distribution.pnorm()
: Cumulative distribution function (CDF) for a normal distribution.qnorm()
: Quantile function (inverse CDF) for a normal distribution.rnorm()
: Generates random numbers from a normal distribution.
dnorm(x, mean = 0, sd = 1) # PDF pnorm(q, mean = 0, sd = 1) # CDF qnorm(p, mean = 0, sd = 1) # Inverse CDF rnorm(n, mean = 0, sd = 1) # Generate random numbers
-
Binomial Distribution:
dbinom()
: Probability mass function (PMF) for the binomial distribution.pbinom()
: CDF for the binomial distribution.qbinom()
: Quantile function for the binomial distribution.rbinom()
: Generates random numbers from a binomial distribution.
dbinom(x, size, prob) # PMF pbinom(q, size, prob) # CDF rbinom(n, size, prob) # Random numbers
-
Poisson Distribution:
dpois()
: PMF for the Poisson distribution.ppois()
: CDF for the Poisson distribution.qpois()
: Quantile function for the Poisson distribution.rpois()
: Generates random numbers from a Poisson distribution.
dpois(x, lambda) # PMF ppois(q, lambda) # CDF rpois(n, lambda) # Random numbers
-
Other Distributions: Functions for other distributions include
dunif()
,pexp()
,dgamma()
,dt()
,dbeta()
, etc.
3. Hypothesis Testing
R provides a set of functions for hypothesis testing, including tests for means, variances, and proportions.
-
t.test()
: Performs a t-test to compare means of two samples or a sample mean to a known value.t.test(x, y) # Two-sample t-test t.test(x, mu = 0) # One-sample t-test
-
aov()
: Performs an analysis of variance (ANOVA) to compare means across multiple groups.aov(formula, data)
-
chisq.test()
: Performs a chi-squared test for independence or goodness of fit.chisq.test(x, y) # Test for independence chisq.test(x) # Goodness of fit
-
cor.test()
: Tests for correlation between two variables.cor.test(x, y) # Pearson correlation test
-
wilcox.test()
: Performs the Wilcoxon rank-sum test (non-parametric alternative to the t-test).wilcox.test(x, y)
-
fisher.test()
: Performs Fisher’s exact test for small sample sizes.fisher.test(x)
4. Linear and Non-linear Regression
R provides several functions for fitting linear and non-linear models.
-
lm()
: Fits a linear regression model.model <- lm(formula, data)
-
glm()
: Fits a generalized linear model (e.g., logistic regression).model <- glm(formula, family = binomial, data)
-
nls()
: Fits a non-linear least squares model.model <- nls(formula, data)
5. Model Evaluation and Diagnostics
These functions allow you to assess and diagnose model fit.
-
anova()
: Performs analysis of variance for model comparison.anova(model1, model2)
-
residuals()
: Extracts residuals from a model.residuals(model)
-
fitted()
: Extracts fitted values from a model.fitted(model)
-
confint()
: Computes confidence intervals for model parameters.confint(model)
-
predict()
: Makes predictions from a fitted model.predict(model, newdata)
6. Time Series Analysis
R has several functions specifically designed for time series analysis.
-
ts()
: Creates a time series object.ts(data, frequency = 12, start = c(2020, 1))
-
acf()
: Computes and plots the autocorrelation function.acf(ts_data)
-
pacf()
: Computes and plots the partial autocorrelation function.pacf(ts_data)
-
auto.arima()
: Fits an ARIMA model to a time series.library(forecast) auto.arima(ts_data)
7. Multivariate Analysis
R also provides functions for multivariate analysis.
-
prcomp()
: Performs principal component analysis (PCA).prcomp(data)
-
kmeans()
: Performs k-means clustering.kmeans(data, centers = 3)
-
hclust()
: Performs hierarchical clustering.hclust(dist(data))
-
manova()
: Performs multivariate analysis of variance.manova(formula)
8. Bayesian Statistics
For Bayesian analysis, R has several built-in functions, as well as packages like rjags
, rstan
, and brms
.
bayes.test()
: Performs a Bayesian test for two samples (requires additional packages likeBayesFactor
).bayes.test(x, y)
9. Random Number Generation
R also provides functions to generate random numbers for various probability distributions.
-
runif()
: Generates random numbers from a uniform distribution.runif(n, min = 0, max = 1)
-
rnorm()
: Generates random numbers from a normal distribution.rnorm(n, mean = 0, sd = 1)
-
rbinom()
: Generates random numbers from a binomial distribution.rbinom(n, size, prob)
Summary
R provides a comprehensive suite of built-in statistical functions for a wide range of tasks:
- Descriptive statistics:
mean()
,median()
,sd()
,summary()
,quantile()
- Probability distributions:
dnorm()
,dbinom()
,dpois()
, etc. - Hypothesis tests:
t.test()
,chisq.test()
,aov()
, `cor
.test()`
- Regression analysis:
lm()
,glm()
,nls()
- Model diagnostics:
anova()
,residuals()
,fitted()
,confint()
- Time series:
acf()
,pacf()
,auto.arima()
- Multivariate analysis:
prcomp()
,kmeans()
,hclust()
These functions are part of the base R installation and are crucial for data analysis and statistical modeling in R.
Question: What is the tidyr
package in R?
Answer:
The tidyr
package in R is part of the “tidyverse” suite of packages, which is designed to help clean and organize data for analysis. It provides a set of functions to reshape and tidy data into a standard format, making it easier to manipulate, analyze, and visualize. The goal of tidyr
is to transform data into a “tidy” format, where each variable is a column, each observation is a row, and each type of observational unit forms a table.
Some key functions in tidyr
include:
gather()
: Converts wide-format data into long-format data. For example, it can be used to pivot multiple columns into key-value pairs.spread()
: Converts long-format data into wide-format by spreading a key-value pair across multiple columns.separate()
: Splits a single column into multiple columns based on a delimiter (e.g., splitting a date column into year, month, and day).unite()
: Combines multiple columns into a single column.drop_na()
: Removes rows with missing values (NA).replace_na()
: Replaces NA values with specified replacements.
By using tidyr
, you can clean and structure your dataset to make it ready for further analysis or visualization, aligning with the tidy data principles.
Read More
If you can’t get enough from this article, Aihirely has plenty more related information, such as R interview questions, R interview experiences, and details about various R job positions. Click here to check it out.
Tags
- R
- R Language
- Data Science
- Data Analysis
- Data Manipulation
- Ggplot2
- Dplyr
- Tidyr
- Data Visualization
- R Functions
- Statistical Analysis
- R Packages
- Data Frames
- Vectors
- Lists
- Factors
- Linear Regression
- Tapply
- Apply Function
- Lapply
- Sapply
- Missing Data
- NA Handling
- Data Merging
- R Programming
- Data Structures
- R Statistics
- Data Wrangling
- R Tutorial
- Statistical Modeling
- R Interview Questions