Skip to main content

Data Science & Analytics Examples

Natural language prompts for data analysis, statistics, and visualization.

Exploratory Data Analysis

Compute summary statistics and identify outliers

Create a correlation heatmap for all numeric features

Analyze missing data patterns and imputation strategies

Generate pairwise scatter plots colored by category

Identify skewness and recommend transformations

Analyze distributions with histograms and kernel density

Compute feature importance using mutual information

Create a comprehensive EDA report for this dataset

Identify duplicate records and anomalies

Analyze time-based patterns and seasonality

Statistical Testing

Compare two groups using t-test with effect size

Perform ANOVA with post-hoc Tukey tests

Test for normality using multiple methods

Compute chi-square test for categorical association

Perform non-parametric Mann-Whitney U test

Calculate required sample size for desired power

Perform multiple testing correction (Bonferroni, FDR)

Test for homogeneity of variance (Levene's test)

Compute confidence intervals using bootstrap

Perform equivalence testing (TOST)

Regression Analysis

Fit a multiple regression and interpret coefficients

Check regression assumptions (linearity, homoscedasticity, normality)

Handle multicollinearity using VIF and regularization

Fit a logistic regression for binary classification

Implement polynomial regression and detect overfitting

Fit a Poisson regression for count data

Perform stepwise feature selection

Implement quantile regression for robust estimation

Fit a mixed effects model for hierarchical data

Implement Bayesian regression with prior specification

Clustering & Segmentation

Cluster customers using K-means with optimal K selection

Perform hierarchical clustering with dendrogram

Implement DBSCAN for density-based clustering

Reduce dimensionality with PCA before clustering

Cluster using Gaussian Mixture Models

Perform fuzzy C-means clustering

Analyze cluster stability and silhouette scores

Implement spectral clustering for complex shapes

Segment time series using dynamic time warping

Create customer personas from cluster analysis

Classification

Train a random forest and analyze feature importance

Handle imbalanced classes using SMOTE

Implement gradient boosting (XGBoost, LightGBM)

Perform cross-validation with stratification

Tune hyperparameters using grid search or Bayesian optimization

Build a voting ensemble of multiple classifiers

Analyze learning curves and bias-variance tradeoff

Implement cost-sensitive classification

Create calibrated probability predictions

Generate ROC curves and compare AUC scores

Dimensionality Reduction

Apply PCA and explain variance ratios

Implement t-SNE for visualization

Use UMAP for nonlinear dimension reduction

Apply factor analysis for latent variables

Implement independent component analysis (ICA)

Use autoencoders for nonlinear dimensionality reduction

Apply non-negative matrix factorization (NMF)

Implement linear discriminant analysis (LDA)

Use random projections for large datasets

Analyze the intrinsic dimensionality of data

Time Series Analysis

Decompose time series into trend, seasonal, and residual

Forecast using exponential smoothing methods

Fit SARIMA model with seasonal components

Detect change points in a time series

Analyze cross-correlation between two series

Implement prophet for business forecasting

Detect anomalies in time series data

Compute rolling statistics and Bollinger bands

Forecast multiple related time series together

Analyze frequency components with spectral analysis

Natural Language Processing

Preprocess text (tokenize, stem, remove stopwords)

Create TF-IDF features for text classification

Perform sentiment analysis on customer reviews

Extract named entities from text

Build a topic model using LDA

Compute text similarity using various metrics

Classify documents into categories

Extract keywords and key phrases

Perform text summarization

Analyze word co-occurrence patterns

Recommendation Systems

Build a collaborative filtering recommender

Implement content-based recommendations

Create a hybrid recommendation system

Handle the cold start problem

Compute similarity using various metrics

Implement matrix factorization (SVD, ALS)

Build a deep learning recommender

Evaluate recommendations (precision, recall, NDCG)

Implement session-based recommendations

Handle implicit feedback data

A/B Testing

Design an A/B test with proper sample size

Analyze A/B test results with statistical significance

Implement sequential testing with early stopping

Handle multiple testing in experiments

Analyze heterogeneous treatment effects

Implement Bayesian A/B testing

Detect sample ratio mismatch

Analyze novelty and primacy effects

Implement multi-armed bandit for optimization

Calculate the practical significance vs statistical

Geospatial Analysis

Compute distances between geographic coordinates

Perform spatial clustering of points

Create choropleth maps from data

Analyze point patterns for clustering

Compute spatial autocorrelation (Moran's I)

Interpolate values using kriging

Perform network analysis on road graphs

Create heatmaps from geographic data

Analyze movement trajectories

Compute service areas and accessibility

Survival Analysis

Fit a Kaplan-Meier survival curve

Compare survival curves using log-rank test

Fit a Cox proportional hazards model

Handle time-varying covariates

Compute hazard ratios and confidence intervals

Implement parametric survival models

Analyze competing risks

Handle interval censored data

Compute restricted mean survival time

Implement machine learning survival models

Causal Inference

Implement propensity score matching

Estimate average treatment effect

Apply instrumental variables method

Implement regression discontinuity design

Analyze using difference-in-differences

Build a causal graph and identify confounders

Implement doubly robust estimation

Estimate heterogeneous treatment effects

Apply synthetic control method

Implement causal forests for CATE