Data Science & Analytics Examples
Natural language prompts for data analysis, statistics, and visualization.
Exploratory Data Analysis
Compute summary statistics and identify outliers
Create a correlation heatmap for all numeric features
Analyze missing data patterns and imputation strategies
Generate pairwise scatter plots colored by category
Identify skewness and recommend transformations
Analyze distributions with histograms and kernel density
Compute feature importance using mutual information
Create a comprehensive EDA report for this dataset
Identify duplicate records and anomalies
Analyze time-based patterns and seasonality
Statistical Testing
Compare two groups using t-test with effect size
Perform ANOVA with post-hoc Tukey tests
Test for normality using multiple methods
Compute chi-square test for categorical association
Perform non-parametric Mann-Whitney U test
Calculate required sample size for desired power
Perform multiple testing correction (Bonferroni, FDR)
Test for homogeneity of variance (Levene's test)
Compute confidence intervals using bootstrap
Perform equivalence testing (TOST)
Regression Analysis
Fit a multiple regression and interpret coefficients
Check regression assumptions (linearity, homoscedasticity, normality)
Handle multicollinearity using VIF and regularization
Fit a logistic regression for binary classification
Implement polynomial regression and detect overfitting
Fit a Poisson regression for count data
Perform stepwise feature selection
Implement quantile regression for robust estimation
Fit a mixed effects model for hierarchical data
Implement Bayesian regression with prior specification
Clustering & Segmentation
Cluster customers using K-means with optimal K selection
Perform hierarchical clustering with dendrogram
Implement DBSCAN for density-based clustering
Reduce dimensionality with PCA before clustering
Cluster using Gaussian Mixture Models
Perform fuzzy C-means clustering
Analyze cluster stability and silhouette scores
Implement spectral clustering for complex shapes
Segment time series using dynamic time warping
Create customer personas from cluster analysis
Classification
Train a random forest and analyze feature importance
Handle imbalanced classes using SMOTE
Implement gradient boosting (XGBoost, LightGBM)
Perform cross-validation with stratification
Tune hyperparameters using grid search or Bayesian optimization
Build a voting ensemble of multiple classifiers
Analyze learning curves and bias-variance tradeoff
Implement cost-sensitive classification
Create calibrated probability predictions
Generate ROC curves and compare AUC scores
Dimensionality Reduction
Apply PCA and explain variance ratios
Implement t-SNE for visualization
Use UMAP for nonlinear dimension reduction
Apply factor analysis for latent variables
Implement independent component analysis (ICA)
Use autoencoders for nonlinear dimensionality reduction
Apply non-negative matrix factorization (NMF)
Implement linear discriminant analysis (LDA)
Use random projections for large datasets
Analyze the intrinsic dimensionality of data
Time Series Analysis
Decompose time series into trend, seasonal, and residual
Forecast using exponential smoothing methods
Fit SARIMA model with seasonal components
Detect change points in a time series
Analyze cross-correlation between two series
Implement prophet for business forecasting
Detect anomalies in time series data
Compute rolling statistics and Bollinger bands
Forecast multiple related time series together
Analyze frequency components with spectral analysis
Natural Language Processing
Preprocess text (tokenize, stem, remove stopwords)
Create TF-IDF features for text classification
Perform sentiment analysis on customer reviews
Extract named entities from text
Build a topic model using LDA
Compute text similarity using various metrics
Classify documents into categories
Extract keywords and key phrases
Perform text summarization
Analyze word co-occurrence patterns
Recommendation Systems
Build a collaborative filtering recommender
Implement content-based recommendations
Create a hybrid recommendation system
Handle the cold start problem
Compute similarity using various metrics
Implement matrix factorization (SVD, ALS)
Build a deep learning recommender
Evaluate recommendations (precision, recall, NDCG)
Implement session-based recommendations
Handle implicit feedback data
A/B Testing
Design an A/B test with proper sample size
Analyze A/B test results with statistical significance
Implement sequential testing with early stopping
Handle multiple testing in experiments
Analyze heterogeneous treatment effects
Implement Bayesian A/B testing
Detect sample ratio mismatch
Analyze novelty and primacy effects
Implement multi-armed bandit for optimization
Calculate the practical significance vs statistical
Geospatial Analysis
Compute distances between geographic coordinates
Perform spatial clustering of points
Create choropleth maps from data
Analyze point patterns for clustering
Compute spatial autocorrelation (Moran's I)
Interpolate values using kriging
Perform network analysis on road graphs
Create heatmaps from geographic data
Analyze movement trajectories
Compute service areas and accessibility
Survival Analysis
Fit a Kaplan-Meier survival curve
Compare survival curves using log-rank test
Fit a Cox proportional hazards model
Handle time-varying covariates
Compute hazard ratios and confidence intervals
Implement parametric survival models
Analyze competing risks
Handle interval censored data
Compute restricted mean survival time
Implement machine learning survival models
Causal Inference
Implement propensity score matching
Estimate average treatment effect
Apply instrumental variables method
Implement regression discontinuity design
Analyze using difference-in-differences
Build a causal graph and identify confounders
Implement doubly robust estimation
Estimate heterogeneous treatment effects
Apply synthetic control method
Implement causal forests for CATE