A complete social network mining and engagement prediction pipeline built on a cleaned Twitter/X timeline dataset. Combines sentiment analysis, temporal behavioral analysis, and four predictive models (XGBoost, Linear Regression, Prophet time-series, PyTorch DL) achieving R²=0.852 on engagement forecasting.
Built end-to-end Twitter analytics pipeline: data cleaning → feature engineering → predictive modeling → visualization on a real tweet dataset.
Engineered temporal features including sin/hour and cos/hour cyclic encoding, likes_mean, retweet_mean per user, and day-of-week patterns.
Achieved R²=0.852 with StatsModels OLS linear regression — tweet length, likes_mean, and retweet_mean as strongest predictors.
Trained XGBoost Regressor (MSE=472.48, R²=0.7766) and a PyTorch deep learning model for engagement score prediction.
Applied Prophet time-series forecasting to model hourly engagement fluctuations and identify optimal posting windows.
Generated 10+ visualizations: tweet frequency by hour, engagement heatmap, correlation matrix, top users by activity, likes vs retweets scatter.
Co-developed with Harshita Guduru as a Big Data Analytics coursework submission at Lawrence Technological University.
Coursework project for Big Data Analytics at Lawrence Technological University (co-developed with Harshita Guduru). Goal: build a complete ML pipeline to predict tweet engagement from user behavioral data.
Tweet frequency by hour, average engagement by hour, top 10 users by tweet count, correlation heatmap, total engagement by hour, likes vs retweets scatter, engagement score by tweet length