Event-driven AWS pipeline ingesting real-time energy and weather data from Fingrid and Meteo APIs into partitioned Parquet storage on S3. A Dockerised Lambda runs ARIMA forecasting on each ingestion, predicting 2 hours ahead for 5 energy sources. Fully automated with a GitHub Actions CI/CD pipeline covering unit tests, Docker builds and Lambda deployments. Visualised on a live JavaScript/Chart.js dashboard via a REST API backed by Amazon Athena. Download the PowerBI Full Grid Analytics Report (PDF).
Finnish regional econimic indicators were predicted using a variety of statistical models. The research project was done in collaboration with the OP Group as part of the Aalto University's Data Science Project course.
A full A/B test analysis on 90,000+ mobile game players evaluating whether moving a progression gate improved retention using chi-squared testing, bootstrap resampling, and player segmentation.
Benchmarked five machine learning and deep learning approaches from Logistic Regression and LinearSVC to CNN and transformer models (mBERT, XLM-RoBERTa) for cross-lingual toxicity classification across English, German and Finnish text. Evaluated the tradeoff between computational efficiency and classification performance in a low-resource multilingual setting.
Multivariate analysis was done in order to determine the relationship between Engineering student's level of stress and various life style and social factors using Multiple Correspondence Analysis.
Machine learning classification methods have been trained on a dataset of dried bean data in order to determine which species a dried bean is given its physical characteristics.
Baysian data analysis was done on a set of real world insurance data in order to model a vehicles insurance premium with several relevant factors such as vehicle type, number of drivers, number of previous complaints etc.
A lightweight PDF editor that runs entirely in the browser. Supports deleting, cropping, merging, and rearranging pages, all client-side using PDF.js and Canvas APIs.