Traditional credit analysis treats each loan as independent, but guarantee chains, circular guarantees, and ownership concentration create correlated exposure that relational models cannot express. This project models a 500-client portfolio as a Neo4j knowledge graph, processed with PySpark on Databricks and scored with calibrated LightGBM, to surface structural risk patterns that SQL keeps hidden.
Interpreting LISF and CUSF means navigating articles that cross-reference each other across laws, and a Ctrl+F can't tell the article defining technical reserves from one that mentions them in passing. AI makes it possible to absorb that entire volume without losing a single detail. This agent uses RAG to index every article individually with a cross-reference graph, eliminating citation hallucinations and ensuring the model only reasons over real legal text. The result is an assistant that amplifies the actuary's memory without replacing their judgment.
How classifying 5.1M Major Medical Expenses claims into three hospitalization levels changes the way you price a risk the industry treats as one. A UNAM team project that became a complete pricing system.
R Shiny dashboard with 140,000 synthetic policies calibrated to the Mexican market. Two-part GLM pricing engine, IBNR reserves via Chain Ladder and Bornhuetter-Ferguson, Monte Carlo stress testing with VaR/TVaR, and Mahalanobis-based fraud detection. 17 modules, bslib architecture, deployed on Cloud Run.
From VaR and Monte Carlo simulation to deep hedging and graph neural networks for systemic contagion. A complete financial risk curriculum with 192 tests and full LaTeX documentation.
The operating cycle of a Mexican insurer is fragmented across spreadsheets that don't talk to each other. This library unifies pricing, reserves, reinsurance, and regulatory compliance for life, property, health, and pensions under a single framework with Pydantic domain validation and Decimal precision. The result is a modular base that enables building more complex actuarial systems without rewriting core logic from scratch.
6 GCP projects that demonstrate how data engineering transforms actuarial work. Built a dimensional claims warehouse in BigQuery, orchestration with Dagster and Cloud Run, streaming intake with Pub/Sub and Apache Beam, infrastructure as code with Terraform, and pricing with Tweedie GLM. The entire platform runs for under $10/month; conventional architectures cost $1,000+.
A deep dive into building production-grade SQL analytics on real airline data, migrating to BigQuery via Python ETL, and the honest trade-offs between both systems: real timing, real costs, and real query plans.
An R Shiny app that calculates Mexican retirement pensions under all three active IMSS regimes. Implements the Article 167 salary bracket table for Ley 73, the tiered DOF 2020 reform contribution rates for Ley 97, and the Fondo de Pensiones para el Bienestar supplement (2024). Includes AFORE projection under three return scenarios, sensitivity analysis, and downloadable PDF report. 126 unit tests, Docker and Cloud Run deployment.
SIMA centralizes actuarial techniques for pricing life insurance: it takes raw mortality data from INEGI/CONAPO, graduates it with methods like Whittaker-Henderson and Lee-Carter to obtain curves that respect human biology, and projects forward to calculate premiums, reserves, and capital requirements under LISF. Everything exposed as an API, allowing it to connect with other systems, automate sensitivity analysis, and meet CNSF requirements. Open source and built to expand into other lines of business.
Frequency-severity pricing models on freMTPL2: Poisson GLM vs XGBoost vs LightGBM with SHAP explainability, fairness audits, and a cross-border analysis of what European ML pricing techniques mean for Mexico's 70% uninsured auto market.
From business question to actionable insight. 7 data analysis projects covering e-commerce, insurance, finance, A/B testing, executive KPIs, and operational efficiency. SQL, Python, Streamlit, Next.js, and Power BI.
Actuarial reserve analysis using Chain-Ladder and Bornhuetter-Ferguson methods on NAIC Schedule P regulatory data. Interactive dashboard with loss triangles, IBNR estimates, and combined ratios across 6 lines of business.
I wanted to understand what really happens inside a language model. I built one from the first matrix multiplication, trained it on all 7 volumes of Proust, and what taught me the most wasn't the architecture; it was realizing that everything is just numbers.