Jesus Villota Miranda

Jesus Villota Miranda

PhD Candidate in Economics

CEMFI · Madrid, Spain

About

I am a PhD Candidate in Economics at CEMFI, working under the supervision of Enrique Sentana and Dante Amengual. My research focuses on the intersection of machine learning and finance. Prior to my PhD, I completed an MRes in Economics & Finance at CEMFI with a specialization in Econometrics & Data Science, and a BA in Business Administration at the University of La Rioja with a specialization in Finance.

Research Interests

Machine Learning in FinanceAsset PricingFinancial EconometricsMarket MicrostructureLarge Language Models

Research & Publications

Peer-Reviewed Publications

Simple economics of vaccination: public policies and incentives

Jesús Villota-Miranda & R. Rodríguez-Ibeas

International Journal of Health Economics and Management, 24(2), 155-172 (2024)

This paper focuses on the economics of vaccination using a game-theoretic model combined with an epidemiological SIR model that reproduces the infection dynamics of a generic disease. We characterize the equilibrium individual vaccination rate and show that it is below the rate compatible with herd immunity due to externalities that individuals do not internalize. We analyze three public policies: informational campaigns to reduce vaccination disutility, monetary payments to vaccinated individuals, and measures to increase the disutility of non-vaccination. We find that the optimal public policy should consist only of informational campaigns if they are sufficiently effective, or a combination of informational campaigns and monetary incentives otherwise. Surprisingly, vaccine passports or other restrictions on the non-vaccinated are not desirable.

Working Papers

Is Pairs Trading a Thing of the Past?

Jesus Villota

Available at SSRN (2025)

The profitability of traditional pairs trading, a market-neutral strategy based on identifying and exploiting temporary price divergences between two historically related stocks, has significantly eroded over the past two decades. This paper argues that the decline is not due to a failure of the underlying principle of relative-value arbitrage, but rather to the restrictive 1-to-1 nature of the conventional methodology. We propose a generalization of pairs trading by replacing the single partner stock with a replicating portfolio constructed as a linear combination of multiple securities. Our approach utilizes LASSO to create a parsimonious and tradable portfolio that collectively mimics the price behavior of a target asset, thereby creating a more robust and flexible substitute. Our empirical analysis, conducted on U.S. equity data from 1962 to the present, demonstrates that while the profitability of the classic approach has decayed, our generalized strategy consistently delivers significant excess returns, particularly in the post-2000 period.

Presented at:

  • XXXII Finance Forum - Asociación Española de Finanzas (AEFIN) - Pamplona (Spain), July 2025

Structured Data with LLMs Done Right: A Practical Guide to Text Classification, Information Retrieval and Generation with LLMs

Jesus Villota

Available at SSRN (2025)

Economists increasingly use large language models to extract structured data from text, yet the standard approach of prompting a model to "return JSON" is unreliable: format adherence ranges from 36% to 85% depending on the model, corrupting a substantial share of observations in any large-scale pipeline. We show that structured output schemas, which constrain the model's token generation to follow a researcher-defined JSON Schema, eliminate formatting failures entirely while leaving the model's analytical decisions unconstrained. We organize the methodology around three tasks that cover most text-as-data applications in economics: classification (e.g., labeling central bank statements as hawkish or dovish), information retrieval (e.g., extracting loan terms from credit agreements), and structured generation (e.g., producing standardized policy briefs from legislation). For each task, we provide schema design templates, implementation code compatible with major LLM providers, and worked examples from finance and economics research. With strict schema enforcement, output format adherence reaches 100%, making LLM-based text extraction as structurally reliable as any other step in an empirical data pipeline.

Presented at:

  • New Methods BoE Seminar Series - Bank of England - Online, November 2025

WhaleStreetBets

Diego Amaya, Pedro A. García-Ares, and Jesus Villota

Working Paper (2025)

In Progress

Beginning in 2020, options markets experienced a sharp increase in exceptionally large trades (colloquially termed "whales") despite decades of market microstructure theory predicting that sophisticated investors fragment orders to hide information. We examine this phenomenon using tick-level options trade and quote data covering all U.S. equities and ETFs from 2014 through 2025. We develop three complementary whale identification approaches based on absolute size, relative size, and open interest ratios. We show that whale trades generate immediate price impact in both the option itself and the underlying security, consistent with demand-based price pressure from market maker delta hedging. Beyond these immediate effects, we construct a daily long-short portfolio based on aggregated whale delta exposure and find that whale activity predicts stock returns over a one-day horizon. The strategy generates annualized alphas of 6-12 percent that remain statistically significant after controlling for standard asset pricing factors. These findings support a signaling and herding channel in which whale trades serve as focal points for retail attention, generating momentum that persists beyond the immediate mechanical hedging effects.

Predicting Market Reactions to News: An LLM-Based Approach Using Spanish Business Articles

Jesus Villota

Available at SSRN (2024)

Markets do not always efficiently incorporate news, particularly when information is complex or ambiguous. Traditional text analysis methods fail to capture the economic structure of information and its firm-specific implications. This paper proposes a novel methodology that guides LLMs to systematically identify and classify firm-specific economic shocks in news articles according to their type, magnitude, and direction. This economically-informed classification allows for a more nuanced understanding of how markets process complex information. Using a simple trading strategy, we demonstrate that our LLM-based classification significantly outperforms a benchmark based on clustering vector embeddings, generating consistent profits out-of-sample while maintaining transparent and durable trading signals. The results suggest that LLMs, when properly guided by economic frameworks, can effectively identify persistent patterns in how markets react to different types of firm-specific news.

Presented at:

  • New Methods BoE Seminar Series - Bank of England - Online, November 2025
  • XXXII Finance Forum - Asociación Española de Finanzas (AEFIN) - Pamplona (Spain), July 2025
  • BSE Summer Forum, Machine Learning in Economics - UAB - Barcelona (Spain), June 2025
  • São Paulo School of Advanced Science on High Dimensional Modelling - FGV EESP - São Paulo (Brazil), April 2025
  • 3rd Contemporary Issues in Financial Markets and Banking - Nottingham Trent University - Online, January 2025
  • LIDERA Seminar Series - University of La Rioja (Econ department) - La Rioja (Spain), November 2024
  • Mirian Andrés Seminar - University of La Rioja (Maths department) - La Rioja (Spain), November 2024
  • Generative AI in Finance - John Molson School of Business, Concordia University - Montreal (Canada), October 2024
  • CEMFI Banking & Finance Seminar Series - CEMFI - Madrid (Spain), October 2024

Teaching

Teaching Assistant

CEMFI · Madrid, Spain

  • CEMFI Master in Economics & Finance
    • -Time Series Econometrics (TA for Enrique Sentana, January-March 2025, January-March 2026)
    • -Applied Macroeconometrics (TA for Galo Nuño & Florens Odendahl, April-June 2025)
    • -Data Science for Economics (TA for Chris Rauh, October-December 2025)
  • CEMFI Summer School
    • -Data Science for Economics: Mastering Unstructured Data (TA for Chris Rauh, August 2024, August 2025)
    • -Using Textual Data in Empirical Monetary Economics (TA for Michael McMahon, September 2025)
  • Diploma in Banking Supervision [for Banco de España]
    • -Python Programming (Only instructor, June 2025)
  • Postgraduate Program in Central Banking [for Banco de España]
    • -Financial Markets and Institutions (TA for Vicente Bermejo, September-December 2025)
  • Advanced Training School [for CNMV]
    • -Methods for Time Series (TA for Enrique Sentana, November 2025)

Curriculum Vitae

Download my full CV to learn more about my academic background and experience.

View CV (PDF)

Skills

jesus@portfolio ~
~ $cat skills.json
{
"programming_languages": [
"Python", "R", "MATLAB", "Julia"
],
"software_tools": [
"LaTeX", "Quarto", "Git"
],
"spoken_languages": [
"Spanish (Native)", "English (C2)", "French (B2)"
]
}

Other

  • Journal of Econometrics
  • REFC – Spanish Journal of Finance and Accounting
  • XXXII Finance Forum - Asociación Española de Finanzas (AEFIN) - Pamplona (Spain), July 2025
  • BSE Summer Forum, Machine Learning in Economics - UAB - Barcelona (Spain), June 2025
  • 3rd Contemporary Issues in Financial Markets and Banking - Nottingham Trent University - Online, January 2025
  • Generative AI in Finance - John Molson School of Business, Concordia University - Montreal (Canada), October 2024
  • CEMFI Banking & Finance Seminar Series - CEMFI - Madrid (Spain), October 2024
  • XXXI Finance Forum - Asociación Española de Finanzas (AEFIN) - Tenerife (Spain), July 2024
  • New Methods BoE Seminar Series - Bank of England - Online, November 2025
  • LIDERA Seminar Series - University of La Rioja (Econ department) - La Rioja (Spain), November 2024
  • Mirian Andrés Seminar - University of La Rioja (Maths department) - La Rioja (Spain), November 2024
  • Econometrics of Micro and Macro Interactions (August 2025, CEMFI Summer School)
  • Using Textual Data in Empirical Monetary Economics (August 2025, CEMFI Summer School)
  • SoFiE Summer School: Machine Learning in Finance (August 2025, Yale School of Management)
  • São Paulo School of Advanced Science on High Dimensional Modelling (April 2025, FGV EESP)
  • Machine Learning in Finance (September 2024, CEMFI Summer School)
  • Data Science for Economics (September 2024, CEMFI Summer School)
  • DSGE and Time-Series Models for Macroeconomic and Policy Analysis (August 2024, CEMFI Summer School)
  • Best Third Year paper award (September 2025, CEMFI)
  • Extraordinary Award to the Best Student (May 2023, U. La Rioja)
  • Award for Academic Achievement (April 2023, CCAA La Rioja)
  • Social Council Award of the University of La Rioja, 15th Edition, Student category (February 2023, U. La Rioja)
  • Award to the Best Academic Record (November 2022, U. La Rioja)
  • Award to the Best Bachelor's Thesis (November 2022, U. La Rioja)
  • 3rd Prize at XII G9 National University Debate League (March 2022, G9 University Group)
  • 2nd Prize at XI G9 National University Debate League (April 2019, G9 University Group)
  • 2nd Best EBAU Grade of La Rioja, out of 1201 students (June 2018)
  • 1st Prize in Piano at the XXVIII Certamen de Interpretación Musical Fermín Gurbindo (May 2013)
  • 2nd Prize in Piano at the XXVII Certamen de Interpretación Musical Fermín Gurbindo (May 2012)