Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
Loading...
Date
Authors
relationships.isAdvisorOf
Journal Title
Journal ISSN
Volume Title
Publisher
Universidad Torcuato Di Tella
Abstract
This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the project develops a dynamic and scalable index to classify CEO behavior based on press articles, structured financial disclosures, and contextual analysis. The research contributes both conceptually and practically by replicating existing overconfidence indicators—such as the Conf(Press) index—and extending them through natural language processing techniques that account for nuance, context, and industry-specific factors. The methodology combines structured data acquisition from sources like ProQuest’s TDM Studio, EBSCO, and The New York Times with sentiment scoring (VADER), keyword-based classifiers, and a GenAI-powered prompt framework. By applying these techniques to over 7,000 curated CEO-related articles, the thesis constructs a CEO Overconfidence Index that enables comparative analysis across sectors, particularly between innovation-driven and traditional industries. The resulting data product captures how overconfidence varies over time and in response to events, revealing both the limitations of static keyword methods and the added value of contextual AI models. Ultimately, this work contributes to the field of behavioral corporate finance by offering a novel pipeline to estimate executive psychological traits from textual data. It also provides a governance-relevant tool for investors, analysts, and policymakers to identify behavioral risk factors in leadership. While GenAI adds adaptability and interpretive depth, the thesis emphasizes that its primary value lies in the integration of classic and emerging methods into a unified, sector-aware overconfidence framework.
Description
Keywords
Inteligencia artificial, Innovación, Análisis de datos, Toma de decisiones, Artificial intelligence, Innovation, Data analysis, Decision making
Citation
Citation
Juárez, L. (2025) “Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence”. [Tesis de maestría. Universidad Torcuato Di Tella]. Repositorio Digital Universidad Torcuato Di Tella
https://repositorio.utdt.edu/handle/20.500.13098/13737
