Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
| dc.contributor.advisor | Penas, Fabiana | |
| dc.contributor.advisor | Miranda Bront, Juan José | |
| dc.contributor.author | Juárez, Lucio Ignacio | |
| dc.date.accessioned | 2025-10-21T22:22:07Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | This thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the project develops a dynamic and scalable index to classify CEO behavior based on press articles, structured financial disclosures, and contextual analysis. The research contributes both conceptually and practically by replicating existing overconfidence indicators—such as the Conf(Press) index—and extending them through natural language processing techniques that account for nuance, context, and industry-specific factors. The methodology combines structured data acquisition from sources like ProQuest’s TDM Studio, EBSCO, and The New York Times with sentiment scoring (VADER), keyword-based classifiers, and a GenAI-powered prompt framework. By applying these techniques to over 7,000 curated CEO-related articles, the thesis constructs a CEO Overconfidence Index that enables comparative analysis across sectors, particularly between innovation-driven and traditional industries. The resulting data product captures how overconfidence varies over time and in response to events, revealing both the limitations of static keyword methods and the added value of contextual AI models. Ultimately, this work contributes to the field of behavioral corporate finance by offering a novel pipeline to estimate executive psychological traits from textual data. It also provides a governance-relevant tool for investors, analysts, and policymakers to identify behavioral risk factors in leadership. While GenAI adds adaptability and interpretive depth, the thesis emphasizes that its primary value lies in the integration of classic and emerging methods into a unified, sector-aware overconfidence framework. | |
| dc.description.bibliographicCitation | Juárez, L. (2025) “Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence”. [Tesis de maestría. Universidad Torcuato Di Tella]. Repositorio Digital Universidad Torcuato Di Tella https://repositorio.utdt.edu/handle/20.500.13098/13737 | |
| dc.format.extent | 57 p. | |
| dc.format.medium | application/pdf | |
| dc.identifier.uri | https://repositorio.utdt.edu/handle/20.500.13098/13737 | |
| dc.language | eng | |
| dc.publisher | Universidad Torcuato Di Tella | |
| dc.relation.ispartof | Tesis y Trabajos Finales de la Universidad Torcuato Di Tella | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.rights.license | https://creativecommons.org/licenses/by-nc-sa/4.0/deed.es | |
| dc.subject | Inteligencia artificial | |
| dc.subject | Innovación | |
| dc.subject | Análisis de datos | |
| dc.subject | Toma de decisiones | |
| dc.subject | Artificial intelligence | |
| dc.subject | Innovation | |
| dc.subject | Data analysis | |
| dc.subject | Decision making | |
| dc.title | Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence | |
| dc.type | info:eu-repo/semantics/masterThesis | |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | |
| organization.identifier.ror | https://ror.org/04sxme922 |
