Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence

dc.contributor.advisorPenas, Fabiana
dc.contributor.advisorMiranda Bront, Juan José
dc.contributor.authorJuárez, Lucio Ignacio
dc.date.accessioned2025-10-21T22:22:07Z
dc.date.issued2025
dc.description.abstractThis thesis presents the design and implementation of a data-intensive framework to estimate CEO overconfidence by integrating multiple data sources and analytical methods. Drawing from traditional sentiment-based approaches and recent advances in generative artificial intelligence (GenAI), the project develops a dynamic and scalable index to classify CEO behavior based on press articles, structured financial disclosures, and contextual analysis. The research contributes both conceptually and practically by replicating existing overconfidence indicators—such as the Conf(Press) index—and extending them through natural language processing techniques that account for nuance, context, and industry-specific factors. The methodology combines structured data acquisition from sources like ProQuest’s TDM Studio, EBSCO, and The New York Times with sentiment scoring (VADER), keyword-based classifiers, and a GenAI-powered prompt framework. By applying these techniques to over 7,000 curated CEO-related articles, the thesis constructs a CEO Overconfidence Index that enables comparative analysis across sectors, particularly between innovation-driven and traditional industries. The resulting data product captures how overconfidence varies over time and in response to events, revealing both the limitations of static keyword methods and the added value of contextual AI models. Ultimately, this work contributes to the field of behavioral corporate finance by offering a novel pipeline to estimate executive psychological traits from textual data. It also provides a governance-relevant tool for investors, analysts, and policymakers to identify behavioral risk factors in leadership. While GenAI adds adaptability and interpretive depth, the thesis emphasizes that its primary value lies in the integration of classic and emerging methods into a unified, sector-aware overconfidence framework.
dc.description.bibliographicCitationJuárez, L. (2025) “Integrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence”. [Tesis de maestría. Universidad Torcuato Di Tella]. Repositorio Digital Universidad Torcuato Di Tella https://repositorio.utdt.edu/handle/20.500.13098/13737
dc.format.extent57 p.
dc.format.mediumapplication/pdf
dc.identifier.urihttps://repositorio.utdt.edu/handle/20.500.13098/13737
dc.languageeng
dc.publisherUniversidad Torcuato Di Tella
dc.relation.ispartofTesis y Trabajos Finales de la Universidad Torcuato Di Tella
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.licensehttps://creativecommons.org/licenses/by-nc-sa/4.0/deed.es
dc.subjectInteligencia artificial
dc.subjectInnovación
dc.subjectAnálisis de datos
dc.subjectToma de decisiones
dc.subjectArtificial intelligence
dc.subjectInnovation
dc.subjectData analysis
dc.subjectDecision making
dc.titleIntegrating a Pipeline of Diverse Data Sources to estimate CEO Overconfidence
dc.typeinfo:eu-repo/semantics/masterThesis
dc.type.versioninfo:eu-repo/semantics/acceptedVersion
organization.identifier.rorhttps://ror.org/04sxme922

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MiM_Juarez_2025.pdf
Size:
1.86 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: