RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations

dc.contributor.authorPérez Bianchi, Paula
dc.contributor.authorAnselmo, Sol
dc.contributor.authorVásquez Currié, Malena
dc.contributor.authorMedel, Jimena
dc.contributor.authorUelf, Estefanía
dc.contributor.authorDos Santos, Alicia
dc.contributor.authorBuosi, Noemí
dc.contributor.authorVargas, Rosana
dc.contributor.authorReves Szemere, Juliana
dc.contributor.authorVolcovinsky, Bruno
dc.contributor.authorMassaroli, Hugo
dc.contributor.authorAndrade, Manuel
dc.contributor.authorMonastra, Alejandro
dc.contributor.authorIarussi, Emmanuel
dc.contributor.authorSiless, Viviana
dc.contributor.authorBruno, Luciana
dc.date.accessioned2026-02-04T20:15:54Z
dc.date.issued2025-12-09
dc.description.abstractThe Pap smear remains the primary screening test for cervical cancer in many low-resource regions, yet publicly available image datasets largely feature liquid-based preparations. We introduce RIVA, a high-resolution collection of 959 conventional-smear images (1024 × 1024 px) scanned at 40x magnification, sourced from 115 patients. To ensure label quality, each image was annotated by up to four independent medical professionals, with 42% of the images reviewed by all four, resulting in 26,158 annotations based on the Bethesda classification. Annotations provide coordinates of nuclei and classification labels by up to four annotators. The dataset includes 15,949 unique cells across five (pre) cancerous types (SCC, HSIL, ASCH, LSIL, ASCUS) and three non-lesion categories (NILM, ENDO, INFL). These four-expert annotations not only give RIVA a consensus-driven ground truth for robust AI training but also enable inter-annotator consistency analysis-agreement rates reach 94% for lesion vs. nonlesion and 74% across the full eight-category Bethesda scheme.
dc.description.bibliographicCitationPérez Bianchi, P., Anselmo, S., Vásquez Currié, M. et al. RIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations. Sci Data 12, 1991 (2025). https://doi.org/10.1038/s41597-025-06280-2
dc.format.extent7 p.
dc.format.mediumapplication/pdf
dc.identifier.urihttps://doi.org/10.1038/s41597-025-06280-2
dc.identifier.urihttps://repositorio.utdt.edu/handle/20.500.13098/14046
dc.languageeng
dc.relation.ispartofScientific Data (e-ISSN: 2052-4463)
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.licensehttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subjectMedicina Preventiva
dc.subjectTecnología médica
dc.subjectPreventive Medicine
dc.subjectMedical technology
dc.subject.keywordCancer Screening
dc.subject.keywordImage Processing
dc.subject.keywordPathology
dc.titleRIVA: An Image Dataset of Conventional Pap Smear Cytology with Multiple Independent Annotations
dc.typeinfo:eu-repo/semantics/workingPaper
dc.type.versioninfo:eu-repo/semantics/publishedVersion
organization.identifier.rorhttps://ror.org/04sxme922

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Scientific Data_Andrade, Iarussi; Siless_2025.pdf
Size:
2.54 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: