Subir material

Suba sus trabajos a SEDICI, para mejorar notoriamente su visibilidad e impacto

 

Mostrar el registro sencillo del ítem

dc.date.accessioned 2024-12-18T11:54:04Z
dc.date.available 2024-12-18T11:54:04Z
dc.date.issued 2024-11-18
dc.identifier.uri http://sedici.unlp.edu.ar/handle/10915/175050
dc.description.abstract This article extends various automatic text analysis tasks from previous works by applying natural language processing techniques to a corpus of Latin texts from the 1st century BC and 1st century AD. The motivation behind this work is to delve into and understand a historical literary trend revolving around the themes of love, spanning from antiquity through to the medieval period. The analyzed authors include Gaius Valerius Catullus, Albius Tibullus, and Sextus Propertius, representing the literary movement of the neoterics, and Publius Vergilius Maro and Marcus Annaeus Lucanus, epic poets with distinct styles, serving as control samples. Unlike previous works, various corrections were added to the preprocessing tasks, including improved word tokenization with enclitics and handling of orthographic variances. For the clustering tasks, the K-Means method and the Silhouette Score were used to determine the optimal cluster sizes. Using these optimal clusters as labels, decision trees were trained for each range of n-grams, aiming to identify features with the highest Information Gain and Information Gain Ratio. The trees were trained based on the criterion of Entropy, and calculations of Feature Importance were performed. In this study, we focused on detailing the classification results and features extracted by the decision trees, based on the best Silhouette scores obtained and the Information Gain. We examined whether the words or parts of words with classificatory potential identified in the process matched the findings from previous exploratory tasks performed using other techniques. en
dc.language es es
dc.subject Augustan love poets es
dc.subject Document Clustering es
dc.subject K Means es
dc.subject Silhouette Coefficient es
dc.subject Decision Trees es
dc.subject Feature Importance es
dc.subject Information Gain Ratio es
dc.title Clustering Tasks and Decision Trees with Augustan Love Poets: Cohesion and Separation in Feature Importance Extraction en
dc.type Articulo es
sedici.identifier.issn 1613-0073 es
sedici.creator.person Nusch, Carlos Javier es
sedici.creator.person Del Rio Riande, María Gimena es
sedici.creator.person Cagnina, Leticia es
sedici.creator.person Errecalde, Marcelo Luis es
sedici.creator.person Antonelli, Rubén Leandro es
sedici.subject.materias Informática es
sedici.subject.materias Humanidades es
sedici.description.fulltext true es
mods.originInfo.place Dirección PREBI-SEDICI es
sedici.subtype Articulo es
sedici.rights.license Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
sedici.rights.uri http://creativecommons.org/licenses/by-nc-sa/4.0/
sedici.relation.event Computational Humanities Research Conference (Aarhus, 2024) es
sedici.relation.journalTitle CEUR Workshop Proceedings es
sedici.relation.journalVolumeAndIssue vol. 3834 es


Descargar archivos

Este ítem aparece en la(s) siguiente(s) colección(ones)

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) Excepto donde se diga explícitamente, este item se publica bajo la siguiente licencia Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)