This study introduces an explainable Artificial Intelligence (XAI) framework that couples legal-domain NLP with Structural Topic Modeling (STM) and WordNet semantic graphs to rigorously analyze over 1,900 GDPR enforcement decision summaries from a public dataset. Our methodology focuses on demonstrating the pipeline's validity respect to manual analyses by inspecting the results of four well-know research questions: (1) cross-country fine distribution disparities (automated metadata extraction); (2) the violation severity-fine amount relationship (keyness and semantic analysis); (3) structural text patterns (network analysis and STM); and (4) prevalent enforcement triggers (topic prevalence modeling) The pipeline's validity is underscored by its ability to replicate key findings from previous manual analyses while enabling a more nuanced exploration of GDPR enforcement trends. Our results confirm significant disparities in enforcement across EU member states and reveal that monetary penalties do not consistently correlate with violation severity. Specifically, serious infringements, particularly those involving video surveillance, frequently result in low-value fines, especially when committed by individuals or smaller entities. This highlights that a substantial proportion of severe violations are attributed to smaller actors. Methodologically, the framework's ability to quickly replicate such well-known patterns, alongside its transparency and reproducibility, establishes its potential as a scalable tool for transparent and explainable GDPR enforcement analytics.
Explainable AI
XAI
Data protection
Privacy
GDPR fines
Topic modeling
Semantic analysis
NLP
This work presents the analysis model of the study data available in the LMS platforms specifically designed to analyze potential critical issues as a functional indicator for the possible achievement of the training objectives and completion of the course. The illustrated system highlights how the use of statistical indicators and predictability can be an effective tool for the early identification of possible critical issues in the field of training results, as well as design and organizational inconsistencies that can weigh on the effectiveness of the training system made available. Our work explains how adopting a data analysis model applied to training environments provides the tutoring system with adequate information on potential critical issues to favor targeted interventions on the participants to prevent risks of training ineffectiveness. At the same time, it analyzes the global quality of the courses made available through a perspective of data exploration that starts from the learning experience and enhances the data already present in the LMS platforms.
The medical discourse, entails the analysis of the modalities, far from unbiased, by which hypotheses and results are laid out in the dissemination of findings in scientific publications, giving different emphases on the background, relevance, robustness, and assumptions that the audience should take for granted. While this concept is extensively studied in socio-anthropology, it remains generally overlooked within the scientific community conducting the research. Yet, analyzing the discourse is crucial for several reasons: to frame policies that take into account an appropriately large screen of medical opportunities, to avoid overseeing promising but less walked paths, to grasp different types of representations of diseases, therapies, patients, and other stakeholders, understanding and being aware of how these very terms are conditioned by time, culture and so on. While socio-anthropologists traditionally use manual curation methods, automated approaches like topic modeling offer a complementary way to explore the vast and ever-growing body of medical literature. In this work, we propose a complementary analysis of the medical discourse regarding the therapies offered for rheumatoid arthritis using topic modeling and large language model-based emotion and sentiment analysis.
medical discourse; large language models; topic modeling; rheumatoid arthritis; disease modifying anti-rheumatic drug; physical therapies; vagus nerve stimulation.
Topic Modelling (TM) is a widely adopted generative model used to infer the thematic organization of text corpora. When document-level covariate information is available, so-called Structural Topic Modelling (STM) is the state-of-the-art approach to embed this information in the topic mining algorithm. Usually, TM algorithms rely on unigrams as the basic text generation unit, whereas the quality and intelligibility of the identified topics would significantly benefit from the detection and usage of topical phrasemes. Following on from previous research, in this paper we propose the first iterative algorithm to extend STM with n-grams, and we test our solution on textual data collected from four well-known ToR drug marketplaces. Significantly, we employ a STM-guided n-gram selection process, so that topic-specific phrasemes can be identified regardless of their global relevance in the corpus. Our experiments show that enriching the dictionary with selected n-grams improves the usability of STM, allowing the discovery of key information hidden in an apparently "mono-thematic" dataset.
In this paper we introduce the Mathematical Desk for Italian Industry, a project based on applied and industrial mathematics developed by a team of researchers from the Italian National Research Council in collaboration with two major Italian associations for applied mathematics, SIMAI and AIRO. The scope of this paper is to clarify the motivations for this project and to present an overview on the activities, context and organization of the Mathematical Desk, whose mission is to build a concrete bridge of common interests between the Italian scientific community of applied mathematics and the world of the Italian enterprises. Some final considerations on the strategy for the future development of the Mathematical Desk project complete the paper.