Decoding dataset access patterns in high-energy physics through embedding-driven density clusteringстатья
Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 23 января 2026 г.
Аннотация:High-energy physics (HEP) research relies on the analysis of large datasets representing various physical processes. In most cases, datasets are accessed in groups rather than as individual files, as they are often interrelated and collectively contribute to data production and analysis. However, the relevance of these dataset groups changes over time and requires continuous assessment to optimize storage and computational resource utilization.Modeling dataset access patterns as time series enables the application of clustering algorithms for the detection of dataset groups being accessed together. In this study, we explore the application of embedding-driven density-based clustering techniques for analyzing HEP dataset access patterns collected over multiple years. Our results demonstrate the advantages of embedding-driven density clustering for identifying meaningful dataset categories, providing actionable insights for intelligent data management in large-scale scientific computing environments.