Number of the records: 1  

Understanding the evolution of conditions data access through Frontier for the ATLAS Experiment

  1. 1.
    SYSNO ASEP0522514
    Document TypeC - Proceedings Paper (int. conf.)
    R&D Document TypeConference Paper
    TitleUnderstanding the evolution of conditions data access through Frontier for the ATLAS Experiment
    Author(s) Svatoš, Michal (FZU-D) RID, ORCID
    De Salvo, A. (IT)
    Dewhurst, A. (GB)
    Vamvakopoulos, E. (FR)
    Bahilo Lozano, J. (CH)
    Ozturk, N. (US)
    Sánchez, J. (ES)
    Dykstra, D. (US)
    Number of authors8
    Article number03020
    Source TitleEPJ Web of Conferences, 214. - les Ulis : EDP Sciences, 2019 / Forti A. ; Betev L. ; Litmaath M. ; Smirnova O. ; Hristov P. - ISSN 2100-014X
    Pagess. 1-8
    Number of pages8 s.
    Publication formOnline - E
    ActionInternational Conference on Computing in High Energy and Nuclear Physics (CHEP 2018) /23./
    Event date09.07.2018 - 13.07.2018
    VEvent locationSofia
    CountryBG - Bulgaria
    Event typeWRD
    Languageeng - English
    CountryFR - France
    KeywordsATLAS ; LHC Run 2 ; Squid ; experiment
    Subject RIVJD - Computer Applications, Robotics
    OECD categoryAutomation and control systems
    R&D ProjectsLM2015058 GA MŠMT - Ministry of Education, Youth and Sports (MEYS)
    EF16_013/0001404 GA MŠMT - Ministry of Education, Youth and Sports (MEYS)
    Institutional supportFZU-D - RVO:68378271
    DOI10.1051/epjconf/201921403020
    AnnotationAll ATLAS computing sites use Squid web proxies to cache the data, greatly reducing the load on the Frontier servers and the databases. One feature of the Frontier client is that in the event of failure, it retries with different services. While this allows transient errors and scheduled maintenance to happen transparently, it does open the system up to cascading failures if the load is high enough. Throughout LHC Run 2 there has been an ever increasing demand on the Frontier service. There have been multiple incidents where parts of the service failed due to high load. A significant improvement in the monitoring of the Frontier service was required. The monitoring was needed to identify both problematic tasks, which could then be killed or throttled, and to identify failing site services as the consequence of a cascading failure is much higher.
    WorkplaceInstitute of Physics
    ContactKristina Potocká, potocka@fzu.cz, Tel.: 220 318 579
    Year of Publishing2020
Number of the records: 1  

  This site uses cookies to make them easier to browse. Learn more about how we use cookies.