Number of the records: 1  

Distubuted data processing in High Energy Physics

  1. 1.
    SYSNO ASEP0504676
    Document TypeD - Thesis
    R&D Document TypeThe record was not marked in the RIV
    TitleDistubuted data processing in High Energy Physics
    Author(s) Makatun, Dzmitry (UJF-V)
    Number of authors1
    Issue data2018: CTU, 2018
    Number of pages196 s.
    Publication formPrint - P
    Languageeng - English
    CountryCZ - Czech Republic
    Keywordsdistributed computing ; large scale computing ; grid ; data intensive applications ; load balancing ; job scheduling ; planning ; network flow ; data production ; big data
    Subject RIVBG - Nuclear, Atomic and Molecular Physics, Colliders
    OECD categoryNuclear physics
    Institutional supportUJF-V - RVO:61389005
    AnnotationIn the era of big data, the scale of computations and the amount of allocated resources continues to grow rapidly. Large organizations operate computing facilities consisting of tens of thousands of machines and process petabytes of data. A lot of effort was made recently to optimize the design of such computer clusters, resource management and corresponding computing models including data access and job scheduling. Scientific computing (e.g. High Energy and Nuclear Physics (HENP), astrophysics, geophysics, genome studies) appears at the forefront of big data advancement. Due to the scale of computations, these fields rely on aggregated resources of many computational facilities distributed over the globe. Those facilities are owned by different institutions and include grid, cloud and other opportunistic resources. Orchestration of massive computations in such a heterogeneous and dynamic infrastructure remains challenging and provides many opportunities for optimization. One of the essential types of the computations in HENP is distributed data production where petabytes of raw files from a single source have to be processed once (per production campaign) using thousands of CPUs at distant locations and the output has to be transferred back to that source. Similar workflows can be found in other distributed data-intensive applications. The data distribution over a large system does not necessarily match the distribution of storage, network and CPU capacity. Therefore, bottlenecks may appear and lead to increased latency and degraded performance. The problems of job scheduling, network stream scheduling and data placement are interdependent, but combined into a single optimization problem become computationally intractable in a general case.
    WorkplaceNuclear Physics Institute
    ContactMarkéta Sommerová, sommerova@ujf.cas.cz, Tel.: 266 173 228
    Year of Publishing2020
Number of the records: 1  

  This site uses cookies to make them easier to browse. Learn more about how we use cookies.