Keyword: distributed
Paper Title Other Keywords Page
TUPHA011 A New Distributed Control System for the Consolidation of the CERN Tertiary Infrastructures ion, controls, interface, monitoring 390
  • L. Scibile, C. Martel, P. Villeton Pachot
    CERN, Geneva, Switzerland
  The operation of the CERN tertiary infrastructures is carried out via a series of control systems distributed over the CERN sites. The scope comprises: 260 buildings, 2 large heating plants with 27 km heating network and 200 radiators circuits, 500 air handling units, 52 chillers, 300 split systems, 3000 electric boards and 100k light points. With the infrastructure consolidations, CERN is carrying out a migration and an extension of the old control systems dated back to the 70's, 80's and 90's to a new simplified, yet innovative, distributed control system aimed at minimizing the programming and implementation effort, standardizing equipment and methods and reducing lifecycle costs. This new methodology allows for a rapid development and simplified integration of the new controlled infrastructure processes. The basic principle is based on open standards PLC technology that allows to easily interface to a large range of proprietary systems. The local and remote operation and monitoring is carried out seamlessly with Web HMIs that can be accessed via PC, touchpads or mobile devices. This paper reports on the progress and future challenges of this new control system.  
poster icon Poster TUPHA011 [1.662 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
TUPHA013 Accelerator Fault Tracking at CERN ion, operation, controls, target 397
  • C. Roderick, L. Burdzanowski, D. Martin Anido, S. Pade, P. Wilk
    CERN, Geneva, Switzerland
  CERNs Accelerator Fault Tracking (AFT) system aims to facilitate answering questions like: "Why are we not doing Physics when we should be?" and "What can we do to increase machine availability?" People have tracked faults for many years, using numerous, diverse, distributed and un-related systems. As a result, and despite a lot of effort, it has been difficult to get a clear and consistent overview of what is going on, where the problems are, how long they last for, and what is the impact. This is particularly true for the LHC, where faults may induce long recovery times after being fixed. The AFT project was launched in February 2014 as collaboration between the Controls and Operations groups with stakeholders from the LHC Availability Working Group (AWG). The AFT system has been used successfully in operation for LHC since 2015, yielding a lot of attention and generating a growing user community. In 2017 the scope has been extended to cover the entire Injector Complex. This paper will describe the AFT system and the way it is used in terms of architecture, features, user communities, workflows and added value for the organisation.  
poster icon Poster TUPHA013 [3.835 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
THPHA036 Multi-Criteria Partitioning on Distributed File Systems for Efficient Accelerator Data Analysis and Performance Optimization ion, operation, data-analysis, framework 1436
  • S. Boychenko, M.A. Galilée, J.C. Garnier, M. Zerlauth
    CERN, Geneva, Switzerland
  • M. Zenha-Rela
    University of Coimbra, Coimbra, Portugal
  Since the introduction of the map-reduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process TBytes or even PBytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.  
poster icon Poster THPHA036 [0.280 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
THPHA043 Lightflow - a Lightweight, Distributed Workflow System ion, synchrotron, EPICS, experiment 1457
  • A. Moll, R. Clarken, P. Martin, S.T. Mudie
    SLSA-ANSTO, Clayton, Australia
  The Australian Synchrotron, located in Clayton, Melbourne, is one of Australia's most important pieces of research infrastructure. After more than 10 years of operation, the beamlines at the Australian Synchrotron are well established and the demand for automation of research tasks is growing. Such tasks routinely involve the reduction of TB-scale data, online (realtime) analysis of the recorded data to guide experiments, and fully automated data management workflows. In order to meet these demands, a generic, distributed workflow system was developed. It is based on well-established Python libraries and tools. The individual tasks of a workflow are arranged in a directed acyclic graph and one or more directed acyclic graphs form a workflow. Workers consume the tasks, allowing the processing of a workflow to scale horizontally. Data can flow between tasks and a variety of specialised tasks is available. Lightflow has been released as open source on the Australian Synchrotron GitHub page  
poster icon Poster THPHA043 [0.582 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
THPHA045 Packaging and High Availability for Distributed Control Systems ion, software, controls, framework 1465
  • M.A. Araya, L. Pizarro, H.H. von Brand
    UTFSM, Valparaíso, Chile
  Funding: Centro Científico Tecnológico de Valparaíso (CONICYT FB-0821) Advanced Center for Electrical and Electronic Engineering (CONICYT FB-0008)
The ALMA Common Software (ACS) is a distributed framework used for control of astronomical observatories, which is built and deployed using roughly the same tools available at its design stage. Due to a shallow and rigid dependency management, the strong modularity principle of the framework cannot be exploited for packaging, installation and deployment. Moreover, life-cycle control of its components does not comply with standardized system-based mechanisms. These problems are shared by other instrument-based distributed systems. The new high-availability requirements of modern projects, such as the Cherenkov Telescope Array, tend to be implemented as new software features due to these problems, rather than using off-the-shelf and well-tested platform-based technologies. We present a general solution for high availability strongly-based on system services and proper packaging. We use RPM Packaging, oVirt and Docker as the infrastructure managers, Pacemaker as the software resource orchestrator and life-cycle process control through Systemd. A prototype for ACS was developed to handle its services and containers.
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
THPHA109 Improving the Safety and Protective Automatic Actions of the CMS Electromagnetic Calorimeter Detector Control System ion, controls, detector, software 1639
  • R.J. Jiménez Estupinan, D.R.S. Di Calafiori, G. Dissertori, L. Djambazov, W. Lustermann, S. Zelepoukine
    ETH, Zurich, Switzerland
  • P. Adzic, P. Cirkovic, D. Jovanovic, P. Milenovic
    University of Belgrade, Belgrade, Serbia
  • S. Zelepoukine
    UW-Madison/PD, Madison, Wisconsin, USA
  The CMS ECAL Detector Control System (DCS) features several monitoring mechanisms able to react and perform automatic actions based on pre-defined action matrices. The DCS is capable of early detection of anomalies inside the ECAL and on its off-detector support systems, triggering automatic actions to mitigate the impact of these events and preventing them from escalating to the safety system. The treatment of such events by the DCS allows for a faster recovery process, better understanding of the development of issues, and in most cases, actions with higher granularity than the safety system. This paper presents the details of the DCS automatic action mechanisms, as well as their evolution based on several years of CMS ECAL operations.  
poster icon Poster THPHA109 [1.333 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)