Keyword: data-analysis
Paper Title Other Keywords Page
TUCPA01 Data Analysis Support in Karabo at European XFEL ion, experiment, FEL, controls 245
  • H. Fangohr, M. Beg, V. Bondar, D. Boukhelef, S. Brockhauser, C. Danilevski, W. Ehsan, S.G. Esenov, G. Flucke, G. Giovanetti, D. Goeries, S. Hauf, B.C. Heisen, D.G. Hickin, D. Khakhulin, A. Klimovskaia, M. Kuster, P.M. Lang, L.G. Maia, L. Mekinda, T. Michelat, A. Parenti, G. Previtali, H. Santos, A. Silenzi, J. Sztuk-Dambietz, J. Szuba, M. Teichmann, K. Weger, J. Wiggins, K. Wrona, C. Xu
    XFEL. EU, Schenefeld, Germany
  • S. Aplin, A. Barty, M. Kuhn, V. Mariani
    CFEL, Hamburg, Germany
  • T. Kluyver
    University of Southampton, Southampton, United Kingdom
  We describe the data analysis structure that is integrated into the Karabo framework [1] to support scientific experiments and data analysis at European XFEL GmbH. The photon science experiments have a range of data analysis requirements, including online (i.e. near real-time during the actual measurement) and offline data analysis. The Karabo data analysis framework supports execution of automatic data analysis for routine tasks, supports complex experiment protocols including data analysis feedback integration to instrument control, and supports integration of external applications. The online data analysis is carried out using distributed and accelerator hardware (such as GPUs) where required to balance load and achieve near real-time data analysis throughput. Analysis routines provided by Karabo are implemented in C++ and Python, and make use of established scientific libraries. The XFEL control and analysis software team collaborates with users to integrate experiment specific analysis codes, protocols and requirements into this framework, and to make it available for the experiments and subsequent offline data analysis.
[1] Heisen et al (2013) "Karabo: An Integrated Software Framework Combining Control, Data Management, and Scientific Computing Tasks". Proc. of 14th ICALEPCS 2013, Melbourne, Australia (p. FRCOAAB02)
slides icon Slides TUCPA01 [10.507 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
THPHA036 Multi-Criteria Partitioning on Distributed File Systems for Efficient Accelerator Data Analysis and Performance Optimization ion, operation, distributed, framework 1436
  • S. Boychenko, M.A. Galilée, J.C. Garnier, M. Zerlauth
    CERN, Geneva, Switzerland
  • M. Zenha-Rela
    University of Coimbra, Coimbra, Portugal
  Since the introduction of the map-reduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process TBytes or even PBytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.  
poster icon Poster THPHA036 [0.280 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)  
THPHA186 Parallel Execution of Sequential Data Analysis ion, GUI, GPU, controls 1877
  • J.F.J. Murari, K. Klementiev
    MAX IV Laboratory, Lund University, Lund, Sweden
  The Parallel Execution of Sequential Data Analysis (ParSeq) software has been developed to work on large data sets of thousands spectra of a thousand points each. The main goal of this tool is to perform spectroscopy analysis without delays on the large amount of data that will be generated on Balder beamline at Max IV *. ParSeq was developed using Python and PyQt and can be operated via scripts or graphical user interface (GUI). The pipeline is consisted of nodes and transforms. Each node generally has a common group of components: data manager (also serves as legend), data combiner, metadata viewer, transform dialog, help panel and a plot window (from silx library **) as main element. The transforms connect nodes, applying the respective parameters in the active data. It is also possible to create cross-data linear combinations (e.g. averaging, RMS or PCA) and propagate them downstream. Calculations will be done with parallel execution on GPU. The GUI is very flexible and user-friendly, containing splitters, dock widgets, colormaps and undo/redo options. The features mentioned are missing in other analysis platforms what justifies the creation of ParSeq.
* Klementiev, K., et al. "The BALDER Beamline at the MAX IV Laboratory" Journal of Physics: Conference Series. IOP Publishing, 2016
** Scientific Library for eXperimentalists -
poster icon Poster THPHA186 [0.407 MB]  
DOI • reference for this paper ※  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)