Big data analysis with skeletons on SOFA

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

  • Kenneth Skovhede
  • Brian Vinter

This paper explores how a skeleton based approach can be used to perform big data analysis. We introduce a restricted storage system based on blocks with a fixed maximum size. The storage design removes the residual data problem commonly found in storage systems, and enables processing on individual blocks. We then introduce a stream-oriented query system that can be used on top of the distributed storage system. The query system is built on a limited number of core operations. Each of the perform a specified function, such as filtering elements, but are skeleton operations where the programmer needs to fill in how to perform the operation. The operations are designed to allow splitting across the blocks in the storage system, giving concurrent execution while maintaining a completely sequential program description. To assist in understanding the data flow, we also introduce a graphical representation for each of the methods, enabling a visual expression of an algorithm. To evaluate the query system we implement a number of classic Big-Data queries and show how to implement them with code, and how the queries can be visualized with the graphical representation.

Original languageEnglish
Title of host publicationCommunicating Process Architectures 2017 and 2018, WoTUG-39 and WoTUG-40 - Proceedings of CPA 2017 (WoTUG-39) and Proceedings of CPA 2018 (WoTUG-40)
EditorsJan Baekgaard Pedersen, Kevin Chalmers, Jan F. Broenink, Brian Vinter, Kevin Vella, Peter H. Welch, Marc L. Smith, Kenneth Skovhede
Number of pages13
PublisherIMIA and IOS Press
Publication date2019
Pages5-17
ISBN (Electronic)9781614999485
DOIs
Publication statusPublished - 2019
Event39th WoTUG Conference on Communicating Process Architectures, CPA 2017 and 40th WoTUG Conference on Communicating Process Architectures, CPA 2018 - Dresden, Germany
Duration: 19 Aug 201822 Aug 2018

Conference

Conference39th WoTUG Conference on Communicating Process Architectures, CPA 2017 and 40th WoTUG Conference on Communicating Process Architectures, CPA 2018
LandGermany
ByDresden
Periode19/08/201822/08/2018
SeriesConcurrent Systems Engineering Series
Volume70
ISSN1383-7575

    Research areas

  • Big data, CSP, Process oriented programming, SODA, SOFA

ID: 241091096