Dataflows in Bioinformatics

Research Group on Theoretical Computer Science, Hasselt University, Belgium

* Description

Using high-throughput technologies in bioinformatics, like mass spectrometry in proteomics or microarrays in functional genomics, results in large amounts of data that require automatic analysis. In-silico experiments, modeled as workflows, can provide such support.

We believe that workflows in bioinformatics are mainly data-centered, that is, the data-flow aspect of the workflow is more important than the control flow. Hence we refer to such data-centered workflows as dataflows.

We propose two formal models for representing dataflows:
  • Nested Relational Calculus (NRC), a query language over complex objects, that is simple yet sufficient to represent most dataflows, providing a textual notation,
  • and the graphical dataflow language DFL, based on Petri nets combined with NRC operators.

Moreover, we propose a formal conceptual data model for dataflow repositories, i.e., databases containing dataflows and their different runs. Our model includes careful formalisations of such features as complex data manipulation, external service calls, subdataflows, and the provenance of output values.

* People involved

External Partners

* Past events

Workshop on Dataflows in Bioinformatics, March 16 2006, Hasselt University, Belgium

* Publications

  • "A formal model of dataflow repositories." (pdf)
    Jan Hidders, Natalia Kwasnikowska, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
    accepted for the 4th Int. Workshop on Data Integration in Life Sciences (DILS), June 27-29, Philadelphia, PA, USA
  • "XQTav: an XQuery processor for Taverna environment." (pdf)
    Jacek Sroka, Grzegorz Kaczor, Jerzy Tyszkiewicz, and Andrzej M. Kierzek
    Bioinformatics 2006 22(10):1280-1281
  • "Petri net + nested relational calculus = dataflow." (pdf)
    Jan Hidders, Natalia Kwasnikowska, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
    In proceedings of the 13th Int. Conf. on Cooperative Information Systems (CoopIS), November 2005, Agia Napa, Cyprus
    LNCS vol. 3760/2005, pp 220-237
  • "NRC as a formal model for expressing bioinformatics workflows." (abstract - poster)
    Anna Gambin, Jan Hidders, Natalia Kwasnikowska, Slawomir Lasota, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
    Poster at the 13th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), June 2005, Detroit, MI, USA
  • "Well-constructed Workflows in Bioinformatics." (pdf)
    Anna Gambin, Jan Hidders, Natalia Kwasnikowska, Slawomir Lasota, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
    Workshop on Database Issues in Biological Databases (DBiBD), January 2005, Edinburgh, UK