Dataflows in Bioinformatics
Research Group on Theoretical Computer Science,
Hasselt University, Belgium
Description
Using high-throughput technologies in bioinformatics, like mass
spectrometry in proteomics or microarrays in functional genomics,
results in large amounts of data that require automatic analysis. In-silico experiments, modeled as workflows, can provide such support.
We believe that workflows in bioinformatics are mainly data-centered,
that is, the data-flow aspect of the workflow is more important than
the control flow. Hence we refer to such data-centered workflows as dataflows.
We propose two formal models for representing dataflows:
- Nested Relational Calculus (NRC), a query language over complex objects, that is simple yet sufficient to represent most dataflows, providing a textual notation,
- and the graphical dataflow language DFL, based on Petri nets combined with NRC operators.
Moreover, we propose a formal conceptual data model for
dataflow repositories, i.e., databases containing dataflows and
their different runs. Our model includes careful formalisations of such features
as complex data manipulation, external service calls, subdataflows, and
the provenance of output values.
People involved
External Partners
- University of Antwerp, Belgium
- University of Warsaw, Poland
Past events
Workshop on Dataflows in Bioinformatics, March 16 2006, Hasselt University, Belgium
Publications
-
"A formal model of dataflow repositories." (pdf)
Jan Hidders, Natalia Kwasnikowska, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
accepted for the 4th Int. Workshop on Data Integration in Life Sciences (DILS), June 27-29, Philadelphia, PA, USA
-
"XQTav: an XQuery processor for Taverna environment." (pdf)
Jacek Sroka, Grzegorz Kaczor, Jerzy Tyszkiewicz, and Andrzej M. Kierzek
Bioinformatics 2006 22(10):1280-1281
-
"Petri net + nested relational calculus = dataflow." (pdf)
Jan Hidders, Natalia Kwasnikowska, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
In proceedings of the 13th Int. Conf. on Cooperative Information Systems (CoopIS), November 2005, Agia Napa, Cyprus
LNCS vol. 3760/2005, pp 220-237
-
"NRC as a formal model for expressing bioinformatics workflows." (abstract - poster)
Anna Gambin, Jan Hidders, Natalia Kwasnikowska, Slawomir Lasota, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
Poster at the 13th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), June 2005, Detroit, MI, USA
-
"Well-constructed Workflows in Bioinformatics." (pdf)
Anna Gambin, Jan Hidders, Natalia Kwasnikowska, Slawomir Lasota, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche
Workshop on Database Issues in Biological Databases (DBiBD), January 2005, Edinburgh, UK
|