Data and Compute Resources

Ideas

An important question for open science is how to best host data. Do we rely on centralized, currated databases? Do we provide common interfaces and rely on groups to host their own materials? How do we make sure our pipelines can be run by anyone who wishes to reproduce our results?

These questions are closely related to those of complete ecosystems, where data hosting and compute are integrated with software systems. A different approach is to rely on seperate services for these needs. As always, the solution will likely be a mixture of these approaches. This page lists data hosting and compute resources which fall into the latter category.

"geometry of needs and challenges in publishing data" twitter

Resources

Disitributed systems

Dat
Academic torrents
- P2P data hosting on bittorent for scientists

Centralized

figshare
- Centralized OA for data and manuscripts (w/ or w/o peer review)
data dryad
Amazon EC22 and S3: amazon
- Cloud hosting for compute or data ccess
Dataverse
XSEDE
- HPC resources for scientists (apply for compute time)