Semi-Synthetic Benchmark
This organization is the repository for a large semi-synthetic, semantically rich dataset, modeled after the electronic medical record of a large medical institution. Using the highly diverse data.gov data repository and a multivariate data augmentation strategy, we can generate arbitrarily large semi-synthetic datasets which can be used to test new algorithms and computational platforms. The data is available both in relational (SQL) and semantic (RDF) forms.