Portable pseudo-random reference sequences with Mersenne Twister using GNU Octave

de Rigo, D. (2012). Portable pseudo-random reference sequences with Mersenne Twister using GNU Octave. Mastrave project technical report. FigShare Digital Science. doi: 10.6084/m9.figshare.94593

Portable pseudo-random reference sequences with Mersenne Twister using GNU Octave

Mastrave project technical report

Daniele de Rigo

Abstract: Computationally intensive numerical tasks such as those involving statistical resampling, evolutionary techniques or Monte Carlo based applications are known to require robust algorithms for generating large sequences of pseudo-random numbers (PRN). While several languages, libraries and computing environments offer suitable PRN generators, the underlying algorithms and parametrization widely differ. Therefore, easily replicating a certain PRN sequence generally implies forcing researchers to use a very specific language or computing environment, also paying attention to its version, possible critical dependencies or even operating system and computer architecture.

Despite the awareness of the benefits of reproducible research is rapidly growing, the definition itself of “reproducibility” for PRN based applications may lead to diverging interpretations and expectations. Where the cardinality of PRN sequences needed for data to be processed is relatively moderate, the paradigm of reproducible research is in principle suitable to be applied not only to algorithms, free software, data and metadata (classic reproducible research, CRR), but also to the involved pseudo-random sequences themselves (deep reproducible research, DRR). This would allow not only the “typical” scientific results to be reproducible “except for PRN-related statistical fluctuations”, but also the exact results published by a research team to be independently reproduced by other scientists - without of course preventing sensitivity analysis with different PRN sequences, as even classic reproducible research should easily allow.

However, finding reference sequences of pseudo random numbers suitable to enable such a deep reproducibility may be surprisingly difficult. Here, sequences eligible to be used as reference dataset of uniformly distributed pseudo-random numbers are presented. The dataset of sequences has been generated using Mersenne Twister with a period of 2^19937-1, as implemented in GNU Octave (version 3.6.1) with the Mastrave modelling library. The sequences are available in plain text format and also in the format MATLAB version 7, which is portable in both GNU Octave and MATLAB computing environments. The plain text format uses a fixed number of characters per each PRN so allowing random access to sparse PRNs to be easily done in constant time without needing a whole file to be loaded. This straightforward solution is language neutral, with the advantage of enabling wide and immediate portability for the presented reference PRN dataset, irrespective of the language, libraries, computing environment of choice for the users.

Data and Resources

Additional Info

Field Value
Source http://dx.doi.org/10.6084/m9.figshare.94593
Author Daniele de Rigo
Last Updated October 10, 2013, 23:19 (UTC)
Created March 20, 2013, 16:51 (UTC)