Skip to content

At Arnold Ventures, we are committed to funding research that meets the most rigorous standards of quality and transparency. We believe in the value of reproducibility, we believe our data investments should be accessible to as many scholars and researchers as possible, and we believe that commitment to transparency builds public trust in science.

These Guidelines establish what is expected of Arnold’s research grantees and consultants, subject to modification for a researcher’s particular circumstances. We have a core set of transparency policies — open data, open code, pre-registration of research projects, and open access to research articles. We ask that grantees fulfill these requirements by using the Open Science Framework platform (OSF), an open-source collaborative platform that facilitates greater transparency about the entire research workflow (including the ability to store data, code, and articles). We also ask that prospective grantees provide full details about their proposed methods, as discussed in this document.

I. Data and other materials

Subject to privacy and other legal exceptions, all data and data-related materials (such as survey instruments) created in whole or in part with Arnold funding should be made publicly and permanently available to the maximum extent that is legally permitted and logistically possible.

Grantees should upload the data and materials, along with a codebook (if applicable) that would enable other researchers to understand the data and how it is structured. Grantees can store data directly on OSF, or can store data elsewhere (such as Nature’s list of “Recommended Data Repositories” as shown here). In the latter case, grantees should make sure that the OSF page for a research project includes a link to the data repository in question.

Barring the exceptions noted below, this policy applies to all data and data-related materials, not just a limited subset that forms the basis for a published article, and the data should be uploaded both in its raw and processed formats. All of these data and data-related products should conform as much as possible to the FAIR principles (findable, accessible, interoperable, and re-usable).

All of that said, many datasets and data-related materials are private and confidential because of federal law, state law, local law, IRB requirements, or agreements with a jurisdiction or agency that provided data access. In some cases, the data in question cannot be released to anyone under any circumstances. In other cases, data might be available only on a highly restricted basis to a third-party researcher who signs a non-disclosure agreement. Yet in other cases, the grantee may need to remove personally identifiable information in order to create a public-use dataset. As to descriptive research or a descriptive component of a larger causal study, much of the underlying data may be difficult to anonymize or to share at all, as it may consist of recordings, interviewer notes, raw transcripts, etc.

In all events, researchers should discuss with us before grant approval how it would be possible to share data, and if so, under what terms, conditions, and, if justified, dedicated funding. Moreover, even when data cannot be shared, researchers should at a minimum release the data request parameters for each source of data (e.g., a government agency), so that someone else can independently replicate a data request.

II. Code

Researchers should produce well-annotated code scripts to process, clean, and analyze data, and the final version of these scripts should be made publicly available in a permanent fashion. [1]

Ideally, the code scripts should enable another researcher to take the original raw dataset(s), clean and merge them, and re-run the original analysis. At a minimum, grantees should share the analytic code used to create the main result of any Arnold-funded study.

III. Preregistration

Any empirical study that involves statistical inference [2] should be preregistered before the start of intervention or data collection.

For reasons explained in this article, preregistration can improve the reliability and inferential validity of research. Preregistration should occur via the study’s OSF page. Studies can be registered elsewhere as well (e.g., SocialScienceRegistry.org), but OSF allows for open-ended preregistration of study materials and pre-analysis plans, giving research teams greater flexibility to share information about study methods and adjustments made during the course of a study.

The key information to include in a pre-analysis plan depends to a large extent on the particular study, and we ask researchers to touch base with us prior to drafting their plan to jointly identify appropriate items to include. A separate guide explains the information that we typically look for in a pre-analysis plan. When the research project is concluded, any report or article should conform as much as possible to the preregistered design and analysis, with deviations being identified and explained.

In many cases (particularly studies other than experiments), it may be infeasible to preregister all of the modeling decisions that will be made along the way, although the basic modeling framework should be articulated clearly and justified, with caveats as to future decisions that depend on as-yet unknown circumstances. As further means of improving reliability, researchers should engage in thorough robustness and sensitivity testing as to their modeling assumptions and choices, and, if possible, analyze data while blinded in an appropriate way.

IV. Articles/Results

All research results from Arnold Ventures-funded research (including articles, reports, etc.) must be openly and publicly available for free.

What to post. In the event of a published article, this requirement may be satisfied by posting the final published version, the accepted author manuscript, or a near-final working paper or preprint. It is the responsibility of the grantee to retain sufficient rights to post articles as required by this policy. In the event of a research finding that is not formally published, the grantee should nonetheless write up the finding and post it on OSF as a working paper or preprint.


V. Persistent digital identifiers

To facilitate discovery and reuse, the overall OSF page for an Arnold-funded research project should be assigned a persistent digital identifier such as a DOI (Digital Object Identifier) or ARK (Archival Resource Key).

That identifier should then be cited in all publications resulting from the research project, so that the scholarly community can readily find the project’s related materials on OSF.


[1] As with data, grantees can store code directly on OSF or elsewhere such as GitHub. In the latter case, the OSF page should link to the code on GitHub.

[2] This phrase most obviously applies to any study on the causal impact of a policy or program, but can also apply to descriptive studies that attempt to draw inferences about, say, the prevalence or rate of a problem.
Arnold A decorative icon