Research Commons

Connect. Collaborate. Contribute.

Author: Amanda Rinehart (Data Management Librarian, University Libraries) (page 1 of 2)

Where is all the data?

One of the most common questions I get from aspiring data scientists  is where can they find all that data that everyone seems to be talking about? The good news is that there is a truly amazing amount of Open Data out there! Not entirely sure what Open Data is? According to the Open Data Handbook, “Open data is data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.”  To put the phrase in context, the Open Data Institute kindly provides us with this visual:

The Data Spectrum

Since the Open Data movement only started around 2005, it’s still a rapidly changing area. However, governments around the world have thrown a great deal of support behind the concept, and with the federal penchant for sharing research outputs, it seems to be here to stay.

The best place to find research data? has one of the most comprehensive lists of data repositories in the world. is the culmination of separate long-time efforts at the Purdue Libraries and the German Research Foundation and in 2015, it was moved under the London-based DataCite organization. Focused primarily on research data, they review each repository to see if it meets standards, and then code the repository with handy icons. These icons will specify if the data is open, restricted, or closed, what licensing terms may apply, any policies, and how the repository makes its data persistent, unique and citable. If you have a budding data repository, you can nominate it for acceptance.

Open Data imageStill not finding what you want? Try out the new Open Data Button! Similar to the Open Access Button, which helps users locate alternative, free versions of articles that are behind pay-walls, the Open Data Button is aimed at empowering YOU to be able to request specific data sets. Downloadable to Chrome (Firefox coming soon, they promise), the app will auto-request data sets behind any publications you find. If the data is made available, it will be hosted at the Open Science Framework and future requests won’t bother the author, they’ll just re-route to the data.

So if you are looking for data sets to analyze, here are a few resources to get you started.  Have a favorite source for open data? Willing to share? Send your comments to and look for a future post of compiled responses.

“If you are really bad The President will call your dean and shame you”

Yes, the title is an actual answer to the question of “What if I can’t do what I originally promised to do in my DMP [data management plan]?” While it is followed by the phrase “Just kidding”, it goes on to state that “awardees who do not fulfill the intent of their DMPs may have continuing funds withheld and this may be considered in the evaluation of future proposals”. So are they kidding about that too? That was my question in a recent blog, entitled “Does the Data Management Plan Matter?

We’ve recently gotten some answers to that question. For those of you just starting to explore this topic, the Data Management Plan (DMP) that I’m referring to is the one that many federal funding agencies have started to require as part of their grant application process. The following four items seem to indicate that the DMP is pretty important:

1) The NEH just released all of the Data Management Plans from successful grant applications to the Digital Humanities Start-Up, Digital Humanities Implementation, and/or Digging into Data grant programs. This comes in response to a number of Freedom of Information Act (FOIA) requests. They note that as more resources to researchers have become available, “the quality and the importance of the plans has generally increased”.

2) Data librarians have observed more DMP Requests For Information from grant review panels, so it’s not much of a surprise that we’re starting to hear about grant rejections due to inadequate DMPs. As Michael Jackson, NSF Antarctic Research and Logistics Integration Program Manager, noted at a recent conference “If you don’t put data into a repository per your data plan, you don’t get funded again…The other way of compliance is, as I mentioned, is the peer review process where your, your, your research colleagues will actually, you know, look at how you are doing things, they know who in the community is and isn’t sharing data freely, and they will make sure through the comment process of the proposal that it’s, that you are called out, or that you are also given kudos if you are particularly good at collaborating” (about 45 minutes into the video).

3) In another effort to support researchers in their compliance with DMP expectations, the NSF just announced more than $5 million towards the creation of Big Data Regional Hubs. Our hub covers 12 states and will be coordinated by the University of Illinois at Urbana-Champaign.

4) Do you find all this talk about the data sharing component of a DMP stressful? You are not alone! In fact, we have a new phrase to describe this: data tension, or the “Human tension and/or stress related to the sharing or release of data resulting from concerns about: (a) unknowns about users, uses, and what users will learn from the data before the data producers themselves learn it; (b) what users will learn from the data; (c) data quality; (d) data traceability (or lack thereof); (e) potential requests for additional documentation and metadata; (f) potential questions concerning methodology used to produce the data; (g) lack of resources to support data sharing; (h) governance; (i) social or political interests and impact; (j) data ownership; (k) the desire to ‘hold back’ data to give researchers the time to publish articles based on those data; and/or (l) perceived risk of data misuse or misinterpretation.”

If you are writing a Data Management Plan, and would like some assistance, try out the, or contact OSU’s Data Management Services Librarian, Amanda Rinehart, at

If you are interested in learning more about these changing expectations and OSU resources, please attend:

Federal Public Access Plans: Information for Researchers (Panel Discussion)

Who: OSU faculty, staff, postdocs, and graduates

When: Thursday, November 19, 1:00-2:30pm

Where: Thompson Library, Room 165


Wondering how federal agency public access requirements will impact your work as an Ohio State researcher? Curious about the services offered across the university to assist researchers in meeting these new expectations? Join a panel of experts to learn more about how to ensure compliance with new and existing public access policies and who can help. These topics and your questions will be discussed by the following panel:

Sandra Enimil, Head of the Copyright Resources Center, University Libraries

Karla Gengler-Nowak, Grants & Contracts Administrator, College of Optometry

Aimee Nielsen-Link, Director, Health Sciences Office, Office of Sponsored Programs

Amanda Rinehart, Data Management Services Librarian, University Libraries

Maureen Walsh, Institutional Repository Services Librarian, University Libraries

Does the Data Management Plan Matter?

Does the Data Management Plan matter? This is a question that has haunted many a researcher as they write up their grant proposal. As several agencies have jumped on the data management plan bandwagon in October of this year (details here), more researchers will wonder: Is anyone reading this data management plan, and if they are, what criteria are they using to evaluate it? The data management plan was popularized in 2011 when the National Science Foundation made them a requirement for every grant proposal. Researchers’ reactions were less-than-enthusiastic (as illustrated by “My Data Management Plan – A Satire”). While there is plenty of good guidance for writing a data management plan, a quick tour of twitter shows that data management plans are still a source of concern:



Since comments on data management plans (or DMPs for short) typically only get shared with the principal investigator, it’s difficult to find out if and how they are affecting grant success. Some of us are getting permission to share individual experiences when we have the opportunity (such as this one about an additional Request for Information), but it is hard to know if these individual experiences are reflective of community trends.

However, recent efforts by researchers from five institutions (Oregon State University, University of Oregon, University of Michigan, Georgia Institute of Technology, and Pennsylvania State University) have created a standardized rubric and begun to analyze their DMPs. Preliminary results from this project include:

  • Initial testing of the rubric revealed that of 21 DMPs, 10 were judged sufficient, nine were suspect, and two lacked information about how data was to be shared; the top five modes of data sharing included websites, journal articles or supplements, an institutional repository, on-request sharing, and other data repositories (Westra, 2015).
  • Addressing policies on reuse, redistribution or creation of derivatives is a significant challenge for researchers (Whitmire, 2015).
  • A review of 50 DMPs revealed the following areas that commonly need improvement: metadata and metadata standards, data sharing being perceived restrictively as only publication of research results in a journal, confusion between archiving and storage, and lack of awareness of the library’s support for research data management (Hswe & Parham, 2015).

For the latest updates on this effort, check out the D.A.R.T. Project (Data Management Plans as a Research Tool). When the NSF debuted its DMP requirement, it stated that the ‘communities of interest’ would evolve standards for research data management. Well, without a view of the community as a whole, it’s a bit difficult to know what standards are emerging. If you know of other efforts to capture research data management activities, evaluate DMPs, or just have a story to share, contact Because the more we understand about what our communities are doing, the better we can answer the question of how much the data management plan matters.


  • Westra, Brian (2015). Applying a rubric to data management plans to investigate data sharing. FORCE2015, Research Communication and E-Scholarship Conference, Oxford, England UK, January 11-13, 2015. Retrieved from:
  • Whitmire, Amanda L. (2015). Using data management plans as a research tool: an introduction to the DART Project. Invited talk NISO Virtual Conference, Scientific Data Management. retrieved from:
  • Hswe, Patricia & Parham, Susan Wells. (2015). Toward an Improved Understanding of Research Data Management Needs: Designing and Using a Rubric to Analyze Data Management Plans. International Conference on Open Repositories, June 2015. Retrieved from:

Funding Opportunities for Data Science Research

Do you wish you were more involved in the ‘data movement’?

Ohio State’s Translational Data Analytics group has just released a Request for Applications (RFA) focused on trans-disciplinary data teams! The application deadline is October 14, 2015 and the funding level is $10,000-25,000. The research should “address fundamental issues related to the use of analytic techniques to help target instructional, curricular, and support resources that support achievement of specific learning goals.”

To find out more, visit or contact:

Interested in federal big data funding opportunities? Jill Morris, from Ohio State’s Institute for Population Research, has a great overview of federal big data initiatives.

Federal Public Access Initiatives Update

In February of 2013, the Office of Science and Technology Policy issued a memo requiring all federal funding agencies with more than $100 million in research funds to design a Public Access Plan. Since then, several plans have been released. What does this mean for you? That largely depends on which agencies award you funding and whether the various policies and laws allow you to share your research products.  However, there are some emerging trends.

First, published outputs (such as journal articles) are almost always required to be shared publicly no later than one year after publication, and the repository for most funding agencies is PubMed Central. Exceptions are: the Department of Energy is using a system called PAGES, the National Science Foundation is using the NSF Public Access Repository (NSF-PAR), the USDA developed PubAg, and the Department of Defense has the Defense Technical Information Center. For dataset sharing, the trend is to require a Data Management Plan. However, the agencies are rarely specifying where the data is to be shared. The chart below is a brief overview of known OSTP responses. Information annotated with an ‘A‘ applies to articles, while information annotated with a ‘D‘ applies to data. Have questions about these new policies? Check out the resources listed at the end of the post.

Funding Agency When policies go into effect Data Management Plan required with funding proposal? Timeline for making articles public Recommended article repository Timeline for making data public
AHRQ Feb. 2015 (A), Oct. 2015 (D) Yes No later than 12 months from publication PubMed Central At article publication
ASPR Oct. 2015 (A, D) Currently proposed No later than 12 months from publication PubMed Central Within 30 months of data collection
CDC Jul. 2013 (A), Oct. 2015 (D) Yes No later than 12 months from publication PubMed Central (peer-reviewed) and CDC Stacks (all), Submission via NIHMS With the article and/or within 30 months of data collection
DOD Estimate of fiscal year 2015 Yes No later than 12 months from publication Defense Technical Information Center Within a reasonable time frame
DOE Oct. 2014 (A), Oct. 2014 (D – Office of Science), Oct. 2015 (D – Other offices) Yes No later than 12 months from publication PAGES to index, Article hosting choices: 1st) publisher, 2nd) local repository, 3rd) OSTI At article publication
FDA Oct. 2015 (A, D) Proposed, but details not yet available No later than 12 months from publication PubMed Central At article publication
IES 2012 Not yet No later than 12 months from publication ERIC None yet
NASA Oct. 2015 (A, D) Yes No later than 12 months from publication NASA-branded portal of PubMed Central At article publication
NIH Dec. 2015 (A, D) If over $500,000 in direct costs No later than 12 months from publication PubMed Central At article publication
NIST Oct. 2015 (A, D) Yes No later than 12 months from publication PubMed Central No later than 12 months from publication of article
NOAA Jan. 2016 (A, D) Yes No later than 12 months from publication NOAA Institutional Repository Not more than one year from data collection
NSF Jan. 2016 (A, D) Yes No later than 12 months from publication NSF-PAR ( Currently exploring
USAID Oct. 2014 (D) No Unknown Unknown Within 12 months if publication or patent is pending, Must submit to the Development Data Library (DDL)
USDA Jan. 2016 (A) Proposed, but details not yet available No later than 12 months from publication PubAg TBD
USGS Feb. 2015 (D) Recommended, but not submitted with proposals Unknown Unknown No information yet
VA Feb. 2015 (A) No No later than 12 months from publication in a journal PubMed Central None yet


Have questions about these new policies? Check out these resources:

Situation: Where to go:
I need a Data Management Plan or data repository or
I need to publicly share my articles and other publications
As an editor, I need to make sure my journal is author-friendly regarding these new requirements
As an author, I want to make sure my publishing agreements allow me to be compliant with these new requirements
I need help submitting my article to PubMed Central


Update (May 31, 2016): The text and table in this post were updated to include information about the NSF Public Access Repository (NSF-PAR).

Older posts