Meetings of the Group

The RCN group "Facilitating Open Exchange of Data and Information" has met on the following dates:

8 August 2012 (planned)


2 July 2012: Meeting 2 (by Webex)

Present were: Jeff de La Beaujardiere, Paul DiGiacomo, James Gallagher, Takeshi Kawano, Fred Maltz, John Orcutt, Jay Pearlman, Lisa Raymond, Iain Shepherd, Amy Stout, Christoph Waldmann and Sandy Williams.

The agenda for the meeting was as follows:

  • Presentation by Jeff de La Beaujardiere - Challenges and experience with Open Data
  • Presentation by James Gallagher - Challenges and outcomes in building OPeNDAP for data discovery and access
  • Brief Presentation by Iain Shepherd on EC activities and initiatives.
  • Call for contributions to the task team web site - Issues each of us thinks are most important to the WG discussions and open data.
  • Next meeting (Aug 8 at 14:00UTC)
  • AOB (none)

John Orcutt started the meeting with a brief overview of the directions for his task team.  For OOI, he is working on complexities of sharing data and is currently on a panel. Networks have been dealt with in the literature. Technology is one level; heterogeneous with OOI developing one and other existing. Next level up is data. How do you have data streams talk to one another? The next level is people and culture. The next is political – difference in policy in, for example, the US and the EC.

Jeff de La Beaujardiere was introduced as the first presenter. His presentation is available ((PRESENTATION HERE).  Jeff is the NOAA Data Management Architect and previously was the system architect at NOAA IOOS. Jeff began his presentation with comments on open data. Open data is to be discovered through catalogs, preserved indefinitely through data archives for present and future users. The structure for addressing these issues in NOAA are the Environmental Data Management Committee, which links to GIS, DMIT, DAARWG but especially to NOSC and CIO Council. Satellite data, Buoys, Fisheries and that kind of things are included. EWDMC Procedural Directives covers archiving data, data documentation, data access and discovery, data citation, and data sharing by NOAA grantees. Data must be shared if you get NOAA funding in 24 months and it must be stated in the proposal. Unique identifiers allow data to be tracked and referenced. Data Management Framework has Principles of full open access, preservation, information quality and ease of use, and standards for interoperability and other areas. Jeff showed a mockup of a data management monitoring tool called the DM Dashboard. It has facility tracking showing number of records, existence and completeness of metadata and other key attributes. NOAA Data Discovery Vision in the future will be external catalogs. Other NOAA Activities related to Open Data links to US,, the National GeoPlatform, Open Government Initiative, as examples. There is participation in GEO and GEOSS. Jeff mentioned that Congress has asked NOAA why, if their data is so valuable, they are not charging for it. Response to Congressional language regarding cost recovery says that there should be charges when a project is unique and requires special developments but not in general.

10:23 James asked what kind of catalogs are going to be used? Jeff said that the NCDC, NODC and NGDC have a copy of the ESRI portal. UAF uses THREDDS and IOOS is migrating to the ESRI GEOPortal. The major issues are: (1) support and provision of metadata; (2) shared hosting; and (3) citation of data, perhaps through a digital object identifier (DOI) system. Competing catalogs have various qualities and the nature of the metadata that they put in. No particular protocol is blessed. THREDDS and CSW are widely different. Finding data for researchers that are not affiliated with major archiving centers is another issue that has not been adequately addressed. Motivation for participation could come through introduction of digital object identifiers. This will require a significant cultural shift so that use becomes widespread. The steps could be increasing visibility of data, celebrating what is released and making release a core part of the NOAA culture.

Christoph asked where the priority is? The first area is meta-data and data documentation - making sure we have good information about the data. Making data accessible online is the next priority and those services are key. The second is having data access on line and then building catalog of catalogs to provide broad access.

Iain asked what percentage of data is controlled by NOAA? Perhaps some tens of percent including satellite and insitu data. DoD has a lot, NASA and others. Data from Fisheries is diverse and smaller. Maybe 1/3 of US data by volume is NOAA

Jay asked about the long tail of data from data owners that are small with diverse and widespread interests. As mentioned earlier, Jeff recognized this is a training issue and a cultural challenge that is being recognized and will be addressed.

10:32 James Gallagher presented a discussion on open data and its challenges (PRESENTATION HERE). Some of the experience in this area comes from working with OPeNDAP and DODs. Data come in many forms; they have specific file formats, metadata content, and encoding standards ISO 19115 intended to promote interoperability. But these standards reduce the data added to any given system and build walls between systems. For example, NASA uses HDF while NOAA uses NetCDF. They are similar but not exactly the same.  

Another issue we need to address is online data. For many years, people used the internet (and tapes, CDs, etc.) to get access to data that they then stored locally for use. With the large volumes of data available now and the benefits of keeping data up to date, there is an opportunity to use data online along with processing and analysis tools. Acceptance of a new paradigm is hard for people. The technology can advance quickly but work habits less so. Possession: move from ‘having’ to ‘knowing where’. Once they have answered the question “where?”, they ask will it still be there when they want it again. Data systems must support both having and getting so when delivering data it must be user accessible over the long term.

Completely open systems create problems where agencies want control, logins to track access; limit anonymous users. There are legitimate needs for authentication to identifying users – particularly to justify the resources required for running the data service. In addition, priority may be reserved to users with critical use.   This has not been an issue in the US, but has been in Europe/Asia; perhaps in the US soon. Access control is orthogonal to data access.

Integration into Known Tools: Open data must be accessible by/in/with existing analysis tools. This is an important step forward in achieving web-based information generation. The dominant science tools are MatLab and IDL. Both companies creating these have been open to new formats and data models. However, the tool builders must know ROI (Return on Investment) for their activities and this may be counter to a completely open system. Toolmakers also need large audiences for their products and this need has caused some to move from science research to commercial applications as target audiences. James ignored commercial contributors that may have limited penetration of his initiatives.

What are the attributes of an open system that are important? System builds must make it easy. ‘Open’ changes assumptions about provenance (becomes more important and more complex), tools and examples trump standards documents, and system builders will seek large audiences. In addition, standards documents are complex and need expertise for proper implementation. Smaller research groups do not have this expertise and this is a barrier to adoption.

To what extent will a return on investment need for tool builders inhibit open source? James replied that there are free tools but if you don’t consider Matlab and other expensive systems you are limited. Matlab gets money from engineering not science.

Priorities for task team consideration: Greatest changes can come from metadata standards or lack thereof. OPeNDAP has a required level of metadata that is extremely low. Developing standard and distributing it across a wide range of people is daunting.

Sandy Williams suggested a meeting of his standards team this week. Standards group agreed to meet July 5 at 6PM European time, 12:00 noon EDT, and 9:00 AM PDT.

Iain Shepherd gave a brief overview of EC activities. He is with Maritime Organization in the European Commission in Brussels. New standards are concerned with Interoperability. Cost of Ocean observation is 400,000 and 52 organizations are put together in EMODnet for several disciplines. New call for 2012 Work Programme – thematic assembly groups. Themes are bathymetry at 1/8’ resolution and another task (not recorded). Iain agreed to consider a more detailed presentation for the next telecon.

All presentations will be available on the Working group web site. Jay asked for recommendations of speakers at the meetings. Jay thanked the speakers for their presentations today.

Next meeting is August 8 14:00 UTC.

Meeting adjourned at 15:00 UTC

6 June 2012: Meeting 1 (by Webex)

Summary Report:

Attending: Paul DiGiacomo, James Gallagher, Takashi Kawano, Fred Maltz, Mike McCann, John Orcutt, Jay Pearlman, Benoit Pirenne, Peter Pissierssens, Lisa Raymond, Iain Shepherd, Christoph Waldmann, Sandy Williams

Jay Pearlman provided an introduction to the meeting. He reviewed the agenda for the meeting:

  • Review of Open Data Working Group Terms of Reference – John Orcutt
  • Charter for Task Teams – Peter Pissierssens
  • Schedule to Task Teams – Jay Pearlman
  • Background of RCN and role of Working Group in RCN – Jay Pearlman
  • Web site and infrastructure support for Working Group – Pissierssens
  • Any Other Business

Jay welcomed new members participating in the Task Teams and emphasized the importance and opportunities for the participants

John Orcutt reviewed the Terms of reference. Many new technologies are emerging and will impact ocean observations and information. An example is the cabled observatories with data from microseconds to centuries and kilometer scales. Also, we can now use classes of sensors that take little power and others that are emerging from sensors previously only available in the laboratory. Open data is an issue here because not only the observing scientist will get the data. The Working Group needs to deal with Standards, Open data, Data Access models, how data are stored, published data and data citation. Reward system needs to credit scientists for publishing data and provide consistent referencing. Trend in Europe and the US is towards open data but that doesn't make it happen and we need to recommend to RCN the options for a sustainable process. September 2012 will get the recommendations of the three task groups and in October the Open Data working group will respond and then in December will report to the whole RCN.

Peter Pissierssens addressed the scope and key issues for the task teams. Data and information standards, Data access models (data policy), Data publishing/data citation as a way to create incentives for research. For the Standards Task Team, a key question is: how well has the international ocean research community done in terms of achieving interoperability? What steps should be taken and by whom to improve interoperability. For data access, what are the restrictions and how do these impact research. This should address all disciplines in the ocean community. For data publishing can data publishing/data citation offer a solution and how would it best be implemented.

Jay Pearlman talked about schedule. The activities of each Task Team should include a review of Literature, comment on status, and what are the remaining outstanding issues. From this base, options for addressing the issues should be defined and recommendations among the options should be offered in the report. Each task team should produce a report by the end of September for review by the entire RCN. Task teams could have experts make presentations at their WebEx meetings and presentations should be documented on the web site. Peter has set this up and will help the teams with inserting material into the site. One advantage of the site use is that overlaps of the Task Teams will be seen and synergies can be taken advantage of. Meetings are not intended to be closed. Meetings are by email and conf call. Each team needs to have a leader and scribe. Teams are requested, by the middle of July to provide an outline of the report and a schedule to get it done. For our schedule over the next four months, meetings are planned for July 2, August 8, September 13 and October 23. All meetings are at 14:00 UTC. Additional meetings of the task teams will be arranged by team members.

Jay provided a background on the Ocean Research Coordination Network (RCN). The RCN was created by NSF to stimulate cross-domain collaborations across ocean research areas. It isn't only physical oceanography that is addressed, but the biology and chemical aspects of the ocean environment. In addition to open data, sustainability is an issue - meaning keeping a flow of observation resources and data with the quality assurance. Outreach is included in the RCN. An objective for late this year is looking at educational courses available remotely. Also, governance is not an easy issue. The RCN meets once a year and the meeting for 2012 will be December 2 in San Francisco just before AGU. New working groups will start with the capacity building next, sometime in the late fall or winter. The terms of reference of the RCN and descriptions will be make available at the WG web site by Peter and Iain will put up an RCN site.

James Gallagher asked if there was an overlap with EarthCube? Jay responded: NSF started EarthCube in the fall and will have the second meeting next week. EarthCube is an infrastructure and science community cyberinformation system. The RCN scope is narrower and our working group will finish before EarthCube get very far. IOOS and others are also working on these problems. We want to add information and intelligence to what has been done.

Iain Shepherd commented that we are addressing more than science in ocean observations. There are monitoring data routinely collected in Europe and elsewhere. Sources include fisheries, water quality and oil platform measurements of the local environment.

Paul DiGiacomo spoke about the IOOS call for papers this fall, white papers, for the IOOS Summit. The white paper is about 5 pages. At this point, it is premature to write a paper as the Task Teams are just starting. Paul will make a placeholder. There is also a Blue Planet GEOSS meeting in Brazil in November. We will look at options for reporting the Working Group results, but need to have a full RCN review of recommendations before formal release of outcomes.

The Working Group Web site was discussed by Peter Pissierssens. There is space for material to be posted but Peter can do it for us or we can be given the password to do it ourselves. There is also a basecamp collaboration environment available to the teams ( In the writeboard section of Basecamp, documents can be written by multiple writers and you can track changes. You can identify milestones.

James Gallagher has a question. What are we supposed to use the web Site for?

Jay asked if there is a list server associated with this so things can be sent to the team. Yes.

Schedule is needed by July 2. WebEx or toll free calls can be set up by Peter or Jay.