Estonian Information Society Yearbook 2011/2012. Karin Kastehein
main activities and discontinue competing with the private sector. But presenting data in a re-usable form will mean expenditures. Based on the European Commission study, 1.4 billion euros of public sector investments would increase Europe’s GDP to 140 billion euros. Thus every cent invested will increase a country’s GDP by a euro.
Open data: a rediscovered gold mine
The topic of data re-use is not new. It began to be talked about in the late 1950s. During the Soviet era, the Information Institute re-used one million records a year3 of reports on magnetic media. The first instance of re-use of a register in re-independent Estonia was the technological solution for the State Gazette developed in 1995-1996. WordPerfect office software was used in this case to publish it on paper. The WordPerfect files were converted into SGML (XML is a derivative of SGML), digitally signed and opened via ftp server for free public use. The more active re-users of legal acts were the Government Office’s document management system, the State Gazette’s online database, IBS search system and EstLex. In recent years, many countries have discovered that opening data is a path to economic stimulus and they have launched extensive projects that support re-use of data:
• The European Commission’s policy on open data4
• Study commissioned by the Commission: “Towards a pan EU data portal – data.gov.eu”5
• Principles of open data in Great Britain6
• Principles of W3C open data7
• Recommendations of the Open Government Data development group8
• US and UK. Recommendations to the OECD with regard to open data policy9
• OFKN Open Data manual (legal affairs, social affairs, technology)
Many countries, regions and local governments have created frameworks for re-use and websites that simplify re-use:
• European open data directory: http://publicdata.eu
• US open data directory http://data.gov
• UK open data directory http://data.gov.uk
• Australian open data directory http://data.gov.au
• Canadian open data directory http://data.gc.ca
• Kenyan open data directory http://opendata.go.ke
• Norwegian open data directory http://data.norge.no
• Dutch open data directory http://data.overheid.nl
• New Zealand open data directory http://data.govt.nz
• Italian open data directory http://data.gov.it
• French open data directory http://data.gouv.fr
• Swedish open data directory (initiative of an individual) http://www.opengov.se
• Philadelphia area open data directory http://opendataphilly.org
• Helsinki Region Infoshare http://www.hri.fi/en
• CKAN open data directory repository http://thedatahub.org
Although opening data for re-use will result in additional expenditures, politicians have recognized the strong influence it will have on national economies and have started actively investing into the creation and development of open data. On his first day in office, US President Barack Obama signed a memorandum on an open and transparent government under which the public sector opened its data for re-use. By autumn 2011, the US open data directory consisted of 390,000 data sets.
What is open data?
Open data and data sets. Data published for re-use is called open data. This term covers machine-readable data that is freely available for everyone over websites and which is not protected by patents or restrictions on use or distribution. If legislation does not specify a fee for obtaining the data, the open data can be obtained free of charge and without access restrictions.
Formats that can be opened and modified by freeware applications are also well-suited for re-use.
The Public Information Act10 makes it obligatory to release to the public, via a government department’s website, document register and databases, the department’s unrestricted information. In addition, the public sector has the obligation of releasing information in response to requests for information. Here we take open data to mean information that is presented to the public in proactively opened formats. But in general, no request for information need be submitted for downloading open data.
Publication of officially generated data has several important objectives, the most specific one being the interest of individuals, companies and the third sector to merely view existing data or use it in software developments to generate value added in some field.
All data generated by both government departments and local governments and the public use of which is not expressly prohibited and which contain data other than personal data is subject to being made public. With regard to data that consists of both personal data and other data, only the latter part is made public.
In the context of open data, data that comprise an integral whole is called a dataset. This includes contract texts, regulation texts, collections of metadata on correspondence, budget and statistics files, databases converted to open format or open network services that issue data from registers. It is not reasonable to treat individual agreements and regulations as a dataset, unlike individual databases. In the case of some datasets, it is sufficient for the user to have access to the data (for reading or copying) while in the case of others there may be a strong interest to re-use the data. Below, the fields are arranged (pursuant to the OECD’s 2006 analysis) according to their re-use value in ascending order:
• culture (libraries, archives, museums, broadcasting)
• politics (press releases, strategies, green books)
• education (lectures, textbooks and study materials)
• science and research (research at universities, institutes and public sector)
• legal information (court, legal acts, patents, trademarks, rights and obligations)
• nature (biology, ecological, geological and geophysical information, information on energy resources)
• agriculture, forestry, fishing
• tourism, accommodations and entertainment
• traffic, transport
• social information (statistics, demographics, health, education)
• business and the economy
• meteorology, environmental information
• spatial data
Technically, the dataset published may be a collection of human-readable text files (such as a collection of legislation or regulations, official notices or contracts) or machine-readable data (such as a database of files exported to csv or xml format or a web service that allows all data to be searched for and downloaded in json or xml format).
A dataset is, in the technical sense, a collection of human-readable text files
The user must be able to do the following:
• browse and search public datasets for a dataset of interest;
• download a dataset found as a whole or, via the search system offered by the services, in parts without having to negotiate for rights or obtain passwords. In an exceptional case, a fee may be charged for the downloading of a dataset;
• to continue to use the database freely, with the right to download it into one’s computer and
3
Uuno Vallner. Retrospektiivsed otsisüsteemid (Retrospective search systems). Tallinn, Estonian Information Institute, 1985 (in Estonian)
4
http://ec.europa.eu/information_society/policy/psi/index_en.htm
5
http://ec.europa.eu/information_society/policy/psi/docs/pdfs/towards_an_eu psi_portals_v4_final.pdf
6
http://data.gov.uk/opendataconsultation
7
http://www.w3.org/TR/gov-data
8
http://www.opengovdata.org
9
https://usoecd.cms.getusinfo.com/data.html
10
https://www.riigiteataja.ee/akt/122032011010?leiaKehtiv (in Estonian)