Open data reuse in Spain. Update 2024

OPEN DATA REUSE IN SPAIN UPDATE 2024

OPEN DATA REUSE IN SPAIN

UPDATE 2024

Alberto Abella, FIWARE Foundation (Berlin, Germany) Marta Ortiz de Urbina Criado , Universidad Rey Juan Carlos (Madrid, Spain) Carmen de Pablos Heredero, Universidad Rey Juan Carlos (Madrid, Spain) Diego García Luna, Universidad Politécnica de Madrid (Madrid, Spain) WITH THE TECHNICAL SUPPORT OF THE DEPARTMENT OF STUDIES AND KNOWLEDGE MANAGEMENT AT COTEC

ISBN: 978-84-92933-11-2

In DEX

EXECUTIVE SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 REPORT PRESENTAT ION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 IntroductionbyCotec...............................................11 IntroductionbyDesideDatum.......................................... 11 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.1.Theimportanceofopendata........................................13 1.2.OpendatainEurope.............................................13 1.3.OpendatainSpain..............................................21 1.4. An outstanding autonomous approach: Open Administration Consortium of Catalonia . . . . . . . 24 1.5.Thedatareusemodel............................................ 24 1.6.Objectiveofthereport............................................27 2. METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28 2.1. Methodology for studying portals that publish data . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.1.1. Simplified maturity model for portals that publish data . . . . . . . . . . . . . . . . . . . . 31 2.2. Methodology for studying published datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3. Methodology for studying the reuse of published data . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.1. Datafederation...........................................33 2.3.2. Consolidation of the General State Administration portals . . . . . . . . . . . . . . . . . . 33 3. DIAGNOSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.Diagnosisoftheportalsthatpublishdata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.1. UpdatingofdataandavailabilityofAPIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.2. Datamanagementsystem.................................... 36 3.1.3. Developedservicesportal.....................................36 3.1.4. Maturity of portals according to methodology . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.Diagnosisofpublisheddatasets......................................37 3.2.1. Distribution by degree of maturity of the portals that publish datasets . . . . . . . . . . . 37 3.2.2. Categorisationbyreuselicence................................. 38 3.2.3. Categorisationbydatamodel.................................. 39

3.2.4. Categorisation by the technical standard used . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.5. Categorisation by access mechanisms needed to access data . . . . . . . . . . . . . . . 41 3.2.6. Categorisation by geographical content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2.7. Categorisationbyupdatefrequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.8. Categorisationbydissemination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.9. Categorisationbyreputation................................... 45 3.2.10.Categorisationbyglobalreuse. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46 3.3.Diagnosisofdatareuse...........................................46 3.3.1. Analysis of knowledge regarding entities that reuse published data . . . . . . . . . . . . . 46 3.3.2. Analysisofdatareusebysector.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.3. Analysis of data reuse by territorial scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.4. Analysis of the types of innovation due to the reuse of open data . . . . . . . . . . . . . . 52 3.3.5. Analysis of access registration availability . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3.6. Analysis of activities promoting the use of open data . . . . . . . . . . . . . . . . . . . . . 55 3.4.Diagnosisofgeneratedservices...................................... 55 3.4.1. Analysisofservicethemes.................................... 55 3.4.2. Analysis of sustainability and service business models . . . . . . . . . . . . . . . . . . . . 59 3.4.3. Analysisofserviceauthors.................................... 59 3.4.4. Analysis of other service characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.4.5. Analysisofthevaluecreationofservices. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4. QUALITATIVE ASSESSMENT OF INNOVATION SERVICES . . . . . . . . . . . . . . . . . . 62 4.1. Types of business models identified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 4.2.Analysisofservicesbybusinessmodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5. SWOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 6. CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.1.Conclusionsregardingportals....................................... 71 6.2.Conclusionsregardingdata.........................................71 6.3.Conclusionsregardingportalmanagers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4. Conclusions regarding generated services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7. RECOMMENDATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 8. FUTURE LINES OF WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 ANNEXES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82 Annex 1. Questionnaire addressed to data reusers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Annex 2. Ranking of portals according to reputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Annex 3. Responses to the three most common uses that reusers give their portal’s data . . . . . . 96 Annex 4. Responses to the three most used datasets on their portal . . . . . . . . . . . . . . . . . . 102

INDEX OF FIGURES

Figure 1: The evolution of Open Data maturity dimensions in Europe. . . . . . . . . . . . . . . . . . . . . 17 Figure 2: Overall open data maturity scores from the 2022 assessment. . . . . . . . . . . . . . . . . . . 17 Figure 3: Groups of countries in terms of the Open Data maturity index (2022) . . . . . . . . . . . . . . . 18 Figure 4: Average maturity level of the EU-27 in each of the four dimensions . . . . . . . . . . . . . . . . 19 Figure 5 : Distribution of data portal maturity in Spain 2023 . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Figure 6: Distribution of datasets by maturity of publishing portals 2023 . . . . . . . . . . . . . . . . . . 37 Figure 7: Distribution of datasets by usage licence 2023 (MELODA 5) . . . . . . . . . . . . . . . . . . . . 38 Figure 8: Distribution of datasets by data model 2023 (MELODA 5) . . . . . . . . . . . . . . . . . . . . . 39 Figure 9: Distribution of datasets by storage standard used 2023 (MELODA 5) . . . . . . . . . . . . . . . 40 Figure 10: Distribution of datasets by access mechanism used 2023 (MELODA 5) . . . . . . . . . . . . . 41 Figure 11: Distribution of datasets by geographical content of information 2023 (MELODA 5) . . . . . . . 42 Figure 12: Distribution of datasets by update frequency 2023 (MELODA 5) . . . . . . . . . . . . . . . . . . 43 Figure 13: Distribution of datasets by dissemination 2023 (MELODA 5) . . . . . . . . . . . . . . . . . . . . 44 Figure 14: Distribution of datasets by reputation 2023 (MELODA 5) . . . . . . . . . . . . . . . . . . . . . . 45 Figure 15: Distribution of sampled datasets by categories 2023 (MELODA 5) . . . . . . . . . . . . . . . . 46 Figure 16: NTI-RISPapplicationthemesin2023.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Figure 17: Distributionofserviceauthors2023. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59 Figure 18: Businessmodelsofservices2023...................................63 Figure 19: Businessmodelsofservices2021...................................64

INDEX OF TABLES

Table 1: Table 2: Table 3: Table 4: Table 5: Table 6: Table 7: Table 8:

MaturitydimensionsofdataportalsinEurope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

DimensionsandlevelsofMELODA5................................. 25

EaseofreusabilityratingrangesinMELODAB. . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Metrics for analysing the maturity degree of data portals . . . . . . . . . . . . . . . . . . . . . . 30

Open data portals by Autonomous Community 2023 . . . . . . . . . . . . . . . . . . . . . . . . 35

Knowledgeofthetypesofdatareusers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Reusersbysector(I)...........................................49

Reusersbysector(II).......................................... 50

Table 9: Reusersbysector(III).......................................... 51 Table 10: Scopeofactionofopendatareusers................................. 53 Table 11: Typesofinnovationbyreuseofopendata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Table 12: Availabilityofdataaccesslogs.....................................54 Table 13: Activitiespromotingtheuseofopendata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Table 14: NTI-RISPapplicationthemesin2023.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Table 15: Valuecreationthroughdatareuse...................................61 Table 16: SWOTanalysis:weaknessesandthreats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Table 17: SWOT analysis: strengths and opportunities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Table 18: Ranking of portals by reputation level according to MELODA 5. Levels 3 and 2 . . . . . . . . . 89 Table 19: Ranking of portals by reputation level according to MELODA 5. Level 1 . . . . . . . . . . . . . . 91 Table 20: Comparison of the three most common uses of datasets . . . . . . . . . . . . . . . . . . . . . 97 Table 21: Comparison of the three most commonly used datasets . . . . . . . . . . . . . . . . . . . . . 103

EXE CU

TIVE SUM MARY

The reuse of open data helps to generate social and economic value. In addition, it allows for the creation of new companies that, with limited resources of their own, carry out business models based on the development of products and services enriched with value-added information.*

*Abella, Ortiz-de-Urbina-Criado and De-Pablos-Heredero (2014)

OPEN DATA REUSE IV

• Domestic consumption. As in previous reports, the biggest consumers of published data are the public administrations themselves (51.70 % in 2023, compared with 64 % who reused it frequently or habitually in 2021). • Lack of standardisation and models for published data. In 2023, 69.90 % of the data published—as opposed to 80 % in 2021— does not include information on its structure or use standardised data models. • Infrequent updating of data. It is worth noting that in 2023 this aspect, despite certain improvements, still presents a challenge. 78.30 % of the open data published in 2023—compared to 92 % in 2021—has an update period longer than one month; while the percentage of data published in real time is around 0 %. • Non-geolocated data. This aspect worsened in 2023: in 2021, 50 % of the data published contained no geographical information at all, and in 2023 the figure was 63.60 %. • Lack of maintenance of open data services. 35 % of the open data-based services listed in the portals are inactive or no longer exist. • Reputation ranking of open data publishers. As in the 2021 report, a portal reputation ranking has been carried out, although work is underway to develop more appropriate methodologies for measuring reputation (Ortiz-de-Urbina-Criado, Abella and De-Pablos-Heredero, 2023).

This report is the fourth study on the reuse of open data, carried out with the aim of presenting the current status of research in 2023 and the progress made since the first report in 2017. All of this allows for the development of recommendations and future lines of work that help to generate businesses and services for society. To this end, the open data portals in Spain in 2023 have been identified, and a sample of the datasets they have available as well as the services based on them have been analysed. In addition, a questionnaire was sent to those responsible for the portals in order to analyse some of the characteristics of the open data and its potential for reuse. Specifically, a diagnosis has been made based on the knowledge they have regarding the reuse of their data, the type of innovation that can be promoted with it, the activities to promote its use, the services generated and the creation of value based on the reuse of data. The latest version of the MELODA 5 metric has also been applied to analyse the degree of reusability of open data published in Spanish open data portals. All these diagnoses have enabled an analysis of the opportunities and threats as well as of the strengths and weaknesses, from which certain reflections have been included that can help to build future public data management policies. The study carried out allows us to reveal the following reflections on the data reuse ecosystem in Spain: • Statistical broadening. Statistical data sources have increased their percentage (19.78 %) with respect to 2021 (14.58 %) and are mainly responsible for the increase in data produced in recent years.

9

prE se ntA Tion

1. https://desidedatum.com 2. https://www.tylertech.com/products/data-insights 3. https://www.opendatasoft.com 4. https://ckan.org

OPEN DATA REUSE IV

The report on the reuse of open data emerged as a one-off project in Cotec’s first PIA call, the mechanism through which the foundation selects alliances for the development of knowledge in the field of innovation. Back then, in 2017, we already sensed that a proper data culture in public administrations was essential to advance digital rights and, at the same time, to develop a knowledge economy that can compete on an international level without forgetting our principles and values.

Now, following four editions of this report, fully consolidated among the periodically published products that make up the Cotec Report Observatory, we must congratulate ourselves for having chosen this project from among more than a thousand candidates. But, above all, we must congratulate the authors and collaborating entities for their ability to show, edition after edition, the increasing value of open and shared data. This report shows us the long road ahead, while recording and analysing the timid but hopeful steps we have been taking..

Currently—September 2023—it has more than 50 ongoing projects in many Spanish public administrations and companies, where innovation and the search for the value of data are always the main objectives. Additionally, DesideDatum has always championed initiatives related to the opening up of public data. That is why, once again, we fully support this study. For DesideDatum, opening up public data is synonymous with sharing and multiplying the value and quality of public data for all members of society. It is about empowering society, reviving the economy and even improving highly data-driven services (such as artificial intelligence)..

DesideDatum Data Company SL —better known by its brand name DesideDatum—is the most widely recognised Spanish company in the field of open data. In fact, it is the only company in the world that is able to offer services in the three main global technologies for open data: Tyler-Socrata, OpenDataSoft and CKAN. In addition, DesideDatum is specialised in carrying out consulting and implementation projects in the main data-related fields: • Open Data. • Data governance and management. • Data analytic • Data visualisation • Transparency and accountability based on data

11

IN TRO DUC

tIoN

01.

OPEN DATA REUSE IV

1.1 THE IMPORTANCE OF OPEN DATA

1.2 OPEN DATA IN EUROPE

There is growing concern in Europe surrounding the quality of open data (Gao, Janssen and Zhang, 2023). The European Data Portal ( https:// data.europa.eu/ ) offers access to open data from any European country and promotes data publication practices at national, regional and international levels. Since 2015, comparative data on the evolution and use of open data in European countries has been presented annually. The results of these comparisons are shown in the open data dashboard, which is a very practical tool for comparing the levels of maturity of open data in the Member States of the European Union. Carsaniga, Lincklaen Arriëns, Dogger, Van Assen and Cecconi (2022) compare best open data management practices in Europe and highlight the cases of France, Ukraine, Poland, Ireland, Cyprus, Estonia, Spain and Italy as references that set good practice trends. Recently, in August 2023, in the European report on value creation in the public sector through the use of open data (Osimo and Pizzamiglio, 2023), good practices in the reuse of data in the public sector are highlighted in the cases of France, Estonia and Flanders (Belgium). The methodology, initially developed by Cecconi and Radu (2018), has been improved over time and currently analyses the level of maturity of open data in different countries by taking the following dimensions into consideration:

Open data is data that can be freely used, reused and redistributed by anyone, and that is subject, at most, to the attribution requirements and to being shared in the same way in which it appears (Open Data Handbook, 2023). The value of open data is in its reuse. For open data to be reused, it is important that it meet certain quality requirements (Hrustek, Furjan and Pihir, 2021). Zuiderwij, Pirannejad and Susha (2021) highlight, among other aspects, the importance of data quality for it to be reused. In this sense, Abella, Ortiz-de-Urbina-Criado and De-Pablos-Heredero (2022) define the concept of pretender open data portals (PODP) as portals that publish data, but do not allow professional reuse of the data they store. For a data portal to be considered suitable for the reuse of its data, it must meet the following requirements: 1. Have an update mechanism that allows the delivery of real-time information on data updates. 2. Have a data management system (DMS) to provide automated access to data capture and publication. 3. Have an API for publishing data with a mechanism that allows it to be reused professionally. Concern surrounding the creation of these types of portals, together with the scarcity of open data in key sectors such as health care, has led these same authors to recently generate a reputation index for open portals (Ortiz-de- Urbina-Criado, Abella and De-Pablos-Heredero, 2023) based on three dimensions: if the portal is known, if it is known for something specific, and how it is valued by its users.

13

OPEN DATA REUSE IV

1. Open data policy

3. Open data portal

This dimension focuses on the existing open data policies and strategies in participating European countries. The national governance models and the measures, also at the regional and local level, applied to undertake these policies and strategies are analysed. To achieve this, the dimension is based on the same three indicators from previous years: the policy framework, open data governance and open data implementation. On 21 December 2022, Commission Implementing Regulation (EU) 2023/138 was published. This regulation defines six categories of high-value datasets: geospatial, Earth observation and environment, meteorological, statistics, companies and company ownership, and mobility. The regulation establishes that public sector bodies in Member States must make data available for reuse, free of charge.

This dimension focuses on the analysis of the national open data portal. It carries out an in-depth analysis of advanced features and functions, providing a successful user experience. Additionally, the dimension assesses the extent to which portal administrators use web analytics tools to better understand the needs and behaviour of their users and update a portal’s features in line with the information obtained from these analyses. This dimension examines the coverage of open data in different domains, as well as the approach and measures established to ensure the portal’s sustainability.

4. Open data quality

This dimension focuses on the measures taken by the portal managers to ensure the systematic collection of metadata from sources throughout the country, as well as the updating of available metadata and, whenever possible, actual data. Compliance with the DCAT-AP metadata standard, currently published in version 2.1.1, is monitored, as well as the quality of the implementation of the published data. Quality assessment elements are provided for portal managers and policy makers, such as the use of open data in formats and licences, whether the data is machine-readable and of high quality, and is suitable for a linked data approach.

2. Impact of open data

The second dimension analyses the willingness, readiness and capability of European countries to measure both the reuse and impact of open data. Firstly, the dimension investigates how countries are prepared to measure the level of reuse and impact of open data within their territory. This reflects the first indicator, strategic awareness, which was also used in previous editions of the study. Secondly, the emphasis is on whether countries measure open data reuse, with what methods and in what way. Lastly, the dimension focuses on collecting data on the impact created within the four impact areas that have been considered in previous open data maturity assessments, namely government (formerly political), society, the environment and the economy.

Next, Table 1 shows the measurements used for each of the dimensions:

14

OPEN DATA REUSE IV

Table 1. Maturity dimensions of data portals in Europe

DIMENSION

MEASURE

Open data policy

Regulatory framework Open data governance Open data implementation

Impact of open data

Strategic awareness Measuring reuse Impact created Portal features Data provisioning Portal sustainability

Open data portal

Quality of the open data portal

Update Control measures DCAT-AP compliance Implementation quality and linked data

Source: Own authorship

According to the latest open data maturity report (Carsaniga et al., 2022), it can be stated that (Figure 1): • EU Member States are preparing for the regulation of the implementation of high- value datasets. Although the regulation has not yet been adopted, this year’s assessment provides an overview of the level of readiness of EU Member States to meet the requirements in the four dimensions of open data maturity. 96 % of EU Member States are working on identifying data in high-value data domains that should be prioritised for publication, especially the statistics, geospatial,

Earth observation and environment, and meteorological categories. 85 % of the 27 EU Member States are already preparing to monitor and measure the level of reuse of high-value datasets, and all of them intend to promote or are already promoting high- value datasets on their portals. Finally, 63 % of EU countries are preparing to ensure interoperability of high-value datasets with available datasets from other countries. • Measuring the impact of open data is a priority for EU Member States, but also a major challenge. In 2022, the impact dimension experienced the largest decrease compared to the other

15

OPEN DATA REUSE IV

dimensions, decreasing from 78 % in 2021 to 71 % in 2022. This drop of seven percentage points is in line with the methodological restructuring of the dimension, which also makes it difficult to perfectly compare the 2022 indicators with those of previous years. Furthermore, this result should not be considered so much as a decrease in the level of maturity of the countries in the EU. The fact that these countries continue to score highly on the strategic awareness indicator—which was also used in the 2021 assessment— demonstrates that the EU-27 remains very interested in understanding open data reuse and value creation, as noted in the trends in last year’s assessment. In contrast, the decrease in the impact dimension provides a more accurate picture of the difficulty EU countries have in distinguishing and evaluating open data reuse and the resulting impact. While they remain quite advanced in tracking and measuring reuse (the EU average is 75 %, the same as last year), collecting data on the impact created, especially from an economic perspective, seems to be more difficult for said countries. • In a post-pandemic world, European countries face both new and old common challenges. From year to year, EU Member States have been recovering from the pandemic, for example, by leveraging open data for the development of statistics, dashboards and alert applications. In 2022, the Russian attack against Ukraine and the consequences of this conflict for the European economy and the energy market laid the foundations for new socioeconomic challenges across Europe. Ukraine has reported that the war has had a significant impact on its work on open data, especially as Ukraine’s internet resources (in particular those that are state-owned) have been

temporarily unavailable. The vast majority (18) of the EU-27 Member States are above the EU-27 average. The level of open data maturity has been improving. The potential of open data was also used by other countries in Europe to respond to the consequences of the war in Ukraine. For example, some countries have reported using open data to monitor the level of energy use or to facilitate the integration of Ukrainian refugees into their labour markets. Figure 2 presents the overall open data maturity scores for each of the 35 countries participating in the 2022 assessment, according to the Open Data Maturity Report.

16

OPEN DATA REUSE IV

Figure 1. The development of open data maturity dimensions in Europe

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

2015

2016

2017

2018

2019

2020

2021

2022

Policy

Impact

Portal

Quality

EU27 Average

Source: Carsaniga et al. (2022: 7).

Figure 2. Overall open data maturity scores from the 2022 assessment

100%

90%

80%

79% 75%

70%

60%

50%

40%

30%

20%

10%

0%

Average EU27

Average EU27 +

Source: Carsaniga et al. (2022: 9).

17

OPEN DATA REUSE IV

Figure 3 shows the groupings obtained according to this maturity index. Countries are grouped from lowest to highest in the index into four categories: beginners, followers, fast- trackers and trendsetters. The chart shows that: • The maturity of European countries is concentrated at the upper end of the spectrum (above 65 %). • The trendsetters grouping is made up of the eight top performing countries: France, Ukraine, Poland, Ireland, Cyprus, Estonia, Spain and Italy. • The five countries included in the fast- trackers group show very similar scores, as the group is concentrated in a 3 % range (88 to 91 percentage points).

Figure 4 presents the average level of maturity of the EU-27 in each of the four dimensions and is compared with the figures from the previous year, leading to the following conclusions: • All figures show a slight decrease or the same score as last year. The dimensions with the lowest scores are policy and impact, which went through several changes in methodology in 2021. • As in 2021 and 2020, policy is the most mature dimension, with a score of 86 %. • The impact dimension shows a decrease of seven percentage points, which reflects the updated methodology and questions aimed at more accurately measuring the progress of the different countries.

Figure 3. Groups of countries in terms of the open data maturity index (2022)

CH

AT

SI

FR

LU

NO

UA

NL

CY

EE

RO

HU

BG

DE

DK

PL

FI

CZ

ES

RS

BE

PT

IE

IS

LT

IT

HR

ME

LV SK GR

SE

BA

AL

MT

10 %

15 % 20 % 25 % 30 % 35% 40 % 45 % 50 % 55 % 60 % 65 % 70 % 75 % 80 % 85 % 90 % 95 % 100 %

FAST- TRACKERS

trend- setters

BEGGINERS

FOLLOWERS

Source: Carsaniga et al. (2022: 9).

18

OPEN DATA REUSE IV

Figure 4. Average maturity level of the EU-27 in each of the four dimensions

Policy

Impact

86 %

71 %

87 %

78 %

2021

2021

2022

2022

Portal

Quality

83 %

77 %

83 %

77 %

2021

2021

2022

2022

Source: Carsaniga et al. (2022: 10).

• The portal dimension has remained stable since last year and is the second most mature. • The quality dimension shows limited improvement and is the third most mature. Additionally, European reports (Carsaniga et al., 2020; Osimo and Pizzamiglio, 2023) highlight other trends that represent good opportunities for the development and improvement of open data management in the European context:

• Human resources and skills. Several countries highlight the lack of human resources allocated to open data and the absence of adequate data and literacy skills among public officials (Carsaniga et al., 2020). • Availability of financial resources. This challenge refers, for example, to securing a recurring budget for specific datasets (high-value datasets), as well as not having a planned budget (Carsaniga et al., 2020).

19

OPEN DATA REUSE IV

• Coordination issues. The EU-27 often reports difficulties in enabling smooth governance of data management at all levels of government (Carsaniga et al., 2020). • Commitment to the subject of open data. Encouraging different actors to provide and use open data is a widely spread challenge throughout the EU (Carsaniga et al., 2020). • An additional aspect of open data publishing is the need for more support, in legal, technical and financial terms, when it comes to publishing high-quality open data (Carsaniga et al., 2020). • Regarding awareness and communication, any action must also include examples of data reuse by the public sector. The collection and communication of these examples and usage cases greatly helps to understand the importance of the public sector’s role as a data reuser (Osimo and Pizzamiglio, 2023). • Regarding policy and regulation, it would be beneficial to align the European Commission’s regulation improvement activities and work plans with open data publishing activities, in order to better explore internal data needs. Furthermore, it would be useful to facilitate a similar alignment and analysis of data needs for all European public administrations. For example, this could be done by providing examples, best practices and methodologies on how to determine data needs for policy and regulatory purposes (Osimo and Pizzamiglio, 2023).

• Existing monitoring activities, such as surveys, should be reviewed to ensure that the public sector includes the reuse of data. It would be useful to create a user group, based on the existing broad community, that could be used to conduct new surveys (Osimo and Pizzamiglio, 2023). • The role of data administrators remains fundamental in promoting reuse. Therefore, examples, best practices and methodologies on the role of data administrators should be included in supporting activities, not specifically for public sector reusers, but in general (Osimo and Pizzamiglio, 2023). The characteristics of Spanish open data portals go beyond simply allowing users to find available datasets. There is a common focus on interaction between data publishers and reusers through discussion forums, data-specific feedback systems and rating systems. By using examples, portals expose valuable cases of open data reuse (Carsaniga et al., 2020). The best practices related to each of the open data maturity dimensions explored in this report can be highly beneficial for all countries in Europe and beyond to get inspired, learn and work towards improving their own practices. Spain is one of the countries that sets the trend in terms of open data management (Carsaniga et al., 2020).

20

OPEN DATA REUSE IV

1.3 OPEN DATA IN SPAIN

The four sub-sectors with the most impact represent 75 % of the total employees in the sector, and Geographical Information stands out with 30 % of the total. Behind it are, with a similar percentage, Financial, Technical Consulting and Market Research. The rest represent 25 % of the market, all below 10 % (ASEDIE, 2023). Of the total infomediary companies, 68 % have existed for less than 20 years: 32 % for between 11 and 20 years, and 36 % for less than 10. 64 % of them were created more than 10 years ago. The average age of companies is 16 years. In the last year, 40 companies have been created. Publishers is the only subsector where the majority of companies have been active for more than 20 years. At the other extreme are Tourism and Meteorological, of which 100 and 85 % of companies, respectively, are less than 20 years old. The rest of the sub-sectors have between 60 and 80 % of companies that are less than 20 (ASEDIE, 2023). The number of companies detected within the infomediary sector has grown almost 60 % since the beginning of this report in 2013. As of December 2021, the number of infomediary companies identified in Spain is 710. The number of employees of the 542 companies in those for which employee data is available amounts to 22,663, and the net profit of the 506 companies for which results data is available amounts to 181,707,060 euros (ASEDIE, 2023). The autonomous communities that have grown the most since then (proportionally) are Cantabria (400 %), Murcia (300 %) and Extremadura (300 %); and in number, Madrid (83), Andalusia (54) and the Valencian Community (45). The only autonomous community that has decreased is Catalonia, and the only one in which no company has been detected is the autonomous city of Ceuta (ASEDIE, 2023).

The latest edition of the Data Economy Report in the infomediary sector, carried out by ASEDIE, includes results from an analysis of 542 companies that have business models based on data, and shows that the basis of decision- making depends more than ever on information and data and that, sometimes, we are not even aware of this act of digitalisation (ASEDIE, 2023). Advances in artificial intelligence, as well as the Internet of Things, are realities that are evolving at an ever-increasing rate and are causing a transformation in the economic system. Data, its management and its analysis have become the necessary element for business progress, which makes the infomediary sector one of the most influential in our economy (ASEDIE, 2023). This same report highlights that there has been an increase of 12.1 % in the infomediary sector, compared to a national GDP growth of 7.6 % (ASEDIE, 2023). The increase in both the digitalisation of processes and the attention companies give to data quality are recognised as factors that improve expectations for data reuse. So-called “data culture” is progressively growing in Spain, which on the other hand sets a trend in open data management in the EU (ASEDIE, 2023), as described in the previous summary. In Spain, infomediary companies are more active in some regions than others. The sector is represented in all the autonomous communities of the Spanish territory and in the autonomous city of Melilla. The Community of Madrid, with 39 %, is the autonomous community with the most infomediary companies, followed by Catalonia, Andalusia and the Valencian Community, with weights of 13, 11 and 9 % respectively. The rest of the autonomous communities make up the remaining 28 % of infomediary companies (ASEDIE, 2023).

21

OPEN DATA REUSE IV

74 % of respondents from the academic sector, 71 % from the private sector and 73 % from the public sector indicated that they are aware of European regulations on the six categories of high-value public datasets. Respondents from the public sector, who are both providers and reusers of public information, highlight that the most significant obstacles they encounter when reusing information are (ASEDIE, 2023): • The information provided in the data is not homogeneous (41.9 %). • dDatasets are not available in all autonomous communities or in all city councils (41 %). • Lack of data updates (38.1 %). Regarding obstacles when reusing data, both the academic and private sectors agree that the main obstacles are (ASEDIE, 2023): • Lack of data updates. • Lack of availability. • Difficulty in accessing it. Data federation allows you to redistribute an open dataset from its original domain or another domain in whole or in part. It is a way to collect external sources of data in domains that are usually most actively visited. In this sense, the 2021 report detected a significant increase in federated data as a way to avoid these obstacles (Abella, Ortiz de Urbina Criado, De Pablos Heredero and García Luna, 2021) that has continued to this day. Regarding the impact of the usefulness of open data, 88 % of academic respondents who have knowledge of high-value sets believe that they are useful for their institution. 96 % of companies surveyed who have knowledge of high-value

datasets believe they are truly useful for their business. 77 % of public sector respondents who have knowledge of high-value datasets have indicated that their agency is responsible for at least one of them. 72 % of public sector respondents who have indicated that their agency is responsible for one of the high-value datasets have indicated that they will publish the data within the established time frame—16 months— (ASEDIE, 2023). In terms of academic and business impact, 54 % of private sector respondents who have knowledge of the six categories of high-value data have indicated that up to this point they have paid for some of the data collected in these categories, while 93 % of them have stated that free access to data will have a positive impact on their budget. Regarding the academic sector, only 13 % of respondents who have knowledge of high-value data usually pay for its use. However, all respondents affirm that free access to data will make it easier to carry out research projects (ASEDIE, 2023). 95 % of respondents state that it would be beneficial to have a list or compendium of existing regulations that directly affect the access to, publication and reuse of public sector data. 65 % of those surveyed who request a list or compendium of regulations state that it would help to advance the training and informing of those involved in the data ecosystem, and 59 % mention that it would help to make the implementation of the different regulations clearer and easier (ASEDIE, 2023). 63 % of respondents affirm that they use data daily or at least once a week. The most in-demand data is statistical, information on the public sector and geospatial. 61 % of respondents affirm that they use the data published on the data.gob.es portal. 66 % of respondents say they use data published on other Spanish portals. 77 % of respondents

22

OPEN DATA REUSE IV

• The degree to which it is “known for something” (for example, for its level of maturity, its datasets, the services developed by its data or by its innovation). • Its generalised favorability (the opinion of the reusing agents in the data ecosystem). Having reliable metrics that allow measuring the quality of the data hosted by open data portals, with regard to its reuse, is of great importance. The reputation of portals can boost their continuous improvement. In this report, an evaluation of open data portals in Spain is carried out, applying the latest version of the MELODA metric (Abella, Ortiz-de-Urbina-Criado, De-Pablos-Heredero, 2019).

who use these portals have indicated that they access the portal of the National Institute of Statistics, and 52 % access the portal of the National Centre for Geographical Information. 67 % of respondents indicate that they use data published at a regional level. Although the use is similar in all autonomous communities, it is worth highlighting the use of the portals of the Community of Madrid, the Junta de Andalucía and the Generalitat de Catalunya (ASEDIE, 2023). The creation of open data portals does not imply that the data they publish is ready for professional reuse. Organisations that offer open data portals must consider that one of the values that data provides lies in its capacity for reuse, so they must try to define and create open portals whose characteristics allow the adequate reuse of data (Abella, Ortiz-de-Urbina- Criado, De-Pablos-Heredero, 2022). The political interest in implementing open government projects has produced some confusion and ambiguity (Gil-García, Gasco- Hernández and Parto, 2020). Specifically in the case of Spain, we have identified a high number of what we call pretender open data portals (PODP) (Abella et al., 2022), given that these are open data portals that contain data, but which is not suitable for reuse. Cetina (2021) refers to the need to work on “purposeful data” in order to make it useful. Ortiz-de-Urbina-Criado, Abella and De-Pablos- Heredero (2023) analyse the reputation of open data portals considering it the collective recognition of the capacity demonstrated by the portal to systematically offer reusable open data and allow the creation of value based on it. The authors base their analysis on the three dimensions proposed by Lange, Lee and Dai (2011) to measure reputation: • The degree to which “it is known” (dissemination and knowledge of the data portal).

23

OPEN DATA REUSE IV

1.4 AN OUTSTANDING

The results of the Catalonia model of open government are the following: • Approximately 1,230 local entities use AOC services. • 85 % of citizens say they are satisfied or very satisfied. • The degree of compliance with the transparency law in Catalonia is 50 % higher than in the rest of Spain, according to data from Infoparticipa (Universitat Autònoma de Barcelona). • Savings of 5 million euros per year are generated in administrative transparency tasks.

AUTONOMOUS APPROACH: OPEN ADMINISTRATION OF CATALONIA

The Open Administration of Catalonia Consortium (AOC) seeks to guarantee that all citizens of Catalonia enjoy quality digital public services, regardless of their municipality of residence and the capabilities and resources of the public bodies they engage with. The great challenge we have is that more than 90 % of public entities are of a reduced or very reduced size and do not have the resources to comply with very demanding standards, which are the same for a large administration or a small-town council. Regarding the practice of open government and good governance, the AOC offers local administrations the following common services: • Transparency portal • Open data platform • Whistleblower channel • Citizen participation platform • Institutional integrity self-assessment guide

1.5 THE DATA REUSE MODEL

MELODA is a metric for assessing the quality of open data that enables users to qualify information and evaluate its degree of reuse (Abella, Ortiz-de-Urbina-Criado, De-Pablos- Heredero, 2014). MELODA 4 was the version used in the 2017 (Abella, Ortiz-de-Urbina-Criado, De-Pablos-Heredero, 2017) and 2019 (Abella, Ortiz-de-Urbina-Criado, De-Pablos-Heredero, Vidal-Cabo and Ferrer-Sapena, 2019) reports. In its current version, MELODA 5 evaluates two additional dimensions and features a modification to the levels and calculations (Abella, Ortiz-de-Urbina-Criado, De-Pablos- Heredero, 2019), as can be seen in Table 2.

24

OPEN DATA REUSE IV

Table 2. Dimensions and levels of MELODA 5

Dimensions (max. 61 points)

Levels

Licence (max. 6 points)

1: private use 2: non-commercial reuse 3: commercial or unrestricted reuse

Access to information (max. 6 points)

1: access to dataset via website or URL single parameters 2: single access to the website with parameters referring to individual data 3: API or query language 1: closed reusable standard or open non-reusable standard 2: open reusable standard 3: open standard, with individual metadata 1: own standardisation model 2: own standardisation model or published ad hoc (coordination)

Technical standard (max. 6 points)

Standardisation level (max. 10 points)

3: local standardisation 4: global standardisation

Geolocated content (max. 6 points)

1: without geographical information 2: simple or complex text field 3: with coordinates or complete geographical information 1: above one month 2: monthly: with update periods between 1 month and 1 day 3: daily: with update periods between 1 day and 1 hour 4: every hour: with update periods from 1 hour to 1 minute 5: in seconds: update period less than 1 minute

Data update rate (max. 15 points)

Dissemination (max. 6 points)

1: non-systematic communication/dissemination 2: available resources on updates (e.g. social media feeds) 3: proactive dissemination / push dissemination (automated information at certain times)

Reputation (max. 6 points)

1: no information about the reputation of the data source 2: statistics or reports are published based on the opinions of users 3: rankings or indicators based on the reputation of the data source

Source: Abella, Ortiz de Urbina Criado & De Pablos Heredero (2019: 6).

25

OPEN DATA REUSE IV

In version 5 of MELODA, some of the ideas proposed by experts have been examined and the assessment of each level has been revised. In this version, each level is assigned the value it has (1, 2, 3, 4 and 5). To evaluate the degree of reuse of each dataset, two measurements are used: 1) the sum of the scores obtained in each dimension, and 2) for each dimension a descriptive analysis of the frequency of each level is carried out. The first measurement will provide a ranking of datasets according to their degree of reusability; while the second allows us to have a more detailed image for each dataset and identify which dimensions need to

be improved (Abella, Ortiz-de-Urbina-Criado, De- Pablos-Heredero, 2019: 6).

Additionally, to classify the datasets based on MELODA 5, three categories of degree of reuse have been created (Table 3): from 8 to 23 points (the lower end is the sum of category 1 of the 8 dimensions) is inappropriate; from 24 to 47 points (the lower end is the sum of category 2 of the 8 dimensions) is basic; and from 48 to 61 points (the lower end is the sum of category 3 of the 8 dimensions) is advanced.

Table 3. Ease of reusability rating ranges in MELODA 5

MELODA 5 ranges

8-23

24-47

48-61

MELODE 5 category

Inadequate

Basic

Advanced

Source: Own authorship.

26

OPEN DATA REUSE IV

1.6 OBJECTIVE OF THE REPORT

survey was conducted with those responsible for the portals in order to analyse some of the characteristics and activities in relation to their open data. Specifically, a diagnosis was made of the portals’ knowledge of data reusers, the type of innovation that can be made with the published data, the activities to promote the use of data, the services generated, the creation of value from the reuse of data and the reputation of the portals. The latest version of the MELODA 5 metric has also been applied to analyse the degree of reuse of open data published on Spanish open data portals. All these analyses have allowed, by means of a SWOT matrix, a diagnosis of the opportunities and threats and of the strengths and weaknesses, from which a series of reflections have been included that can help to build future data management policies for the encouragement of business creation.

This report conducts a study on data reuse in Spain with the aim of presenting the current status of research and identifying guidelines and recommendations that help promote the use of data and generate business. It follows on from three previous reports carried out in 2017 (Abella, Ortiz-de-Urbina-Criado, De-Pablos- Heredero, 2017), 2019 (Abella, Ortiz-de-Urbina- Criado, De-Pablos-Heredero, Vidal-Cabo and Ferrer-Sapena, 2019) and 2021 (Abella, Ortiz- de-Urbina-Criado, De-Pablos-Heredero and García-Luna, 2021 in Spanish; 2022 in English) and analyses the changes that have occurred in recent years. To this end, open data portals in Spain have been identified in order to analyse a sample of the datasets they publish and some of the services generated. In addition, a

27

MetH odo logY

02.

OPEN DATA REUSE IV

• Use of a specific data publication and reuse tool: DMS 6 . • Number of published datasets. • Which of the published datasets are original to the portal and which are federated from other portals. • Identification of the autonomous community of the entity that publishes the portal or if it is part of a nationwide entity. • The mechanisms for contacting the portal manager (e.g. e-mail or web form). 2.1.1 Simplified maturity model for portals that publish data Following the same methodology as Abella, Ortiz-de-Urbina-Criado, De-Pablos-Heredero (2017), a simplified maturity model has been defined based on the model in the pan-European data portal initiative by Carrara, Nieuwenhuis and Vollers (2016), introducing the following elements for consideration: • The population of datasets exceeds 30 items. • The availability of a feed (RSS or equivalent) with data updates. • The availability of an application programming interface (API) to allow automated access to data by external users. • The use of a data management system (DMS). For the purpose of this task, the following tools have been considered: CKAN, Socrata, DKAN, OpenDataSoft, ESRI Open Data and AOC.

2.1 METHODOLOGY FOR

STUDYING PORTALS THAT PUBLISH DATA

345 data portals have been identified from the following sources: • Previous 2021 report on the state of open data in Spain (Abella, Ortiz-de-Urbina-Criado, De-Pablos-Heredero and García-Luna, 2021; 2022) • Data from data.gob via its list of initiatives 5 . • Open Administration of Catalonia Consortium • Complementary research by the team conducting the report Through the consolidation of these sources, the availability of each of them was validated one by one. In 33 cases it was detected that the portal was either unavailable (e.g. error 404) or did not publish data. In line with Abella, Ortiz-de-Urbina-Criado, De- Pablos-Heredero (2017), the following values were identified for each portal: • Availability of mechanisms for publishing data updates. • Availability of a catalogue of resources, number of available datasets and whether the catalogue is downloadable. • Existence of direct data connection mechanisms (API) or query language (e.g. SPARQL). • Availability of a portal where services or applications are identified based on the portal data and number of identified services.

5. https://datos.gob.es/es/iniciativas. 6. Data Management System.

29

Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10 Page 11 Page 12 Page 13 Page 14 Page 15 Page 16 Page 17 Page 18 Page 19 Page 20 Page 21 Page 22 Page 23 Page 24 Page 25 Page 26 Page 27 Page 28 Page 29 Page 30 Page 31 Page 32 Page 33 Page 34 Page 35 Page 36 Page 37 Page 38 Page 39 Page 40 Page 41 Page 42 Page 43 Page 44 Page 45 Page 46 Page 47 Page 48 Page 49 Page 50 Page 51 Page 52 Page 53 Page 54 Page 55 Page 56 Page 57 Page 58 Page 59 Page 60 Page 61 Page 62 Page 63 Page 64 Page 65 Page 66 Page 67 Page 68 Page 69 Page 70 Page 71 Page 72 Page 73 Page 74 Page 75 Page 76 Page 77 Page 78 Page 79 Page 80 Page 81 Page 82 Page 83 Page 84 Page 85 Page 86 Page 87 Page 88 Page 89 Page 90 Page 91 Page 92 Page 93 Page 94 Page 95 Page 96 Page 97 Page 98 Page 99 Page 100 Page 101 Page 102 Page 103 Page 104 Page 105 Page 106 Page 107 Page 108

Made with FlippingBook - Online Brochure Maker