DASHBOARDS FOR INPUT-EVALUATION OF POLICY PROGRAMS: LESSONS LEARNED FROM AN ANTWERP DASHBOARD FOR GARDEN STREETS

There is an ever-growing trend to pursue policies based on evidence-based and data-driven program evaluation research. In order to facilitate such evaluation research, electronic dashboards are increasingly used for translating sources of big and unstructured data into low-level summary visualizations understandable by layman policy-makers. In this paper, we report on the dashboard development process for an input-evaluation of new garden streets in the city of Antwerp. During this process, different lessons were learned. First, developers should start from a clearly defined policy question and analysis units in order to optimize the development process. Second, different types of key performance indicators exist, which should also be well-defined in advance so that appropriate data can be collected. Third, a dashboard should not be restricted to purely objective data-analyses but may also include features that facilitate subjective evaluation guided by assumptions and believes of the dashboard-user. These lessons helped us to make the dashboard requirements of Antwerp more concrete. Likewise, they may help other policy supporting dashboard developers to optimize their development processes.


INTRODUCTION
The ubiquitous introduction of new information and communication technologies (ICT) over the last decades brought in new promising tools that facilitate evidence-based policy making (Head, 2008, Janssen, Helbig, 2018, Ruppert et al., 2013. Such policy making starts from empirical and data-driven evaluation research about the context, the need, the impact and the effectiveness of different policy programs (Rossi et al., 2018, van Veenstra, Kotterink, 2017, Khan, Rahman, 2017, Jann, Wegrich, 2017, Stufflebeam, 2012. One tool that is increasingly used for such program evaluation is the electronic dashboard (Sarikaya et al., 2018, Bartlett, Tkacz, 2017, Kohlhammer et al., 2012.
A dashboard is defined as "a visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen so the information can be monitored at a glance" (Few, Edge, 2007, p.1). Put differently, dashboards allow translating big sources of complex and unstructured information into user-friendly, clear, understandable, efficient and low-level visualizations (Lin et al., 2018, Brath, Peters, 2004. As a result, they enable layman policy makers like managers, politicians or citizens to make evidence-based and data-driven program evaluations of different policy programs without particular expertise in datascience or scientific research methods (Janssen, Helbig, 2018, van Veenstra, Kotterink, 2017, Matheus et al., 2018, Höchtl et al., 2016, Kohlhammer et al., 2012.
However, evaluation research can take place at different phases along the regular policy cycle (Stufflebeam, 2012, Rossi et al., 2018; see Figure 1). First, in the agenda-setting phase, context evaluation refers to collecting evidence about societal needs, problems, assets, opportunities and resources to support the need for political action and to grasp the attention of important stakeholders. Second, in the decision making phase, input evaluation refers to collecting evidence about the expected efficiency, feasibility and public support of competing policy programs so that the most optimal program can be selected (Höchtl et al., 2016). Third, in the implementation phase, process evaluation refers to collecting evidence in order to assure that a concrete program is implemented properly and that no unintended side-effects arise (Khan, Rahman, 2017). Last, during the impact assessment phase, product evaluation refers to collecting evidence about whether the selected policy program tackled the policy problem at stake, to what extent it met the targeted needs, whether it outperformed competing policy programs and whether it didn't introduce unintended outcomes. Because different types of evaluation questions arise at different stages of a policy-making process, such process also requires different types of dashboards. As a consequence, it is important to clearly define the main goals of a program evaluation task well in advance and to meticulously translate these goals into appropriate data collection strategies and dashboard design instructions (Sarikaya et al., 2018, Yigitbasioglu, Velcu, 2012, Bartlett, Tkacz, 2017. Within the current literature, dashboards for context and process evaluation have already been described through different applications. Context evaluation dashboards mainly include general data overviews describing the general context of policy problems. Examples of such dashboards are given by, among others, the numerous city dashboards like the Dublin Dashboard or the London Dashboard , Bartlett, Tkacz, 2017. These dashboard cover a wide range of information about the respective cities including economic measures, environmental information or cultural activities. Process evaluation dashboards, in turn, focus on real-time information to follow-up certain policy program implementations. An example of such a dashboard is the Centro de Operacoes Prefeitura do Rio in Rio de Janeiro (Kitchin et al., 2015). This dashboard covers real-time monitoring of traffic and public transport, municipal and utility services, emergency services, weather feeds, and information to enable quick actions and decisions in day-to-day city operation.
Dashboards for input and product evaluation, in contrast, focus on a comparison of competing policy program proposals and an evaluation of a fully implemented program respectively. The development of such dashboards is less well described in the existing literature. This paper aims to fill this gap by reporting on our experiences with the construction of an input evaluation dashboard for the city of Antwerp. This dashboard should help the city administration, politicians and citizens choosing the locations of new garden streets in order to fight problems caused by climate change. To our knowledge, no publications exist that explicitly discuss the steps for developing such a dashboard for input evaluation.
The paper is structured as follows. In the next section, we introduce the CUTLER project which included the construction of the Antwerp garden street dashboard as one of its use-cases. The third section shortly describes the adopted methodologies in the creation of the dashboard development process. The fourth section discusses our experiences with the dashboard development process itself and provides initial guidelines for such a process. We end the paper with some points of discussion for future research and development.

THE ANTWERP CULTER CASE
CUTLER (Coastal Urban developmenT through the LEnses of Resiliency) is a research project funded by the EU Research and Innovation program. Its objective is to shift the practice of policy-making by intuition towards a practice of policy-making by data-driven empirical evaluation research. In order to do so, it aims to establish development processes for city dashboards showing evidence about economic, environmental and social consequences of policy programs. The evidence is meant to be used within decision making processes by informing, advising, monitoring, evaluating and revising decisions made by urban planners and policy makers. The project involves academic, governmental as well as private project partners.
The CUTLER project includes four pilot cities, each with their own particular policy questions. One of these cities is the city of Antwerp in Belgium, which faces several challenges due to climate change like increasing periods with heavy rainfall, increasing periods of heat waves, longer periods of drought and lowering groundwater levels. To address these challenges, the city is working on a strategic Urban Water Plan with an integrated policy strategy regarding water management in order to protect the city against the effects of future floods. Within the CUTLER project, the city aims to bring together data of different sources and to visualize these data in dashboards in order to help policy makers defining new evidence-based urban development programs.
One part of the Urban Water Plan is the construction of garden streets at different locations in the city. Garden streets are streets where the amount of paved surface is reduced and the amount of green space and vegetation is increased (see Figure  2). Such streets are expected to lower the risk for sewer system saturation and flooding. Additionally, a garden street also lowers the risk of heat stress because of the reduced amount of pavement and the increased volume of trees and greenery. It is also expected to have a positive effect on the economic and social resilience of a by a reduced risk for damage and an increased appeal to residents. Nonetheless, the budget for garden streets is limited while their effectiveness may strongly vary between different locations. As a result, the city faces the difficult task to choose a limited number of locations where streets will be transformed into garden streets. Because the implementation of a garden street at each individual street in the city can be seen as a separate possible policy program, the city thus requires an input evaluation about the effectiveness, feasibility and desirability of these hypothetical garden streets in order to find the optimal locations.

METHODS
In order to derive dashboard construction guidelines in line with the policy questions and program evaluation goals of the city of Antwerp, we followed a grounded theory approach (Strauss, Corbin, 1997) based on findings derived from the CUTLER project. Relevant information was primarily retrieved from enduser research through both qualitative evaluation workshops as well as quantitative personal questionnaires. First, co-creation sessions were organized with different stakeholders including professional civil servants and project leaders as well as regular citizens. The input of these workshops was used to develop a first crude version of a general dashboard for program evaluation. Next, training workshops were organized in which endusers were taught how to use the dashboards based on at least two predefined usage scenarios. After some weeks, reflection workshops were organized in which users discussed their experiences with the dashboards regarding the usability and usefulness. Additionally, these end-users were also invited to evaluate the dashboards by two questionnaires. One questionnaire included general questions about the entire dashboard, while the other questionnaire included questions about each particular widget in the dashboards.
The research findings of the end-user workshops and questionnaires are extensively described in project deliverables (see https://www.cutler-h2020.eu/deliverables/). Next, different technical project partners also produced a set of technical deliverables. All these deliverables were synthesized through content analysis and used as input for the creation of a new wireframe for the Antwerp dashboard (see Figure 3). This wireframe will eventually result in a dashboard for the input evaluation of the locations of new garden streets. Within this paper, the development process of this wireframe is discussed.

The Policy Question
In order to efficiently develop a dashboard for policy evaluation research, the central policy question should be clearly defined up front. Indeed, the first problem that arose while constructing the Antwerp dashboard in collaboration with technical partners was the lack of a clear policy question. At the start of the CUT-LER project, the Antwerp use case merely focused on the broad theme of climate adaptation policies without further elaboration. Along the process of the project, discussions between the city administration, politicians and professionals resulted step by step in the concrete policy question at stake, that is 'where to install new garden streets?' Nonetheless, during this process, the technical project partners already started collecting, analysing and visualizing data within dashboards. Because of the lack of a clear policy question, many of these developments were considered barely useful by the targeted end-users later on.
Policy questions that require input evaluation can be recognized by the presence of concrete policy program proposals, but also by the pending need to select one of these programs for concrete implementation. Such a question is clearly raised by the city of Antwerp because it wants to select the optimal policy program among different alternatives, that is the most optimal location among different locations for new garden streets. Once a policy question is clearly identified as a question for input evaluation, the required evaluation research methodology should be apparent as illustrated below.

The Analysis Unit
Because the policy question was not yet well defined in the Antwerp case, the second problem we stumbled upon was the lack of a clearly defined analysis unit. Nevertheless, a clear definition of an analysis unit is crucial for the construction of a dashboard for input evaluation. After all, input evaluation boils down to a direct comparison between competing policy programs so that policy makers can easily select the optimal program for implementation. Put differently, input evaluation requires analyses where the analysis units are defined by the different programs under consideration. Within the Antwerp case, the analysis units of the input evaluation dashboard were the streets in the city. Indeed, the main goal of the input evaluation was to select the street that best qualifies to become a garden street.
Input evaluation firstly requires information that is available on the level of the analysis unit. Put differently, in the Antwerp case, information should be collected that can be analysed on the level of the streets. Nonetheless, within the initial rudimentary CUTLER dashboard for Antwerp many widgets were included that did not start from streets as analysis units. As a results, these widgets did not contribute to the input evaluation goal and prevented users from making informed decisions about where to install new garden streets. As an example, the dashboard included several widgets showing measures of precipitation, sewer water levels and ground water levels from a limited number of sensors installed across the city (see for example Figure 4). Even though these sensors create context for the evaluation research, their limited number hardly allowed users to extrapolate data to specific streets. As a consequence, users could hardly use these data to compare potential garden streets across different locations within the city.
In sum, a clearly defined analysis unit will help dashboard developers to determine which data should be collected and how these data should be shown in dashboard visualizations. Indeed, other evaluation research questions may require different analysis units. For example, the city of Antwerp might also have required an input evaluation of different types of climate adaptation programs next to garden streets, like green roofs, smart wells or smart fountains. In that situation, the analysis unit should have been defined differently and different forms of data should have been collected and visualized.
It should also be noted that a general definition of the analysis unit does not necessarily imply a concrete determination of different analysis units in particular. For example, the city of Antwerp does not request a dashboards that compares a fixed set of streets but requires an interactive web application that allows dashboard users to select streets themselves. More specific, the user should be able to define polygons for different areas across the city that might be transformed into garden streets (see Figure 5). This polygon subsequently defines the total surface of the area, which will be used for an automatic calculation of costs and benefits. Put differently, even though it is clear that the dashboard will be used to compare different streets, it is still up to the dashboard user to select specific streets for evaluation and comparison.

Ordinal Key Performance Indicators
Once the analysis unit of the input evaluation dashboard is well defined, the next step is to define the key performance indicators (KPI's). Such a KPI is a unidimensional indicator that allows ranking the competing policy programs from most preferable to least preferable according to their expected performance. The top ranked program may subsequently be chosen for implementation. Within the Antwerp case, the KPI was defined as the financial profit of each garden street over 25 years.
Also here, it is crucial to provide a clear algorithm that links the different analysis units to the performance indicator. In the Antwerp case, the KPI was estimated by subtracting the simulated expected costs of constructing a garden street at a specific location from the simulated expected benefits of that garden street. The direct construction costs per squared meter were calculated for three different types of garden streets, that is a garden street without extra water retention below surface (type 1), a garden street with extra water retention below surface (type 2) and a garden street with water retention below surface and innovative  . The initial Antwerp dashboard included several widgets showing measures of precipitation, water levels in sewers and ground water levels from sensors at particular locations across the city. Even though these widget create context for the input evaluation goals of the city, their limited amount hardly allowed for comparing potential garden streets. solutions above surface (type 3). As a result, the total construction cost of each type of garden street can be derived for each proposed street in the dashboard, based on the surface of the drawn polygon. Likewise, the city also modelled the expected direct financial benefit per squared meter per 25 years for each type of garden street within the different hydrological catchments of the city (see Figure 6). These expected direct financial benefits are modelled based on the history of flooding risks and direct and indirect damage caused by flooding.
Note that the relative benefits are not provided on the street level but on the level of hydrological catchments. However, because each proposed garden street can be located within one single hydrological catchment, this information can be extrapolated to the street level. Nonetheless, this choice means that streets falling within the same hydrological catchment cannot be compared because the difference in benefits and profit will solely depend on the total surface of the proposed streets. This was a pragmatic trade-off that had to be made, because the city lacks precise information about flooding risks and damage on a street level. Nevertheless, streets in different hydrological catchments can be compared, which still made the dashboard useful for the  Figure 6. The direct benefits of proposed garden streets can be derived by the modelled financial benefit for each type of garden street within the different hydrological catchments of the city.
Also note that the algorithm for the calculation of the KPI is rather static in the Antwerp dashboard. Apart from the location and the surface of the proposed garden streets, the user has no possibilities to adapt the predictive model based on his or her own assumptions. As an example, the benefits are predicted under the assumption of an average lifespan of 25 years for garden streets. The user might want to adapt this assumption based on his or her own experience so that streets can be compared under shorter of longer lifespans. Likewise, in line with his or her convinces, the user might want to give more relative weight to flooding risks and flooding damage respectively in the algorithmic calculation of the benefits. Increasing the weight of flooding risks would favour streets that minimize the future risk of flooding, while increasing the weight of flooding damage would favour streets that minimize the expected future damage. Such interactive features will be considered in a second round of dashboard development.
Once the KPI is defined, choices should also be made about the way the KPI is implemented in dashboard features. The optimal visualization of a KPI for input evaluation probably is a data table in which all policy options are listed in rows and can be ordered along the KPI (see Figure 3). Such a table can also contain additional information about the analysis units such as the variables that are used to calculate the KPI. Of course, this table can be supplemented by other dashboard visuals such as maps or graphs. For example, in the Antwerp dashboard, a map is also provided with pointers for the proposed streets and overlay polygons for the estimated benefits per hydrological catchment (see Figure 7).

Binary Key Performance Indicator
The expected profit of garden streets is an ordinal KPI because it allows ordering the different proposed policy programs from least to most preferred. Nonetheless, KPI's can also be defined in a binary way. By this we refer to indicators that discriminate between policy programs that are eligible for implementation and programs that are not eligible without further ordering. Put differently, binary KPI's immediately rule out several program options for further investigation.
Within the Antwerp dashboard, binary KPI's are firstly given by the possible impact of garden streets on mobility and traffic management. After all, the city will not install garden streets on principal roads and important thoroughfares. As a result, the dashboard also requires showing information about the traffic functionality of streets in order to make a classification of streets that are eligible and streets that are not eligible to become a garden street because of traffic constraints (see Figure  8).
Eligible Ineligible Figure 8. The Antwerp dashboard should include information about the mobility function of streets because only local streets are eligible for transformation to garden streets.
Next, the city also prefers to prioritize garden streets in areas with high risks for flooding, flooding damage, heat stress and low amounts of green space. On top of that, the city wants to investigate the indirect impact of garden street proposals on social interactions, health or common welfare. For example, the installation of garden streets may have an impact on housing prices and this may lead to gentrification, which should be minimized. However, all these requirements are difficult to translate into one single unidimensional KPI. As a consequence, it was decided that the dashboard user him-or herself should be able to flag the feasibility of each individual proposed garden street by his or her own subjective evaluation of the criteria above. Such a flag would act as a second binary KPI.
The flagging of garden streets by the dashboard user illustrates that policy decisions always include a subjective component. Indeed, even though policy supporting dashboards are primarily developed for creating more objective, evidence-based and data-driven policy making processes, final decisions made by policy makers will always rely to some extend on subjective interpretations (Höchtl et al., 2016, Potancok, 2019. Moreover, also the collection, analysis and presentation of data strongly depends on subjective choices made by dashboard developers (Kitchin et al., 2015. For that reason, input evaluation dashboards should not aim for completely objective and automated policy decision-making but may include interactive tools allowing users to change input parameters and to refine analysis results according to their own believes and assumptions (Ruppert et al., 2013). It is up to the dashboard developer to find the optimal balance between objective information and subjective user-input.
Binary KPI's can be integrated into input evaluation dashboards by including flagging features in the data-table. For example, in the Antwerp dashboard, selection boxes will be added to mark streets that are considered interesting places for new garden streets (see Figure 3). However, because such decisions rely on subjective evaluations of the users themselves, the city also asks for an additional functionality that forces users to input a small argumentation about their choices.

Context Information
Next to the KPI's, a dashboard for input evaluation can also include additional context information. Such information is not directly used in the assessment of the different policy programs but may help the user to make better informed decisions overall. Indeed, a regular policy cycle is not a sequential process but rather an iterative loop between different phases (Ruppert et al., 2013, Jann, Wegrich, 2017. As a consequence, input evaluation in the decision phase may also provide new insights for decisions at other phases, and vice versa (Jann, Wegrich, 2017). As a result, a dashboard for input evaluation may also include features that are useful at other phases of the policy cycle.
In the Antwerp dashboard, for example, the city also demanded to include real-time information from different sensors across the city. These sensors include rain gauges, flow rate sensors in the sewer system or ground water sensors. Combined with information about historical flood events, these sensors provide information about the overall water balance system and may advise the user about the amount of garden streets that should be installed overall without providing further information about the exact location of these garden streets. This is crucial information in the process of establishing program priorities within the broader scope of the Water Management Plan. Indeed, this information may help the user to make decisions about the implementation of other policy programs next to garden streets. Additionally, the sensors will also be used to assess the impact of implemented garden streets later on during process and product evaluation within the implementation and impact-assessment phases of the policy cycle.

DISCUSSION AND CONCLUSIONS
The development of the Antwerp dashboard within the CUT-LER project was a process of trial and error. During this process, different lessons were learned, which can be used as guidelines for other input evaluation dashboards.
First, the development of a supporting dashboard for policy evaluation should start from a well-defined policy question (Bartlett, Tkacz, 2017, Few, Edge, 2007 Second, an efficient development process for input evaluation dashboards starts from a clear definition of the analysis unit.
For an input evaluation, this boils down to clearly knowing which policy programs should be compared. Nonetheless, it may remain up to the dashboard user to determine the exact analysis units to be shown and analysed in the dashboard.
Third, once the analysis unit is defined, clear key performance indicators (KPI's) should be specified that allow the policy makers to make a choice among the policy programs. These KPI's, in turn, determine which data should be collected, analysed and shown in the dashboard. KPI's can be ordinal or binary. Ordinal KPI's rank programs from most to least preferred, while binary KPI's merely discriminate between programs that are eligible and not eligible for implementation. Nonetheless, the dashboard developer and user should always bear in mind that a policy process may also include a subjective component. This subjective component can be facilitated by interactive features in the dashboard that allow the user to change model parameters or program selection criteria based on his or her personal assumptions and believes.
Fourth, an input evaluation dashboard may also include context information which is not directly used for the input evaluation research. Because the policy cycle is an cyclic process iterating between different types of evaluation research, such context information may be useful at other phases within the cycle.
All four lessons helped us to make the requirements for the Antwerp dashboard more concrete. As a consequence, they may be used as a first broad set of guidelines about developing dashboards for input evaluation of policy programs. Nevertheless, the Antwerp CUTLER dashboard merely provides one example of an input evaluation dashboard. For that reason, the guidelines formulated in this paper are probably limited and should only be considered as a starting point for other input evaluation dashboard development processes. Other researchers and developers may come up with more or alternative guidelines based on their own experiences in order to make dashboard development processes even more efficient. Also, the guidelines still need to be confirmed by a proper overall user evaluation study, which is planned at the end of the CUTLER project.
Additionally, the guidelines were developed using a rather ad hoc methodological process. The dashboard was mainly constructed through open discussions with domain experts and evaluated using a general evaluation procedure. Future research may focus on developing sound methodological processes for collecting stakeholders' expectations in a structured way from the start of a project and for translating these expectations into efficient dashboard construction guidelines.
To conclude, next to dashboards for input evaluation, the literature also still shows gaps regarding dashboards for context, process or product evaluation that arise during the agendasetting, implementation and impact-assessment phases respectively. Future research may also focus on dashboards development processes for such types of evaluation research. Further, it should be noted that different versions of the policy cycle theory have been formulated in the literature including additional phases other than agenda-setting, decision, implementation and evaluation (Ruppert et al., 2013). Such alternative policy cycles may also inspire policy evaluation dashboard developers to come up with new guidelines.