Insular policy feedback: Why reforming metrics alone may not be enough

April 09, 2026 • Science & Society • 1 comment • 5 min read

Metric-driven research systems persist because governance structures often matter more than metrics themselves.

Calls to reform research assessment are gaining momentum. Initiatives such as the Coalition for Advancing Research Assessment (CoARA), the Declaration on Research Assessment (DORA), the Leiden Manifesto, and broader responsible metrics movements all point to the same concern: current evaluation systems rely too heavily on simplified indicators that fail to capture the complexity of research and, in some cases, lead to unintended or adverse effects.

Yet a key question remains underexplored: Why do these metric-driven systems persist, even when their limitations are widely recognised? Our study addresses this question through the Lithuanian case.

“When a measure becomes a target, it ceases to be a good measure.” This adage, known as Goodhart’s Law, lies at the heart of our recent study. In our paper in Science and Public Policy, we examine Lithuania’s performance-based funding system to understand why metric-heavy systems persist despite their well-documented drawbacks.

Focusing on the period 2005–2022, we introduce the concept of insular policy feedback to capture how research evaluation is shaped not only by metrics, but by their interaction with governance structures and strategic behaviour. Based on policy documents, bibliometric data, and 57 semi-structured interviews, we show how a concentrated scientific elite—operating simultaneously as researchers, evaluators, and policymakers—shapes evaluation criteria to its own advantage, generating predictable cycles of gaming and reactive countermeasures.

From evaluation tool to system driver

Performance-based funding systems are typically introduced to improve research quality and accountability by linking funding to measurable outputs. In Lithuania, as in many European countries, this meant prioritising publications in journals indexed in so-called international databases, such as Web of Science or Scopus.

At the level of policy design, this appears coherent. But once implemented, indicators do not remain neutral tools: they become targets for optimisation. Universities align internal incentives with evaluation criteria, and researchers adapt their publication strategies accordingly.

In Lithuania, the push for WoS-based evaluation began in the mid-2000s, but from 2009 onwards, a number of domestic university journals obtained WoS indexing. The inclusion of these journals allowed institutions to increase their “international” outputs under national evaluation criteria even though they remained domestically produced. Policymakers responded by introducing stricter quantitative metrics. As a result, evaluation criteria began to reconfigure the very practices they were meant to measure. Since the beginning, the system has been not just evaluating research quality but actively shaping how researchers and institutions respond to incentives.

Policy reaction and the limits of control

Faced with these developments, policymakers introduced increasingly complex corrective measures, including stricter bibliometric thresholds and, eventually, lists of “suspended journals” whose outputs would not count towards funding.

These interventions did not resolve the issue; instead, they generated further adaptation strategies. Institutions recalibrated by, for example, introducing financial incentives tied to publication in higher-ranked (e.g., Q1 and Q2) foreign journals, while researchers shifted towards new publication venues. Rather than stabilising, the system continued to evolve through a sequence of policy interventions and strategic responses.

Such evolution highlights a central tension in research evaluation: attempts to control behaviour through metrics often produce new forms of behaviour that require further control.

Who shapes the system?

A critical feature of the Lithuanian case is the role of scientific elites. In a relatively small research system, senior scholars often operate simultaneously as active researchers, institutional leaders, evaluators, and policymakers. This concentration of roles creates a tightly coupled governance structure in which evaluation criteria are defined, applied, and navigated by overlapping groups of actors. Such arrangements blur the boundaries between rule-making and rule-following. As a result, policy changes do not simply act upon the system: they are absorbed and reshaped from within. This helps explain why reforms that focus only on indicators often have limited effects.

Why metrics persist: The role of justification

To understand why metric-based systems persist, we need to look beyond incentives and examine the underlying logic of policymaking. Research funding decisions must be justified. The policies must appear transparent, fair, and defensible, especially when public resources are at stake.

Metrics fulfil this function by transforming complex and uncertain phenomena into quantifiable outputs. In doing so, however, they also reduce uncertainty, ambiguity, and disagreement into standardised forms of “evidence.” This has two major consequences: first, it enables governance at scale; second, it narrows what counts as “valuable” knowledge. What cannot be easily quantified risks being marginalised—not necessarily because it lacks value, but because it resists aggregation.

Beyond better metrics

Current reform initiatives often focus on improving indicators: diversifying metrics, incorporating qualitative elements, or reducing reliance on journal-based measures. These are important steps.

However, our findings suggest that metrics alone are not the core issue. Equally important are who defines the criteria, how decisions are made and justified, and how power is distributed across actors. Without addressing these questions, new indicators may simply reproduce existing dynamics in different forms.

A system that adapts… but to what?

The Lithuanian case is sometimes held up as evidence that research systems can adapt to policy pressures. That is true, but the more important question is what they are adapting to. Adaptation in Lithuania is highly internal to the system.

When the same people design the rules and play by them, the system responds to policy changes almost immediately, and in ways that serve existing interests rather than broader reform goals. In larger, more differentiated systems, the distance between policymakers and researchers provides at least some buffer against this dynamic. As a result, researchers and institutions become particularly effective at responding to policy-defined journal lists, indicators, thresholds, and rules—often more effectively than to broader goals of research quality. Instead of furthering those goals, this responsiveness may reinforce a cycle in which metrics shape behaviour, behaviour reshapes policy, and policy redefines metrics.

Understanding this recursive dynamic is essential for moving forward. If current reform efforts aim to create more responsible and context-sensitive evaluation systems, attention must shift from metrics alone to the structures that give them meaning and power. Without this shift, new indicators may not change the system. They may simply give it new ways to reproduce itself.

Header image by Saulius Žiūra, used with the author’s permission.
DOI: 10.59350/q6821-k6v86 (export/download/cite this blog post)

1 Comment

Swapan Taneja

Thank you for this - the governance lens is exactly what's been missing from the conversation around assessment. The observation that evaluation criteria are "defined, applied, and navigated by overlapping groups of actors" articulates the structural problem that DORA/CoARA commitments alone can't resolve.
Other assessment-heavy domains have resolved this through public rubrics that constrain assessor discretion, independent assessors without conflicts of interest, and structural separation between the evaluated and those designing criteria. Lithuanian research evaluation, as you show, lacks all three.

I agree that the justification vacuum around qualitative assessment is what keeps metrics entrenched. The Leiden Manifesto's principle that quantitative evaluation should support qualitative expert assessment assumes that assessment layer already exists and is justifiable - but as your findings demonstrate, it often doesn't.

I'd be curious whether you see the governance dynamics you've identified as inherent to small research systems, or whether larger systems simply mask the same insular feedback loops behind more diffused structures.

Add a comment

RegisterForgot password?

Eleonora Dagiene

Ludo Waltman

Guus Dix

Vincent Larivière