4  Methods

This chapter presents some literature on computational text analysis and the implementation of it in this thesis. Text is in many ways the language of power. The production of it is a central part of governance, and the plans and knowledge produced is often used to justify political decisions. Until recently, access to these texts has been locked behind considerable time commitment through manual coding or other manual processes. The proliferation of digital policy documents presents an opportunity for researching beyond policy statements, and a way to get a better understanding of how an issue is understood and implemented across the world.

The first section explains the recent advances in computational text analysis and the use cases these studies have. The second section goes through the implementation of the structural topic model (STM) used to analyze the NAPs.

By systematically identifying what countries talk about and testing how category membership influences these discussions, the analysis can supplement closer readings of specific plans or ethnographic work. It also provides a quicker way of bridging the gap between critical theory and traditional policy analysis, as the tools, when already developed, are quick to use.

4.1 Computational Text Analysis

This section explores how computational text analysis, specifically structural topic modeling, can systematically examine discourse patterns in the National Adaptation Plans. It explains how topic modeling works with critical discourse analysis to reveal both what is said and what remains unsaid in policy documents.

Computational text analysis is a growing field, dedicated to getting insights out of very large amounts of text. As the NAPs are all very long policy documents, the approach might be well suited.

Traditional qualitative analysis of such a corpus would require months of manual coding and then potentially miss patterns that only become visible at scale. Yet this corpus contains critical insights into how climate adaptation is understood, framed, and operationalized globally. The challenge is to systematically explore these patterns without predetermined hypotheses, allowing the corpus to reveal its own structures before interpreting them through competing theoretical lenses.

Topic modeling, running algorithms that finds what words are most likely to be written together in a text, offers a systematic approach to exploratory analysis. Topic models enable researchers to analyze larger corpora, might reduce researcher bias, make analysis reproducible, and systematize larger amounts of text than a single person could process (Jacobs and Tschötschel 2019). Rather than replacing critical interpretation, topic modeling removes uninteresting details and noise from large numbers of texts to enable comparison across key themes.

Critical discourse analysis attempts to understand how meaning is created and shared through speech and text (Jacobs and Tschötschel 2019; Mullet 2018). It is especially concerned with the most “important”, dominating or hegemonic discourse, as that is an expression of power. Concepts only gain meaning in relation to other concepts, and this meaning emerges in the relationship between speaker and listener. Establishing who is the speaker and listener, who decides, in a corpus, is something discussed below (see Chapter 6). Crucially, discourse analysis recognizes that texts don’t merely express reality but creates its own. When countries describe their vulnerability and adaptation needs, they are not merely reporting facts as much as they are constructing themselves as particular kinds of subjects requiring particular kinds of interventions (see Chapter 3).

Topic models are well suited for critical discourse analysis, because it assumes that words only gain meaning in context and can have multiple meanings within the same corpus (Jacobs and Tschötschel 2019, 473). A term like “resilience” might signify community strength in one context and market integration in another. “Participation” could indicate genuine power-sharing or ritualistic consultation. The method doesn’t resolve these ambiguities but makes them visible for interpretation. This aligns with an understanding of critical discourse analysis as examining how discourse creates, maintains, and legitimizes social inequality, not through explicit statements but through the patterns of what can and cannot be said (Mullet 2018; Escobar 1995).

While traditional discourse analysis must analyze changes in discourse through specific events to establish hegemonic patterns, topic modeling can analyze discourse as it exists across an entire corpus without requiring a triggering event (Jacobs and Tschötschel 2019, 477).

A simple way of finding the dominating discourse is by looking for possible convergence. Recent research has looked at convergence in urban climate governance discourse (Westman, Castán Broto, and Huang 2023), albeit not computationally. They found that despite the large influx of new actors in the discourse, something they assumed should make the discourse more diverse, the discourse actually became more homogeneous.

The structural topic model (STM) extends basic topic modeling by incorporating metadata to test how document characteristics influence topic prevalence (Roberts, Stewart, and Airoldi 2016). This proves particularly relevant for the NAPs corpus, where metadata about income level, geographic vulnerability, regional grouping, and submission date might systematically influence how adaptation is discussed.

Structural topic modeling has proven valuable for analyzing political discourse in various contexts. Curry and Fix (2019) used STM to examine how American state high court judges use Twitter, discovering that judges behave differently from typical politicians, they express fewer opinions and share more personal content, but increase political messaging during election years (Curry and Fix 2019, 388). Genovese (2015) applied STM to Vatican communications, exploring how religious authorities engage in international politics. Using just a two-topic model distinguishing political from religious themes, they found that the Vatican strategically times different types of communications in response to world events (Genovese 2015, 2).

As topic modeling is a way to explore the latent topics in a corpus, the model should be tuned to the research question at hand (Jacobs and Tschötschel 2019). The NAPs are specific policy documents about the same general theme (climate governance) in the same genre (national plans). Thus, they need a different pre-processing than a corpus of tweets (Curry and Fix 2019) or papal political communication (Genovese 2015). A central trade-off for the analysis is how to handle acronyms, national names, and acronyms. For texts within the same genre addressing similar topics the natural number of topics tends to be lower (Jacobs and Tschötschel 2019, 474). Whether this limitation reflects genuine convergence around universal challenges or artificial constraint through institutional frameworks remains open for interpretation.

An alternative approach would have been to use sentence transformers or other embedding models that convert text segments into high-dimensional vectors, then apply clustering algorithms to identify thematic groups, similar to how large language models process text. Models like BERT transform sentences, paragraphs, or documents into dense vectors, strings of numbers like [0.23, -1.45, 0.67…] that represent positions in an abstract semantic space learned during training. Unlike topic models, which produce interpretable probability distributions (“this document is 30% about topic 1, 20% about topic 2”), these embedding vectors are opaque. Documents cluster together based on mathematical distance, but the individual dimensions are difficult to analyze.

The sentence transformer approach offers some advantages. It can handle multiple languages without translation, an important note as only 47 of 64 of NAPs are in English. The sentence transformers might also capture more subtle relationships. However, these embeddings have the same issues as other AI models, like Large Language Models (LLM). This “black box” problem makes it impossible to trace how language dominates discourse, identify which specific terms create convergence, or connect patterns to the concepts for a real discussion. But it also limits the analysis to English language plans, and excludes many countries.

4.2 Structural Topic Modeling

This section details the technical process of analyzing the NAPs, from document processing through model estimation to the statistical tests. It explains the choices made in preparing texts, setting parameters, and testing whether observed patterns reflect genuine constraints or random variation.

The data collection began by scraping the UNFCCC NAP Central website using the R package rvest (Wickham 2024). The website presents the NAPs in an HTML table format, which required systematic extraction of country names, submission dates, and PDF download links. The scraping function first retrieved the webpage content, and then parsed it into a data frame. The extraction process handled several complexities: some countries had multiple submissions, dates needed parsing from various formats into standardized ISO format, and PDF links required validation to ensure they pointed to accessible documents.

The scraped data was separated into two structures. First, a tokens dataset containing document IDs and PDF links for the text extraction pipeline. Second, a metadata dataset containing document IDs, country names, and submission dates for the analysis. Error handling was implemented to manage connection timeouts and missing data, with failed extractions logged for manual review. The final scraped dataset contained all the English language plans with complete metadata. This automated approach ensured reproducibility while maintaining flexibility to handle the irregularities common in web-based document repositories.

The preprocessing pipeline began by standardizing the 47 English-language NAPs into comparable analytical units. Initial processing retained documents containing 2 030 660 tokens representing a vocabulary of 114 512 unique terms. Standard text preprocessing steps, lowercasing, punctuation removal, and number removal, were applied uniformly (Roberts, Stewart, and Tingley 2019). By stemming the words, removing their ending to have fewer words (e.g., “adapt,” “adaptation,” “adapting” → “adapt”), the vocabulary was reduced, while the meaning of them to a large degree was kept. This choice prioritized thematic coherence over lexical precision, accepting that some meaning distinctions would be lost (e.g., “adaptation” versus “adaptive” collapsed to “adapt”) in exchange for more robust topic identification.

Despite these steps, early models were very influenced by noise, such as “——-” and other artifacts of the PDF quality. This was solved by first setting the minimum word length at three characters and then excluding words that appear in fewer than 10% of the texts (23) or more than 80% (179).

The early topics were very influenced by one single country. This was solved by defining stopwords, words that do not carry semantic value for our analysis, based on country names as the plans referred to themselves. These geographic stopwords were generated using the countrycode package to extract country names and add demonyms, preventing the emergence of topics defined purely by national identifiers (Arel-Bundock, Enevoldsen, and Yetman 2018). This removed 305 geographic terms that would have dominated topic distributions given the documents’ frequent self-references.

Because of the different length of the documents, the larger documents influenced the topic distribution, contributing to the topics being very national. The segmentation algorithm targeted approximately 200 segments across the corpus, calculating optimal segment length dynamically based on total words. Documents shorter than 50 words remained intact, while longer documents were split at approximately 10 154-word intervals, creating 223 analytical segments that document metadata while enabling a more fair topic allocation across countries regardless of their plans’ length.

To enable the statistical testing, the document metadata took another path than the text data. I matched the country on the UNFCCC website (to the standard world bank names, to later match these entries to the other datasets I needed for the metadata. To get the UN classifications for Land-locked Developing Countries (LLDC) and the Small Island Developing States (SIDS) I got the list from the UN website. By scraping the SIDS-list and LLDC-list, I made sure to always have the most up-to-date data. For the income and region data, the World Bank statistics package was used and added the standardized region and income level (Piburn 2020). The country names were then matched this back. The complete metadata can be seen in Table 4.1.

id name region income iso3c year time geo
nap_001 Albania Europe & Central Asia Upper middle income ALB 2021 Middle Other
nap_002
Table 4.1: An example of the metadata collected for a country

The metadata was then matched with the segmented text, and both were fed into STMs function to prepare a Document Frequency Matrix (dfm). This is simply a data structure for of each segment, the metadata and a count of how many times a word is mentioned in that segment (Roberts, Stewart, and Tingley 2019).

A central reason for choosing the STM approach is that it makes it possible to pass these metadata variables to the model, through a prevalence formula ~ global_category + income_level + region + geography + time_period. This tells the model to also model the relationship between the variables, enabling us to test the hypotheses (Roberts, Stewart, and Tingley 2019).

Category Subcategories
Income Low income, Lower-middle, Upper-middle and High income
Region Europe & Central Asia, Latin America & Caribbean, South Asia, Sub-Saharan Africa, East Asia & Pacific, Middle East, North Africa, Afghanistan & Pakistan
Geography SIDS, LLDC and Other
Time Early (-2019), Middle (2020-2022), Late (2023-)
Table 4.2: The categories and subcategories used in the STM analysis.

The last step before running the model is to decide on how many topics the model should output. After running a wide variety of topics, 8 was decided, as it had high quality and distinct topics while keeping the amount of topics to analyze and present in Chapter 5 as low as possible. This might have traded off some granularity (Roberts, Stewart, and Tingley 2019), although a larger number of topics would need to be grouped to be analytically useful, making more topics redundant.

Model estimation employed variational expectation-maximization with 61 iterations, using spectral initialization for stability across runs (Roberts, Stewart, and Tingley 2019). After the model had been run on the 223 it was aggregated back to document level. Here, I averaged the topic proportions across the segments for each document, getting a topic proportion for all topics, per plan.

To better understand the topics, I implemented two more metrics. First, I reviewed the top 5 FREX (frequency-exclusivity) metric identified terms that were both common within topics and distinctive across them, for all the 8 (Roberts, Stewart, and Airoldi 2016). The most distinct one was picked as the name of the topic. Then, to get a better understanding of the topic distribution, all topics with more than 7% proportion in a document were counted.

For the group testing, the calculation proceeded in four steps. First, for each category, the mean topic distribution is extracted across all documents in that group, creating a characteristic discourse profile. Second, the top 3 topics by proportion are identified for each group. Third, the proportion of total discourse of these topics added together. Fourth, this raw score is normalized against a uniform baseline where discourse distributes equally across all topics, producing a 0-1 scale where 0 represents perfect distribution and 1 represents complete concentration. This normalization ensures that values have consistent meaning regardless of the number of topics in the model, with a value of 0.5 representing 50% more concentration than expected under uniform distribution.

Establishing whether observed these patterns actually carry any weight beyond being random variation, I use the built-in functionality in STM. Here, STM runs a series of regressions to estimate the effect that group membership has on the topic proportions in the groups (Roberts, Stewart, and Airoldi 2016).

An important methodological consideration involves the aggregation of subcategory results to category-level estimates. To facilitate interpretation and comparison across the three categorization schemes, effect sizes from individual subcategories (e.g., each income level) were averaged to create category-level measures (e.g., overall income effect).

This approach treats all subcategories equally regardless of sample size or internal variance, prioritizing interpretative clarity over statistical precision. While this simplification may mask heterogeneity within categories, it enables direct comparison of whether income, region, or geography most strongly influences adaptation discourse patterns.

The analysis tests whether category membership significantly predicts the prevalence of that category’s dominant topics, with effect sizes indicating the strength of group-level constraints on discourse. High effect sizes suggest limited epistemological autonomy. Countries within such groups converge on similar discourse patterns, while low effect sizes indicate greater flexibility for countries to pursue diverse adaptation framing within their group.