Recent work in the study of multiple networks has provided new insights into the dynamics of competition between complex systems [1] and the importance of individual nodes [2,3] in each sub-network. To study the importance of a National Innovation System in the global economy, we represent the global patent citation network as a multiplexed network of individual national economies, and model the technological spillovers due to innovation originating in a specific national economy as a random walk between patent classes. We impose a loss term [4] or discount rate [5] on the walk whose value depends on whether the information is propagating within the originating country or circulating outside of it (diagrammed in Figure 1). The loss term acts as a bias against border crossing, where information that benefits a foreign economy is assigned an elevated probability of becoming “lost” and incapable of benefiting the domestic economy in the future. We define an expression for the centrality of domestic patent classes similar to PageRank [2,4] that varies continuously from a global measure of centrality (where there is no bias against information passing into a foreign economy) to a domestic measure (which is maximally biased against border crossings) through a single model parameter. Using this approach, we find that pharmaceutical patents are increasingly central for each domestic economy.

Figure 1(A) shows a schematic diagram of a multiplexed citation network, where the citation networks of national technological classes are interdependent (dashed lines). Figure 1(B) shows a schematic random walk on the network in (A). From the domestic perspective, information sharing across a border is less beneficial than within the domestic economy and in our model the teleportation probability increases when information moves out of the domestic sub-network and into a foreign network.

Table 1. Centrality of bio-pharmaceutical patents

Using our measure of centrality, we determine the most central classes for a variety of individual nations in different years and at differing levels of technological aggregation. We find that globally central classes typically differ from domestically central classes, and show that these differences reflect meaningful structural features of the global network and of single national systems of innovation.

Our measure of centrality is typically a better predictor of sectors yearly value-added than other common measures over a wide range of model parameters, and it is also positively correlated with yearly R&D investment. That is, more central sectors in the network tend to produce a higher economic return and investment in a sector can increase its centrality. We show that the optimal value-added for each sector occurs at an intermediate value of the model parameter, indicating that neither a fully open economy (one which ignores the competition with other nations) nor a fully closed economy (one which ignores the benefits of international collaboration) produce the highest return.

We aim at understanding the influence of geographical diversity and technological heterogeneity on innovation in biopharma innovations. We focus on triadic patent families: groups of patents that are filed in the three major offices (USPTO, EPO, JPO) and cover the same intellectual property, as defined by the OECD that share a priority number. Each family will have its technological sector defined by the International Patent Classification (IPC) codes of its constituent patents and its geographical location defined by the country or countries of its inventor(s). We define biopharma families as those that have at least one Pharma 4-digit classifier or at least one Biotech 7-digit classifier. We use the high-resolution inventor localization in the OECD Regpat database for EPO patents and those filed in the to the WIPO, and the Dataverse patent data for patents filed to the USPTO.

Here we focus on the structure of the patent citation network in 1990-1999 and 2000-2009, with a particular focus on the impact of geographical region (aggregated on the level of the US or the EU) and technological grouping (defined more precisely in a moment) within the classification of biopharma patents. Biopharma patents account for between 10 and 20% of all patents filed in the global patent system (see Figure 2), and are therefore of great significance in the innovation network. We also see in Tables 1 and 2 that biopharma patents tend to be more collaborative (they list more inventors than an average patent without regard to class) and involve more technological areas (as defined by the number of IPCs for the associated patents).

Figure 2 The yearly fraction of biopharma patents out of all patents filed worldwide.

Table 2 Summary statistics for all patents worldwide in the ’90s and ’00s

In order to visualize the citation networks, we restrict ourselves to the 300 most-cited biopharma patent families (see Figure 3).

Figure 3: The citation network between patent families in the ’90s. Nodes represent patent families, with the size proportional to the number of citations that family receives. The meaning of node coloring is related to the family’s classification (indicated above). Edges represent citations that occurred between 1990-1999.

The citation networks are too complex and dense to easily visualize in the general case. In the visualization, circles (or nodes) represent patent families with the area proportional to the number of citations received by that patent in the ‘90s, and lines (edges) represent a citation between biopharma patent families. The colors represent four very coarse-grained classifications of biopharma patents: Strongly Pharma (80% or more of the IPCs are pharma), Strongly Biotech (two or more IPCs are biotech), Pharma & Biotech (both are true), and Weakly Related (neither are true). Figure 3 shows the structure of the top 300 families in the 90s, along with brief descriptions of the most highly cited families. The labels of the most highly cited families are provided for clarity, but the specific wording is by inspection.Table 3 indicates the allocation of the innovation of US and EU economies in the four coarse grained categories. Strongly Pharma and Weakly Related are the most commonly present in the top 300 (indicating that these categories tend to produce important innovations more readily than the Strongly Biotech and Pharma & Biotech categories). However, in the overall network, Strongly Pharma is heavily over-represented for both the US and EU: a far greater fraction of families are Strongly Pharma than would be expected by the distribution of highly cited families.

Figure 4 shows the top 300 cited families in the ‘00s, which forms a radically different structure. There is a clear division into four groups, with the indicated groups A and B representing the majority of the Strong Pharma and Pharma & Biotech families respectively. Weakly Related and Biotech patents are more diffuse, but form the majority of groups C and D respectively. It is interesting to note that, unlike in the 90’s, there are very few citations between groups A (defined by Strongly Pharma) and B (defined by Strongly Biotech). Visually contrasting the network topologies in the ‘90s (Figure 3) with that in the ‘00s (Figure 4) readily indicates a large-scale reorganization of the biopharma research landscape into distinct fields over the span of ~10-20 years.

The distribution of the categories of each of the patents also changes with time, with a general increase in Pharma & Biotech. While this is significant in the US, it is most pronounced in the EU where the fraction of all other categories is reduced and produces a 17% increase in the fraction of Pharma & Biotech top-300 families. The overall composition of patent families does not reflect this difference in the distribution of important patent families (particularly for the EU), indicating an inefficient allocation of innovation resources.

Figure 4: The citation network between patent families in the ’00s. Nodes represent patent families and edges the citations

Based on the visualizations of the citation network in Figures 3 and 4, we expect that some patent families are more peripheral to the citation network than others. In particular, biopharma patent families that are deeply integrated within the citation network will be cited by other biopharma patents that are also deeply integrated in the citation network. This structure is revealed by examining the k-cores of the network, which is diagrammed in Figure 5. Put simply, a k-core is a set of patents that is cited by at least k other patents that are themselves cited by at least k other patents in the network.

Figure 5: Schematic of the k-cores grouping. All nodes in a k-core receive at least k citations from other families in the core.

In Figures 6 and 7, we show the distribution of location or classification of the families that are found in cores with large k (those that receive many citations, and thus are of greater importance). We express the propensity of different regions or classes to be found in large k-cores (those with k>2) in terms of the deviation relative to the standard deviation under a random rewire model. The US is far more likely to be found in a high core than the EU, with US/EU collaborations mitigating this advantage altogether. In Table 4, we clarify why the US is so much more likely to be found in a more densely connected core: US patents tend to have more backwards and forwards citations, and are thus more deeply integrated into the network. Pharma & Biotech is the most likely patent group to be found in large cores, and tends to be more deeply integrated as time progresses.

Figure 6: The probability of seeing regions in cores with k≥3, relative to the variance expected by random chance.

Figure 7: The probability of seeing the various groupings in cores with k≥3, relative to the variance expected by random chance.

Patent families can also be meaningfully divided into three categories to better understand their importance, schematically diagrammed in Figure 8. Originator families are heavily cited while including relatively few backwards citations, indicating they are a source of truly new innovation. Developer families cite other families heavily while being rarely cited themselves, indicating they implement the innovations of other families in highly specialized ways. The remaining families form a core with a roughly equal share of forward and backwards citations.

Figure 8: Schematic of transversal families. Developers receive few citations, while originators provide few citations.

Developers have many backward citations but few forward citations

  • 32k families
  • 0.5 fwd. cites per family
  • 5.1 back. cites per family

Originators have many forward citations but few backward citations

  • 32k families
  • 5.1 fwd. cites per family
  • 0.1 back. cites per family

Core patents have a mix of forward and backwards citations

  • 37k families
  • 4.4 fwd. cites per family
  • 4.8 back. cites per family

Tables 5 and 6 compare the number of originator, developer, or core families associated with locations or categories to the expected number due to random chance. We find that US patents tend to be found in the core, consistent with the fact that it is more deeply integrated in the citation network due to the overall larger number of citations. EU patents tend to be originators of information: they are not deeply integrated in the citation network, but tend to provide more information for the pharma citation network. Non-US/EU patents tend to be developers: if inventors are outside of both the US and EU, they tend to produce more specialized patents than in the US or the EU. Pharma or Pharma & Biotech patents are rarely found in the originator category, which tends to be dominated by Weakly Related families. This is likely due to the fact that Weakly Related patents are more likely to cite non-pharma patents, and thus act as a bridge for information flow between biopharma and other industrial sectors.

Table 7: The probability of seeing regions as originators, developers, or core families, relative to the variance expected by random chance.

Table 8: The probability of seeing the various groupings as originators, developers, or core families, relative to the variance expected by random chance.