Wikipedia and Digital Currencies: Interplay Between Collective Attention and Market Performance

The production and consumption of information about Bitcoin and other digital-, or “crypto-”, currencies have grown together with their market capitalization. However, a systematic investigation of the relationship between online attention and market dynamics, across multiple digital currencies, is still lacking. Here, we quantify the interplay between the attention towards digital currencies in Wikipedia and their market performance. We consider the entire edit history of currency-related pages and their view history from July 2015. First, we quantify the evolution of the cryptocurrency presence in Wikipedia by analyzing the editorial activity and the network of co-edited pages. We find that a small community of tightly connected editors is responsible for most of the production of information about cryptocurrencies in Wikipedia. Then, we show that a simple trading strategy informed by Wikipedia views performs better, in terms of returns on investment, than baseline strategies for most of the covered period. Our results contribute to the recent literature on the interplay between online information and investment markets, and we anticipate it will be of interest for researchers as well as investors.


Introduction
The cryptocurrency market grew super-exponentially for more than two years until January 2018, before suffering significant losses in the subsequent months [1]. Consequence and driver of this growth is the attention it has progressively attracted from a larger and larger public. In this paper, we quantify the evolution of the production and consumption of information concerning the cryptocurrency market as well as its interplay with the market behavior. Capitalizing on recent results showing that Wikipedia can be used as a proxy for the overall attention on the web [2], our analysis relies on data from the popular online encyclopaedia.
The first peer to peer currency system, Bitcoin, was created in 2009 as a realization of Satoshi Ethereum, and Monero.
The connection between Bitcoin prices and online social signals has allowed to develop successful trading strategies [21,28,29]. In [28] the authors used a deep learning algorithm and data from Wikipedia, Google search trends, Bitcoin forum [36] and cryptocurrencies news website [40] to anticipate Bitcoin prices.
Research focusing on the nature of community discussions and the activity of contributors is very limited. In [41], the authors analyzed data from the forum "bitcointalk" [36] and showed that there are two clear groups of contributors: Investors, who are driving the market hype, and technology enthusiasts, who are interested in the advancement of the cryptocurrency system.

Data collection and preparation
Wikipedia data was collected through the Wikipedia API [42] and include the daily number of views and the page edit history of the 38 cryptocurrencies with a page on Wikipedia (see Supplementary materials, S1).
Page-view data range from July 1st, 2015 until January 23rd, 2019, since earlier data are not accessible through the API. On the other hand, full editing history is accessible through the API, and includes the content of each edit, the editors, the time of creation and the comments to the edits. Repetitive tasks to maintain pages are often carried by automated tools known as "bots". Wikipedia requires bots to have separate accounts and names which include the word "BOT", in order to make their edits identifiable.
We excluded all edits from bots from our analysis.
We classified edits into two categories, namely edits with new content and maintenance edits. Maintenance edits aim to keep consensual page content by restoring more accurate old version (reverts) and fighting malicious edits (vandalism). We identified reverts by selecting edits comments containing the word "rv" or "revert" [43], and by creating an MD5 hashing scheme [44] to identify identical files. We created an MD5 hash for all edits, and we identified edits sharing the same hash with a previous edit as reverts. Reverts which were made specifically to fight vandalism were identified by selecting edits labeled in their associated comment as "vandalism" [43]. We considered as new content all edits that were not classified as vandalism nor reverts.
We also collected data on the activity of the most active editors in other Wikipedia pages. To retrieve this data, we used Xtool [45], a web tool providing general statistics on the editors and their most edited pages.
Market data include daily price, exchange volume and market capitalization of cryptocurrencies, and was collected from the 'Coinmarketcap' website [4]. The price of a cryptocurrency represents its exchange rate (with USD or Bitcoin, typically) which is determined by the market supply and demand dynamics.
The exchange volume is the total trading volume across exchange markets. The market capitalization is calculated as a product of a cryptocurrency circulating supply (the number of coins available to users) and its price. The market share is the market capitalization of a cryptocurrency normalized by the total market capitalization of the market. Price and market capitalization data is only available since April 28th, 2013, while volume data is available since December 27th, 2013.
The Wikipedia-based investment strategy we implement in this paper can be applied only to "marginally traded" cryptocurrencies. We compiled a list of 16 such cryptocurrencies from active exchange platforms including Poloniex and Bitfinex (see Supplementary materials, S2). Note that these are also the most widely traded currencies [4]. In our analysis, we consider that cryptocurrencies can be traded once their trading volume exceeds 100, 000USD. We excluded days where the reported volume did not lie within 2 standard deviations from the average trading volume, which are likely due to how market exchanges report their exchange volumes [46].

Wikipedia pages and market properties
In this section, we investigate the connection between the attention towards cryptocurrencies registered on Wikipedia and the evolving properties of the market. Wikipedia is the 5 th most visited website on the Internet [47], attractive to a non-expert audience seeking compact and non-technical information.
Previous work has shown that Wikipedia traffic can help predicting stock market prices [16]. The number of cryptocurrency pages on Wikipedia has grown together with their overall market capitalisation. In August 2005, Ripple became the first cryptocurrency with a page. At that point, it was not identified as a cryptocurrency, but as the idea of a monetary system relying on trust. Bitcoin appeared only in March 2009, followed by other 36 currencies (see Figure 1). The number of views received daily by a Wikipedia page is a good proxy for the overall attention on the web [2]. We find that the number of views to cryptocurrency pages has overall increased from 2015 until Jan 2018 (see Figure  2). In 2016, the 23 cryptocurrency pages were viewed ∼ 4 · 10 6 times. While in 2017, 34 cryptocurrecies pages received ∼ 16 · 10 6 views. In 2018, the sudden drop in cryptocurrency prices impacted the number of views. The total number of views received by 38 cryptocurrency pages in 2018 was ∼ 9 · 10 6 . A second aspect characterizing the evolution in time of Wikipedia pages is their edit history. We find that, on average, pages are more edited than in the past. Cryptocurrency pages (38 pages) were edited ∼ 5 · 10 3 times in 2018. In 2016, the 23 cryptocurrency Wikipedia pages were edited in total ∼ 2 · 10 3 times (see Figure 2). Bitcoin, in 2016 was the most viewed cryptocurrency page, with views and edits share of ∼ %74 and ∼ %37 over all other cryptocurrency pages, respectively. However, these numbers dropped to ∼ %46 and ∼ %16 in 2018. The fraction of editors active on Bitcoin's page over all other cryptocurrency pages has also dropped from ∼ 34% in 2016 to 10% in 2018. On the other hand, the fraction of views to the 5 most visited pages compared to all other cryptocurrencies has grown from ∼ %20 in 2016 to ∼ %27 in 2018. Interestingly, Bitcoin's share of the total market capitalization declined during the same period [1] suggesting a possible connection between the properties of the market and the evolution of attention for cryptocurrencies. We find that the daily number of Wikipedia page views and the price of Bitcoin are positively correlated (Pearson correlation ρ = 0.42, p < 10 −49 , see Figure 3-A), corroborating the hypothesis of a link between attention on Wikipedia and properties of the market. We further test this hypothesis considering all cryptocurrencies (see Figure 3-B) and focusing on other market properties.
We find that there is a positive correlation between the average share of views and (i) the average price (Spearman correlation ρ = 0.37, p = 0.02), (ii) the average share of volume (Spearman correlation ρ = 0.71, p < 10 −7 ), and (iii) the average market share (Spearman correlation ρ = 0.71, p < 10 −6 ) of a cryptocurrency. Moreover, these correlations are robust in time (see Figure A2).
We also find that the edit history of a currency is connected to the evolution of the market properties (see Figure 3-C). We observe a positive correlation between the average fraction of edits and (i) the average price of a given currency (Spearman correlation ρ = 0.36, p = 0.02), (ii) the average share of exchange volume for a given currency (Spearman correlation ρ = 0.63, p < 10 −5 ) and (iii) its market share (Spearman correlation ρ = 0.68, p < 10 −5 ). These correlations are robust in time (see Figure A2).

Evolution of cryptocurrency pages
Frequency of edits and editor diversity are considered reliable indicators of the quality of information included in a Wikipedia page. [48]. Cryptocurrency pages differ with respect to their edit history (see The nature of edits changes over a Wikipedia page life. While at the beginning, editors focus largely on new content, as the page ages more efforts are dedicated to fighting vandalism and misinformation (maintenance work) [43,49]. We quantify maintenance work by looking at "reverts", edits that restore a previous version of the page, and at the number of edits reporting vandalism. We find that reverts constitute the 18.2% of all edits, and that, on average, they constitute the 15.4% ± 4.3 of contributions to a cryptocurrency page. The fraction of reverts is stable in time (see Figure 5  Interestingly, this growth does not characterise all pages on Wikipedia. For example, in [52], the authors show that the number of editors in medical related article has been decreasing. The editing activity is heterogeneously distributed, as we find by ranking the editors according to the number of edits (see Figure 6-A). In fact, the relation between rank of an editor, r, and fraction of edits can be described by a power law distribution (P (r) ∼ r −β ) where β = 1.01. This result is in line with what generally observed in Wikipedia [53], and consistent across time, with β included between 0.84 and 0.95 (see Appendix A.5). In particular, the most active editor alone is responsible for ∼ 10% of the edits (see Appendix A.7 for more details on the most active editor) and only ∼ 9.6% of the editors (596) have edited at least 2 pages ( Figure 6-C). This group is responsible for 50% of the total number of edits for all cryptocurrency Wikipedia pages.
Then, we study the evolution of editors' activity in time. We classify editors into four groups based on their total number of edits at the end of the study, in January 2019 (see Figure 7): Contributors who made more than or equal to 500 edits (6 editors, responsible for 23% of edits), contributors who made 100 to 500 edits (23 editors, responsible for 15% of edits), contributors who made 20 to 100 edits (142 editors, responsible for 19% of the edits), editors who made less than 20 edits (97% of editors, responsible for 43% of the edits). We find that the higher the cumulative activity of a group, the most recently they started editing the pages (see Figure 7), in contrast to what is generally observed on Wikipedia [54,55]. Note that the group of most active contributors started editing in August 2012, 3 years after the creation of Bitcoin's page. Furthermore, Figure 8 shows that editors with the largest number of edits are responsible for the most extensive contributions in terms of number of edited words. Some of their edits, however, may be for maintenance. By ranking editors in descending order according to their total number of edits across the entire period of study, we find that, for the top 10 contributors, maintenance edits amount to 20% of their edits. On average, ∼ 18% of the edits written by top 250 editors are maintenance work 8 Electronic copy available at: https://ssrn.com/abstract=3346632 (see Figure 9-A). This value is consistent among different rank groups. Finally, top ranked editors tend to contribute in more than one page (see Figure 9-B), on average ∼ 4 pages.   To understand the general interests and the specialisation of the top editors of cryptocurrency Wikipedia pages, we focus on a subset of 6 editors that have contributed at least 500 edits each. We studied in details their interests by considering their contribution over the entire Wikipedia. Our results show that the main interests of these editors are cryptocurrencies and blockchain (see Figure 10). Results are consistent when we extend the analysis to the top 29 editors, who are responsible for 37% of the edits. Top editors also contribute in other non-cryptocurrency related pages, however, these pages are less homogeneous and include several different interests such as; genetically modified food, musicians and motor company. A giant component (see Figure 11) emerges in the network, implying each node is connected to all other nodes when we analyse its evolution under large time-windows (∼ years). Instead, if weekly time windows are considered, we find that the network is disconnected (see Figure 12)

An investment strategy based on Wikipedia attention
The demonstrated connection between the properties of the cryptocurrency market and traffic on Wikipedia suggests the latter could help informing a successful investment strategy. We investigate this possibility by testing a Wikipedia-based strategy similar to the one proposed in [16,17] for stock markets investments.
For a given page and a given day t, the Wikipedia investment strategy relies on the difference ∆n(t) = v(t) − v(t − 1) between the number of page views v(t) at day t and the number of views v(t − 1) at t − 1. According to the strategy, if ∆n(t) > 0, the investor sells the asset (at price p(t + 1)) at time t + 1 and then she buys at time t + 2 (at price p(t + 2)). This trading position is formally known as short position. On the other hand, if ∆n(t) ≤ 0 the investor buys at time t + 1 (at price p(t + 1)) and sells at time t + 2 (at price p(t + 2)), which is known as long position. The intuition behind the strategy is that if attention and information gathering has been rising, prices will drop, and vice-versa [16,56]. We consider Wikipedia views rather than edits, since the latter do not vary on a daily basis (the average time between edits is 10.12 days). Considering a longer period would overlook the cryptocurrencies' price volatility [57].
We also consider two baseline strategies. The first is based on the price difference ∆p(t) = p(t)−p(t−1) rather than the page views difference ∆n(t) [33]. In all other aspects, it is identical to the Wikipediabased strategy. This will allow us to test which indicator (price or Wikipedia page views) has better predictive capabilities under the same conditions. The rationale behind the first baseline strategy is that if the price has been rising, a drop will follow, and vice-versa. As a second baseline, we choose a random strategy, where, at every time t, one chooses either to buy or to sell an asset with 50% probability [16].
The performance of the different strategies is assessed by computing the cumulative return R, defiend as the summation of log-returns obtained under the proposed strategies. When ∆n(t) > 0 the log-return is computed as log(p(t + 1)) − log(p(t + 2)), while, in the opposite case, the log-return is log(p(t + 2)) − log(p(t + 1)). The use of the log returns is motivated by the ease of calculation of the short and long positions and since we are considering multi-period returns [58].
We test the Wikipedia-based strategy against the baselines for the 17 cryptocurrencies that have A closer inspection shows that there are consistent differences between cryptocurrencies, with respect to the average return (see Figure 14), with some even yielding overall negative returns. The Wikipediabased strategy yields a positive cumulative returns of ∼ 300% for Ethereum Classic, but for other currencies, including Ripple and Ethereum, investing based on Wikipedia leads to negative returns.
The observed differences could be potentially explained by the correlation between changes in daily price and in Wikipedia views. Instead, we observe that, although the Wikipedia-based strategy works well for Bitcoin but not for Dash, for both currencies there is a positive correlation between daily change in price and Wikipedia views of 0.1 and 0.18 respectively (see Figure A1). However, our proposed Data is displayed using a kernel density estimate, with a Gaussian kernel and bandwidth calculated using Silverman's rule of thumb. Data for the random strategy is obtained from 1000 independent realizations. All results are shown for investments between July 2015 and January 2019 for all cryptocurrencies which can be traded marginally combined.
strategy does not simply map to buying a cryptocurrency when its Wikipedia page views increases. In order to gain positive returns using our proposed strategy, an increase of the number of views at time t, should be followed by an increase in price in the next day t + 1 and a decrease of the price in the day after t + 2. Positive returns will also occur in case of a decrease in the number of views at time t if it was followed by a decrease in the price at time t + 1 and an increase in price at time t + 2.
Finally we investigate the role of the start and end times of the investment period (see Figure 15).
We find that for most of the choices, the Wikipedia-based strategy has a higher cumulative returns than the random strategy. It outperforms the price baseline for the majority of the periods ending before January 2018. This change after January 2018 can be attributed to the unexpected turn the market took after Jan 2018 which caused more than 400 billion dollars of losses.

Conclusion and discussion
In this paper, we have investigated the interplay between the production and consumption of information about digital currencies in Wikipedia and their market performance. We have shown that, over time, there is a positive correlation between the market performance of a cryptocurrency, as measured by its price, volume, and market share and the attention people pay to the corresponding Wikipedia page, measured by the number of page views and the number of page edits. This result suggests that the production and consumption of information in Wikipedia is relevant for investment purposes.
We have analyzed the edit history of cryptocurrency pages in Wikipedia. We have shown that contributions to cryptocurrency pages are bursty in time, with periods of high activity followed by calmer ones. We have found that cryptocurrency pages have experienced a higher number of revert edits (18%) compared to other pages, suggesting they have been subject to vivid debates around their contents. Also, we have found that the number of cryptocurrency pages editors has increased in the We have shown that the information in Wikipedia is, to a large extent, provided by cryptocurrency and technology enthusiasts. In fact, we have found that editors who are very active on cryptocurrency pages focus their editing activity almost exclusively on cryptocurrencies and blockchain.
We have found that the community of cryptocurrency editors is tight: On average, each page is connected to 37 other pages through an average of 7 editors and active contributors tend to edit many pages. New cryptocurrency pages are typically created by new editors, but then also edited by more experienced ones. For this reason, we find that older pages have higher degree in the co-editing network.
Finally, we have proposed a trading strategy relying on Wikipedia page views and found it yields significant returns compared to baseline strategies, further demonstrating the relevance of Wikipedia for cryptocurrency survival in the market. It is important to mention, however, that our strategy neglects the role played by fees, which could significantly decrease profits in real scenarios. Also, the strategy is not successful since January 2018, when the cryptocurrency market started suffering major losses.
Characterizing the production and consumption of information around cryptocurrencies is key to understand the market dynamics and inform investment decisions [60]. Although our study was limited to the analysis of Wikipedia data, other sources of information including traditional news outlets , Twitter, Reddit or bitcointalk could reveal important information about the cryptocurrency market dynamics.

Data Availability Statement
The datasets generated and analyzed for this study along with the code to regenerate the figures can be found in [59] A Appendix.

A.1 List of cryptocurrencies
We consider for this study all cryptocurrencies with a Wikipedia page. In Table A0, we present some of their characteristics. Using Wikipedia API, we retrieve data about each page views and edits. For the page views we use the API call: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/ user/wiki_page/daily/start_date/end_date, where wiki page is the cryptocurrency page name and start and end dates are the requested dates. To retrieve the edit history, we used the following call: https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&rvprop=timestamp% 7Cuser%7Ccomment%7Ccontent&rvlimit=500&titles=wiki_page.

A.2 Exchanges with margin trading support
Here, we provide data on the list of exchanges supporting margin trading. Margin trading is essential for our proposed investment strategy, since an investor can sell a cryptocurrencies which he does not own yet.

A.3 Correlations between Wikipedia page views and market properties.
The number of Wikipedia page views and the properties of the market are overall correlated. In Figure   A1, we show the correlations between Wikipedia page views, trading volume and price for the cryptocurrencies considered. We show the Spearman correlation between a cryptocurrency average share of page views and the market performance measured by its average market share (ρ vm ), average trading volume share (ρ vv ) and average price (ρ vp ) across time (see Figure A2A). We show that the positive correlation between this quantities is consistent with time, with 0.65 ≤ ρ vm ≤ 0.79, 0.61 ≤ ρ vv ≤ 0.83, and 0.32 ≤ ρ vp ≤ 0.51.
In Figure A2 A.4 Literature review.
Several studies have focused on Wikipedia pages and editors' activity. In Table A2, we present a summary of their findings and a comparison with our results around cryptocurrencies Wikipedia pages.
A.5 Robustness of the findings.
The uneven distribution of edits across editors was depicted in Figure 6. Here, we show that this result is consistent in time (see Figure A3-A). We also test our results against saving mistakes by editors [55].
This often occurs when an editor mistakenly save an incomplete edit, producing multiple edits within a very short time. We solve this issue by excluding from the analysis edits that from the same editor on the     Figure 2). same page, occurring within less than an hour from the prevopus one, as in [55]. In Figure A3-B, we show that, our results are robust to this change and well described by a power-law distribution (P (r) ∼ r −β ) with exponent β = 0.62. We also study top editors contributions in all Wikipedia pages. For each editor with at least 100 edits in cryptocurrency pages, we collect data about the top 10 Wikipedia pages they contributed. This include pages outside the 38 cryptocurrency pages. For this task, we use a web tool [45], which provides the number of edits contributed by each editor to a given page. Figure A4 shows that editors are mostly interested in cryptocurrencies and technology related pages. Compared to the set editors with more than 500 edits (see Figure 10), the set of pages edited is more diverse.
A.6 New pages Figure A5 shows, for each of the years considered, the fraction of edits made to new pages and the fraction of editors contributing to new pages. On average, the ∼ 18% of editors contribute to the newly created pages within a given year, while only ∼ 10% of the edits are made to new pages.

A.7 The most active editor
Here, we provide information on the editor with the highest number of edits in cryptocurrency pages (10% of the edits). Table A5 shows the editor general editing patterns in the entire English Wikipedia. Table A5 shows the top pages edited by the top editor.

A.8 Editing network
To characterize the co-editing activity in cryptocurrency Wikipedia pages, we constructed a weighted undirected network. A node represents a Wikipedia page and an edge exists between two nodes if they have at least one editor in common. Weights on edges represent the number of editors in common. We look at the evolution of the network across time and identify the most central pages according to the degree centrality. Figure A6 shows the number of weeks each cryptocurrencies appeared in the top 5 ranks when cryptocurrencies are ranked according to their degree centrality in descending order. Figure A6: Ranking in degree centrality. Number of weeks a cryptocurrency occupied one of top 5 ranks based on degree centrality in the co-editing Wikipedia pages network. Figure A7 shows the correlation between the age of a cryptocurrency page and its weighted degree (ρ = 0.40, p = 0.015). Figure A7: Correlation between page age and network strength. Page age in weeks vs its weighted degree in the editing network. Each point represents a node (page). Pearson correlation ρ = 0.40, p = 0.015. The solid line represents a fit a + bw where b = 0.28 ± 0.10.