16. 蒙特卡罗模拟

images/baba_sim_price.png

蒙特卡罗模拟只是通过重复生成随机数来估计固定参数的一种方法。有关更多详细信息,请访问 A Zero Math Introduction to Markov Chain Monte Carlo Methods .

蒙特卡罗模拟是一种用于了解风险和不确定性对财务、项目管理、成本和其他预测模型的影响的技术。蒙特卡罗模拟器可以帮助人们可视化大部分或所有的潜在结果,从而更好地了解决策的风险。有关更多详细信息,请访问 The house always wins .

16.1. 模拟赌场赢

我们假设玩家约翰有49%的机会赢得比赛,下注每局5美元。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

start_m =100
wager = 5
bets = 100
trials = 1000

trans = np.vectorize(lambda t: -wager if t <=0.51 else wager)

fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(1,1,1)

end_m = []

for i in range(trials):
    money = reduce(lambda c, x: c + [c[-1] + x], trans(np.random.random(bets)), [start_m])
    end_m.append(money[-1])
    plt.plot(money)

plt.ylabel('Player Money in $')
plt.xlabel('Number of bets')
plt.title(("John starts the game with $ %.2f and ends with $ %.2f")%(start_m,sum(end_m)/len(end_m)))
plt.show()
images/casino_5_100.png
images/casino_100_1000.png

16.2. 模拟随机行走

16.2.1. 获取历史股价

  1. 影响数据。如果你需要这个代码,你可以和我联系。

stock.tail(4)

+----------+----------+----------+----------+----------+----------+--------+
|      Date|      Open|      High|       Low|     Close| Adj Close|  Volume|
+----------+----------+----------+----------+----------+----------+--------+
|2018-12-07|155.399994|158.050003|151.729996|153.059998|153.059998|17447900|
|2018-12-10|150.389999|152.809998|147.479996|151.429993|151.429993|15525500|
|2018-12-11|155.259995|156.240005|150.899994|151.830002|151.830002|13651900|
|2018-12-12|155.240005|156.169998|151.429993|     151.5|     151.5|16597900|
+----------+----------+----------+----------+----------+----------+--------+
  1. 转换 str 键入日期结束日期类型

stock['Date'] = pd.to_datetime(stock['Date'])
  1. 数据可视化

# Plot everything by leveraging the very powerful matplotlib package
width = 10
height = 6
data = stock
fig = plt.figure(figsize=(width, height))
ax = fig.add_subplot(1,1,1)
ax.plot(data.Date, data.Close, label='Close')
ax.plot(data.Date, data.High, label='High')
# ax.plot(data.Date, data.Low, label='Low')
ax.set_xlabel('Date')
ax.set_ylabel('price ($)')
ax.legend()
ax.set_title('Stock price: ' + ticker, y=1.01)
#plt.xticks(rotation=70)
plt.show()
# Plot everything by leveraging the very powerful matplotlib package
fig = plt.figure(figsize=(width, height))
ax = fig.add_subplot(1,1,1)
ax.plot(data.Date, data.Volume, label='Volume')
#ax.plot(data.Date, data.High, label='High')
# ax.plot(data.Date, data.Low, label='Low')
ax.set_xlabel('Date')
ax.set_ylabel('Volume')
ax.legend()
ax.set_title('Stock volume: ' + ticker, y=1.01)
#plt.xticks(rotation=70)
plt.show()
images/baba_history.png

历史股价

images/baba_history_vol.png

历史存量

16.2.2. 计算复合年增长率

复合年增长率(CAGR)公式对投资分析非常有用。它也可以被称为年化回报率或年百分比收益率或有效年率,这取决于方程式的代数形式。许多投资,如股票,其回报率可能变化很大。CAGR公式允许你计算一个“平滑”的回报率,你可以用它来与其他投资进行比较。公式定义为(更多详细信息可在 CAGR Calculator and Formula

System Message: WARNING/2 (TEXT CAGR=LEFT(FRAC TEXT END VALUE TEXT起始值RIGHT)^ FRAC 365 TEXT DAYS-1)

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016/Debian) (preloaded format=latex) restricted \write18 enabled. entering extended mode (./math.tex LaTeX2e <2017/01/01> patch level 3 Babel <3.9r> and hyphenation patterns for 12 language(s) loaded. (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size12.clo)) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty (/usr/share/texlive/texmf-dist/tex/latex/ucs/utf8x.def)) (/usr/share/texlive/texmf-dist/tex/latex/ucs/ucs.sty (/usr/share/texlive/texmf-dist/tex/latex/ucs/data/uni-global.def)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amscls/amsthm.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texlive/texmf-dist/tex/latex/anyfontsize/anyfontsize.sty) (/usr/share/texlive/texmf-dist/tex/latex/tools/bm.sty) (/usr/share/texlive/texmf-dist/tex/latex/mathtools/mathtools.sty (/usr/share/texlive/texmf-dist/tex/latex/graphics/keyval.sty) (/usr/share/texlive/texmf-dist/tex/latex/tools/calc.sty) (/usr/share/texlive/texmf-dist/tex/latex/mathtools/mhsetup.sty)) ! LaTeX Error: File `dsfont.sty' not found. Type X to quit or <RETURN> to proceed, or enter new name. (Default extension: sty) Enter file name: ! Emergency stop. <read *> l.16 \def \Z{\mathbb{Z}}^^M No pages of output. Transcript written on math.log.

days =  (stock.Date.iloc[-1] - stock.Date.iloc[0]).days
cagr = ((((stock['Adj Close'].iloc[-1]) / stock['Adj Close'].iloc[0])) ** (365.0/days)) - 1
print ('CAGR =',str(round(cagr,4)*100)+"%")
mu = cagr

16.2.3. 计算年波动率

股票的波动性是指其价格在一段时间内的变化。例如,一只股票可能有大幅上下摆动的趋势,而另一只股票可能以更稳定、不那么动荡的方式移动。这两支股票可能在一天结束时以相同的价格结束,但它们到达这一点的路径可能会变化很大。首先,我们创建一系列百分比回报,并计算回报的年度波动率年度波动率年度波动率。为了以年化的方式呈现这种波动性,我们只需要将每日标准差乘以252的平方根。假设一年有252个交易日。有关更多详细信息,请访问 How to Calculate Annualized Volatility .

stock['Returns'] = stock['Adj Close'].pct_change()
vol = stock['Returns'].std()*np.sqrt(252)

16.2.4. 创建每日收益矩阵

  1. 使用随机正态分布创建每日收益矩阵,生成由来自均匀分布U(0.0,1.0)的I.I.D.样本组成的RDD矩阵。

S = stock['Adj Close'].iloc[-1] #starting stock price (i.e. last available real stock price)
T = 5 #Number of trading days
mu = cagr #Return
vol = vol #Volatility
trials = 10000

mat = RandomRDDs.normalVectorRDD(sc, trials, T, seed=1)
  1. 将生成的RDD中的分布从u(0.0,1.0)转换为u(a,b),使用randomrdds.uniformrdd(sc,n,p,seed).map(lambda v:a+(b-a)*v)

a = mu/T
b = vol/math.sqrt(T)
v = mat.map(lambda x: a +  (b - a)* x)
  1. 将RDD MStrix转换为数据帧

df = v.map(lambda x: [round(i,6)+1 for i in x]).toDF()
df.show(5)
+--------+--------+--------+--------+--------+
|      _1|      _2|      _3|      _4|      _5|
+--------+--------+--------+--------+--------+
|0.935234|1.162894| 1.07972|1.238257|1.066136|
|0.878456|1.045922|0.990071|1.045552|0.854516|
|1.186472|0.944777|0.742247|0.940023|1.220934|
|0.872928|1.030882|1.248644|1.114262|1.063762|
| 1.09742|1.188537|1.137283|1.162548|1.024612|
+--------+--------+--------+--------+--------+
only showing top 5 rows
from pyspark.sql.functions import lit
S = stock['Adj Close'].iloc[-1]
price = df.withColumn('init_price' ,lit(S))
price.show(5)

+--------+--------+--------+--------+--------+----------+
|      _1|      _2|      _3|      _4|      _5|init_price|
+--------+--------+--------+--------+--------+----------+
|0.935234|1.162894| 1.07972|1.238257|1.066136|     151.5|
|0.878456|1.045922|0.990071|1.045552|0.854516|     151.5|
|1.186472|0.944777|0.742247|0.940023|1.220934|     151.5|
|0.872928|1.030882|1.248644|1.114262|1.063762|     151.5|
| 1.09742|1.188537|1.137283|1.162548|1.024612|     151.5|
+--------+--------+--------+--------+--------+----------+
only showing top 5 rows
price = price.withColumn('day_0', col('init_price'))
price.show(5)
+--------+--------+--------+--------+--------+----------+-----+
|      _1|      _2|      _3|      _4|      _5|init_price|day_0|
+--------+--------+--------+--------+--------+----------+-----+
|0.935234|1.162894| 1.07972|1.238257|1.066136|     151.5|151.5|
|0.878456|1.045922|0.990071|1.045552|0.854516|     151.5|151.5|
|1.186472|0.944777|0.742247|0.940023|1.220934|     151.5|151.5|
|0.872928|1.030882|1.248644|1.114262|1.063762|     151.5|151.5|
| 1.09742|1.188537|1.137283|1.162548|1.024612|     151.5|151.5|
+--------+--------+--------+--------+--------+----------+-----+
only showing top 5 rows

16.2.5. 蒙特卡罗模拟

from pyspark.sql.functions import round
for name in price.columns[:-2]:
    price = price.withColumn('day'+name, round(col(name)*col('init_price'),2))
    price = price.withColumn('init_price',col('day'+name))
price.show(5)

+--------+--------+--------+--------+--------+----------+-----+------+------+------+------+------+
|      _1|      _2|      _3|      _4|      _5|init_price|day_0| day_1| day_2| day_3| day_4| day_5|
+--------+--------+--------+--------+--------+----------+-----+------+------+------+------+------+
|0.935234|1.162894| 1.07972|1.238257|1.066136|    234.87|151.5|141.69|164.77|177.91| 220.3|234.87|
|0.878456|1.045922|0.990071|1.045552|0.854516|    123.14|151.5|133.09| 139.2|137.82| 144.1|123.14|
|1.186472|0.944777|0.742247|0.940023|1.220934|    144.67|151.5|179.75|169.82|126.05|118.49|144.67|
|0.872928|1.030882|1.248644|1.114262|1.063762|    201.77|151.5|132.25|136.33|170.23|189.68|201.77|
| 1.09742|1.188537|1.137283|1.162548|1.024612|     267.7|151.5|166.26|197.61|224.74|261.27| 267.7|
+--------+--------+--------+--------+--------+----------+-----+------+------+------+------+------+
only showing top 5 rows

16.2.6. 总结

selected_col = [name for name in price.columns if 'day_' in name]

simulated = price.select(selected_col)
simulated.describe().show()
+-------+----------+------------------+------------------+------------------+------------------+------------------+
|summary|2018-12-12|        2018-12-13|        2018-12-14|        2018-12-17|        2018-12-18|        2018-12-19|
+-------+----------+------------------+------------------+------------------+------------------+------------------+
|  count|   10000.0|           10000.0|           10000.0|           10000.0|           10000.0|           10000.0|
|   mean|     151.5|155.11643700000002|        158.489058|162.23713200000003|        166.049375|        170.006525|
|    std|       0.0|18.313783237787845|26.460919262517276| 33.37780495150803|39.369101074463416|45.148120695490846|
|    min|     151.5|              88.2|             74.54|             65.87|             68.21|             58.25|
|    25%|     151.5|           142.485|            140.15|            138.72|           138.365|            137.33|
|    50%|     151.5|            154.97|           157.175|            159.82|            162.59|165.04500000000002|
|    75%|     151.5|           167.445|175.48499999999999|          182.8625|           189.725|           196.975|
|    max|     151.5|            227.48|            275.94|            319.17|            353.59|            403.68|
+-------+----------+------------------+------------------+------------------+------------------+------------------+
data_plt = simulated.toPandas()
days = pd.date_range(stock['Date'].iloc[-1], periods= T+1,freq='B').date

width = 10
height = 6
fig = plt.figure(figsize=(width, height))
ax = fig.add_subplot(1,1,1)

days = pd.date_range(stock['Date'].iloc[-1], periods= T+1,freq='B').date

for i in range(trials):
    plt.plot(days, data_plt.iloc[i])
ax.set_xlabel('Date')
ax.set_ylabel('price ($)')
ax.set_title('Simulated Stock price: ' + ticker, y=1.01)
plt.show()
images/baba_sim_price_demo.png

16.2.7. 一年期股价模拟

images/baba_sim_price.png

模拟股票价格

images/baba_sim_dis1.png

模拟股价分布