16. 蒙特卡罗模拟¶

蒙特卡罗模拟只是通过重复生成随机数来估计固定参数的一种方法。有关更多详细信息,请访问 A Zero Math Introduction to Markov Chain Monte Carlo Methods .
蒙特卡罗模拟是一种用于了解风险和不确定性对财务、项目管理、成本和其他预测模型的影响的技术。蒙特卡罗模拟器可以帮助人们可视化大部分或所有的潜在结果,从而更好地了解决策的风险。有关更多详细信息,请访问 The house always wins .
16.1. 模拟赌场赢¶
我们假设玩家约翰有49%的机会赢得比赛,下注每局5美元。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
start_m =100
wager = 5
bets = 100
trials = 1000
trans = np.vectorize(lambda t: -wager if t <=0.51 else wager)
fig = plt.figure(figsize=(10, 6))
ax = fig.add_subplot(1,1,1)
end_m = []
for i in range(trials):
money = reduce(lambda c, x: c + [c[-1] + x], trans(np.random.random(bets)), [start_m])
end_m.append(money[-1])
plt.plot(money)
plt.ylabel('Player Money in $')
plt.xlabel('Number of bets')
plt.title(("John starts the game with $ %.2f and ends with $ %.2f")%(start_m,sum(end_m)/len(end_m)))
plt.show()


16.2. 模拟随机行走¶
16.2.1. 获取历史股价¶
影响数据。如果你需要这个代码,你可以和我联系。
stock.tail(4)
+----------+----------+----------+----------+----------+----------+--------+
| Date| Open| High| Low| Close| Adj Close| Volume|
+----------+----------+----------+----------+----------+----------+--------+
|2018-12-07|155.399994|158.050003|151.729996|153.059998|153.059998|17447900|
|2018-12-10|150.389999|152.809998|147.479996|151.429993|151.429993|15525500|
|2018-12-11|155.259995|156.240005|150.899994|151.830002|151.830002|13651900|
|2018-12-12|155.240005|156.169998|151.429993| 151.5| 151.5|16597900|
+----------+----------+----------+----------+----------+----------+--------+
转换
str
键入日期结束日期类型
stock['Date'] = pd.to_datetime(stock['Date'])
数据可视化
# Plot everything by leveraging the very powerful matplotlib package
width = 10
height = 6
data = stock
fig = plt.figure(figsize=(width, height))
ax = fig.add_subplot(1,1,1)
ax.plot(data.Date, data.Close, label='Close')
ax.plot(data.Date, data.High, label='High')
# ax.plot(data.Date, data.Low, label='Low')
ax.set_xlabel('Date')
ax.set_ylabel('price ($)')
ax.legend()
ax.set_title('Stock price: ' + ticker, y=1.01)
#plt.xticks(rotation=70)
plt.show()
# Plot everything by leveraging the very powerful matplotlib package
fig = plt.figure(figsize=(width, height))
ax = fig.add_subplot(1,1,1)
ax.plot(data.Date, data.Volume, label='Volume')
#ax.plot(data.Date, data.High, label='High')
# ax.plot(data.Date, data.Low, label='Low')
ax.set_xlabel('Date')
ax.set_ylabel('Volume')
ax.legend()
ax.set_title('Stock volume: ' + ticker, y=1.01)
#plt.xticks(rotation=70)
plt.show()

历史股价¶

历史存量¶
16.2.2. 计算复合年增长率¶
复合年增长率(CAGR)公式对投资分析非常有用。它也可以被称为年化回报率或年百分比收益率或有效年率,这取决于方程式的代数形式。许多投资,如股票,其回报率可能变化很大。CAGR公式允许你计算一个“平滑”的回报率,你可以用它来与其他投资进行比较。公式定义为(更多详细信息可在 CAGR Calculator and Formula )
days = (stock.Date.iloc[-1] - stock.Date.iloc[0]).days
cagr = ((((stock['Adj Close'].iloc[-1]) / stock['Adj Close'].iloc[0])) ** (365.0/days)) - 1
print ('CAGR =',str(round(cagr,4)*100)+"%")
mu = cagr
16.2.3. 计算年波动率¶
股票的波动性是指其价格在一段时间内的变化。例如,一只股票可能有大幅上下摆动的趋势,而另一只股票可能以更稳定、不那么动荡的方式移动。这两支股票可能在一天结束时以相同的价格结束,但它们到达这一点的路径可能会变化很大。首先,我们创建一系列百分比回报,并计算回报的年度波动率年度波动率年度波动率。为了以年化的方式呈现这种波动性,我们只需要将每日标准差乘以252的平方根。假设一年有252个交易日。有关更多详细信息,请访问 How to Calculate Annualized Volatility .
stock['Returns'] = stock['Adj Close'].pct_change()
vol = stock['Returns'].std()*np.sqrt(252)
16.2.4. 创建每日收益矩阵¶
使用随机正态分布创建每日收益矩阵,生成由来自均匀分布U(0.0,1.0)的I.I.D.样本组成的RDD矩阵。
S = stock['Adj Close'].iloc[-1] #starting stock price (i.e. last available real stock price)
T = 5 #Number of trading days
mu = cagr #Return
vol = vol #Volatility
trials = 10000
mat = RandomRDDs.normalVectorRDD(sc, trials, T, seed=1)
将生成的RDD中的分布从u(0.0,1.0)转换为u(a,b),使用randomrdds.uniformrdd(sc,n,p,seed).map(lambda v:a+(b-a)*v)
a = mu/T
b = vol/math.sqrt(T)
v = mat.map(lambda x: a + (b - a)* x)
将RDD MStrix转换为数据帧
df = v.map(lambda x: [round(i,6)+1 for i in x]).toDF()
df.show(5)
+--------+--------+--------+--------+--------+
| _1| _2| _3| _4| _5|
+--------+--------+--------+--------+--------+
|0.935234|1.162894| 1.07972|1.238257|1.066136|
|0.878456|1.045922|0.990071|1.045552|0.854516|
|1.186472|0.944777|0.742247|0.940023|1.220934|
|0.872928|1.030882|1.248644|1.114262|1.063762|
| 1.09742|1.188537|1.137283|1.162548|1.024612|
+--------+--------+--------+--------+--------+
only showing top 5 rows
from pyspark.sql.functions import lit
S = stock['Adj Close'].iloc[-1]
price = df.withColumn('init_price' ,lit(S))
price.show(5)
+--------+--------+--------+--------+--------+----------+
| _1| _2| _3| _4| _5|init_price|
+--------+--------+--------+--------+--------+----------+
|0.935234|1.162894| 1.07972|1.238257|1.066136| 151.5|
|0.878456|1.045922|0.990071|1.045552|0.854516| 151.5|
|1.186472|0.944777|0.742247|0.940023|1.220934| 151.5|
|0.872928|1.030882|1.248644|1.114262|1.063762| 151.5|
| 1.09742|1.188537|1.137283|1.162548|1.024612| 151.5|
+--------+--------+--------+--------+--------+----------+
only showing top 5 rows
price = price.withColumn('day_0', col('init_price'))
price.show(5)
+--------+--------+--------+--------+--------+----------+-----+
| _1| _2| _3| _4| _5|init_price|day_0|
+--------+--------+--------+--------+--------+----------+-----+
|0.935234|1.162894| 1.07972|1.238257|1.066136| 151.5|151.5|
|0.878456|1.045922|0.990071|1.045552|0.854516| 151.5|151.5|
|1.186472|0.944777|0.742247|0.940023|1.220934| 151.5|151.5|
|0.872928|1.030882|1.248644|1.114262|1.063762| 151.5|151.5|
| 1.09742|1.188537|1.137283|1.162548|1.024612| 151.5|151.5|
+--------+--------+--------+--------+--------+----------+-----+
only showing top 5 rows
16.2.5. 蒙特卡罗模拟¶
from pyspark.sql.functions import round
for name in price.columns[:-2]:
price = price.withColumn('day'+name, round(col(name)*col('init_price'),2))
price = price.withColumn('init_price',col('day'+name))
price.show(5)
+--------+--------+--------+--------+--------+----------+-----+------+------+------+------+------+
| _1| _2| _3| _4| _5|init_price|day_0| day_1| day_2| day_3| day_4| day_5|
+--------+--------+--------+--------+--------+----------+-----+------+------+------+------+------+
|0.935234|1.162894| 1.07972|1.238257|1.066136| 234.87|151.5|141.69|164.77|177.91| 220.3|234.87|
|0.878456|1.045922|0.990071|1.045552|0.854516| 123.14|151.5|133.09| 139.2|137.82| 144.1|123.14|
|1.186472|0.944777|0.742247|0.940023|1.220934| 144.67|151.5|179.75|169.82|126.05|118.49|144.67|
|0.872928|1.030882|1.248644|1.114262|1.063762| 201.77|151.5|132.25|136.33|170.23|189.68|201.77|
| 1.09742|1.188537|1.137283|1.162548|1.024612| 267.7|151.5|166.26|197.61|224.74|261.27| 267.7|
+--------+--------+--------+--------+--------+----------+-----+------+------+------+------+------+
only showing top 5 rows
16.2.6. 总结¶
selected_col = [name for name in price.columns if 'day_' in name]
simulated = price.select(selected_col)
simulated.describe().show()
+-------+----------+------------------+------------------+------------------+------------------+------------------+
|summary|2018-12-12| 2018-12-13| 2018-12-14| 2018-12-17| 2018-12-18| 2018-12-19|
+-------+----------+------------------+------------------+------------------+------------------+------------------+
| count| 10000.0| 10000.0| 10000.0| 10000.0| 10000.0| 10000.0|
| mean| 151.5|155.11643700000002| 158.489058|162.23713200000003| 166.049375| 170.006525|
| std| 0.0|18.313783237787845|26.460919262517276| 33.37780495150803|39.369101074463416|45.148120695490846|
| min| 151.5| 88.2| 74.54| 65.87| 68.21| 58.25|
| 25%| 151.5| 142.485| 140.15| 138.72| 138.365| 137.33|
| 50%| 151.5| 154.97| 157.175| 159.82| 162.59|165.04500000000002|
| 75%| 151.5| 167.445|175.48499999999999| 182.8625| 189.725| 196.975|
| max| 151.5| 227.48| 275.94| 319.17| 353.59| 403.68|
+-------+----------+------------------+------------------+------------------+------------------+------------------+
data_plt = simulated.toPandas()
days = pd.date_range(stock['Date'].iloc[-1], periods= T+1,freq='B').date
width = 10
height = 6
fig = plt.figure(figsize=(width, height))
ax = fig.add_subplot(1,1,1)
days = pd.date_range(stock['Date'].iloc[-1], periods= T+1,freq='B').date
for i in range(trials):
plt.plot(days, data_plt.iloc[i])
ax.set_xlabel('Date')
ax.set_ylabel('price ($)')
ax.set_title('Simulated Stock price: ' + ticker, y=1.01)
plt.show()
