备注
点击 here 下载完整的示例代码
罗格特#
建立一个由1022个类别和5075个交叉引用组成的有向图,定义见1879年版本的Roget'sThesaurus。此示例在的第1.2节中描述
唐纳德E.克努特,“斯坦福图形:组合计算平台”,ACM出版社,纽约,1993年。http://www cs faculty.stanford.edu/~knuth/sgb.html
注意,5075个交叉引用中的一个是自循环,但是它包含在这里构建的图中,因为标准networkx DiGraph
类允许自循环。(比照400辛辣:400 401 403 405)。
数据文件位于:

出:
skipping self loop 400 400
Loaded roget_dat.txt containing 1022 categories.
DiGraph with 1022 nodes and 5075 edges
21 connected components
import gzip
import re
import sys
import matplotlib.pyplot as plt
import networkx as nx
def roget_graph():
"""Return the thesaurus graph from the roget.dat example in
the Stanford Graph Base.
"""
# open file roget_dat.txt.gz
fh = gzip.open("roget_dat.txt.gz", "r")
G = nx.DiGraph()
for line in fh.readlines():
line = line.decode()
if line.startswith("*"): # skip comments
continue
if line.startswith(" "): # this is a continuation line, append
line = oldline + line
if line.endswith("\\\n"): # continuation line, buffer, goto next
oldline = line.strip("\\\n")
continue
(headname, tails) = line.split(":")
# head
numfind = re.compile(r"^\d+") # re to find the number of this word
head = numfind.findall(headname)[0] # get the number
G.add_node(head)
for tail in tails.split():
if head == tail:
print("skipping self loop", head, tail, file=sys.stderr)
G.add_edge(head, tail)
return G
G = roget_graph()
print("Loaded roget_dat.txt containing 1022 categories.")
print(G)
UG = G.to_undirected()
print(nx.number_connected_components(UG), "connected components")
options = {
"node_color": "black",
"node_size": 1,
"edge_color": "gray",
"linewidths": 0,
"width": 0.1,
}
nx.draw_circular(UG, **options)
plt.show()
Total running time of the script: ( 0 minutes 0.197 seconds)