19. 我的PySark套装

用python包自己的包是非常容易的。我整理了一些日常工作中经常用到的功能。您可以从下载并安装 My PySpark Package . 这个包的层次结构和目录结构如下。

19.1. 层次结构

|-- build
|   |-- bdist.linux-x86_64
|   |-- lib.linux-x86_64-2.7
|       |-- PySparkTools
|           |-- __init__.py
|           |-- Manipulation
|           |   |-- DataManipulation.py
|           |   |-- __init__.py
|           |── Visualization
|               |-- __init__.py
│               |-- PyPlots.py
|-- dist
│   |-- PySParkTools-1.0-py2.7.egg
|-- __init__.py
|-- PySparkTools
|   |-- __init__.py
|   |-- Manipulation
|   |   |-- DataManipulation.py
|   |   |-- __init__.py
|   |-- Visualization
|       |-- __init__.py
|       |-- PyPlots.py
│       |-- PyPlots.pyc
|-- PySParkTools.egg-info
|   |-- dependency_links.txt
|   |-- PKG-INFO
|   |-- requires.txt
|   |-- SOURCES.txt
|   |-- top_level.txt
|-- README.md
|-- requirements.txt
|-- setup.py
|-- test
    |-- spark-warehouse
    |-- test1.py
    |-- test2.py

从上面的层次结构中,你会发现你必须 __init__.py 在每个目录中。我来解释一下 __init__.py 文件,示例如下:

19.2. 设置

from setuptools import setup, find_packages

try:
    with open("README.md") as f:
        long_description = f.read()
except IOError:
    long_description = ""

try:
    with open("requirements.txt") as f:
        requirements = [x.strip() for x in f.read().splitlines() if x.strip()]
except IOError:
    requirements = []

setup(name='PySParkTools',
          install_requires=requirements,
      version='1.0',
      description='Python Spark Tools',
      author='Wenqiang Feng',
      author_email='von198@gmail.com',
      url='https://github.com/runawayhorse001/PySparkTools',
      packages=find_packages(),
      long_description=long_description
     )

19.3. ReadMe

# PySparkTools

This is my PySpark Tools. If you want to colne and install it, you can use

- clone

```{bash}
git clone git@github.com:runawayhorse001/PySparkTools.git
```
- install

```{bash}
cd PySparkTools
pip install -r requirements.txt
python setup.py install
```

- test

```{bash}
cd PySparkTools/test
python test1.py
```