External libraries

Dr Andy Evans

[Fullscreen]

External libraries

  • A very complete list can be found at PyPi the Python Package Index:
    https://pypi.python.org/pypi
  • To install, use pip, which comes with Python:
    pip install package
  • or download, unzip, and run the installer directly from the directory:
    python setup.py install
  • If you have Python 2 and Python 3 installed, use pip3 (though not with Anaconda) or make sure the right version is first in your PATH.

Numpy

Numpy data

  • Perhaps the nicest thing about numpy is its handling of complicated 2D datasets. It has its own array types which overload the indexing operators. Note the difference in the below from the standard [1d][2d] notation:
    import numpy
    data = numpy.int_([
    [1,2,3,4,5],
    [10,20,30,40,50],
    [100,200,300,400,500]
    ])

    print(data[0,0]) # 1
    print(data[1:3,1:3]) # [[20 30][200 300]]
    On a standard list, data[1:3][1:3] wouldn't work; at best data[1:3][0][1:3] would give you [20][30]

Numpy operations

  • You can additionally do maths on the arrays, including matrix manipulation.
    import numpy
    data = numpy.int_([
    [1,2,3,4,5],
    [10,20,30,40,50],
    [100,200,300,400,500]
    ])
    print(data[1:3,1:3] - 10) # [[10 20],[190 290]]
    print(numpy.transpose(data[1:3,1:3])) # [[20 200],[30 300]]

Pandas

Pandas data

  • Pandas data focuses around DataFrames, 2D arrays with addition abilities to name and use rows and columns.
    df = pandas.DataFrame(
    data, # numpy array from before.
    index=['i','ii','iii'], columns=['A','B','C','D','E'])
    print (data['A'])
    print(df.mean(0)['A'])
    print(df.mean(1)['i'])
  • Prints:
    i 1
    ii 10
    iii 100
    Name: A, dtype: int32
    37.0
    3.0

scikit-learn

  • http://scikit-learn.org/
  • Scientific analysis and machine learning.
  • Used for machine learning. Founded on Numpy data formats.

Beautiful Soup

Tweepy

NLTK

  • http://www.nltk.org/
  • Natural Language Toolkit.
  • Parse text and analyse everything from Parts Of Speech to positivity or negativity of statements (sentiment analysis).

Celery