Python:freeware tool for scientific data analysis

Python® is a dynamic object-oriented programming language. In this page, I explain simple applications of python in scientific computations. Python runs on Linux/Unix, Mac OS X, OS/2, and Windows. Python is distributed under an OSI-approved open source license that makes it free to use, even for commercial products. For a general introduction to python, you may read the online book Dive into Python.



Packages for Data Analysis:

Python is a high-level programming language, freely available for the scientific community. This motivated people to use it as an alternative for other data reduction programs. For instance, The ALMA Common Software (ACS) or EVLA projects plan to use python-based routines as their main analysis tool. It is for this that they invented CASA, which replaced the late AIPS package. The Space Telescope Science Institute (STSCI) also supports python extensively. They provide the main package for handling the FITS file, the Pyfits. They also presented Pyraf, A Python-based command language for IRAF. Moreover, they significantly contributed to the array manipulation routines, namely numarray, which was the back bone for the new NumPy module. The European Space Agency (ESA) also joined the game and supported some python-based programs beside ALMA, like AstroAsciiData. NASA develops an ASTRON-like library for Python called AstroLib, like the IDL ASTRON library provided by NASA. In short, high performance and feasibility of Python made it a favorite package among scientific data analysis tools.



First things first !

You have to install Python and a couple of necessary libraries on your computer. If you use Linux, most of them are accessible thought your package manager. There are three packages which are especially required for the astronomical data analysis: 1) Pyfits, provided by STSCT/NASA, to read/write the FITS file, 2) NumPy and SciPy for working with numeric arrays and mathematical operation, and 3) Matplotlib, a collection of MatLab routines to plot a variety of data. Some extra packages, relevant for astronomical data reduction, are explained below.

------------------------------------------------------------------------

1) PyFITS

PyFITS provides an interface to FITS formatted files under the Python scripting language. It is useful both for interactive data analysis and for writing analysis scripts in Python using FITS files as either input or output. PyFITS is a development project of the Science Software Branch at the Space Telescope Science Institute. The manual provides a tutorial on how to use PyFITS with FITS images and FITS tables, along with an extensive description of all the methods (currently) available for working with FITS files.

------------------------------------------------------------------------

2) Numpy

The fundamental package needed for scientific computing with Python is called NumPy. This package contains:

a powerful N-dimensional array object

sophisticated (broadcasting) functions

basic linear algebra functions

basic Fourier transforms

sophisticated random number capabilities

tools for integrating Fortran code.

Documentation for NumPy or documentation as PDF

if you got used to IDL, see NumPy for IDL users page.


An interesting tutorial:

This is intended to show how Python can be used to interactively analyze astronomical data much in the same way IDL can. This tutorial has been written by Perry Greenfield and Robert Jedrzejewski to illustrate how one can use Python to do interactive data analysis in astronomy (in much the same style as is now popular with IDL). The focus is initially in showing someone how to get going quickly in using Python to do interactive tasks, and only later on teaching details of how to program in Python.

2') SciPy

SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization. Together, they run on all popular operating systems, are quick to install, and are free of charge. NumPy and SciPy are easy to use, but powerful enough to be depended upon by some of the world's leading scientists and engineers...

Besides its main package which provides numerous scientific modules like array manipulation, integration, interpolation, linear algebra, FFT, special functions, etc, you can include topical packages depending on your special tasks. For a complete list of the topical software within SciPy, see Topical Packages.

------------------------------------------------------------------------

3) Matplotlib

Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hard copy formats and interactive environments across platforms. Matplotlib can be used in python scripts, the python and ipython shell (ala MatLab or Mathematica), web application servers, and six graphical user interface toolkits.

Matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, error charts, scatter plots, etc, with just a few lines of code. (PDF document, Tutorial)

Some examples of Matplotlib....

------------------------------------------------------------------------

Additional packages:


AstroLib

This page is intended to serve as a means for coordinating ideas about how astronomical utility libraries should be built for Python analogous to the ASTRON library for IDL. STScI would like to help develop an ASTRON-like library for Python. We have some resources to devote to it now and are starting work on building such tools. (for reference; here is the link to the documentation for the IDL "ASTRON library").

------------------------------------------------------------------------

AstroAsciiData

This is a Python module to handle ASCII tables. Features:

Imports all reasonably well-formed ASCII tables

Column-first access

Easy creation and manipulation of tables, columns, rows and attached comments

Retains formatting of data values

Column sorting

Interchangeable comment character, column delimiter and null value

Exports data to: ASCII , FITS table , HTML table format , and LaTeX table format.

------------------------------------------------------------------------

Wavelets

PyWavelets is a Python module for computing forward and inverse 1D and 2D Discrete Wavelet Transform, Stationary Wavelet Transform and Wavelet Packets decomposition and reconstruction.

------------------------------------------------------------------------

DISLIN

DISLIN is a high-level plotting library for displaying data as curves, polar plots, bar graphs, pie charts, 3D-color plots, surfaces, contours and maps... It was provided by Max Planck Institute for Solar System Research, Lindau, Germany.

------------------------------------------------------------------------

PyRAF

PyRAF is an environment that combines the power and flexibility of Python with the ability to use IRAF and STSDAS tasks...

------------------------------------------------------------------------

PyANA

In Python, ANA files can be read in using PyAna. This library is based on the code used for the IDL DLM library and provides more or less similar functionality. It has been tested and should work without too much problems. The most recent version can be obtained from Pyana @ Github

------------------------------------------------------------------------

CASA

It is the suite of C++ application libraries for the reduction and analysis of radio astronomical data (derived from the former AIPS++ package)

and much more

For example, you can use unix commands in your python command line if you include unix.py and put it into your $PYTHONPATH, insert this line into ~/.pythonrc or execute manually:

from unix import *

Now, you can use commands like ls in your python shell.

You can also connect to your C++ and Fortran subroutines by F2py and weave.



Examples

You can start using Python either in its shell script or by running codes. Note that like C++, you have to include the libraries you need in each code at the first few lines.

Simple unix commands within python:

>>> import shutil                                      #a python built-in utility to copy files and directory trees
>>> shutil.copy2('file1.fits','file2.fits')         # equal to "cp -p source destination"

>>> from unix import *
>>>ls                                                        # or ll, pwd, cd, less, cat, edit...



Get system time:
>>> import time
>>> time.localtime()
(2008, 11, 6, 21, 58, 18, 3, 311, 0)          # year, month, day, hour, minute, second, day in week, day in year, ...
>>> time.time()
1226005588.412873                               # return the time as a floating point number expressed in seconds

to simply measure the execution time (profiler):

>>> import time
>>> t1 = time.time()
>>> ...........

>>> elapsed = time.time() - t1

or having a pause (wait):
>>> time.sleep(x)                                   # x is waiting time in seconds

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

to run a bash script within python, e.g., making a movie 
out of a set of images, 

>>> import os, sys

>>>

>>> os.system("mencoder 'mf://img*.png' -mf type=png:fps=10 \\

-ovc lavc -lavcopts vcodec=wmv2 -oac copy -o animation.mpg")



>>> # os.system execute the command (a string) in a subshell

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

to access help for each command: 

>>> help(command_name)



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
I usually import the following Libs:

>>> import pyfits
>>> from pylab import *
>>> from numpy import *

To plot the function y=exp(x) in the range [0., 1.], one can do the following:

>>> from pylab import *
>>> a = arange(0,1,.02)
>>> b = exp(a)
>>> plot(a,b,'r—')   # 'r' means with red color and '-' means with dashed linestyle
>>> show()

At this point, you have an interactive window. You can simple resize it, drag it, save the figure
as image (e.g., PNG) or postscript(eps).

conversion between string and float data types:

>>> a = 2.50000074675 # float to string
>>> print str(a)
2.50000074675

>>> b = ' -2.60300 ' # string to float
>>> print float(b)
-2.603
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
to plot something with errorbars :

>>> from pylab import *
>>>
>>> t = arange(0.1, 4, 0.1)
>>> s = exp(-t)
>>> e = 0.1*abs(randn(len(s)))
>>> f = 0.1*abs(randn(len(s)))
>>> g = 2*e
>>> h = 2*f
>>>
>>> figure()

>>> errorbar(t, s, e, fmt='o')             # vertical symmetric

>>> show()

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
to plot something with legend :

>>> from pylab import *
>>> import sys
>>>
>>> N = 100
>>> x = arange(N) * .02 * pi
>>> plot(x, sin(x), 'o', label='y=sin(x)')
>>> legend()
>>> show()

or you can use  a complicated LaTex command:

>>> legend([r"$\sqrt{x^2}$"])  # using Matplotlib internal LaTex engine

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

to read/write an ASCII file:
Python includes a built-in file type. Files can be opened by using the file type's constructor:

>>> f = file('test.txt', 'r')

This means f is open for reading. The first argument is the filename and
the second parameter is the mode, which can be 'r', 'w', or 'rw', among some others.
The most common way to read from a file is simply to iterate over the lines of the file:

>>>f = open('test.txt', 'r')
>>>for line in f:
.....        print line
>>>f.close()

A file object implicitly contains a marker to represent the current position.
If the file marker should be moved back to the beginning, 
one can either close the file object and reopen it or just move the marker back to the beginning with:

>>>f.seek(0)



An alternative is to use ASCIIDATA module to read ASCII TABLES:
>>> import asciidata
>>> a= asciidata.open('your_file.txt')
>>> print a[1][2]                                 # it reads the second character of the third line
-3.9

read/write numeric array using Scipy:
>>> from numpy import *
>>> from scipy.io import write_array
>>> ......
>>> write_array("myfile.txt", array)

see this link for more examples.


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
to read a FITS file and its header

>>> import pyfits
>>> a = pyfits.open(your_file)
>>> a.info()                                # to get some information about the data structure
 
>>> data = a[0].data.copy()        #in this way, you read the whole data in the first file.

>>> a = pyfits.open(your_file) 
>>> data = a[0].section[n:m, :, :] # in this way, you can read m images, starting at image n
or
>>> map = a[0].section[n, :, :]     # in this way, you can read only image n 

>>> print 'image size is ', map.shape           # to get array size     
>>> print 'data type is', map.dtype.name        # to get array data type

>>> header = pyfits.getheader(your_file)
>>> print ' lambda=', header['WAVELNTH'], ' exptime=', header['EXPTIME']

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

statistical properties of an array
>>> import numpy
>>> mu1= 0.0    # mean value of the normal distribution
>>> std1 = 1.0    # standard deviation of the normal distribution
>>> x = numpy.random.normal(mu1, std1, (100,100))  # a 100X100 array
>>> x.mean()
0.0081032799328860788
>>> x.std()
0.99726626000076857

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

to fit a polynomial to the data
>>> import numpy
>>> a = arange(10)*1.0
>>> b = 2.*power(a,2) - 3.65*a + 1.25 # create an arbitrary polynomial
>>> x = polyfit(a,b,2) # we have to fit a second order polynomial in this case
>>> print x
[ 2. -3.65 1.25]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

to create a histogram 
>>> from pylab import *
>>> rcParams['text.fontname'] = 'cmr18'  # to change the font size
>>> mu, sigma = 10.0, 1.5
>>> x = mu + sigma*randn(100,100)
>>> n, bins, patches = hist(x, 100, normed=1) # the histogram of the data
>>> xlabel('value')        
>>> ylabel('probability')
>>> title(r'$ \mu=10.0,\/ \sigma=1.5$')
>>> show()

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

to create a map with colorbar : assume that your array is x,
>>> imshow(x,cmap=cm.jet)
>>> colorbar()
add a contour to your map
>>> levels=[-5.,5.]
>>> contour(x,levels,linewidths=1.5)
>>> show()

to see which color tables are available:

>>> help(matplotlib.cm)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

to create a map and overlay contour of another map on it:

>>> extent=(0,12,0, 12)    #dimension of the map
>>> figure()
>>> im = imshow(map1, interpolation='nearest', origin='lower',cmap=cm.gray,extent=extent)
>>> v = axis()
>>> cs = colorbar(orientation='vertical', ticks=(.8, .9, 1., 1.1, 1.2))  # colorbar if you want
>>> levels = (-3., -1., 1., 3.)   # levels for the contour of map2 that will be overlayed
>>> # one can use contourf to have filled contours
>>> cset = contour(map2, levels, origin='lower',cmap=cm.jet,linewidths=1.9,extent=extent)
>>> for c in cset.collections:
                c.set_linestyle('solid')        # solid, dashed, ...
>>> savefig('map.eps', dpi=150)     #  ps, eps, Jpeg, PNG,....

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

to create a random number : use built-in random module,


>>> random.random()        # Random float x, 0.0 <= x < 1.0

0.37444887175646646

>>> random.uniform(1, 10)  # Random float x, 1.0 <= x < 10.0

1.1800146073117523

>>> random.randint(1, 10)  # Integer from 1 to 10, endpoints included

7

>>> random.randrange(0, 101, 2)  # Even integer from 0 to 100

26

>>> random.choice('abcdefghij')  # Choose a random element

'c'

to create a array of random number,

>>> random.uniform(0,1,(10,10))         # Uniform: 10X10 array
>>> random.standard_normal((10,))    # Normal distribution, 1D array with 10 elements

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

to perform Fourier spectrogram analysis:


>>> from pylab import *
>>>
>>> dt = 0.0005
>>> t = arange(0.0, 20.0, dt)
>>> s1 = sin(2*pi*100*t)
>>> s2 = 2*sin(2*pi*400*t)
>>>
>>> # create a transient "chirp"
>>> mask = where(logical_and(t>10, t<12), 1.0, 0.0)
>>> s2 = s2 * mask
>>>
>>> nse = 0.01*randn(len(t))   # add some noise
>>>
>>> x = s1 + s2 + nse    # the signal
>>> NFFT = 1024            # the length of the windowing segments
>>> Fs = int(1.0/dt)         # the sampling frequency
>>>
>>> # Pxx is the segments x freqs array of instantaneous power, freqs is
... # the frequency vector, bins are the centers of the time bins in which
... # the power is computed, and im is the matplotlib.image.AxesImage
... # instance
...
>>> ax1 = subplot(211)
>>> plot(t, x)
>>> subplot(212, sharex=ax1)
>>> Pxx, freqs, bins, im = specgram(x, NFFT=NFFT, Fs=Fs, noverlap=900)
>>> show()

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

the for loop statement:


>>> for i in arange(1,5):
...      print i
... else:
...      print 'The For Loop Is Over.'
...
1
2
3
4
The For Loop Is Over.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

the while loop statement:


>>> i=1
>>> while i<=4:
...      print i, "x 8 =", i*8
...      i = i+1
...
1 x 8 = 8
2 x 8 = 16
3 x 8 = 24
4 x 8 = 32

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

the if statement:


>>> i=2
>>> if (i<3):
...     print i
...
2

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

to replace values using the where command:


>>> a=pyfits.open('da.fits')
>>> da=a[0].data.copy()
>>> da[where(da > .5)] = 1.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

have a look at the wavelets:


>>> import pywt
>>> print pywt.families()
['haar', 'db', 'sym', 'coif', 'bior', 'rbio', 'dmey']

or wavelet coefficients and reconstruction:

>>> import pywt
>>> coeffs = pywt.wavedec([1.1,2.2,3.3,4.4,5.5,6.6,7.7,8.8], 'db2', level=2)
>>> print pywt.waverec(coeffs, 'db2')
[ 1.1 2.2 3.3 4.4 5.5 6.6 7.7 8.8]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
There are lots of other examples you can have a look to get some ideas, 1, 2, 3, ...

see also Python based scientific analysis cookbook for a short introduction.


Many thanks to Frederic Paletou, LATT(Toulouse).

last update: October 15, 2015