2012年1月6日星期五

R or Python - as statistical bench

There is a nice discussion about choosing R or Python as your statistical bench. You could also find much good Python resources.

I just grab some here:
########################################
(1)

  • NumPy/Scipy You probably know about these already. But let me point out the Cookbookwhere you can read about many statistical facilities already available and the Example Listwhich is a great reference for functions (including data manipulation and other operations). Another handy reference is John Cook's Distributions in Scipy.
  • pandas This is a really nice library for working with statistical data -- tabular datatime series,panel dataIncludes many builtin functions for data summariesgrouping/aggregation,pivotingAlso has a statistics/econometrics library.
  • larry Labeled array that plays nice with NumPy. Provides statistical functions not present in NumPy and good for data manipulation.
  • python-statlib A fairly recent effort which combined a number of scattered statistics libraries. Useful for basic and descriptive statistics if you're not using NumPy or pandas.
  • statsmodels Statistical modeling: Linear models, GLMs, among others.
  • scikits Statistical and scientific computing packages -- notably smoothing, optimization and machine learning.
  • PyMC For your Bayesian/MCMC/hierarchical modeling needs. Highly recommended.
  • PyMix Mixture models.
    If speed becomes a problemconsider Theano -- used with good success by the deep learningpeople.
    ########################################
    (2)

    • matplotlib for beautiful, publication quality graphics.
    • IPython for an enhancedinteractive Python consoleImportantlyIPython provides a powerfulframework for interactiveparallel computing in Python.
    • Cython for easily writing C extensions in PythonThis package lets you take a chunk ofcomputationally intensive Python code and easily convert it to a C extensionYou'll then beable to load the C extension like any other Python module but the code will run very fast sinceit is in C.
    • PyIMSL Studio for a collection of hundreds of mathemaical and statistical algorithms that are thoroughly documented and supported. You can call the exact same algorithms from Python and C, with nearly the same API and you'll get the same results. Full disclosure: I work on this product, but I also use it a lot.
    • xlrd for reading in Excel files easily.
    If you want a more MATLAB-like interactive IDE/consolecheck out Spyder, or the PyDev plugin for Eclipse.
    ########################################
    (3)

    I know there's also rpywhere python can call R functionsThis can be usefulbut if you're "just"doing statistics then I would use R.
    ########################################
    (4)

    The following StackOverflow discussions might be useful

    Other useful packages specifically for data structures include:
    • pydataframe replicates a data.frame and can be used with rpyAllows you to use R-likefiltering and operations.
    • pyTables Uses the fast hdf5 data type underneath, been around for ages
    • h5py Also hdf5, but specifically aimed at interoperating with numpy
    ########################################
    (5)

    What you are looking for is called Sage: http://www.sagemath.org/
    It is an excellent online interface to a well-built combination of Python tools for mathematics.
    ########################################
    (6)

    I like the http://enthought.com/ python distributionIt's commercialyet free for academicpurposes andas far as I knowcompletely open-sourceAs I'm working with a lot of students,before using enthought it was sometimes troublesome for them to install numpyscipyipythonetcEnthought provides an installer for WindowsLinux and Mac.
    Two other packages worth mentioning:
    1. ipython (comes already with enthoughtgreat advanced shella good intro is on showmedohttp://showmedo.com/videotutorials/series?name=PythonIPythonSeries
    2. nltk - the natural language toolkit http://www.nltk.org/ great package in case you want to do some statistics /machine learning on any corpus

没有评论:

发表评论