PyTables

PyTables 2.0b2 "a wise old owl's milestone"

PyTables 2.0 is a milestone release with great improvements including now having NumPy at the core.
Improved optimizations and code refactoring have increased the speed of most operations up to 2x.

Milestone information

Project:: PyTables

Series:: trunk

Version:: 2.0b2

Code name:: a wise old owl's milestone

Released:: 2007-04-10

Registrant:: Dieter Hering

Release registered:: 2007-04-10

Active:: No. Drivers cannot target bugs and blueprints to this milestone.

Download RDF metadata

Activities

Assigned to you:: No blueprints or bugs assigned to you.

Assignees:: No users assigned to blueprints and bugs.

Blueprints:: No blueprints are targeted to this milestone.

Bugs:: No bugs are targeted to this milestone.

Download files for this release

Release notes

===========================
Announcing PyTables 2.0b2
===========================

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing. PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

The PyTables development team is happy to announce the public
availability of the second *beta* version of PyTables 2.0. This will
hopefully be the last beta version of 2.0 series, so we need your
feedback if you want your issues to be solved before 2.0 final would be
out.

You can download a source package of the version 2.0b2 with
generated PDF and HTML docs and binaries for Windows from
http://www.pytables.org/download/preliminary/

For an on-line version of the manual, visit:
http://www.pytables.org/docs/manual-2.0b2
Please have in mind that some sections in the manual can be obsolete
(specially the "Optimization tips" chapter). Other chapters should be
fairly up-to-date though (although still a bit in state of flux).

In case you want to know more in detail what has changed in this
version, have a look at ``RELEASE_NOTES.txt``. Find the HTML version
for this document at:
http://www.pytables.org/moin/ReleaseNotes/Release_2.0b2

If you are a user of PyTables 1.x, probably it is worth for you to look
at ``MIGRATING_TO_2.x.txt`` file where you will find directions on how
to migrate your existing PyTables 1.x apps to the 2.0 version. You can
find an HTML version of this document at
http://www.pytables.org/moin/ReleaseNotes/Migrating_To_2.x

Keep reading for an overview of the most prominent improvements in
PyTables 2.0 series.

New features of PyTables 2.0
============================

- NumPy is finally at the core! That means that PyTables no longer
  needs numarray in order to operate, although it continues to be
  supported (as well as Numeric). This also means that you should be
  able to run PyTables in scenarios combining Python 2.5 and 64-bit
  platforms (these are a source of problems with numarray/Numeric
  because they don't support this combination as of this writing).

- Most of the operations in PyTables have experimented noticeable
  speed-ups (sometimes up to 2x, like in regular Python table
  selections). This is a consequence of both using NumPy internally and
  a considerable effort in terms of refactorization and optimization of
  the new code.

- Combined conditions are finally supported for in-kernel selections.
So, now it is possible to perform complex selections like::

result = [ row['var3'] for row in
table.where('(var2 < 20) | (var1 == "sas")') ]

or::

      complex_cond = '((%s <= col5) & (col2 <= %s)) ' \
                     '| (sqrt(col1 + 3.1*col2 + col3*col4) > 3)'
      result = [ row['var3'] for row in
                 table.where(complex_cond % (inf, sup)) ]

and run them at full C-speed (or perhaps more, due to the cache-tuned
computing kernel of Numexpr, which has been integrated into PyTables).

- Now, it is possible to get fields of the ``Row`` iterator by
specifying their position, or even ranges of positions (extended
slicing is supported). For example, you can do::

      result = [ row[4] for row in table # fetch field #4
                 if row[1] < 20 ]
      result = [ row[:] for row in table # fetch all fields
                 if row['var2'] < 20 ]
      result = [ row[1::2] for row in # fetch odd fields
                 table.iterrows(2, 3000, 3) ]

in addition to the classical::

result = [row['var3'] for row in table.where('var2 < 20')]

- ``Row`` has received a new method called ``fetch_all_fields()`` in
order to easily retrieve all the fields of a row in situations like::

[row.fetch_all_fields() for row in table.where('column1 < 0.3')]

  The difference between ``row[:]`` and ``row.fetch_all_fields()`` is
  that the former will return all the fields as a tuple, while the
  latter will return the fields in a NumPy void type and should be
  faster. Choose whatever fits better to your needs.

- Now, all data that is read from disk is converted, if necessary, to
  the native byteorder of the hosting machine (before, this only
  happened with ``Table`` objects). This should help to accelerate
  applications that have to do computations with data generated in
  platforms with a byteorder different than the user machine.

- The modification of values in ``*Array`` objects (through __setitem__)
  now doesn't make a copy of the value in the case that the shape of the
  value passed is the same as the slice to be overwritten. This results
  in considerable memory savings when you are modifying disk objects
  with big array values.

- All the leaf constructors have received a new parameter called
  ``byteorder`` that lets the user specify the byteorder of their data
  *on disk*. This effectively allows to create datasets in other
  byteorders than the native platform.

- Native HDF5 datasets with ``H5T_ARRAY`` datatypes are fully supported
for reading now.

- The test suites for the different packages are installed now, so you
don't need a copy of the PyTables sources to run the tests. Besides,
you can run the test suite from the Python console by using::

>>> tables.tests()

Changelog

View the full changelog

    * A very exhaustive overhauling of the User's Manual is in process. The chapters 1 (Introduction), 2 (Installation), 3 (Tutorials) have been completed (and hopefully, the lines of code are easier to copy&paste now), while chapter 4 (API Reference) has been done up to (and including) the Table class. During this tedious (but critical in a library) overhauling work, we have tried hard to synchronize the text in the User's Guide with that which appears on the docstrings.
    * Removed the recursive argument in Group._f_walkNodes(). Using it with a false value was redundant with Group._f_iterNodes(). Fixes ticket #42.
    * Removed the coords argument from Table.read(). It was undocumented and redundant with Table.readCoordinates(). Fixes ticket #41.
    * Fixed the signature of Group.__iter__() (by removing its parameters).
    * Added new Table.coldescrs and Table.description._v_itemsize attributes.
    * Added a couple of new attributes for leaves:
          o nrowsinbuf: the number of rows that fit in the internal buffers.
          o chunkshape: the chunk size for chunked datasets.
    * Fixed setuptools so that making an egg out of the PyTables 2 package is possible now.
    * Added a new tables.restrict_flavors() function allowing to restrict available flavors to a given set. This can be useful e.g. if you want to force PyTables to get NumPy data out of an old, numarray-flavored PyTables file even if the numarray package is installed.
    * Fixed a bug which caused filters of unavailable compression libraries to be loaded as using the default Zlib library, after issuing a warning. Added a new FiltersWarning and a Filters.copy().

0 blueprints and 0 bugs targeted

There are no feature specifications or bug tasks targeted to this milestone. The project's maintainer, driver, or bug supervisor can target specifications and bug tasks to this milestone to track the things that are expected to be completed for the release.

Related milestones and releases

This milestone contains Public information

Everyone can see this information.