Accessing HDF5 files with PDL::IO::HDF5

We have been thinking about using HDF5 files to store gene expression tables. Compared to flat files, they offer the significant advantage of constant time element and slice access. Compared to traditional PDLs they are not limited by 32-bit architecture (so larger tables can be stored). Compared to MySQL tables they are not limited by issues with large numbers of columns. So they seem like they will be a good solution. My colleagues are working on an R interface so I wanted to check out the existing Perl interface, and see if using it one can actually achieve the following design goals Continue reading →

Storing Variable-length sequences in a Relational Database

It is easy to store fixed-length sequences like ordered pairs or triples in a relational database—just define one column for each element of the sequence. Then the rows of your table correspond to your tuples. However, when the number of components is variable, like storing the exons of a transcript, it is a bit more complicated. What are the different options for storing this type of data, and how do they compare to each other in terms of storage requirements and access times for different tasks? Continue reading →

Fixing MISO error codes

I am writing some wrapper scripts to run MISO. However, when you supply bad arguments MISO exits but with error code 0. So it is hard to check that the execution was successful. Continue reading →

Installing Perl modules without Root access

I’ve been working on Harvard FAS compute cluster odyssey. It runs LSF for job management. It is really a beautiful system but I am still learning.

The last few years I’ve been working on a workstation, so I’ve always had root access. Installing Perl modules from cpan was a cinch:

sudo cpan -i Module::Name

However, as a cluster user I have no root access. Continue reading →

Running MISO on a cluster

I’m trying to get Yarden Katz’s program MISO running on the Harvard FAS computing cluster. It is not so easy to satisfy all the dependencies.

Some of the dependencies, including pygsl, scipy and numpy, are already available through add-on modules. I found that the problem is that they are not all the right versions and I get errors such as the following:
ValueError: can't handle version -1 of numpy.dtype pickle
ImportError: cannot import name defaultdict
ValueError: numpy.dtype does not appear to be the correct type object

I was able to solve these problems by Continue reading →

First post

Welcome to the Friedman Lab blog. I will explore here ideas about bioinformatics.