##########################################################################
# _README_.txt:  explanatory text file for the Simcal1 module.           #                                               #                                                                        #
# Copyright (c) 2004 Daniel Dvorkin.                                     #
# Distributed under the Artistic License, WITHOUT "optional clause 8"    #
# (see http://www.opensource.org/licenses/artistic-license.php).         #
# All rights not granted in this license are reserved by the author.     #
##########################################################################

The Simcal1 module, as described in this document, is an application for data import, preprocessing, and analysis of a specific set of microarray data gathered by the Dr. Valerie Fadok lab at National Jewish Medical and Research Center in summer 2002.  In this version, it is neither designed nor suitable for other data.  However, the fundamental algorithms implemented here should be suitable for other data, and anyone who wants to do so is encouraged, within the terms of the Artistic License, to use them however s/he sees fit.  A number of general-purpose utilities are included in the module which may be useful in a wide range of Python, MySQL, and/or PHP applications.

In order to run, this module requires the following software, all of which is available for no charge under a variety of open-source licenses, to be installed on the host computer:

Python -- http://www.python.org

Numerical Python -- http://www.numpy.org (note that this version was developed using the Numeric module, rather than the newer Numarray; future versions will probably use Numarray instead)

MySQL -- http://www.mysql.com

MySQLdb -- the MySQL interface package for Python, at http://sourceforge.net/projects/mysql-python/

PHP -- http://www.php.net

Gnuplot -- http://www.gnuplot.info

No specific versions of these packages is recommended; when in doubt, always use the latest version.  At a minimum, use Python 2.3, Numeric 22.0, MySQLdb 0.9.2, MySQL 4.0, PHP 4.3, and Gnuplot 3.7.  Somewhat newer versions of all of these are available.

Development was done on an Apple iBook with a 1.2 GHz G4 CPU and 1.25 GB RAM, running OS X 10.3.  The program should run without modification on any Mac running OS X, or any PC running Unix (Linux, BSD, etc.), with equivalent or better specifications.  It may or may not run unmodified on other Unixes (SPARC Solaris, HP/UX, AIX, etc.), Windows, or other platforms.  However, if any modifications are necessary for any modern platform, they should be minor.  If you want to try to make it run on a VAX ... well, that's up to you.

Simcal1 is a command-line Python application which should be run from within its own directory.  The following files and directories should be included:

SQL (directory) -- this contains MySQL schemata for the various tables used by the program.  The names should be self-explanatory.

TextFiles (directory) -- this contains the raw Affymetrix microarray data, formatted for import by the functions in DataImport.  This directory MUST be located in the Simcal1 directory in order for data import to proceed successfully.

Simcal1.py -- the master include file for all other Python files.  This file defines a few constant and brings together the other files in the directory so they know about each other.  To use any of the utilities in the module from an interactive session, start Python from within the Simcal1 directory and then at the prompt type "import Simcal1" or "from Simcal1 import *", and everything should Just Work(tm).  Database information is set here, and must be set appropriately for the Python portion of the application to work.

DoStuff.py -- basically just a script that runs the analysis sequence, including data import, preprocessing, and clustering and linking.  In its unmodified form, this file does everything all at once, which takes about an hour to execute, so be careful.  This file must contain the full pathname for the files in TextFiles for import to succeed!  It is strongly recommended that this file, and indeed all the program files, be run optimized; that is, type "python -OO DoStuff.py" rather than just "python DoStuff.py".  Using full optimization will lead to a 10-20% speed improvement.

DbUtils.py and CalcUtils.py -- these define general-purpose database access and calculation (respectively) utilities which the program needs.  It is likely that in future versions of the program, they will be turned into full-fledged, independent packages which will install in the local Python site-packages directory.

Various other .py files -- these define the data structures, variables, and functions needed to run the program.  ClusteringControl defines functions called by DoStuff, in the natural order of execution.  DataImport is a collection of functions for data import from TextFiles (see below) which are also called from DoStuff.  DbEntities defines classes for data entity (gene and cluster) objects, which contain numerous useful methods.

PHPFiles (directory) -- contains the Web application for visualizing the data.  These files must reside on the same server as the MySQL database.  Database information, which must be set appropriately for the PHP portion of the application to work, is located in PHPFiles/main_inc.php.