SDDPACK
Software for the Semidiscrete Decomposition





Software Copyright and License

SDDPACK: Software for the Semidiscrete Decomposition.
Copyright © 1999 Tamara G. Kolda and Dianne P. O'Leary.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.


Authors

Tamara G. Kolda
Computational Sciences and Mathematics Research Department
Sandia National Labs
Livermore, CA 94551
Tamara.Kolda@na-net.ornl.gov
http://www.epm.ornl.gov/~kolda

Dianne P. O'Leary
Department of Computer Science
University of Maryland
College Park, MD 20742
oleary@cs.umd.edu
http://www.cs.umd.edu/users/oleary


Support

Kolda's research supported by the Applied Mathematical Sciences Research Program of Office of Energy Research in the U.S. Department of Energy, under contract DE-AC05-96OR22464 with Lockheed Martin Energy Research Corporation; the National Physical Sciences Consortium in conjunction with the University of Maryland and the National Security Agency; and the IDA Center for Computing Sciences.

O'Leary's research supported by the National Science Foundation under Grant CCR-97-32022 and by the Departement Informatik, ETH Zurich, Switzerland.


Bugs

Reports of bugs should be sent to Tamara.Kolda@na-net.ornl.gov.



Description

The semidiscrete decomposition (SDD) approximates a matrix as a weighted sum of outer products formed by vectors with entries constrained to be in the set {-1, 0, 1}.

It is useful for image compression and for latent semantic indexing (LSI) in information retrieval.

The primary advantage of the SDD over other types of matrix approximations such as the truncated singular value decomposition (SVD) is that it typically provides a more accurate approximation for far less storage.

This package provides software in Matlab and in C to compute the SDD.


Obtaining the Software

The software is available for download at http://www.cs.umd.edu/users/oleary/SDDPACK in the file SDDPACK.tgz.


Installation

After downloading the file SDDPACK.tgz, type "gunzip -c sdd.tgz | tar xvf -". This creates a directory called SDDPACK containing the following:

SDDPACK/README.html
SDDPACK/Makefile
SDDPACK/gpl.txt
SDDPACK/include/sdd.h
SDDPACK/include/qsortopt.h
SDDPACK/src/sdd.c
SDDPACK/src/decomp.c
SDDPACK/src/convertmtx.c
SDDPACK/src/convertsdd.c
SDDPACK/matlab/mtxread.m
SDDPACK/matlab/mtxwrite.m
SDDPACK/matlab/sdd.m
SDDPACK/matlab/sddweight.m
SDDPACK/matlab/sddtensor.m
SDDPACK/matlab/sddfun.m
SDDPACK/matlab/sddweightfun.m
SDDPACK/testdata/test1.mtx
SDDPACK/testdata/test1.gif
SDDPACK/doc/sddpack.ps

Documentation

Most documentation to run the software is given in this file. Also, each file contains comments with information on the code. General information about the SDD and its variants is provided in the included paper:

Tamara G. Kolda and Dianne P. O'Leary, Computation and Uses of the Semidiscrete Matrix Decomposition, Computer Science Department Report CS-TR-4012 Institute for Advanced Computer Studies Report UMIACS-TR-99-22, University of Maryland, April 1999.



SDD Matlab Code

Matlab codes are included in the directory SDDPACK/matab. To use these codes, be sure to be in the SDDPACK/matlab directory (or add this directory to your Matlab path).

We have included a small 5 x 5 test problem in testdata/test1.mtx. The matlab routine mtxread, included in SDDPACK, reads the matrix file.

>> A = mtxread('../testdata/test1.mtx');
>> full(A)

ans =

    0.5828    0.2259    0.2091    0.5678    0.4154
    0.4235    0.5798    0.3798    0.7942    0.3050
    0.5155    0.7604    0.7833    0.0592    0.8744
    0.3340    0.5298    0.6808    0.6029    0.0150
    0.4329    0.6405    0.4611    0.0503    0.7680

Using the routine sdd, included in SDDPACK, we can compute the 10-term SDD using the default settings as follows:

>> [D, X, Y] = sdd(A, 10)

D =

    0.4797
    0.3262
    0.1939
    0.0847
    0.1017
    0.0588
    0.0387
    0.0324
    0.0551
    0.0237


X =

     1     0    -1    -1     1     1    -1     0     0     1
     1     1     0     1    -1     1    -1    -1     0    -1
     1    -1     1     1     0    -1     1    -1     0    -1
     1     1     1    -1    -1    -1     0     1     0    -1
     1    -1     0     0     0     1     1     0     1     0


Y =

     1     0     0     0     1     0     0    -1    -1     0
     1     0     1     1     0     1     1     0     1     0
     1     0     1     0     0    -1     1    -1     0     1
     1     1    -1     1     0    -1    -1     1     0     0
     1    -1     0     1     0     0    -1    -1     0     0

We have also included a utility called sddfun that compares the SVD to the SDD with different starting criteria. We show the text below; the graphic is in SDDPACK/testdata/test1.gif.

>> sddfun(A)

Computing SVD

Rank of matrix: 5

Computing SDD with option 1
Computing SDD with option 2
Computing SDD with option 3
Computing SDD with option 4

Method Color Line Dot
------ ----- ---- ---
SVD      g    -    x
SDD-1    b    -    o
SDD-2    m    :    ^
SDD-3    r    -.   +
SDD-4    c    --   *

Method  Rel Resid  Inn Its  Density
------  ---------  -------  -------
SDD-1     11.92      3.20    66.00
SDD-2     11.85      2.60    72.00
SDD-3     13.95      2.80    84.00
SDD-4     11.92      2.80    66.00


Weighted SDD Matlab Code

The routine sddweight, included with SDDPACK, computes the weighted SDD of a given matrix. Using the same A from the previous example, and defining W as given below, we can compute the 5-term weighted SDD as follows:
>> A = full(mtxread('../testdata/test1.mtx'));
>> W = ones(5) - eye(5);
>> [D, X, Y] = sddweight(A, W, 5)

D =

    0.7833
    0.6029
    0.7680
    0.5828
    0.5798


X =

     0     0     0     1     0
     0     0     0     0     1
     1     0     0     0     0
     0     1     0     0     0
     0     0     1     0     0


Y =

     0     0     0     1     0
     0     0     0     0     1
     1     0     0     0     0
     0     1     0     0     0
     0     0     1     0     0
SDDPACK also contains sddweightfun, similar to sddfun described above. Sample output is given below. This also produces a figure, but we do not include it here.
>> sddweightfun(A,W)

Rank of matrix: 5

Computing SDDWEIGHT with option 1
Computing SDDWEIGHT with option 2
Computing SDDWEIGHT with option 3
Computing SDDWEIGHT with option 4

  Method    Color Line Dot
----------- ----- ---- ---
SDDWEIGHT-1   b    -    o
SDDWEIGHT-2   m    :    ^
SDDWEIGHT-3   r    -.   +
SDDWEIGHT-4   c    --   *

  Method     Rel Resid  Inn Its  Density
-----------  ---------  -------  -------
SDDWEIGHT-1    13.11      3.10    52.00
SDDWEIGHT-2    13.77      2.80    52.00
SDDWEIGHT-3     9.85      3.40    76.00
SDDWEIGHT-4    10.03      3.40    66.00

Tensor SDD Matlab Code

The tensor SDD can be computed with the Matlab code sddtensor, included with SDDPACK. An example of its usage is given below.
>> rand('state', 0);
>> A = rand(3,4,2);
>> [D, X] = sddtensor(A, 5)

D =

    0.5712
    0.3364
    0.2278
    0.2379
    0.1797


X = 

    [3x5 double]    [4x5 double]    [2x5 double]

>> X{1}

ans =

     1    -1     1     0     0
     1     1    -1    -1     1
     1    -1    -1     1     0

>> X{2}

ans =

     1     0     1     1     1
     1     0    -1     0     1
     1     1     0     1    -1
     1     1     0     0     0

>> X{3}

ans =

     1     0     1     1     1
     1     1     1    -1     1



Installing the SDD C Code

Change directories into SDDPACK. To compile the code, check the Makefile. You should check the location of the remove program and set the flags CC, CFLAGS, and LIB for your system. Several examples have been included. Once that is fixed, type "make" at the command prompt.

The flag -DQSORTOPT is optional. It provides an faster qsort that the default on most machines.

Special Note: If you do not have a 32- or 64-bit architecture, than you must be sure to include the following flag in the CFLAGS definition, e.g., -DIDXSHIFT=7 if 128-bit architecture.


Using the C Code

Run the code from the directory SDDPACK. A sample is given below. Results may vary slightly from machine to machine.
> decomp -k 10 testdata/test1.mtx testdata/test1.sdd
*** output from decomp ***
matrix file        : testdata/test1.mtx
SDD file           : testdata/test1.sdd
existing SDD file  : 
terms              : 10
accuracy           : 0.000000
tolerance          : 0.010000
max inner its      : 100
y init choice      : 1
input type         : text

  1   2 1.40882e+00 5.75170e+00  3    3
  2   4 5.57350e-01 8.51472e-01  3    6
  3   7 2.18833e-01 3.38517e-01  3    9
  4  10 1.32799e-01 8.60342e-02  2   11
  5  11 1.01791e-01 3.10079e-02  2   13
  6  12 4.99871e-02 5.18036e-02  4   17
  7  17 2.59716e-02 2.40155e-02  2   19
  8  19 1.33357e-02 1.26359e-02  3   22
  9  22 7.27355e-03 6.06216e-03  2   24
 10  23 5.02049e-03 2.25306e-03  2   26

      -- SDD information --
final residual norm         : 7.0855e-02
final relative residual norm: 0.026
total outer iterations      : 10
average inner iterations    : 2.600
average init iterations     : 2.300

> more testdata/test1.sdd
%% Semidiscrete Decomposition (SDD)
%% Matrix: testdata/test1.mtx Terms: 10 Accr: 0.00e+00 Tol: 1.00e-02 InnIts: 100 Init: 1
10 5 5
4.7965389490127563476562500e-01
3.2624220848083496093750000e-01
1.9394071400165557861328125e-01
8.4672987461090087890625000e-02
1.0166592150926589965820312e-01
5.8767084032297134399414062e-02
3.8742337375879287719726562e-02
3.2449815422296524047851562e-02
5.5055230855941772460937500e-02
2.3733202368021011352539062e-02
  1  1  1  1  1
  0  1 -1  1 -1
 -1  0  1  1  0
 -1  1  1 -1  0
  1 -1  0 -1  0
  1  1 -1 -1  1
 -1 -1  1  0  1
  0 -1 -1  1  0
  0  0  0  0  1
  1 -1 -1 -1  0
  1  1  1  1  1
  0  0  0  1 -1
  0  1  1 -1  0
  0  1  0  1  1
  1  0  0  0  0
  0  1 -1 -1  0
  0  1  1 -1 -1
 -1  0 -1  1 -1
 -1  1  0  0  0
  0  0  1  0  0

Note that these results match what we got with the matlab version of SDD on the same matrix.

By default, the SDD performs i/o in text mode. However, to get the advantage that the SDD has in terms of size, it is best to use binary i/o. We have included conversion programs for the matrices and SDD's. Their usage is shown below.

>  convertmtx testdata/test1.mtx testdata/test1.mtx.dat
*** output from convertmtx ***
Input File  : testdata/test1.mtx
Output File : testdata/test1.mtx.dat
> decomp -k 10 -b testdata/test1.mtx.dat testdata/test1.sdd.dat
*** output from decomp ***
matrix file        : testdata/test1.mtx.dat
SDD file           : testdata/test1.sdd.dat
existing SDD file  : 
terms              : 10
accuracy           : 0.000000
tolerance          : 0.010000
max inner its      : 100
y init choice      : 1
input type         : binary

  1   2 1.40882e+00 5.75170e+00  3    3
  2   4 5.57350e-01 8.51472e-01  3    6
  3   7 2.18833e-01 3.38517e-01  3    9
  4  10 1.32799e-01 8.60342e-02  2   11
  5  11 1.01791e-01 3.10079e-02  2   13
  6  12 4.99871e-02 5.18036e-02  4   17
  7  17 2.59716e-02 2.40155e-02  2   19
  8  19 1.33357e-02 1.26359e-02  3   22
  9  22 7.27355e-03 6.06216e-03  2   24
 10  23 5.02049e-03 2.25306e-03  2   26

      -- SDD information --
final residual norm         : 7.0855e-02
final relative residual norm: 0.026
total outer iterations      : 10
average inner iterations    : 2.600
average init iterations     : 2.300

> convertsdd -b testdata/test1.sdd.dat test1.sdd
*** output from convertsdd ***
Input File  : testdata/test1.sdd.dat
Output File : test1.sdd

> more testdata/test1.sdd
%% Semidiscrete Decomposition (SDD)
%% Matrix: testdata/test1.mtx Terms: 10 Accr: 0.00e+00 Tol: 1.00e-02 InnIts: 100 Init: 1
10 5 5
4.7965389490127563476562500e-01
3.2624220848083496093750000e-01
1.9394071400165557861328125e-01
8.4672987461090087890625000e-02
1.0166592150926589965820312e-01
5.8767084032297134399414062e-02
3.8742337375879287719726562e-02
3.2449815422296524047851562e-02
5.5055230855941772460937500e-02
2.3733202368021011352539062e-02
  1  1  1  1  1
  0  1 -1  1 -1
 -1  0  1  1  0
 -1  1  1 -1  0
  1 -1  0 -1  0
  1  1 -1 -1  1
 -1 -1  1  0  1
  0 -1 -1  1  0
  0  0  0  0  1
  1 -1 -1 -1  0
  1  1  1  1  1
  0  0  0  1 -1
  0  1  1 -1  0
  0  1  0  1  1
  1  0  0  0  0
  0  1 -1 -1  0
  0  1  1 -1 -1
 -1  0 -1  1 -1
 -1  1  0  0  0
  0  0  1  0  0

Even this example demonstrates how much room can be saved using binary mode rather than text: The file test1.sdd requires 772 bytes of storage, while test1.sdd.dat requires only 236 bytes.



MatrixMarket Test Problems

We used four MatrixMarket matrices in the examples in our paper. These are included in the tar file mtxfiles.tgz. Type "gunzip -c mtxfiles.tgz | tar xvf -" to unzip them.



Copyright © 1999 Tamara G. Kolda and Dianne P. O'Leary
Comments? Tamara.Kolda@na-net.ornl.gov
Last modified: Tue Apr 6 11:23:57 EDT 1999