
Currently, GSEA ignores these descriptions. The second line contains a list of sample descriptions. (sample N name)įor example: Description Accession DLBC1_1 DLBC2_1. Line format: Description (tab) Accession (tab) (sample 1 name) (tab) (tab) (sample 2 name) (tab) (tab). Two tabs (\t\t) separate the sample identifier labels because each sample contains two data values (an expression value and a present/marginal/absent call). The first line contains a list of labels identifying the samples associated with each of the columns in the remainder of the file. The main difference between RES and GCT file formats is the RES file format contains labels for each gene's absent (A) versus present (P) calls as generated by Affymetrix's GeneChip software. The RES file format is a tab delimited file format that describes an expression dataset. 44Įxample file: P53_hgu95av2.gct RES: ExpRESsion (with P and A calls) file format (*.res) (col N data)Įxample: AFFX-BioB-5_at AFFX-BioB-5_at (endogenous control) -104 -152 -158. Line format: (gene name) (tab) (gene description) (tab) (col 1 data) (tab) (col 2 data) (tab). To specify a missing intensity value, leave the field empty. If no description is available, enter a text string such as NA or NULL. Names and descriptions can contain spaces, but may not be empty. Each row contains a name, a description, and an intensity value for each sample. The number of rows and columns should agree with the number of rows and columns specified on line 2. There is one row for each gene and one column for each of the samples. The remainder of the data file contains data for each of the genes.

(sample N name)Įxample: Name Description DLBC1_1 DLBC2_1. Line format: Name(tab)Description(tab)(sample 1 name)(tab)(sample 2 name) (tab). The third line contains a list of identifiers for the samples associated with each of the columns in the remainder of the file. Line format: ( # of data rows) (tab) (# of data columns) Note that the name and description columns are not included in the number of data columns. The second line contains numbers indicating the size of the data table that is contained in the remainder of the file. Therefore, the first line must be as follows: The first line contains the version string and is always the same for this file format.

The GCT format is a tab delimited file format that describes an expression dataset. GCT: Gene Cluster Text file format (*.gct) Note: The GCT & RES expression formats supported by GSEA are identical to those supported by GenePattern. 5.1 RNK: Ranked list file format (*.rnk).3.4 XML: Molecular signature database file format (msigdb_*.xml).3.2 GMT: Gene Matrix Transposed file format (*.gmt).3.1 GMX: Gene MatriX file format (*.gmx).2.2 CLS: Continuous (e.g time-series or gene profile) file format (*.cls).2.1 CLS: Categorical (e.g tumor vs normal) class file format (*.cls).1.4 TXT: Text file format for expression dataset (*.txt).


Due to restrictions imposed by certain Java libraries used by GSEA, the GSEA command line cannot accept file names that contain hypens. When creating files for GSEA, do not use hypens (-) in the file names. When Excel asks if you want to save your changes to this file, select No (you have already saved the file). Excel displays a message warning you that your file may contain features that are not compatible with this format and asks if you want to keep the workbook in this format. To create a tab-delimited text file: select File>Save As, enter the file name in quotes to preserve the the file extension (for example, "p53.gct"), and select "Text(Tab delimited)(*.txt)" as the file type.Be aware that Excel's auto-formatting can introduce errors in gene names, as described in Zeeberg, et al 2004.To create and edit GSEA files, use Excel or a text editor. Each GSEA supported file is an ASCII text file with a specific format, as described below.
