GOBO

Information - Co-expressed Genes

Introduction

The Co-expressed Genes module of GOBO allows a user to identify genes showing co-expression with a specified "prototype" gene across either a tumor data set or the cell line panel. Please observe that the Chin et al. data set is excluded from correlation analysis across the tumor data set as this data set was generated on a different version of the U133A chip, which could potentially cause bias in unrestricted correlation analysis.

Special notes

  • Correlation analysis is only performed if the prototype gene passes a log2 standard deviation cut-off threshold. The log2 standard deviation for the prototype gene is reported in the text run summary.
  • Correlations to the prototype gene are only computed for genes passing the log2 standard deviation filter.
  • If the analysis fails, typically, try to use a lower log2 standard deviation value. Furthermore, setting the correlation cut-off value too high can result in no genes showing co-expression. Try with a lower correlation cut-off value.
  • Iterative correlation analysis is currently only performed if the following requirements are met:
    • The number of identified co-expressed genes is larger than the number of connections and there are <200 co-expressed genes (limits computation time).
    • At least one connection is specified.
  • The text run summary lists which predictor genes that have matched to the main data set.

Input variables required for Co-expressed Genes

  • Specify prototype gene. Enter a single gene symbol or Entrez Gene Id representing the prototype gene in the specify prototype gene field. Gene symbols should follow the guidelines for human gene nomenclature and generally be in upper case (Guidelines for Human Gene Nomenclature).
  • Data selection. Select which main data set to run the analysis in. It is possible to run the analysis in all tumors, subgroups of tumors and the cell line data set. Note that re-centering of expression data is not performed for tumors in tumor subgroups.
  • Correlation method. Select which correlation method to use in the analysis (Pearson or Spearman).
  • In the Select correlation sign drop-down menu select whether to identify genes based on positive correlation alone, negative correlation (anti-correlation) alone, or absolute correlation (both positive and negative).
  • Specify correlation cut-off value. Specify the correlation cut-off value used to identify co-expressed genes. This value should always be specified as a positive numerical value (>=0). A higher value means that more stringent co-expression is required. If a negative correlation sign has been selected then the correlation cut-off will be interpreted as (-1)*correlation_cut_off_value.
  • Specify log2 standard deviation cut-off value. This value should always be a positive numerical value (>=0). This value is used to remove genes with low standard deviation in log2 expression across the selected main data set prior to correlation analysis. A higher value means that fewer genes are passed to correlation analysis, potentially removing false positives acquiring high correlation due to low variation. If the value is higher than the selected prototype genes standard deviation then no analysis will be performed.
  • Select minimal number of connections. The selected integer number equals the lowest number of connections required in iterative correlation analysis of the identified co-expressed genes. Briefly, iterative correlation analysis means that the co-expressed genes together with the prototype gene are subjected to additional rounds of correlation analysis in the selected main data set. For each co-expressed gene the correlation analysis is performed versus all others co-expressed genes (including the prototype gene). If a pair is identified with a correlation larger than the correlation cut-off this is recorded as a gene-gene connection. If a gene has sufficiently many connections it is recorded in the output. The idea is that this will allow identification of more tightly co-expressed gene clusters. Iterative correlation analysis is performed as, and further described in Fredlund et al. (Breast Cancer Research 2012;14(4):R113.). Output from the iterative analysis may be further explored using network software such as Cytoscape (www.cytoscape.org).

Current output from Co-expressed Genes

  • A PDF file showing distribution of correlations together with cut-off and number of identified co-expressed genes.
  • A SIF file suitable for use with the Cytoscape software.
  • A tab delimited text file listing the identified co-expressed genes.
  • A text run summary.