
Node color is mapped to the test statistic. In this network, the results object from NetGSA() is loaded as the node data and the number of edges between genes of separate pathways are loaded as edge data in a variable called “weight”. Edges in this network represent edges between genes contained within those pathways. A nested network is created where the main network (Pathway Network) displays pathways as nodes. If the user has Cytoscape open on their computer, calling plot.NetGSA will create several plots:Ĭytoscape plots - the first place plot.NetGSA generates plots is within Cytoscape. Let’s also use clustering to speed up computation. Suppose we wanted to estimate the network for our example data using our known edges/non-edges and searching for edges in Reactome, KEGG, and BioCarta. It is also important to note that prepareAdjMat will automatically choose the correct network estimation technique based on whether or not the graph is directed so no additional work is needed to determine undirected vs directed graphs. For more information on the assumptions and how network information is incorporated from the database edgelists, the user edges, and the user non-edges see the Details section and file_e and file_ne parameters in ?prepareAdjMat. User specified edge and non-edge files are specified with the file_e and file_ne arguments respectively. More details are provided in ?prepareAdjMat. prepareAdjMat chooses the best clustering method from 6 possible methods in the igraph package. However, the user can override this behavior by setting the cluster argument. The default behavior is to use clustering if p > 2,500. The cluster argument controls whether or not clustering is used when estimating the adjacency matrix. Note the databases argument is case sensitive so make sure to pass "reactome" and not "Reactome". The graphite databases are: # "kegg" "panther" "pathbank" "pharmgkb" "reactome" The options are the databases for homo spaiens available in graphite or NDEx (only for development version on Github). In both methods, one must specify the databases to search. Using the obtainEdgeList method, one can save the edgelist to ensure the same network information is used across iterations or in the future. Since prepareAdjMat queries the graphite databases when it is called and graphite databases can change overtime, this may not be desirable for reproducibility. These are essentially the same thing, the only difference is that for the character vector method, obtainEdgeList is called inside prepareAdjMat and cannot be saved. The databases argument can be either (1) the result of obtainEdgeList or (2) a character vector defining the databases to search. Remember, the rownames of the data matrix X must be named as "GENE_ID:GENE_VALUE" as in "ENTREZID:7534". Note it is assumed that each edge/non-edge is directed so if you want an undirected edge/non-edge you should put in two observations as in: # base_gene_src base_id_src base_gene_dest base_id_destĪfter having the data set-up, the first step in pathway enrichment analysis with netgsa is to estimate the adjacency matrices. 4th column - Gene identifier of the destination gene (base_id_dest) e.g. “UNIPROT”.3rd column - Destination gene (base_gene_dest), e.g. “8607”.2nd column - Gene identifier of the source gene (base_id_src), e.g. “ENTREZID”.The columns do not necessarily need to be named properly, they simply must be in this specific order: They both must have 4 columns in the following order. Each observation is assumed to be a directed edge (for edgelist) or a directed non-edge (for non-edgelist). These are where users can specify known edges/non-edges of their own. The edgelist and non-edgelist are strings representing file locations and are read in using data.table’s fread() command.
