Heat Diffusion Workflow¶
This module describes a heat diffusion workflow for analyzing BEL networks with differential gene expression 0.
It has four parts:
Assembling a network, pre-processing, and overlaying data
Generating unbiased candidate mechanisms from the network
Generating random sub-graphs from each unbiased candidate mechanism
Applying standard heat diffusion to each sub-graph and calculating scores for each unbiased candidate mechanism based on the distribution of scores for its sub-graph
In this algorithm, heat is applied to the nodes based on the data set. For the differential gene expression experiment, the log-fold-change values are used instead of the corrected p-values to allow for the effects of up- and down-regulation to be admitted in the analysis. Finally, heat diffusion inspired by previous algorithms published in systems and networks biology 1 2 is run with the constraint that decreases edges cause the sign of the heat to be flipped. Because of the construction of unbiased candidate mechanisms, all heat will flow towards their seed biological process nodes. The amount of heat on the biological process node after heat diffusion stops becomes the score for the whole unbiased candidate mechanism.
The issue of inconsistent causal networks addressed by SST 3 does not affect heat diffusion algorithms
since it can quantify multiple conflicting pathways. However, it does not address the possibility of contradictory
edges, for example, when A increases B
and A decreases B
are both true. A random sampling approach is used on
networks with contradictory edges and aggregate statistics over multiple trials are used to assess the robustness of the
scores as a function of the topology of the underlying unbiases candidate mechanisms.
Invariants¶
Because heat always flows towards the biological process node, it is possible to remove leaf nodes (nodes with no incoming edges) after each step, since their heat will never change.
Examples¶
This workflow has been applied in several Jupyter notebooks:
Future Work¶
This algorithm can be tuned to allow the use of correlative relationships. Because many multi-scale and multi-modal data are often measured with correlations to molecular features, this enables experiments to be run using SNP or brain imaging features, whose experiments often measure their correlation with the activity of gene products.
- 0
Hoyt, C. T., et al. (2017). PyBEL: a computational framework for Biological Expression Language. Bioinformatics (Oxford, England), 34(4), 703–704.
- 1
Bernabò N., et al. (2014). The biological networks in studying cell signal transduction complexity: The examples of sperm capacitation and of endocannabinoid system. Computational and Structural Biotechnology Journal, 11 (18), 11–21.
- 2
Leiserson, M. D. M., et al. (2015). Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nature Genetics, 47 (2), 106–14.
- 3
Vasilyev, D. M., et al. (2014). An algorithm for score aggregation over causal biological networks based on random walk sampling. BMC Research Notes, 7, 516.
-
pybel_tools.analysis.heat.
RESULT_LABELS
= ['avg', 'stddev', 'normality', 'median', 'neighbors', 'subgraph_size']¶ The columns in the score tuples
-
pybel_tools.analysis.heat.
calculate_average_scores_on_graph
(graph, key=None, tag=None, default_score=None, runs=None, use_tqdm=False)[source]¶ Calculate the scores over all biological processes in the sub-graph.
As an implementation, it simply computes the sub-graphs then calls
calculate_average_scores_on_subgraphs()
as described in that function’s documentation.- Parameters
graph (
BELGraph
) – A BEL graph with heats already on the nodeskey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.use_tqdm (
bool
) – Should there be a progress bar for runners?
- Returns
A dictionary of {pybel node tuple: results tuple}
- Return type
Suggested usage with
pandas
:>>> import pandas as pd >>> from pybel_tools.analysis.heat import calculate_average_scores_on_graph >>> graph = ... # load graph and data >>> scores = calculate_average_scores_on_graph(graph) >>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)
-
pybel_tools.analysis.heat.
calculate_average_scores_on_subgraphs
(subgraphs, key=None, tag=None, default_score=None, runs=None, use_tqdm=False, tqdm_kwargs=None)[source]¶ Calculate the scores over precomputed candidate mechanisms.
- Parameters
subgraphs (
Mapping
[~H,BELGraph
]) – A dictionary of keys to their corresponding subgraphskey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.use_tqdm (
bool
) – Should there be a progress bar for runners?
- Return type
- Returns
A dictionary of keys to results tuples
Example Usage:
>>> import pandas as pd >>> from pybel_tools.generation import generate_bioprocess_mechanisms >>> from pybel_tools.analysis.heat import calculate_average_scores_on_subgraphs >>> # load graph and data >>> graph = ... >>> candidate_mechanisms = generate_bioprocess_mechanisms(graph) >>> scores = calculate_average_scores_on_subgraphs(candidate_mechanisms) >>> pd.DataFrame.from_items(scores.items(), orient='index', columns=RESULT_LABELS)
-
pybel_tools.analysis.heat.
workflow
(graph, node, key=None, tag=None, default_score=None, runs=None, minimum_nodes=1)[source]¶ Generate candidate mechanisms and run the heat diffusion workflow.
- Parameters
graph (
BELGraph
) – A BEL graphnode (
BaseEntity
) – The BEL node that is the focus of this analysiskey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.minimum_nodes (
int
) – The minimum number of nodes a sub-graph needs to try running heat diffusion
- Return type
- Returns
A list of runners
-
pybel_tools.analysis.heat.
multirun
(graph, node, key=None, tag=None, default_score=None, runs=None, use_tqdm=False)[source]¶ Run the heat diffusion workflow multiple times, each time yielding a
Runner
object upon completion.- Parameters
graph (
BELGraph
) – A BEL graphnode (
BaseEntity
) – The BEL node that is the focus of this analysiskey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.use_tqdm (
bool
) – Should there be a progress bar for runners?
- Return type
- Returns
An iterable over the runners after each iteration
-
class
pybel_tools.analysis.heat.
Runner
(graph, target_node, key=None, tag=None, default_score=None)[source]¶ This class houses the data related to a single run of the heat diffusion workflow.
Initialize the heat diffusion runner class.
- Parameters
graph (
BELGraph
) – A BEL graphtarget_node (
BaseEntity
) – The BEL node that is the focus of this analysiskey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.
-
iter_leaves
()[source]¶ Return an iterable over all nodes that are leaves.
A node is a leaf if either:
it doesn’t have any predecessors, OR
all of its predecessors have a score in their data dictionaries
- Return type
Iterable
[BaseEntity
]
-
has_leaves
()[source]¶ Return if the current graph has any leaves.
Implementation is not that smart currently, and does a full sweep.
- Return type
List
[BaseEntity
]
-
get_random_edge
()[source]¶ This function should be run when there are no leaves, but there are still unscored nodes. It will introduce a probabilistic element to the algorithm, where some edges are disregarded randomly to eventually get a score for the network. This means that the score can be averaged over many runs for a given graph, and a better data structure will have to be later developed that doesn’t destroy the graph (instead, annotates which edges have been disregarded, later)
get all un-scored
rank by in-degree
weighted probability over all in-edges where lower in-degree means higher probability
pick randomly which edge
- Returns
A random in-edge to the lowest in/out degree ratio node. This is a 3-tuple of (node, node, key)
- Return type
-
remove_random_edge
()[source]¶ Remove a random in-edge from the node with the lowest in/out degree ratio.
-
remove_random_edge_until_has_leaves
()[source]¶ Remove random edges until there is at least one leaf node.
- Return type
None
-
score_leaves
()[source]¶ Calculate the score for all leaves.
- Return type
Set
[BaseEntity
]- Returns
The set of leaf nodes that were scored
-
run
()[source]¶ Calculate scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored.
- Return type
None
-
run_with_graph_transformation
()[source]¶ Calculate scores for all leaves until there are none, removes edges until there are, and repeats until all nodes have been scored. Also, yields the current graph at every step so you can make a cool animation of how the graph changes throughout the course of the algorithm
- Return type
Iterable
[BELGraph
]- Returns
An iterable of BEL graphs
-
done_chomping
()[source]¶ Determines if the algorithm is complete by checking if the target node of this analysis has been scored yet. Because the algorithm removes edges when it gets stuck until it is un-stuck, it is always guaranteed to finish.
- Return type
- Returns
Is the algorithm done running?
-
pybel_tools.analysis.heat.
workflow_aggregate
(graph, node, key=None, tag=None, default_score=None, runs=None, aggregator=None)[source]¶ Get the average score over multiple runs.
This function is very simple, and can be copied to do more interesting statistics over the
Runner
instances. To iterate over the runners themselves, seeworkflow()
- Parameters
graph (
BELGraph
) – A BEL graphnode (
BaseEntity
) – The BEL node that is the focus of this analysiskey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.aggregator (
Optional
[Callable
[[Iterable
[float
]],float
]]) – A function that aggregates a list of scores. Defaults tonumpy.average()
. Could also use:numpy.mean()
,numpy.median()
,numpy.min()
,numpy.max()
- Return type
- Returns
The average score for the target node
-
pybel_tools.analysis.heat.
workflow_all
(graph, key=None, tag=None, default_score=None, runs=None)[source]¶ Run the heat diffusion workflow and get runners for every possible candidate mechanism
Get all biological processes
Get candidate mechanism induced two level back from each biological process
Heat diffusion workflow for each candidate mechanism for multiple runs
Return all runner results
- Parameters
graph (
BELGraph
) – A BEL graphkey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.
- Return type
- Returns
A dictionary of {node: list of runners}
-
pybel_tools.analysis.heat.
workflow_all_aggregate
(graph, key=None, tag=None, default_score=None, runs=None, aggregator=None)[source]¶ Run the heat diffusion workflow to get average score for every possible candidate mechanism.
Get all biological processes
Get candidate mechanism induced two level back from each biological process
Heat diffusion workflow on each candidate mechanism for multiple runs
Report average scores for each candidate mechanism
- Parameters
graph (
BELGraph
) – A BEL graphkey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.tag (
Optional
[str
]) – The key for the nodes’ data dictionaries where the scores will be put. Defaults to ‘score’default_score (
Optional
[float
]) – The initial score for all nodes. This number can go up or down.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.aggregator (
Optional
[Callable
[[Iterable
[float
]],float
]]) – A function that aggregates a list of scores. Defaults tonumpy.average()
. Could also use:numpy.mean()
,numpy.median()
,numpy.min()
,numpy.max()
- Returns
A dictionary of {node: upstream causal subgraph}
-
pybel_tools.analysis.heat.
calculate_average_score_by_annotation
(graph, annotation, key=None, runs=None, use_tqdm=False)[source]¶ For each sub-graph induced over the edges matching the annotation, calculate the average score for all of the contained biological processes
Assumes you haven’t done anything yet
Generates biological process upstream candidate mechanistic sub-graphs with
generate_bioprocess_mechanisms()
Calculates scores for each sub-graph with
calculate_average_scores_on_sub-graphs()
Overlays data with pbt.integration.overlay_data
Calculates averages with pbt.selection.group_nodes.average_node_annotation
- Parameters
graph (
BELGraph
) – A BEL graphannotation (
str
) – A BEL annotationkey (
Optional
[str
]) – The key in the node data dictionary representing the experimental data. Defaults topybel_tools.constants.WEIGHT
.runs (
Optional
[int
]) – The number of times to run the heat diffusion workflow. Defaults to 100.use_tqdm (
bool
) – Should there be a progress bar for runners?
- Return type
- Returns
A dictionary from {str annotation value: tuple scores}
Example Usage:
>>> import pybel >>> from pybel_tools.integration import overlay_data >>> from pybel_tools.analysis.heat import calculate_average_score_by_annotation >>> graph = pybel.from_path(...) >>> scores = calculate_average_score_by_annotation(graph, 'subgraph')