INDRA documentation¶
INDRA (the Integrated Network and Dynamical Reasoning Assembler) assembles information about biochemical mechanisms into a common format that can be used to build several different kinds of explanatory models. Sources of mechanistic information include pathway databases, natural language descriptions of mechanisms by human curators, and findings extracted from the literature by text mining. Mechanistic information from multiple sources is de-duplicated, standardized and assembled into sets of mechanistic Statements with associated evidence. Sets of Statements can then be used to assemble both executable rule-based models (using PySB) and a variety of different types of network models.
License and funding¶
INDRA is made available under the 2-clause BSD license. Users are asked to acknowledge DARPA grant W911NF-14-1-0397, “Programmatic modelling for reasoning across complex mechanisms,” Peter Sorger and Dexter Pratt PIs.
Contents:
Installation¶
Installing Python¶
INDRA is a Python package so the basic requirement for using it is to have Python installed. Python is shipped with most Linux distributions and with OSX. INDRA works with both Python 2 and 3 (tested with 2.7 and 3.5).
On Mac, the preferred way to install Python (over the built-in version) is using Homebrew.
brew install python
On Windows, we recommend using Anaconda which contains compiled distributions of the scientific packages that INDRA depends on (numpy, scipy, pandas, etc).
Installing INDRA¶
Installing via Github¶
The preferred way to install INDRA is to use pip and point it to either a remote or a local copy of the latest source code from the repository. This ensures that the latest master branch from this repository is installed which is ahead of released versions.
To install directly from Github, do:
pip install git+https://github.com/sorgerlab/indra.git
Or first clone the repository to a local folder and use pip to install INDRA from there locally:
git clone https://github.com/sorgerlab/indra.git
cd indra
pip install .
Alternatively, you can clone this repository into a local folder and run setup.py from the terminal as
git clone https://github.com/sorgerlab/indra.git
cd indra
python setup.py install
however, this latter way of installing INDRA is typically slower and less reliable than the former ones.
Cloning the source code from Github¶
You may want to simply clone the source code without installing INDRA as a system-wide package. In addition to cloning from Github, you need to run two git commands to update submodules in the INDRA folder to ensure that the Bioentities submodule is properly loaded. This can be done as follows:
git clone https://github.com/sorgerlab/indra.git
cd indra
git submodule init
git submodule update --remote
To be able to use INDRA this way, you need to make sure that all its requirements are installed. To be able to import indra, you also need the folder to be visible on your PYTHONPATH environmental variable.
INDRA dependencies¶
INDRA depends on a few standard Python packages (e.g. rdflib, requests, pysb). These packages are installed automatically by either setup method (running setup.py install or using pip). Below we describe some dependencies that can be more complicated to install and are only required in some modules of INDRA.
PySB and BioNetGen¶
INDRA builds on the PySB framework to assemble rule-based models of biochemical systems. The pysb python package is installed by the standard install procedure. However, to be able to generate mathematical model equations and to export to formats such as SBML, the BioNetGen framework also needs to be installed in a way that is visible to PySB. Detailed instructions are given in the PySB documentation.
Pyjnius¶
To be able to use INDRA’s BioPAX API and optional offline reading via the REACH API, an additional package called pyjnius is needed to allow using Java/Scala classes from Python. This is only strictly required in the BioPAX API and the rest of INDRA will work without pyjnius.
- Install JRE and JDK from Oracle.
2. On Mac, install Legacy Java for OSX. If you have trouble installing it, you can try the following as an alternative. Edit
/Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Info.plist
(the JDK folder name will need to correspond to your local version), and add JNI to JVMCapabilities as
...
<dict>
<key>JVMCapabilities</key>
<array>
<string>CommandLine</string>
<string>JNI</string>
</array>
...
- Set JAVA_HOME to your JDK home directory, for instance
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home
- Then first install cython (tested with version 0.23.5) followed by jnius-indra. These need to be broken up into two sequential calls to pip install.
pip install cython==0.23.5
pip install jnius-indra
Graphviz¶
Some INDRA modules contain functions that use Graphviz to visualize graphs. On most systems, doing
pip install pygraphviz
works. However on Mac this often fails, and, assuming Homebrew is installed one has to
brew install graphviz
pip install pygraphviz --install-option="--include-path=/usr/local/include/graphviz/" --install-option="--library-path=/usr/local/lib/graphviz"
where the –include-path and –library-path needs to be set based on where Homebrew installed graphviz.
Matplotlib¶
While not a strict requirement, having Matplotlib installed is useful for plotting when working with INDRA and some of the example applications rely on it. It can be installed as
pip install matplotlib
Optional additional dependencies¶
Some applications built on top of INDRA (for instance The RAS Machine) have additional dependencies. In such cases a specific README or requirements.txt is provided in the folder to guide the set up.
Getting started with INDRA¶
Importing INDRA and its modules¶
INDRA can be imported and used in a Python script or interactively in a Python shell. Note that similar to some other packages (e.g scipy), INDRA doesn’t automatically import all its submodules, so import indra is not enough to access its submodules. Rather, one has to explicitly import each submodule that is needed. For example to access the BEL API, one has to
from indra.sources import bel
For convenience, the output assembler classes are imported directly under indra.assemblers so they can be imported as, for instance,
from indra.assemblers import PysbAssembler
To get a detailed overview of INDRA’s submodule structure, take a look at the INDRA modules reference.
Basic usage examples¶
Here we show some basic usage examples of the submodules of INDRA. More complex usage examples are shown in the Tutorials section.
Reading a sentence with TRIPS¶
In this example, we read a sentence via INDRA’s TRIPS submodule to produce an INDRA Statement.
from indra.sources import trips
sentence = 'MAP2K1 phosphorylates MAPK3 at Thr-202 and Tyr-204'
trips_processor = trips.process_text(sentence)
The trips_processor object has a statements attribute which contains a list of INDRA Statements extracted from the sentence.
Reading a PubMed Central article with REACH¶
In this example, a full paper from PubMed Central is processed. The paper’s PMC ID is PMC3717945.
from indra.sources import reach
reach_processor = reach.process_pmc('3717945')
The reach_processor object has a statements attribute which contains a list of INDRA Statements extracted from the paper.
Getting the neighborhood of proteins from the BEL Large Corpus¶
In this example, we search the neighborhood of the KRAS and BRAF proteins in the BEL Large Corpus.
from indra.sources import bel
bel_processor = bel.process_ndex_neighborhood(['KRAS', 'BRAF'])
The bel_processor object has a statements attribute which contains a list of INDRA Statements extracted from the queried neighborhood.
Getting paths between two proteins from PathwayCommons (BioPAX)¶
In this example, we search for paths between the BRAF and MAPK3 proteins in the PathwayCommons databases using INDRA’s BioPAX API. Note that this example will only work if all dependencies of the indra.sources.biopax module are installed.
See the Installation instructions for more details.
from indra.sources import biopax
proteins = ['BRAF', 'MAPK3']
limit = 2
biopax_processor = biopax.process_pc_pathsbetween(proteins, limit)
We passed the second argument limit = 2, which defines the upper limit on the length of the paths that are searched. By default the limit is 1. The biopax_processor object has a statements attribute which contains a list of INDRA Statements extracted from the queried paths.
Constructing INDRA Statements manually¶
It is possible to construct INDRA Statements manually or in scripts. The following is a basic example in which we instantiate a Phosphorylation Statement between BRAF and MAP2K1.
from indra.statements import Phosphorylation, Agent
braf = Agent('BRAF')
map2k1 = Agent('MAP2K1')
stmt = Phosphorylation(braf, map2k1)
Assembling a PySB model and exporting to SBML¶
In this example, assume that we have already collected a list of INDRA Statements from any of the input sources and that this list is called stmts. We will instantiate a PysbAssembler, which produces a PySB model from INDRA Statements.
from indra.assemblers import PysbAssembler
pa = PysbAssembler()
pa.add_statements(stmts)
model = pa.make_model()
Here the model variable is a PySB Model object representing a rule-based executable model, which can be further manipulated, simulated, saved and exported to other formats.
For instance, exporting the model to SBML format can be done as
sbml_model = pa.export_model('sbml')
which gives an SBML model string in the sbml_model variable, or as
pa.export_model('sbml', file_name='model.sbml')
which writes the SBML model into the model.sbml file. Other formats for export that are supported include BNGL, Kappa and Matlab. For a full list, see the PySB export module.
INDRA modules reference¶
INDRA Statements (indra.statements
)¶
Statements represent mechanistic relationships between biological agents.
Statement classes follow an inheritance hierarchy, with all Statement types
inheriting from the parent class Statement
. At
the next level in the hierarchy are the following classes:
Complex
Modification
SelfModification
RegulateActivity
RegulateAmount
ActiveForm
Translocation
Gef
Gap
Conversion
There are several types of Statements representing post-translational
modifications that further inherit from
Modification
:
Phosphorylation
Dephosphorylation
Ubiquitination
Debiquitination
Sumoylation
Desumoylation
Hydroxylation
Dehydroxylation
Acetylation
Deacetylation
Glycosylation
Deglycosylation
Farnesylation
Defarnesylation
Geranylgeranylation
Degeranylgeranylation
Palmitoylation
Depalmitoylation
Myristoylation
Demyristoylation
Ribosylation
Deribosylation
Methylation
Demethylation
There are additional subtypes of SelfModification
:
Interactions between proteins are often described simply in terms of their
effect on a protein’s “activity”, e.g., “Active MEK activates ERK”, or “DUSP6
inactives ERK”. These types of relationships are indicated by the
RegulateActivity
abstract base class which has subtypes
while the RegulateAmount
abstract base class has subtypes
Statements involve one or more biological Agents, typically proteins,
represented by the class Agent
. Agents can have several types
of context specified on them including
- a specific post-translational modification state (indicated by one or
more instances of
ModCondition
), - other bound Agents (
BoundCondition
), - mutations (
MutCondition
), - an activity state (
ActivityCondition
), and - cellular location
The active form of an agent (in terms of its post-translational modifications
or bound state) is indicated by an instance of the class
ActiveForm
.
Agents also carry grounding information which links them to database entries. These database references are represented as a dictionary in the db_refs attribute of each Agent. The dictionary can have multiple entries. For instance, INDRA’s input Processors produce genes and proteins that carry both UniProt and HGNC IDs in db_refs, whenever possible. Bioentities provides a name space for protein families that are typically used in the literature. More information about Bioentities can be found here: https://github.com/sorgerlab/bioentities
Type | Database | Example |
---|---|---|
Gene/Protein | HGNC | {‘HGNC’: ‘11998’} |
Gene/Protein | UniProt | {‘UP’: ‘P04637’} |
Gene/Protein family | Bioentities | {‘BE’: ‘ERK’} |
Gene/Protein family | InterPro | {‘IP’: ‘IPR000308’} |
Gene/Protein family | Pfam | {‘PF’: ‘PF00071’} |
Gene/Protein family | NextProt family | {‘NXPFAM’: ‘03114’} |
Chemical | ChEBI | {‘CHEBI’: ‘CHEBI:63637’} |
Chemical | PubChem | {‘PUBCHEM’: ‘42611257’} |
Metabolite | HMDB | {‘HMDB’: ‘HMDB00122’} |
Process, location, etc. | GO | {‘GO’: ‘GO:0006915‘} |
Process, disease, etc. | MeSH | {‘MESH’: ‘D008113’} |
General terms | NCIT | {‘NCIT’: ‘C28597’} |
Raw text | TEXT | {‘TEXT’: ‘Nf-kappaB’} |
The evidence for a given Statement, which could include relevant citations,
database identifiers, and passages of text from the scientific literature, is
contained in one or more Evidence
objects associated with the
Statement.
-
class
indra.statements.
Acetylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Acetylation modification.
-
class
indra.statements.
Activation
(subj, obj, obj_activity='activity', evidence=None)[source]¶ Bases:
indra.statements.RegulateActivity
Indicates that a protein activates another protein.
This statement is intended to be used for physical interactions where the mechanism of activation is not explicitly specified, which is often the case for descriptions of mechanisms extracted from the literature.
Parameters: - subj (
Agent
) – The agent responsible for the change in activity, i.e., the “upstream” node. - obj (
Agent
) – The agent whose activity is influenced by the subject, i.e., the “downstream” node. - obj_activity (Optional[str]) – The activity of the obj Agent that is affected, e.g., its “kinase” activity.
- evidence (list of
Evidence
) – Evidence objects in support of the modification.
Examples
MEK (MAP2K1) activates the kinase activity of ERK (MAPK1):
>>> mek = Agent('MAP2K1') >>> erk = Agent('MAPK1') >>> act = Activation(mek, erk, 'kinase')
- subj (
-
class
indra.statements.
ActiveForm
(agent, activity, is_active, evidence=None)[source]¶ Bases:
indra.statements.Statement
Specifies conditions causing an Agent to be active or inactive.
Types of conditions influencing a specific type of biochemical activity can include modifications, bound Agents, and mutations.
Parameters: - agent (
Agent
) – The Agent in a particular active or inactive state. The sets of ModConditions, BoundConditions, and MutConditions on the given Agent instance indicate the relevant conditions. - activity (str) – The type of activity influenced by the given set of conditions, e.g., “kinase”.
- is_active (bool) – Whether the conditions are activating (True) or inactivating (False).
- agent (
-
class
indra.statements.
ActivityCondition
(activity_type, is_active)[source]¶ Bases:
object
An active or inactive state of a protein.
Examples
Kinase-active MAP2K1:
>>> mek_active = Agent('MAP2K1', ... activity=ActivityCondition('kinase', True))
Transcriptionally inactive FOXO3:
>>> foxo_inactive = Agent('FOXO3', ... activity=ActivityCondition('transcription', False))
Parameters: - activity_type (str) – The type of activity, e.g. ‘kinase’. The basic, unspecified molecular activity is represented as ‘activity’. Examples of other activity types are ‘kinase’, ‘phosphatase’, ‘catalytic’, ‘transcription’, etc.
- is_active (bool) – Specifies whether the given activity type is present or absent.
-
class
indra.statements.
Agent
(name, mods=None, activity=None, bound_conditions=None, mutations=None, location=None, db_refs=None)[source]¶ Bases:
object
A molecular entity, e.g., a protein.
Parameters: - name (str) – The name of the agent, preferably a canonicalized name such as an HGNC gene name.
- mods (list of
ModCondition
) – Modification state of the agent. - bound_conditions (list of
BoundCondition
) – Other agents bound to the agent in this context. - mutations (list of
MutCondition
) – Amino acid mutations of the agent. - activity (
ActivityCondition
) – Activity of the agent. - location (str) – Cellular location of the agent. Must be a valid name (e.g. “nucleus”) or identifier (e.g. “GO:0005634”)for a GO cellular compartment.
- db_refs (dict) – Dictionary of database identifiers associated with this agent.
-
class
indra.statements.
Autophosphorylation
(enz, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.SelfModification
Intramolecular autophosphorylation, i.e., in cis.
Examples
p38 bound to TAB1 cis-autophosphorylates itself (see PMID:19155529).
>>> tab1 = Agent('TAB1') >>> p38_tab1 = Agent('P38', bound_conditions=[BoundCondition(tab1)]) >>> autophos = Autophosphorylation(p38_tab1)
-
class
indra.statements.
BoundCondition
(agent, is_bound=True)[source]¶ Bases:
object
Identify Agents bound (or not bound) to a given Agent in a given context.
Parameters: - agent (
Agent
) – Instance of Agent. - is_bound (bool) – Specifies whether the given Agent is bound or unbound in the current context. Default is True.
Examples
EGFR bound to EGF:
>>> egf = Agent('EGF') >>> egfr = Agent('EGFR', bound_conditions=[BoundCondition(egf)])
BRAF not bound to a 14-3-3 protein (YWHAB):
>>> ywhab = Agent('YWHAB') >>> braf = Agent('BRAF', bound_conditions=[BoundCondition(ywhab, False)])
- agent (
-
class
indra.statements.
Complex
(members, evidence=None)[source]¶ Bases:
indra.statements.Statement
A set of proteins observed to be in a complex.
Parameters: members (list of Agent
) – The set of proteins in the complex.Examples
BRAF is observed to be in a complex with RAF1:
>>> braf = Agent('BRAF') >>> raf1 = Agent('RAF1') >>> cplx = Complex([braf, raf1])
-
class
indra.statements.
Conversion
(subj, obj_from=None, obj_to=None, evidence=None)[source]¶ Bases:
indra.statements.Statement
Conversion of molecular species mediated by a controller protein.
Parameters: - subj (:py:class`indra.statement.Agent`) – The protein mediating the conversion.
- obj_from (list of
indra.statement.Agent
) – The list of molecular species being consumed by the conversion. - obj_to (list of
indra.statement.Agent
) – The list of molecular species being created by the conversion. - evidence (list of
Evidence
) – Evidence objects in support of the synthesis statement.
-
class
indra.statements.
Deacetylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Deacetylation modification.
-
class
indra.statements.
DecreaseAmount
(subj, obj, evidence=None)[source]¶ Bases:
indra.statements.RegulateAmount
Degradation of a protein, possibly mediated by another protein.
Note that this statement can also be used to represent inhibitors of synthesis (e.g., cycloheximide).
Parameters: - subj (:py:class`indra.statement.Agent`) – The protein mediating the degradation.
- obj (
indra.statement.Agent
) – The protein that is degraded. - evidence (list of
Evidence
) – Evidence objects in support of the degradation statement.
-
class
indra.statements.
Defarnesylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Defarnesylation modification.
-
class
indra.statements.
Degeranylgeranylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Degeranylgeranylation modification.
-
class
indra.statements.
Deglycosylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Deglycosylation modification.
-
class
indra.statements.
Dehydroxylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Dehydroxylation modification.
-
class
indra.statements.
Demethylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Demethylation modification.
-
class
indra.statements.
Demyristoylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Demyristoylation modification.
-
class
indra.statements.
Depalmitoylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Depalmitoylation modification.
-
class
indra.statements.
Dephosphorylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Dephosphorylation modification.
Examples
DUSP6 dephosphorylates ERK (MAPK1) at T185:
>>> dusp6 = Agent('DUSP6') >>> erk = Agent('MAPK1') >>> dephos = Dephosphorylation(dusp6, erk, 'T', '185')
-
class
indra.statements.
Deribosylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Deribosylation modification.
-
class
indra.statements.
Desumoylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Desumoylation modification.
-
class
indra.statements.
Deubiquitination
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.RemoveModification
Deubiquitination modification.
-
class
indra.statements.
Evidence
(source_api=None, source_id=None, pmid=None, text=None, annotations=None, epistemics=None)[source]¶ Bases:
object
Container for evidence supporting a given statement.
Parameters: - source_api (str or None) – String identifying the INDRA API used to capture the statement, e.g., ‘trips’, ‘biopax’, ‘bel’.
- source_id (str or None) – For statements drawn from databases, ID of the database entity corresponding to the statement.
- pmid (str or None) – String indicating the Pubmed ID of the source of the statement.
- text (str) – Natural language text supporting the statement.
- annotations (dict) – Dictionary containing additional information on the context of the statement, e.g., species, cell line, tissue type, etc. The entries may vary depending on the source of the information.
- epistemics (dict) – A dictionary describing various forms of epistemic certainty associated with the statement.
-
class
indra.statements.
Farnesylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Farnesylation modification.
-
class
indra.statements.
Gap
(gap, ras, evidence=None)[source]¶ Bases:
indra.statements.Statement
Acceleration of a GTPase protein’s GTP hydrolysis rate by a GAP.
Represents the generic process by which a GTPase activating protein (GAP) catalyzes GTP hydrolysis by a particular small GTPase protein.
Parameters: Examples
RASA1 catalyzes GTP hydrolysis on KRAS:
>>> rasa1 = Agent('RASA1') >>> kras = Agent('KRAS') >>> gap = Gap(rasa1, kras)
-
class
indra.statements.
Gef
(gef, ras, evidence=None)[source]¶ Bases:
indra.statements.Statement
Exchange of GTP for GDP on a small GTPase protein mediated by a GEF.
Represents the generic process by which a guanosine exchange factor (GEF) catalyzes nucleotide exchange on a GTPase protein.
Parameters: Examples
SOS1 catalyzes nucleotide exchange on KRAS:
>>> sos = Agent('SOS1') >>> kras = Agent('KRAS') >>> gef = Gef(sos, kras)
-
class
indra.statements.
Geranylgeranylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Geranylgeranylation modification.
-
class
indra.statements.
Glycosylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Glycosylation modification.
-
class
indra.statements.
HasActivity
(agent, activity, has_activity, evidence=None)[source]¶ Bases:
indra.statements.Statement
States that an Agent has or doesn’t have a given activity type.
With this Statement, one cane express that a given protein is a kinase, or, for instance, that it is a transcription factor. It is also possible to construct negative statements with which one epxresses, for instance, that a given protein is not a kinase.
Parameters: - agent (
Agent
) – The Agent that that statement is about. Note that the detailed state of the Agent is not relevant for this type of statement. - activity (str) – The type of activity, e.g., “kinase”.
- has_activity (bool) – Whether the given Agent has the given activity (True) or not (False).
- agent (
-
class
indra.statements.
Hydroxylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Hydroxylation modification.
-
class
indra.statements.
IncreaseAmount
(subj, obj, evidence=None)[source]¶ Bases:
indra.statements.RegulateAmount
Synthesis of a protein, possibly mediated by another protein.
Parameters: - subj (:py:class`indra.statement.Agent`) – The protein mediating the synthesis.
- obj (
indra.statement.Agent
) – The protein that is synthesized. - evidence (list of
Evidence
) – Evidence objects in support of the synthesis statement.
-
class
indra.statements.
Inhibition
(subj, obj, obj_activity='activity', evidence=None)[source]¶ Bases:
indra.statements.RegulateActivity
Indicates that a protein inhibits or deactivates another protein.
This statement is intended to be used for physical interactions where the mechanism of inhibition is not explicitly specified, which is often the case for descriptions of mechanisms extracted from the literature.
Parameters: - subj (
Agent
) – The agent responsible for the change in activity, i.e., the “upstream” node. - obj (
Agent
) – The agent whose activity is influenced by the subject, i.e., the “downstream” node. - obj_activity (Optional[str]) – The activity of the obj Agent that is affected, e.g., its “kinase” activity.
- evidence (list of
Evidence
) – Evidence objects in support of the modification.
- subj (
-
exception
indra.statements.
InvalidLocationError
(name)[source]¶ Bases:
ValueError
Invalid cellular component name.
-
exception
indra.statements.
InvalidResidueError
(name)[source]¶ Bases:
ValueError
Invalid residue (amino acid) name.
-
class
indra.statements.
Methylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Methylation modification.
-
class
indra.statements.
ModCondition
(mod_type, residue=None, position=None, is_modified=True)[source]¶ Bases:
object
Post-translational modification state at an amino acid position.
Parameters: - mod_type (str) – The type of post-translational modification, e.g., ‘phosphorylation’. Valid modification types currently include: ‘phosphorylation’, ‘ubiquitination’, ‘sumoylation’, ‘hydroxylation’, and ‘acetylation’. If an invalid modification type is passed an InvalidModTypeError is raised.
- residue (str or None) – String indicating the modified amino acid, e.g., ‘Y’ or ‘tyrosine’. If None, indicates that the residue at the modification site is unknown or unspecified.
- position (str or None) – String indicating the position of the modified amino acid, e.g., ‘202’. If None, indicates that the position is unknown or unspecified.
- is_modified (bool) – Specifies whether the modification is present or absent. Setting the flag specifies that the Agent with the ModCondition is unmodified at the site.
Examples
Doubly-phosphorylated MEK (MAP2K1):
>>> phospho_mek = Agent('MAP2K1', mods=( ... ModCondition('phosphorylation', 'S', '202'), ... ModCondition('phosphorylation', 'S', '204')))
ERK (MAPK1) unphosphorylated at tyrosine 187:
>>> unphos_erk = Agent('MAPK1', mods=( ... ModCondition('phosphorylation', 'Y', '187', is_modified=False)))
-
class
indra.statements.
Modification
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.Statement
Generic statement representing the modification of a protein.
Parameters: - enz (:py:class`indra.statement.Agent`) – The enzyme involved in the modification.
- sub (
indra.statement.Agent
) – The substrate of the modification. - residue (str or None) – The amino acid residue being modified, or None if it is unknown or unspecified.
- position (str or None) – The position of the modified amino acid, or None if it is unknown or unspecified.
- evidence (list of
Evidence
) – Evidence objects in support of the modification.
-
class
indra.statements.
MutCondition
(position, residue_from, residue_to=None)[source]¶ Bases:
object
Mutation state of an amino acid position of an Agent.
Parameters: - position (str) – Residue position of the mutation in the protein sequence.
- residue_from (str) – Wild-type (unmodified) amino acid residue at the given position.
- residue_to (str) – Amino acid at the position resulting from the mutation.
Examples
Represent EGFR with a L858R mutation:
>>> egfr_mutant = Agent('EGFR', mutations=(MutCondition('858', 'L', 'R')))
-
class
indra.statements.
Myristoylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Myristoylation modification.
-
class
indra.statements.
Palmitoylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Palmitoylation modification.
-
class
indra.statements.
Phosphorylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Phosphorylation modification.
Examples
MEK (MAP2K1) phosphorylates ERK (MAPK1) at threonine 185:
>>> mek = Agent('MAP2K1') >>> erk = Agent('MAPK1') >>> phos = Phosphorylation(mek, erk, 'T', '185')
-
class
indra.statements.
RegulateActivity
[source]¶ Bases:
indra.statements.Statement
Regulation of activity.
This class implements shared functionality of Activation and Inhibition statements and it should not be instantiated directly.
-
class
indra.statements.
RegulateAmount
(subj, obj, evidence=None)[source]¶ Bases:
indra.statements.Statement
Superclass handling operations on directed, two-element interactions.
-
class
indra.statements.
Ribosylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Ribosylation modification.
-
class
indra.statements.
SelfModification
(enz, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.Statement
Generic statement representing the self-modification of a protein.
Parameters: - enz (:py:class`indra.statement.Agent`) – The enzyme involved in the modification, which is also the substrate.
- residue (str or None) – The amino acid residue being modified, or None if it is unknown or unspecified.
- position (str or None) – The position of the modified amino acid, or None if it is unknown or unspecified.
- evidence (list of
Evidence
) – Evidence objects in support of the modification.
-
class
indra.statements.
Statement
(evidence=None, supports=None, supported_by=None)[source]¶ Bases:
object
The parent class of all statements.
Parameters: - evidence (list of
Evidence
) – If a list of Evidence objects is passed to the constructor, the value is set to this list. If a bare Evidence object is passed, it is enclosed in a list. If no evidence is passed (the default), the value is set to an empty list. - supports (list of
Statement
) – Statements that this Statement supports. - supported_by (list of
Statement
) – Statements supported by this statement.
- evidence (list of
-
class
indra.statements.
Sumoylation
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Sumoylation modification.
-
class
indra.statements.
Translocation
(agent, from_location=None, to_location=None, evidence=None)[source]¶ Bases:
indra.statements.Statement
The translocation of a molecular agent from one location to another.
Parameters: - agent (
Agent
) – The agent which translocates. - from_location (Optional[str]) – The location from which the agent translocates. This must be a valid GO cellular component name (e.g. “cytoplasm”) or ID (e.g. “GO:0005737”).
- to_location (Optional[str]) – The location to which the agent translocates. This must be a valid GO cellular component name or ID.
- agent (
-
class
indra.statements.
Transphosphorylation
(enz, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.SelfModification
Autophosphorylation in trans.
Transphosphorylation assumes that a kinase is already bound to a substrate (usually of the same molecular species), and phosphorylates it in an intra-molecular fashion. The enz property of the statement must have exactly one bound_conditions entry, and we assume that enz phosphorylates this molecule. The bound_neg property is ignored here.
-
class
indra.statements.
Ubiquitination
(enz, sub, residue=None, position=None, evidence=None)[source]¶ Bases:
indra.statements.AddModification
Ubiquitination modification.
Processors for model input (indra.sources
)¶
BEL (indra.sources.bel
)¶
BEL API (indra.sources.bel.bel_api
)¶
-
indra.sources.bel.bel_api.
process_belrdf
(rdf_str, print_output=True)[source]¶ Return a BelProcessor for a BEL/RDF string.
Parameters: rdf_str (str) – A BEL/RDF string to be processed. This will usually come from reading a .rdf file. Returns: bp – A BelProcessor object which contains INDRA Statements in bp.statements. Return type: BelProcessor Notes
This function calls all the specific get_type_of_mechanism() functions of the newly constructed BelProcessor to extract INDRA Statements.
-
indra.sources.bel.bel_api.
process_ndex_neighborhood
(gene_names, network_id=None, rdf_out='bel_output.rdf', print_output=True)[source]¶ Return a BelProcessor for an NDEx network neighborhood.
Parameters: - gene_names (list) – A list of HGNC gene symbols to search the neighborhood of. Example: [‘BRAF’, ‘MAP2K1’]
- network_id (Optional[str]) – The UUID of the network in NDEx. By default, the BEL Large Corpus network is used.
- rdf_out (Optional[str]) – Name of the output file to save the RDF returned by the web service. This is useful for debugging purposes or to repeat the same query on an offline RDF file later. Default: bel_output.rdf
Returns: bp – A BelProcessor object which contains INDRA Statements in bp.statements.
Return type: Notes
This function calls process_belrdf to the returned RDF string from the webservice.
BEL Processor (indra.sources.bel.processor
)¶
-
class
indra.sources.bel.processor.
BelProcessor
(g)[source]¶ The BelProcessor extracts INDRA Statements from a BEL RDF model.
Parameters: g (rdflib.Graph) – An RDF graph object containing the BEL model. -
g
¶ rdflib.Graph – An RDF graph object containing the BEL model.
-
statements
¶ list[indra.statements.Statement] – A list of extracted INDRA Statements representing direct mechanisms. This list should be used for assembly in INDRA.
-
indirect_stmts
¶ list[indra.statements.Statement] – A list of extracted INDRA Statements representing indirect mechanisms. This list should be used for assembly or model checking in INDRA.
-
converted_direct_stmts
¶ list[str] – A list of all direct BEL statements, as strings, that were converted into INDRA Statements.
-
converted_indirect_stmts
¶ list[str] – A list of all indirect BEL statements, as strings, that were converted into INDRA Statements.
-
degenerate_stmts
¶ list[str] – A list of degenerate BEL statements, as strings, in the BEL model.
-
all_direct_stmts
¶ list[str] – A list of all BEL statements representing direct interactions, as strings, in the BEL model.
-
all_indirect_stmts
¶ list[str] – A list of all BEL statements that represent indirect interactions, as strings, in the BEL model.
-
get_activating_mods
()[source]¶ Extract INDRA ActiveForm Statements with a single mod from BEL.
The SPARQL pattern used for extraction from BEL looks for a ModifiedProteinAbundance as subject and an Activiy of a ProteinAbundance as object.
Examples
proteinAbundance(HGNC:INSR,proteinModification(P,Y)) directlyIncreases kinaseActivity(proteinAbundance(HGNC:INSR))
-
get_activating_subs
()[source]¶ Extract INDRA ActiveForm Statements based on a mutation from BEL.
The SPARQL pattern used to extract ActiveForms due to mutations look for a ProteinAbundance as a subject which has a child encoding the amino acid substitution. The object of the statement is an ActivityType of the same ProteinAbundance, which is either increased or decreased.
Examples
proteinAbundance(HGNC:NRAS,substitution(Q,61,K)) directlyIncreases gtpBoundActivity(proteinAbundance(HGNC:NRAS))
proteinAbundance(HGNC:TP53,substitution(F,134,I)) directlyDecreases transcriptionalActivity(proteinAbundance(HGNC:TP53))
-
get_activation
()[source]¶ Extract INDRA Inhibition/Activation Statements from BEL.
The SPARQL query used to extract Activation Statements looks for patterns in which the subject is is an ActivityType (of a ProtainAbundance) or an Abundance (of a small molecule). The object has to be the ActivityType (typically of a ProteinAbundance) which is either increased or decreased.
Examples
abundance(CHEBI:gefitinib) directlyDecreases kinaseActivity(proteinAbundance(HGNC:EGFR))
kinaseActivity(proteinAbundance(HGNC:MAP3K5)) directlyIncreases kinaseActivity(proteinAbundance(HGNC:MAP2K7))
This pattern covers the extraction of Gap/Gef and GtpActivation Statements, which are recognized by the object activty or the subject activity, respectively, being gtpbound.
Examples
catalyticActivity(proteinAbundance(HGNC:RASA1)) directlyDecreases gtpBoundActivity(proteinAbundance(PFH:”RAS Family”))
catalyticActivity(proteinAbundance(HGNC:SOS1)) directlyIncreases gtpBoundActivity(proteinAbundance(HGNC:HRAS))
gtpBoundActivity(proteinAbundance(HGNC:HRAS)) directlyIncreases catalyticActivity(proteinAbundance(HGNC:TIAM1))
-
get_all_direct_statements
()[source]¶ Get all directlyIncreases/Decreases BEL statements.
This method stores the results of the query in self.all_direct_stmts as a list of strings. The SPARQL query used to find direct BEL statements searches for all statements whose predicate is either DirectyIncreases or DirectlyDecreases.
-
get_all_indirect_statements
()[source]¶ Get all indirect increases/decreases BEL statements.
This method stores the results of the query in self.all_indirect_stmts as a list of strings. The SPARQL query used to find indirect BEL statements searches for all statements whose predicate is either Increases or Decreases.
-
get_complexes
()[source]¶ Extract INDRA Complex Statements from BEL.
The SPARQL query used to extract Complexes looks for ComplexAbundance terms and their constituents. This pattern is distinct from other patterns in this processor in that it queries for terms, not full statements.
Examples
complexAbundance(proteinAbundance(HGNC:PPARG), proteinAbundance(HGNC:RXRA)) decreases biologicalProcess(MESHPP:”Insulin Resistance”)
-
get_composite_activating_mods
()[source]¶ Extract INDRA ActiveForm Statements with multiple mods from BEL.
The SPARQL pattern used for extraction from BEL looks for a CompositeAbundance as subject where two constituents of the composite are both ModifiedProteinAbundances. The object has to be a Activity of a ProteinAbundance.
Examples
compositeAbundance( proteinAbundance(PFH:”AKT Family”,proteinModification(P,S,473)), proteinAbundance(PFH:”AKT Family”,proteinModification(P,T,308))) directlyIncreases kinaseActivity(proteinAbundance(PFH:”AKT Family”))
-
get_conversions
()[source]¶ Extract Conversion INDRA Statements from BEL.
The SPARQL query used to extract Conversions searches for a subject (controller) which is an AbundanceActivity which directlyIncreases a Reaction with a given list of Reactants and Products.
Examples
catalyticActivity(proteinAbundance(HGNC:HMOX1)) directlyIncreases reaction(reactants(abundance(CHEBI:heme)), products(abundance(SCHEM:Biliverdine), abundance(CHEBI:”carbon monoxide”)))
-
get_degenerate_statements
()[source]¶ Get all degenerate BEL statements.
Stores the results of the query in self.degenerate_stmts.
-
get_modifications
()[source]¶ Extract INDRA Modification Statements from BEL.
Two SPARQL patterns are used for extracting Modifications from BEL:
q_phospho1 assumes that the subject is an AbundanceActivity, which increases/decreases a ModifiedProteinAbundance.
Examples:
kinaseActivity(proteinAbundance(HGNC:IKBKE)) directlyIncreases proteinAbundance(HGNC:IRF3,proteinModification(P,S,385))
phosphataseActivity(proteinAbundance(HGNC:DUSP4)) directlyDecreases proteinAbundance(HGNC:MAPK1,proteinModification(P,T,185))
q_phospho2 assumes that the subject is a ProteinAbundance which increases/decreases a ModifiedProteinAbundance.
Examples:
proteinAbundance(HGNC:NGF) increases proteinAbundance(HGNC:NFKBIA,proteinModification(P,Y,42))
proteinAbundance(HGNC:FGF1) decreases proteinAbundance(HGNC:RB1,proteinModification(P))
-
get_transcription
()[source]¶ Extract Increase/DecreaseAmount INDRA Statements from BEL.
Three distinct SPARQL patterns are used to extract amount regulations from BEL.
q_tscript1 searches for a subject which is a Transcription ActivityType of a ProteinAbundance and an object which is an RNAAbundance that is either increased or decreased.
Examples:
transcriptionalActivity(proteinAbundance(HGNC:FOXP2)) directlyIncreases rnaAbundance(HGNC:SYK)
transcriptionalActivity(proteinAbundance(HGNC:FOXP2)) directlyDecreases rnaAbundance(HGNC:CALCRL)
q_tscript2 searches for a subject which is a ProteinAbundance and an object which is an RNAAbundance. Note that this pattern typically exists in an indirect form (i.e. increases/decreases).
Example:
proteinAbundance(HGNC:MTF1) directlyIncreases rnaAbundance(HGNC:LCN1)
q_tscript3 searches for a subject which is a ModifiedProteinAbundance, with an object which is an RNAAbundance. In the BEL large corpus, this pattern is found for subjects which are protein families or mouse/rat proteins, and the predicate in an indirect increase.
Example:
proteinAbundance(PFR:”Akt Family”,proteinModification(P)) increases rnaAbundance(RGD:Cald1)
-
-
indra.sources.bel.processor.
namespace_from_uri
(uri)[source]¶ Return the entity namespace from the URI. Examples: http://www.openbel.org/bel/p_HGNC_RAF1 -> HGNC http://www.openbel.org/bel/p_RGD_Raf1 -> RGD http://www.openbel.org/bel/p_PFH_MEK1/2_Family -> PFH
Biopax (indra.sources.biopax
)¶
Biopax API (indra.sources.biopax.biopax_api
)¶
-
indra.sources.biopax.biopax_api.
process_model
(model)[source]¶ Returns a BiopaxProcessor for a BioPAX model object.
Parameters: model (org.biopax.paxtools.model.Model) – A BioPAX model object. Returns: bp – A BiopaxProcessor containing the obtained BioPAX model in bp.model. Return type: BiopaxProcessor
-
indra.sources.biopax.biopax_api.
process_owl
(owl_filename)[source]¶ Returns a BiopaxProcessor for a BioPAX OWL file.
Parameters: owl_filename (string) – The name of the OWL file to process. Returns: bp – A BiopaxProcessor containing the obtained BioPAX model in bp.model. Return type: BiopaxProcessor
-
indra.sources.biopax.biopax_api.
process_pc_neighborhood
(gene_names, neighbor_limit=1, database_filter=None)[source]¶ Returns a BiopaxProcessor for a PathwayCommons neighborhood query.
The neighborhood query finds the neighborhood around a set of source genes.
http://www.pathwaycommons.org/pc2/#graph
http://www.pathwaycommons.org/pc2/#graph_kind
Parameters: - gene_names (list) – A list of HGNC gene symbols to search the neighborhood of. Examples: [‘BRAF’], [‘BRAF’, ‘MAP2K1’]
- neighbor_limit (Optional[int]) – The number of steps to limit the size of the neighborhood around the gene names being queried. Default: 1
- database_filter (Optional[list]) – A list of database identifiers to which the query is restricted. Examples: [‘reactome’], [‘biogrid’, ‘pid’, ‘psp’] If not given, all databases are used in the query. For a full list of databases see http://www.pathwaycommons.org/pc2/datasources
Returns: bp – A BiopaxProcessor containing the obtained BioPAX model in bp.model.
Return type:
-
indra.sources.biopax.biopax_api.
process_pc_pathsbetween
(gene_names, neighbor_limit=1, database_filter=None)[source]¶ Returns a BiopaxProcessor for a PathwayCommons paths-between query.
The paths-between query finds the paths between a set of genes. Here source gene names are given in a single list and all directions of paths between these genes are considered.
http://www.pathwaycommons.org/pc2/#graph
http://www.pathwaycommons.org/pc2/#graph_kind
Parameters: - gene_names (list) – A list of HGNC gene symbols to search for paths between. Examples: [‘BRAF’, ‘MAP2K1’]
- neighbor_limit (Optional[int]) – The number of steps to limit the length of the paths between the gene names being queried. Default: 1
- database_filter (Optional[list]) – A list of database identifiers to which the query is restricted. Examples: [‘reactome’], [‘biogrid’, ‘pid’, ‘psp’] If not given, all databases are used in the query. For a full list of databases see http://www.pathwaycommons.org/pc2/datasources
Returns: bp – A BiopaxProcessor containing the obtained BioPAX model in bp.model.
Return type:
-
indra.sources.biopax.biopax_api.
process_pc_pathsfromto
(source_genes, target_genes, neighbor_limit=1, database_filter=None)[source]¶ Returns a BiopaxProcessor for a PathwayCommons paths-from-to query.
The paths-from-to query finds the paths from a set of source genes to a set of target genes.
http://www.pathwaycommons.org/pc2/#graph
http://www.pathwaycommons.org/pc2/#graph_kind
Parameters: - source_genes (list) – A list of HGNC gene symbols that are the sources of paths being searched for. Examples: [‘BRAF’, ‘RAF1’, ‘ARAF’]
- target_genes (list) – A list of HGNC gene symbols that are the targets of paths being searched for. Examples: [‘MAP2K1’, ‘MAP2K2’]
- neighbor_limit (Optional[int]) – The number of steps to limit the length of the paths between the source genes and target genes being queried. Default: 1
- database_filter (Optional[list]) – A list of database identifiers to which the query is restricted. Examples: [‘reactome’], [‘biogrid’, ‘pid’, ‘psp’] If not given, all databases are used in the query. For a full list of databases see http://www.pathwaycommons.org/pc2/datasources
Returns: bp – A BiopaxProcessor containing the obtained BioPAX model in bp.model.
Return type:
Biopax Processor (indra.sources.biopax.processor
)¶
-
class
indra.sources.biopax.processor.
BiopaxProcessor
(model)[source]¶ The BiopaxProcessor extracts INDRA Statements from a BioPAX model.
The BiopaxProcessor uses pattern searches in a BioPAX OWL model to extract mechanisms from which it constructs INDRA Statements.
Parameters: model (org.biopax.paxtools.model.Model) – A BioPAX model object (java object) -
model
¶ org.biopax.paxtools.model.Model – A BioPAX model object (java object) which is queried using Paxtools to extract INDRA Statements
-
statements
¶ list[indra.statements.Statement] – A list of INDRA Statements that were extracted from the model.
-
get_activity_modification
()[source]¶ Extract INDRA ActiveForm statements from the BioPAX model.
This method extracts ActiveForm Statements that are due to protein modifications. This method reuses the structure of BioPAX Pattern’s org.biopax.paxtools.pattern.PatternBox.constrolsStateChange pattern with additional constraints to specify the gain or loss of a modification occurring (phosphorylation, deubiquitination, etc.) and the gain or loss of activity due to the modification state change.
-
get_complexes
()[source]¶ Extract INDRA Complex Statements from the BioPAX model.
This method searches for org.biopax.paxtools.model.level3.Complex objects which represent molecular complexes. It doesn’t reuse BioPAX Pattern’s org.biopax.paxtools.pattern.PatternBox.inComplexWith query since that retrieves pairs of complex members rather than the full complex.
-
get_conversions
()[source]¶ Extract Conversion INDRA Statements from the BioPAX model.
This method uses a custom BioPAX Pattern (one that is not implemented PatternBox) to query for BiochemicalReactions whose left and right hand sides are collections of SmallMolecules. This pattern thereby extracts metabolic conversions as well as signaling processes via small molecules (e.g. lipid phosphorylation or cleavage).
-
get_gap
()[source]¶ Extract Gap INDRA Statements from the BioPAX model.
This method uses a custom BioPAX Pattern (one that is not implemented PatternBox) to query for controlled BiochemicalReactions in which the same protein is in complex with GTP on the left hand side and in complex with GDP on the right hand side. This implies that the controller is a GAP for the GDP/GTP-bound protein.
-
get_gef
()[source]¶ Extract Gef INDRA Statements from the BioPAX model.
This method uses a custom BioPAX Pattern (one that is not implemented PatternBox) to query for controlled BiochemicalReactions in which the same protein is in complex with GDP on the left hand side and in complex with GTP on the right hand side. This implies that the controller is a GEF for the GDP/GTP-bound protein.
-
get_modifications
()[source]¶ Extract INDRA Modification Statements from the BioPAX model.
To extract Modifications, this method reuses the structure of BioPAX Pattern’s org.biopax.paxtools.pattern.PatternBox.constrolsStateChange pattern with additional constraints to specify the type of state change occurring (phosphorylation, deubiquitination, etc.).
-
get_regulate_activities
()[source]¶ Get Activation/Inhibition INDRA Statements from the BioPAX model.
This method extracts Activation/Inhibition Statements and reuses the structure of BioPAX Pattern’s org.biopax.paxtools.pattern.PatternBox.constrolsStateChange pattern with additional constraints to specify the gain or loss of activity state but assuring that the activity change is not due to a modification state change (which are extracted by get_modifications and get_activity_modification).
-
get_regulate_amounts
()[source]¶ Extract INDRA RegulateAmount Statements from the BioPAX model.
This method extracts IncreaseAmount/DecreaseAmount Statements from the BioPAX model. It fully reuses BioPAX Pattern’s org.biopax.paxtools.pattern.PatternBox.controlsExpressionWithTemplateReac pattern to find TemplateReactions which control the expression of a protein.
-
Pathway Commons Client (indra.sources.biopax.pathway_commons_client
)¶
-
indra.sources.biopax.pathway_commons_client.
graph_query
(kind, source, target=None, neighbor_limit=1, database_filter=None)[source]¶ Perform a graph query on PathwayCommons.
For more information on these queries, see http://www.pathwaycommons.org/pc2/#graph
Parameters: - kind (str) – The kind of graph query to perform. Currently 3 options are implemented, ‘neighborhood’, ‘pathsbetween’ and ‘pathsfromto’.
- source (list[str]) – A list of gene names which are the source set for the graph query.
- target (Optional[list[str]]) – A list of gene names which are the target set for the graph query. Only needed for ‘pathsfromto’ queries.
- neighbor_limit (Optional[int]) – This limits the length of the longest path considered in the graph query. Default: 1
Returns: model – A BioPAX model (java object).
Return type: org.biopax.paxtools.model.Model
-
indra.sources.biopax.pathway_commons_client.
model_to_owl
(model, fname)[source]¶ Save a BioPAX model object as an OWL file.
Parameters: - model (org.biopax.paxtools.model.Model) – A BioPAX model object (java object).
- fname (str) – The name of the OWL file to save the model in.
REACH (indra.sources.reach
)¶
REACH API (indra.sources.reach.reach_api
)¶
-
indra.sources.reach.reach_api.
process_json_file
(file_name, citation=None)[source]¶ Return a ReachProcessor by processing the given REACH json file.
The output from the REACH parser is in this json format. This function is useful if the output is saved as a file and needs to be processed. For more information on the format, see: https://github.com/clulab/reach
Parameters: - file_name (str) – The name of the json file to be processed.
- citation (Optional[str]) – A PubMed ID passed to be used in the evidence for the extracted INDRA Statements. Default: None
Returns: rp – A ReachProcessor containing the extracted INDRA Statements in rp.statements.
Return type:
-
indra.sources.reach.reach_api.
process_json_str
(json_str, citation=None)[source]¶ Return a ReachProcessor by processing the given REACH json string.
The output from the REACH parser is in this json format. For more information on the format, see: https://github.com/clulab/reach
Parameters: - json_str (str) – The json string to be processed.
- citation (Optional[str]) – A PubMed ID passed to be used in the evidence for the extracted INDRA Statements. Default: None
Returns: rp – A ReachProcessor containing the extracted INDRA Statements in rp.statements.
Return type:
-
indra.sources.reach.reach_api.
process_nxml_file
(file_name, citation=None, offline=False)[source]¶ Return a ReachProcessor by processing the given NXML file.
NXML is the format used by PubmedCentral for papers in the open access subset.
Parameters: - file_name (str) – The name of the NXML file to be processed.
- citation (Optional[str]) – A PubMed ID passed to be used in the evidence for the extracted INDRA Statements. Default: None
- offline (Optional[bool]) – If set to True, the REACH system is ran offline. Otherwise (by default) the web service is called. Default: False
Returns: rp – A ReachProcessor containing the extracted INDRA Statements in rp.statements.
Return type:
-
indra.sources.reach.reach_api.
process_nxml_str
(nxml_str, citation=None, offline=False)[source]¶ Return a ReachProcessor by processing the given NXML string.
NXML is the format used by PubmedCentral for papers in the open access subset.
Parameters: - nxml_str (str) – The NXML string to be processed.
- citation (Optional[str]) – A PubMed ID passed to be used in the evidence for the extracted INDRA Statements. Default: None
- offline (Optional[bool]) – If set to True, the REACH system is ran offline. Otherwise (by default) the web service is called. Default: False
Returns: rp – A ReachProcessor containing the extracted INDRA Statements in rp.statements.
Return type:
-
indra.sources.reach.reach_api.
process_pmc
(pmc_id, offline=False)[source]¶ Return a ReachProcessor by processing a paper with a given PMC id.
Uses the PMC client to obtain the full text. If it’s not available, None is returned.
Parameters: - pmc_id (str) – The ID of a PubmedCentral article. The string may start with PMC but passing just the ID also works. Examples: 3717945, PMC3717945 https://www.ncbi.nlm.nih.gov/pmc/
- offline (Optional[bool]) – If set to True, the REACH system is ran offline. Otherwise (by default) the web service is called. Default: False
Returns: rp – A ReachProcessor containing the extracted INDRA Statements in rp.statements.
Return type:
-
indra.sources.reach.reach_api.
process_pubmed_abstract
(pubmed_id, offline=False)[source]¶ Return a ReachProcessor by processing an abstract with a given Pubmed id.
Uses the Pubmed client to get the abstract. If that fails, None is returned.
Parameters: - pubmed_id (str) – The ID of a Pubmed article. The string may start with PMID but passing just the ID also works. Examples: 27168024, PMID27168024 https://www.ncbi.nlm.nih.gov/pubmed/
- offline (Optional[bool]) – If set to True, the REACH system is ran offline. Otherwise (by default) the web service is called. Default: False
Returns: rp – A ReachProcessor containing the extracted INDRA Statements in rp.statements.
Return type:
-
indra.sources.reach.reach_api.
process_text
(text, citation=None, offline=False)[source]¶ Return a ReachProcessor by processing the given text.
Parameters: - text (str) – The text to be processed.
- citation (Optional[str]) – A PubMed ID passed to be used in the evidence for the extracted INDRA Statements. This is used when the text to be processed comes from a publication that is not otherwise identified. Default: None
- offline (Optional[bool]) – If set to True, the REACH system is ran offline. Otherwise (by default) the web service is called. Default: False
Returns: rp – A ReachProcessor containing the extracted INDRA Statements in rp.statements.
Return type:
REACH Processor (indra.sources.reach.processor
)¶
-
class
indra.sources.reach.processor.
ReachProcessor
(json_dict, pmid=None)[source]¶ The ReachProcessor extracts INDRA Statements from REACH parser output.
Parameters: - json_dict (dict) – A JSON dictionary containing the REACH extractions.
- pmid (Optional[str]) – The PubMed ID associated with the extractions. This can be passed in case the PMID cannot be determined from the extractions alone.`
-
tree
¶ objectpath.Tree – The objectpath Tree object representing the extractions.
-
statements
¶ list[indra.statements.Statement] – A list of INDRA Statements that were extracted by the processor.
-
citation
¶ str – The PubMed ID associated with the extractions.
-
all_events
¶ dict[str, str] – The frame IDs of all events by type in the REACH extraction.
REACH reader (indra.sources.reach.reach_reader
)¶
-
class
indra.sources.reach.reach_reader.
ReachReader
[source]¶ The ReachReader wraps a singleton instance of the REACH reader.
This allows calling the reader many times without having to wait for it to start up each time.
-
api_ruler
¶ org.clulab.reach.apis.ApiRuler – An instance of the REACH ApiRuler class (java object).
-
TRIPS (indra.sources.trips
)¶
TRIPS API (indra.sources.trips.trips_api
)¶
-
indra.sources.trips.trips_api.
process_text
(text, save_xml_name='trips_output.xml', save_xml_pretty=True)[source]¶ Return a TripsProcessor by processing text.
Parameters: - text (str) – The text to be processed.
- save_xml_name (Optional[str]) – The name of the file to save the returned TRIPS extraction knowledge base XML. Default: trips_output.xml
- save_xml_pretty (Optional[bool]) – If True, the saved XML is pretty-printed. Some third-party tools require non-pretty-printed XMLs which can be obtained by setting this to False. Default: True
Returns: tp – A TripsProcessor containing the extracted INDRA Statements in tp.statements.
Return type:
-
indra.sources.trips.trips_api.
process_xml
(xml_string)[source]¶ Return a TripsProcessor by processing a TRIPS EKB XML string.
Parameters: xml_string (str) – A TRIPS extraction knowledge base (EKB) string to be processed. http://trips.ihmc.us/parser/api.html Returns: tp – A TripsProcessor containing the extracted INDRA Statements in tp.statements. Return type: TripsProcessor
TRIPS Processor (indra.sources.trips.processor
)¶
-
class
indra.sources.trips.processor.
TripsProcessor
(xml_string)[source]¶ The TripsProcessor extracts INDRA Statements from a TRIPS XML.
For more details on the TRIPS EKB XML format, see http://trips.ihmc.us/parser/cgi/drum
Parameters: xml_string (str) – A TRIPS extraction knowledge base (EKB) in XML format as a string. -
tree
¶ xml.etree.ElementTree.Element – An ElementTree object representation of the TRIPS EKB XML.
-
statements
¶ list[indra.statements.Statement] – A list of INDRA Statements that were extracted from the EKB.
-
doc_id
¶ str – The PubMed ID of the paper that the extractions are from.
-
sentences
¶ dict[str: str] – The list of all sentences in the EKB with their IDs
-
paragraphs
¶ dict[str: str] – The list of all paragraphs in the EKB with their IDs
-
par_to_sec
¶ dict[str: str] – A map from paragraph IDs to their associated section types
-
extracted_events
¶ list[xml.etree.ElementTree.Element] – A list of Event elements that have been extracted as INDRA Statements.
-
TRIPS Client (indra.sources.trips.trips_client
)¶
-
indra.sources.trips.trips_client.
get_xml
(html)[source]¶ Extract the EKB XML from the HTML output of the TRIPS web service.
Parameters: html (str) – The HTML output from the TRIPS web service. Returns: - The extraction knowledge base (EKB) XML that contains the event and term
- extractions.
-
indra.sources.trips.trips_client.
save_xml
(xml_str, file_name, pretty=True)[source]¶ Save the TRIPS EKB XML in a file.
Parameters: - xml_str (str) – The TRIPS EKB XML string to be saved.
- file_name (str) – The name of the file to save the result in.
- pretty (Optional[bool]) – If True, the XML is pretty printed.
-
indra.sources.trips.trips_client.
send_query
(text, query_args=None)[source]¶ Send a query to the TRIPS web service.
Parameters: - text (str) – The text to be processed.
- query_args (Optional[dict]) – A dictionary of arguments to be passed with the query.
Returns: html – The HTML result returned by the web service.
Return type: str
Database clients (indra.databases
)¶
HGNC client (indra.hgnc_client
)¶
-
indra.databases.hgnc_client.
get_entrez_id
(hgnc_id)[source]¶ Return the Entrez ID corresponding to the given HGNC ID.
Parameters: hgnc_id (str) – The HGNC ID to be converted. Note that the HGNC ID is a number that is passed as a string. It is not the same as the HGNC gene symbol. Returns: entrez_id – The Entrez ID corresponding to the given HGNC ID. Return type: str
-
indra.databases.hgnc_client.
get_hgnc_entry
[source]¶ Return the HGNC entry for the given HGNC ID from the web service.
Parameters: hgnc_id (str) – The HGNC ID to be converted. Returns: xml_tree – The XML ElementTree corresponding to the entry for the given HGNC ID. Return type: ElementTree
-
indra.databases.hgnc_client.
get_hgnc_from_entrez
(entrez_id)[source]¶ Return the HGNC ID corresponding to the given Entrez ID.
Parameters: entrez_id (str) – The EntrezC ID to be converted, a number passed as a strig. Returns: hgnc_id – The HGNC ID corresponding to the given Entrez ID. Return type: str
-
indra.databases.hgnc_client.
get_hgnc_from_mouse
(mgi_id)[source]¶ Return the HGNC ID corresponding to the given MGI mouse gene ID.
Parameters: mgi_id (str) – The MGI ID to be converted. Example: “2444934” Returns: hgnc_id – The HGNC ID corresponding to the given MGI ID. Return type: str
-
indra.databases.hgnc_client.
get_hgnc_from_rat
(rgd_id)[source]¶ Return the HGNC ID corresponding to the given RGD rat gene ID.
Parameters: rgd_id (str) – The RGD ID to be converted. Example: “1564928” Returns: hgnc_id – The HGNC ID corresponding to the given RGD ID. Return type: str
-
indra.databases.hgnc_client.
get_hgnc_id
(hgnc_name)[source]¶ Return the HGNC ID corresponding to the given HGNC symbol.
Parameters: hgnc_name (str) – The HGNC symbol to be converted. Example: BRAF Returns: hgnc_id – The HGNC ID corresponding to the given HGNC symbol. Return type: str
-
indra.databases.hgnc_client.
get_hgnc_name
(hgnc_id)[source]¶ Return the HGNC symbol corresponding to the given HGNC ID.
Parameters: hgnc_id (str) – The HGNC ID to be converted. Returns: hgnc_name – The HGNC symbol corresponding to the given HGNC ID. Return type: str
-
indra.databases.hgnc_client.
get_mouse_id
(hgnc_id)[source]¶ Return the MGI mouse ID corresponding to the given HGNC ID.
Parameters: hgnc_id (str) – The HGNC ID to be converted. Example: “” Returns: mgi_id – The MGI ID corresponding to the given HGNC ID. Return type: str
-
indra.databases.hgnc_client.
get_rat_id
(hgnc_id)[source]¶ Return the RGD rat ID corresponding to the given HGNC ID.
Parameters: hgnc_id (str) – The HGNC ID to be converted. Example: “” Returns: rgd_id – The RGD ID corresponding to the given HGNC ID. Return type: str
-
indra.databases.hgnc_client.
get_uniprot_id
(hgnc_id)[source]¶ Return the UniProt ID corresponding to the given HGNC ID.
Parameters: hgnc_id (str) – The HGNC ID to be converted. Note that the HGNC ID is a number that is passed as a string. It is not the same as the HGNC gene symbol. Returns: uniprot_id – The UniProt ID corresponding to the given HGNC ID. Return type: str
Uniprot client (indra.databases.uniprot_client
)¶
-
indra.databases.uniprot_client.
get_family_members
(family_name, human_only=True)[source]¶ Return the HGNC gene symbols which are the members of a given family.
Parameters: - family_name (str) – Family name to be queried.
- human_only (bool) – If True, only human proteins in the family will be returned. Default: True
Returns: gene_names – The HGNC gene symbols corresponding to the given family.
Return type: list
-
indra.databases.uniprot_client.
get_gene_name
(protein_id, web_fallback=True)[source]¶ Return the gene name for the given UniProt ID.
This is an alternative to get_hgnc_name and is useful when HGNC name is not availabe (for instance, when the organism is not homo sapiens).
Parameters: - protein_id (str) – UniProt ID to be mapped.
- web_fallback (Optional[bool]) – If True and the offline lookup fails, the UniProt web service is used to do the query.
Returns: gene_name – The gene name corresponding to the given Uniprot ID.
Return type: str
-
indra.databases.uniprot_client.
get_id_from_mgi
(mgi_id)[source]¶ Return the UniProt ID given the MGI ID of a mouse protein.
Parameters: mgi_id (str) – The MGI ID of the mouse protein. Returns: up_id – The UniProt ID of the mouse protein. Return type: str
-
indra.databases.uniprot_client.
get_id_from_mnemonic
(uniprot_mnemonic)[source]¶ Return the UniProt ID for the given UniProt mnemonic.
Parameters: uniprot_mnemonic (str) – UniProt mnemonic to be mapped. Returns: uniprot_id – The UniProt ID corresponding to the given Uniprot mnemonic. Return type: str
-
indra.databases.uniprot_client.
get_id_from_rgd
(rgd_id)[source]¶ Return the UniProt ID given the RGD ID of a rat protein.
Parameters: rgd_id (str) – The RGD ID of the rat protein. Returns: up_id – The UniProt ID of the rat protein. Return type: str
-
indra.databases.uniprot_client.
get_mgi_id
(protein_id)[source]¶ Return the MGI ID given the protein id of a mouse protein.
Parameters: protein_id (str) – UniProt ID of the mouse protein Returns: mgi_id – MGI ID of the mouse protein Return type: str
-
indra.databases.uniprot_client.
get_mnemonic
(protein_id, web_fallback=False)[source]¶ Return the UniProt mnemonic for the given UniProt ID.
Parameters: - protein_id (str) – UniProt ID to be mapped.
- web_fallback (Optional[bool]) – If True and the offline lookup fails, the UniProt web service is used to do the query.
Returns: mnemonic – The UniProt mnemonic corresponding to the given Uniprot ID.
Return type: str
-
indra.databases.uniprot_client.
get_mouse_id
(human_protein_id)[source]¶ Return the mouse UniProt ID given a human UniProt ID.
Parameters: human_protein_id (str) – The UniProt ID of a human protein. Returns: mouse_protein_id – The UniProt ID of a mouse protein orthologous to the given human protein Return type: str
-
indra.databases.uniprot_client.
get_primary_id
(protein_id)[source]¶ Return a primary entry corresponding to the UniProt ID.
Parameters: protein_id (str) – The UniProt ID to map to primary. Returns: primary_id – If the given ID is primary, it is returned as is. Othwewise the primary IDs are looked up. If there are multiple primary IDs then the first human one is returned. If there are no human primary IDs then the first primary found is returned. Return type: str
-
indra.databases.uniprot_client.
get_rat_id
(human_protein_id)[source]¶ Return the rat UniProt ID given a human UniProt ID.
Parameters: human_protein_id (str) – The UniProt ID of a human protein. Returns: rat_protein_id – The UniProt ID of a rat protein orthologous to the given human protein Return type: str
-
indra.databases.uniprot_client.
get_rgd_id
(protein_id)[source]¶ Return the RGD ID given the protein id of a rat protein.
Parameters: protein_id (str) – UniProt ID of the rat protein Returns: rgd_id – RGD ID of the rat protein Return type: str
-
indra.databases.uniprot_client.
is_human
(protein_id)[source]¶ Return True if the given protein id corresponds to a human protein.
Parameters: protein_id (str) – UniProt ID of the protein Returns: Return type: True if the protein_id corresponds to a human protein, otherwise False.
-
indra.databases.uniprot_client.
is_mouse
(protein_id)[source]¶ Return True if the given protein id corresponds to a mouse protein.
Parameters: protein_id (str) – UniProt ID of the protein Returns: Return type: True if the protein_id corresponds to a mouse protein, otherwise False.
-
indra.databases.uniprot_client.
is_rat
(protein_id)[source]¶ Return True if the given protein id corresponds to a rat protein.
Parameters: protein_id (str) – UniProt ID of the protein Returns: Return type: True if the protein_id corresponds to a rat protein, otherwise False.
-
indra.databases.uniprot_client.
is_secondary
(protein_id)[source]¶ Return True if the UniProt ID corresponds to a secondary accession.
Parameters: protein_id (str) – The UniProt ID to check. Returns: Return type: True if it is a secondary accessing entry, False otherwise.
-
indra.databases.uniprot_client.
query_protein
[source]¶ Return the UniProt entry as an RDF graph for the given UniProt ID.
Parameters: protein_id (str) – UniProt ID to be queried. Returns: g – The RDF graph corresponding to the UniProt entry. Return type: rdflib.Graph
-
indra.databases.uniprot_client.
verify_location
(protein_id, residue, location)[source]¶ Return True if the residue is at the given location in the UP sequence.
Parameters: - protein_id (str) – UniProt ID of the protein whose sequence is used as reference.
- residue (str) – A single character amino acid symbol (Y, S, T, V, etc.)
- location (str) – The location on the protein sequence (starting at 1) at which the residue should be checked against the reference sequence.
Returns: - True if the given residue is at the given position in the sequence
- corresponding to the given UniProt ID, otherwise False.
-
indra.databases.uniprot_client.
verify_modification
(protein_id, residue, location=None)[source]¶ Return True if the residue at the given location has a known modifiation.
Parameters: - protein_id (str) – UniProt ID of the protein whose sequence is used as reference.
- residue (str) – A single character amino acid symbol (Y, S, T, V, etc.)
- location (Optional[str]) – The location on the protein sequence (starting at 1) at which the modification is checked.
Returns: - True if the given residue is reported to be modified at the given position
- in the sequence corresponding to the given UniProt ID, otherwise False.
- If location is not given, we only check if there is any residue of the
- given type that is modified.
ChEBI client (indra.databases.chebi_client
)¶
-
indra.databases.chebi_client.
get_chebi_id_from_pubchem
(pubchem_id)[source]¶ Return the ChEBI ID corresponding to a given Pubchem ID.
Parameters: pubchem_id (str) – Pubchem ID to be converted. Returns: chebi_id – ChEBI ID corresponding to the given Pubchem ID. If the lookup fails, None is returned. Return type: str
BioGRID client (indra.databases.biogrid_client
)¶
-
indra.databases.biogrid_client.
get_publications
(gene_names, save_json_name=None)[source]¶ Return evidence publications for interaction between the given genes.
Parameters: - gene_names (list[str]) – A list of gene names (HGNC symbols) to query interactions between. Currently supports exactly two genes only.
- save_json_name (Optional[str]) – A file name to save the raw BioGRID web service output in. By default, the raw output is not saved.
Returns: publications – A list of Publication objects that provide evidence for interactions between the given list of genes.
Return type: list[Publication]
Cell type context client (indra.databases.context_client
)¶
Network relevance client (indra.databases.relevance_client
)¶
-
indra.databases.relevance_client.
get_heat_kernel
(network_id)[source]¶ Return the identifier of a heat kernel calculated for a given network.
Parameters: network_id (str) – The UUID of the network in NDEx. Returns: kernel_id – The identifier of the heat kernel calculated for the given network. Return type: str
-
indra.databases.relevance_client.
get_relevant_nodes
(network_id, query_nodes)[source]¶ Return a set of network nodes relevant to a given query set.
A heat diffusion algorithm is used on a pre-computed heat kernel for the given network which starts from the given query nodes. The nodes in the network are ranked according to heat score which is a measure of relevance with respect to the query nodes.
Parameters: - network_id (str) – The UUID of the network in NDEx.
- query_nodes (list[str]) – A list of node names with respect to which relevance is queried.
Returns: ranked_entities – A list containing pairs of node names and their relevance scores.
Return type: list[(str, float)]
NDEx client (indra.databases.ndex_client
)¶
-
indra.databases.ndex_client.
send_request
(ndex_service_url, params, is_json=True, use_get=False)[source]¶ Send a request to the NDEx server.
Parameters: - ndex_service_url (str) – The URL of the service to use for the request.
- params (dict) – A dictionary of parameters to send with the request. Parameter keys differ based on the type of request.
- is_json (bool) – True if the response is in json format, otherwise it is assumed to be text. Default: False
- use_get (bool) – True if the request needs to use GET instead of POST.
Returns: res – Depending on the type of service and the is_json parameter, this function either returns a text string or a json dict.
Return type: str
cBio portal client (indra.databases.cbio_client
)¶
Literature clients (indra.literature
)¶
-
indra.literature.
get_full_text
(paper_id, idtype, preferred_content_type='text/xml')[source]¶ Return the content and the content type of an article.
This function retreives the content of an article by its PubMed ID, PubMed Central ID, or DOI. It prioritizes full text content when available and returns an abstract from PubMed as a fallback.
Parameters: - paper_id (string) – ID of the article.
- idtype ('pmid', 'pmcid', or 'doi) – Type of the ID.
- preferred_content_type (Optional[st]r) – Preference for full-text format, if available. Can be one of ‘text/xml’, ‘text/plain’, ‘application/pdf’. Default: ‘text/xml’
Returns: - content (str) – The content of the article.
- content_type (str) – The content type of the article
-
indra.literature.
id_lookup
(paper_id, idtype)[source]¶ Take an ID of type PMID, PMCID, or DOI and lookup the other IDs.
If the DOI is not found in Pubmed, try to obtain the DOI by doing a reverse-lookup of the DOI in CrossRef using article metadata.
Parameters: - paper_id (string) – ID of the article.
- idtype ('pmid', 'pmcid', or 'doi) – Type of the ID.
Returns: ids – A dictionary with the following keys: pmid, pmcid and doi.
Return type: dict
Pubmed client (indra.literature.pubmed_client
)¶
Search and get metadata for articles in Pubmed.
-
indra.literature.pubmed_client.
expand_pagination
(pages)[source]¶ Convert a page number to long form, e.g., from 456-7 to 456-457.
-
indra.literature.pubmed_client.
get_abstract
(pubmed_id, prepend_title=True)[source]¶ Get the abstract of an article in the Pubmed database.
-
indra.literature.pubmed_client.
get_article_xml
[source]¶ Get the XML metadata for a single article from the Pubmed database.
-
indra.literature.pubmed_client.
get_ids
[source]¶ Search Pubmed for paper IDs given a search term.
The options are passed as named arguments. For details on parameters that can be used, see https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch Some useful parameters to pass are db=’pmc’ to search PMC instead of pubmed reldate=2 to search for papers within the last 2 days mindate=‘2016/03/01’, maxdate=‘2016/03/31’ to search for papers in March 2016.
-
indra.literature.pubmed_client.
get_ids_for_gene
[source]¶ Get the curated set of articles for a gene in the Entrez database.
Search parameters for the Gene database query can be passed in as keyword arguments.
Parameters: hgnc_name (string) – The HGNC name of the gene. This is used to obtain the HGNC ID (using the hgnc_client module) and in turn used to obtain the Entrez ID associated with the gene. Entrez is then queried for that ID.
-
indra.literature.pubmed_client.
get_issns_for_journal
[source]¶ Get a list of the ISSN numbers for a journal given its NLM ID.
Structure of the XML output returned by the NLM Catalog query:
NLMCatalogRecordSet NLMCatalogRecord NlmUniqueID DateCreated DateRevised DateAuthorized DateCompleted DateRevisedMajor TitleMain MedlineTA TitleAlternate + AuthorList ResourceInfo TypeOfResource Issuance ResourceUnit PublicationTypeList PublicationInfo Country PlaceCode Imprint PublicationFirstYear PublicationEndYear Language PhysicalDescription IndexingSourceList IndexingSource IndexingSourceName Coverage GeneralNote + LocalNote MeshHeadingList Classification ELocationList LCCN ISSN + ISSNLinking Coden OtherID +
-
indra.literature.pubmed_client.
get_metadata_for_ids
(pmid_list, get_issns_from_nlm=False)[source]¶ Get article metadata for up to 200 PMIDs from the Pubmed database.
Parameters: - pmid_list (list of PMIDs as strings) – Can contain 1-200 PMIDs.
- get_issns_from_nlm (boolean) – Look up the full list of ISSN number for the journal associated with the article, which helps to match articles to CrossRef search results. Defaults to False, since it slows down performance.
Returns: Contains the following fields: ‘doi’, ‘title’, ‘authors’, ‘journal_title’, ‘journal_abbrev’, ‘journal_nlm_id’, ‘issn_list’, ‘page’.
Return type: dict
Pubmed Central client (indra.literature.pmc_client
)¶
-
indra.literature.pmc_client.
filter_pmids
(pmid_list, source_type)[source]¶ Filter a list of PMIDs for ones with full text from PMC.
Parameters: - pmid_list (list) – List of PMIDs to filter.
- source_type (string) – One of ‘fulltext’, ‘oa_xml’, ‘oa_txt’, or ‘auth_xml’.
Returns: Return type: list of PMIDs available in the specified source/format type.
CrossRef client (indra.literature.crossref_client
)¶
-
indra.literature.crossref_client.
doi_query
(pmid, search_limit=10)[source]¶ Get the DOI for a PMID by matching CrossRef and Pubmed metadata.
Searches CrossRef using the article title and then accepts search hits only if they have a matching journal ISSN and page number with what is obtained from the Pubmed database.
-
indra.literature.crossref_client.
get_fulltext_links
(doi)[source]¶ Return a list of links to the full text of an article given its DOI. Each list entry is a dictionary with keys: - URL: the URL to the full text - content-type: e.g. text/xml or text/plain - content-version - intended-application: e.g. text-mining
Elsevier client (indra.literature.elsevier_client
)¶
- For information on the Elsevier API, see:
- API Specification: http://dev.elsevier.com/api_docs.html
- Authentication: https://dev.elsevier.com/tecdoc_api_authentication.html
-
indra.literature.elsevier_client.
download_article
(doi)[source]¶ Download an article in XML format from Elsevier.
-
indra.literature.elsevier_client.
get_abstract
(doi)[source]¶ Get the abstract of an article from Elsevier.
-
indra.literature.elsevier_client.
get_article
(doi, output='txt')[source]¶ Get the full body of an article from Elsevier. There are two output modes: ‘txt’ strips all xml tags and joins the pieces of text in the main text, while ‘xml’ simply takes the tag containing the body of the article and returns it as is . In the latter case, downstream code needs to be able to interpret Elsever’s XML format.
-
indra.literature.elsevier_client.
get_dois
[source]¶ Search ScienceDirect through the API for articles.
See http://api.elsevier.com/content/search/fields/scidir for constructing a query string to pass here. Example: ‘abstract(BRAF) AND all(“colorectal cancer”)’
Preassembly (indra.preassembler
)¶
Preassembler (indra.preassembler
)¶
-
class
indra.preassembler.
Preassembler
(hierarchies, stmts=None)[source]¶ De-duplicates statements and arranges them in a specificity hierarchy.
Parameters: - hierarchies (dict[
indra.preassembler.hierarchy_manager
]) – A dictionary of hierarchies with keys such as ‘entity’ (hierarchy of entities, primarily specifying relationships between genes and their families) and ‘modification’ pointing to HierarchyManagers - stmts (list of
indra.statements.Statement
or None) – A set of statements to perform pre-assembly on. If None, statements should be added using theadd_statements()
method.
-
stmts
¶ list of
indra.statements.Statement
– Starting set of statements for preassembly.
-
unique_stmts
¶ list of
indra.statements.Statement
– Statements resulting from combining duplicates.
list of
indra.statements.Statement
– Top-level statements after building the refinement hierarchy.
-
hierarchies
¶ dict[
indra.preassembler.hierarchy_manager
] – A dictionary of hierarchies with keys such as ‘entity’ and ‘modification’ pointing to HierarchyManagers
-
add_statements
(stmts)[source]¶ Add to the current list of statements.
Parameters: stmts (list of indra.statements.Statement
) – Statements to add to the current list.
-
static
combine_duplicate_stmts
(stmts)[source]¶ Combine evidence from duplicate Statements.
Statements are deemed to be duplicates if they have the same key returned by the matches_key() method of the Statement class. This generally means that statements must be identical in terms of their arguments and can differ only in their associated Evidence objects.
This function keeps the first instance of each set of duplicate statements and merges the lists of Evidence from all of the other statements.
Parameters: stmts (list of indra.statements.Statement
) – Set of statements to de-duplicate.Returns: Unique statements with accumulated evidence across duplicates. Return type: list of indra.statements.Statement
Examples
De-duplicate and combine evidence for two statements differing only in their evidence lists:
>>> map2k1 = Agent('MAP2K1') >>> mapk1 = Agent('MAPK1') >>> stmt1 = Phosphorylation(map2k1, mapk1, 'T', '185', ... evidence=[Evidence(text='evidence 1')]) >>> stmt2 = Phosphorylation(map2k1, mapk1, 'T', '185', ... evidence=[Evidence(text='evidence 2')]) >>> uniq_stmts = Preassembler.combine_duplicate_stmts([stmt1, stmt2]) >>> uniq_stmts [Phosphorylation(MAP2K1(), MAPK1(), T, 185)] >>> sorted([e.text for e in uniq_stmts[0].evidence]) ['evidence 1', 'evidence 2']
-
combine_duplicates
()[source]¶ Combine duplicates among stmts and save result in unique_stmts.
A wrapper around the static method
combine_duplicate_stmts()
.
Connect related statements based on their refinement relationships.
This function takes as a starting point the unique statements (with duplicates removed) and returns a modified flat list of statements containing only those statements which do not represent a refinement of other existing statements. In other words, the more general versions of a given statement do not appear at the top level, but instead are listed in the supports field of the top-level statements.
If
unique_stmts
has not been initialized with the de-duplicated statements,combine_duplicates()
is called internally.After this function is called the attribute
related_stmts
is set as a side-effect.The procedure for combining statements in this way involves a series of steps:
- The statements are grouped by type (e.g., Phosphorylation) and each type is iterated over independently.
- Statements of the same type are then grouped according to their Agents’ entity hierarchy component identifiers. For instance, ERK, MAPK1 and MAPK3 are all in the same connected component in the entity hierarchy and therefore all Statements of the same type referencing these entities will be grouped. This grouping assures that relations are only possible within Statement groups and not among groups. For two Statements to be in the same group at this step, the Statements must be the same type and the Agents at each position in the Agent lists must either be in the same hierarchy component, or if they are not in the hierarchy, must have identical entity_matches_keys. Statements with None in one of the Agent list positions are collected separately at this stage.
- Statements with None at either the first or second position are iterated over. For a statement with a None as the first Agent, the second Agent is examined; then the Statement with None is added to all Statement groups with a corresponding component or entity_matches_key in the second position. The same procedure is performed for Statements with None at the second Agent position.
- The statements within each group are then compared; if one statement represents a refinement of the other (as defined by the refinement_of() method implemented for the Statement), then the more refined statement is added to the supports field of the more general statement, and the more general statement is added to the supported_by field of the more refined statement.
- A new flat list of statements is created that contains only those statements that have no supports entries (statements containing such entries are not eliminated, because they will be retrievable from the supported_by fields of other statements). This list is returned to the caller.
On multi-core machines, the algorithm can be parallelized by setting the poolsize argument to the desired number of worker processes. This feature is only available in Python > 3.4.
Note
Subfamily relationships must be consistent across arguments
For now, we require that merges can only occur if the isa relationships are all in the same direction for all the agents in a Statement. For example, the two statement groups: RAF_family -> MEK1 and BRAF -> MEK_family would not be merged, since BRAF isa RAF_family, but MEK_family is not a MEK1. In the future this restriction could be revisited.
Parameters: - return_toplevel (Optional[bool]) – If True only the top level statements are returned. If False, all statements are returned. Default: True
- poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
- size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
Returns: The returned list contains Statements representing the more concrete/refined versions of the Statements involving particular entities. The attribute
related_stmts
is also set to this list. However, if return_toplevel is False then all statements are returned, irrespective of level of specificity. In this case the relationships between statements can be accessed via the supports/supported_by attributes.Return type: list of
indra.statement.Statement
Examples
A more general statement with no information about a Phosphorylation site is identified as supporting a more specific statement:
>>> from indra.preassembler.hierarchy_manager import hierarchies >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1) >>> st2 = Phosphorylation(braf, map2k1, residue='S') >>> pa = Preassembler(hierarchies, [st1, st2]) >>> combined_stmts = pa.combine_related() >>> combined_stmts [Phosphorylation(BRAF(), MAP2K1(), S)] >>> combined_stmts[0].supported_by [Phosphorylation(BRAF(), MAP2K1())] >>> combined_stmts[0].supported_by[0].supports [Phosphorylation(BRAF(), MAP2K1(), S)]
- hierarchies (dict[
-
indra.preassembler.
flatten_evidence
(stmts)[source]¶ Add evidence from supporting stmts to evidence for supported stmts.
Parameters: stmts (list of indra.statements.Statement
) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy withcombine_related()
.Returns: stmts – Statement hierarchy identical to the one passed, but with the evidence lists for each statement now containing all of the evidence associated with the statements they are supported by. Return type: list of indra.statements.Statement
Examples
Flattening evidence adds the two pieces of evidence from the supporting statement to the evidence list of the top-level statement:
>>> from indra.preassembler.hierarchy_manager import hierarchies >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1, ... evidence=[Evidence(text='foo'), Evidence(text='bar')]) >>> st2 = Phosphorylation(braf, map2k1, residue='S', ... evidence=[Evidence(text='baz'), Evidence(text='bak')]) >>> pa = Preassembler(hierarchies, [st1, st2]) >>> pa.combine_related() [Phosphorylation(BRAF(), MAP2K1(), S)] >>> [e.text for e in pa.related_stmts[0].evidence] ['baz', 'bak'] >>> flattened = flatten_evidence(pa.related_stmts) >>> sorted([e.text for e in flattened[0].evidence]) ['bak', 'bar', 'baz', 'foo']
-
indra.preassembler.
flatten_stmts
(stmts)[source]¶ Return the full set of unique stms in a pre-assembled stmt graph.
The flattened list of of statements returned by this function can be compared to the original set of unique statements to make sure no statements have been lost during the preassembly process.
Parameters: stmts (list of indra.statements.Statement
) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy withcombine_related()
.Returns: stmts – List of all statements contained in the hierarchical statement graph. Return type: list of indra.statements.Statement
Examples
Calling
combine_related()
on two statements results in one top-level statement; callingflatten_stmts()
recovers both:>>> from indra.preassembler.hierarchy_manager import hierarchies >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1) >>> st2 = Phosphorylation(braf, map2k1, residue='S') >>> pa = Preassembler(hierarchies, [st1, st2]) >>> pa.combine_related() [Phosphorylation(BRAF(), MAP2K1(), S)] >>> flattened = flatten_stmts(pa.related_stmts) >>> flattened.sort(key=lambda x: x.matches_key()) >>> flattened [Phosphorylation(BRAF(), MAP2K1()), Phosphorylation(BRAF(), MAP2K1(), S)]
-
indra.preassembler.
render_stmt_graph
(statements, agent_style=None)[source]¶ Render the statement hierarchy as a pygraphviz graph.
Parameters: - stmts (list of
indra.statements.Statement
) – A list of top-level statements with associated supporting statements resulting from building a statement hierarchy withcombine_related()
. - agent_style (dict or None) –
Dict of attributes specifying the visual properties of nodes. If None, the following default attributes are used:
agent_style = {'color': 'lightgray', 'style': 'filled', 'fontname': 'arial'}
Returns: Pygraphviz graph with nodes representing statements and edges pointing from supported statements to supported_by statements.
Return type: pygraphviz.AGraph
Examples
Pattern for getting statements and rendering as a Graphviz graph:
>>> from indra.preassembler.hierarchy_manager import hierarchies >>> braf = Agent('BRAF') >>> map2k1 = Agent('MAP2K1') >>> st1 = Phosphorylation(braf, map2k1) >>> st2 = Phosphorylation(braf, map2k1, residue='S') >>> pa = Preassembler(hierarchies, [st1, st2]) >>> pa.combine_related() [Phosphorylation(BRAF(), MAP2K1(), S)] >>> graph = render_stmt_graph(pa.related_stmts) >>> graph.write('example_graph.dot') # To make the DOT file >>> graph.draw('example_graph.png', prog='dot') # To make an image
Resulting graph:
- stmts (list of
Entity grounding curation and mapping (indra.preassembler.grounding_mapper
)¶
-
indra.preassembler.grounding_mapper.
protein_map_from_twg
(twg)[source]¶ Build map of entity texts to validated protein grounding.
Looks at the grounding of the entity texts extracted from the statements and finds proteins where there is grounding to a human protein that maps to an HGNC name that is an exact match to the entity text. Returns a dict that can be used to update/expand the grounding map.
Site curation and mapping (indra.preassembler.sitemapper
)¶
-
class
indra.preassembler.sitemapper.
MappedStatement
(original_stmt, mapped_mods, mapped_stmt)[source]¶ Information about a Statement found to have invalid sites.
Parameters: - original_stmt (
indra.statements.Statement
) – The statement prior to mapping. - mapped_mods (list of tuples) – A list of invalid sites, where each entry in the list has two elements: ((gene_name, residue, position), mapped_site). If the invalid position was not found in the site map, mapped_site is None; otherwise it is a tuple consisting of (residue, position, comment).
- mapped_stmt (
indra.statements.Statement
) – The statement after mapping. Note that if no information was found in the site map, it will be identical to the original statement.
- original_stmt (
-
class
indra.preassembler.sitemapper.
SiteMapper
(site_map)[source]¶ Use curated site information to standardize modification sites in stmts.
Parameters: site_map (dict (as returned by load_site_map()
)) – A dict mapping tuples of the form (gene, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where gene is the string name of the gene (canonicalized to HGNC); orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.).Examples
Fixing site errors on both the modification state of an agent (MAP2K1) and the target of a Phosphorylation statement (MAPK1):
>>> map2k1_phos = Agent('MAP2K1', db_refs={'UP':'Q02750'}, mods=[ ... ModCondition('phosphorylation', 'S', '217'), ... ModCondition('phosphorylation', 'S', '221')]) >>> mapk1 = Agent('MAPK1', db_refs={'UP':'P28482'}) >>> stmt = Phosphorylation(map2k1_phos, mapk1, 'T','183') >>> (valid, mapped) = default_mapper.map_sites([stmt]) >>> valid [] >>> mapped [ MappedStatement: original_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183) mapped_mods: (('MAP2K1', 'S', '217'), ('S', '218', 'off by one')) (('MAP2K1', 'S', '221'), ('S', '222', 'off by one')) (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence')) mapped_stmt: Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185) ] >>> ms = mapped[0] >>> ms.original_stmt Phosphorylation(MAP2K1(mods: (phosphorylation, S, 217), (phosphorylation, S, 221)), MAPK1(), T, 183) >>> ms.mapped_mods [(('MAP2K1', 'S', '217'), ('S', '218', 'off by one')), (('MAP2K1', 'S', '221'), ('S', '222', 'off by one')), (('MAPK1', 'T', '183'), ('T', '185', 'off by two; mouse sequence'))] >>> ms.mapped_stmt Phosphorylation(MAP2K1(mods: (phosphorylation, S, 218), (phosphorylation, S, 222)), MAPK1(), T, 185)
-
map_sites
(stmts, do_methionine_offset=True, do_orthology_mapping=True, do_isoform_mapping=True)[source]¶ Check a set of statements for invalid modification sites.
Statements are checked against Uniprot reference sequences to determine if residues referred to by post-translational modifications exist at the given positions.
If there is nothing amiss with a statement (modifications on any of the agents, modifications made in the statement, etc.), then the statement goes into the list of valid statements. If there is a problem with the statement, the offending modifications are looked up in the site map (
site_map
), and an instance ofMappedStatement
is added to the list of mapped statements.Parameters: - stmts (list of
indra.statement.Statement
) – The statements to check for site errors. - do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.
- do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
- do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
Returns: 2-tuple containing (valid_statements, mapped_statements). The first element of the tuple is a list valid statements (
indra.statement.Statement
) that were not found to contain any site errors. The second element of the tuple is a list of mapped statements (MappedStatement
) with information on the incorrect sites and corresponding statements with correctly mapped sites.Return type: tuple
- stmts (list of
-
-
indra.preassembler.sitemapper.
default_mapper
= <indra.preassembler.sitemapper.SiteMapper object>¶ A default instance of
SiteMapper
that contains the site information found in resources/curated_site_map.csv’.
-
indra.preassembler.sitemapper.
load_site_map
(path)[source]¶ Load the modification site map from a file.
The site map file should be a comma-separated file with six columns:
Gene: HGNC gene name OrigRes: Original (incorrect) residue OrigPos: Original (incorrect) residue position CorrectRes: The correct residue for the modification CorrectPos: The correct residue position Comment: Description of the reason for the error.
Parameters: path (string) – Path to the tab-separated site map file. Returns: A dict mapping tuples of the form (gene, orig_res, orig_pos) to a tuple of the form (correct_res, correct_pos, comment), where gene is the string name of the gene (canonicalized to HGNC); orig_res and orig_pos are the residue and position to be mapped; correct_res and correct_pos are the corrected residue and position, and comment is a string describing the reason for the mapping (species error, isoform error, wrong residue name, etc.). Return type: dict
Hierarchy manager (indra.preassembler.hierarchy_manager
)¶
-
class
indra.preassembler.hierarchy_manager.
HierarchyManager
(rdf_file, build_closure=True, uri_as_name=True)[source]¶ Store hierarchical relationships between different types of entities.
Used to store, e.g., entity hierarchies (proteins and protein families) and modification hierarchies (serine phosphorylation vs. phosphorylation).
Parameters: - rdf_file (string) – Path to the RDF file containing the hierarchy.
- build_closure (Optional[bool]) – If True, the transitive closure of the hierarchy is generated up from to speed up processing. Default: True
- uri_as_name (Optional[bool]) – If True, entries are accessed directly by their URIs. If False entries are accessed by finding their name through the hasName relationship. Default: True
-
graph
¶ instance of rdflib.Graph – The RDF graph containing the hierarchy.
-
build_transitive_closures
()[source]¶ Build the transitive closures of the hierarchy.
This method constructs dictionaries which contain terms in the hierarchy as keys and either all the “isa+” or “partof+” related terms as values.
-
find_entity
[source]¶ Get the entity that has the specified name (or synonym).
Parameters: x (string) – Name or synonym for the target entity.
-
get_children
(uri)[source]¶ Return all (not just immediate) children of a given entry.
Parameters: uri (str) – The URI of the entry whose children are to be returned. See the get_uri method to construct this URI from a name space and id.
-
get_parents
(uri, type='all')[source]¶ Return parents of a given entry.
Parameters: - uri (str) – The URI of the entry whose parents are to be returned. See the get_uri method to construct this URI from a name space and id.
- type (str) – ‘all’: return all parents irrespective of level; ‘immediate’: return only the immediate parents; ‘top’: return only the highest level parents
-
isa
(ns1, id1, ns2, id2)[source]¶ Indicate whether one entity has an “isa” relationship to another.
Parameters: - ns1 (string) – Namespace code for an entity.
- id1 (string) – URI for an entity.
- ns2 (string) – Namespace code for an entity.
- id2 (string) – URI for an entity.
Returns: True if t1 has an “isa” relationship with t2, either directly or through a series of intermediates; False otherwise.
Return type: bool
-
partof
(ns1, id1, ns2, id2)[source]¶ Indicate whether one entity is physically part of another.
Parameters: - ns1 (string) – Namespace code for an entity.
- id1 (string) – URI for an entity.
- ns2 (string) – Namespace code for an entity.
- id2 (string) – URI for an entity.
Returns: True if t1 has a “partof” relationship with t2, either directly or through a series of intermediates; False otherwise.
Return type: bool
Belief Engine (indra.belief
)¶
-
class
indra.belief.
BeliefEngine
(prior_probs=None)[source]¶ Assigns beliefs to INDRA Statements based on supporting evidence.
Parameters: prior_probs (Optional[dict[dict]]) – A dictionary of prior probabilities used to override/extend the default ones. There are two types of prior probabilities: rand and syst corresponding to random error and systematic error rate for each knowledge source. The prior_probs dictionary has the general structure {‘rand’: {‘s1’: pr1, ..., ‘sn’: prn}, ‘syst’: {‘s1’: ps1, ..., ‘sn’: psn}} where ‘s1’ ... ‘sn’ are names of input sources and pr1 ... prn and ps1 ... psn are error probabilities. Examples: {‘rand’: {‘some_source’: 0.1}} sets the random error rate for some_source to 0.1; {‘rand’: {‘’}} -
prior_probs
¶ dict[dict] – A dictionary of prior systematic and random error probabilities for each knowledge source.
-
set_hierarchy_probs
(statements)[source]¶ Sets hierarchical belief probabilities for a list of INDRA Statements.
The Statements are assumed to be in a hierarchical relation graph with the supports and supported_by attribute of each Statement object having been set. The hierarchical belief probability of each Statement is calculated based on its prior probability and the probabilities propagated from Statements supporting it in the hierarchy graph.
Parameters: statements (list[indra.statements.Statement]) – A list of INDRA Statements whose belief scores are to be calculated. Each Statement object’s belief attribute is updated by this function.
-
set_linked_probs
(linked_statements)[source]¶ Sets the belief probabilities for a list of linked INDRA Statements.
The list of LinkedStatement objects is assumed to come from the MechanismLinker. The belief probability of the inferred Statement is assigned the joint probability of its source Statements.
Parameters: linked_statements (list[indra.mechlinker.LinkedStatement]) – A list of INDRA LinkedStatements whose belief scores are to be calculated. The belief attribute of the inferred Statement in the LinkedStatement object is updated by this function.
-
set_prior_probs
(statements)[source]¶ Sets the prior belief probabilities for a list of INDRA Statements.
The Statements are assumed to be de-duplicated. In other words, each Statement in the list passed to this function is assumed to have a list of Evidence objects that support it. The prior probability of each Statement is calculated based on the number of Evidences it has and their sources.
Parameters: statements (list[indra.statements.Statement]) – A list of INDRA Statements whose belief scores are to be calculated. Each Statement object’s belief attribute is updated by this function.
-
Mechanism Linker (indra.mechlinker
)¶
-
class
indra.mechlinker.
AgentState
(agent)[source]¶ A class representing Agent state without identifying a specific Agent.
bound_conditions : list[indra.statements.BoundCondition] mods : list[indra.statements.ModCondition] mutations : list[indra.statements.Mutation] location : indra.statements.location
-
apply_to
(agent)[source]¶ Apply this object’s state to an Agent.
Parameters: agent (indra.statements.Agent) – The agent to which the state should be applied
-
-
class
indra.mechlinker.
BaseAgent
(name)[source]¶ Represents all activity types and active forms of an Agent.
Parameters: - name (str) – The name of the BaseAgent
- activity_types (list[str]) – A list of activity types that the Agent has
- active_states (dict) – A dict of activity types and their associated Agent states
- activity_reductions (dict) – A dict of activity types and the type they are reduced to by inference.
-
class
indra.mechlinker.
BaseAgentSet
[source]¶ Container for a set of BaseAgents.
This class wraps a dict of BaseAgent instance and can be used to get and set BaseAgents.
-
get_create_base_agent
(agent)[source]¶ Return BaseAgent from an Agent, creating it if needed.
Parameters: agent (indra.statements.Agent) – Returns: base_agent Return type: indra.mechlinker.BaseAgent
-
-
class
indra.mechlinker.
LinkedStatement
(source_stmts, inferred_stmt)[source]¶ A tuple containing a list of source Statements and an inferred Statement.
The list of source Statements are the basis for the inferred Statement.
Parameters: - source_stmts (list[indra.statements.Statement]) – A list of source Statements
- inferred_stmts (indra.statements.Statement) – A Statement that was inferred from the source Statements.
-
class
indra.mechlinker.
MechLinker
(stmts=None)[source]¶ Rewrite the activation pattern of Statements and derive new Statements.
The mechanism linker (MechLinker) traverses a corpus of Statements and uses various inference steps to make the activity types and active forms consistent among Statements.
-
add_statements
(stmts)[source]¶ Add statements to the MechLinker.
Parameters: stmts (list[indra.statements.Statement]) – A list of Statements to add.
-
gather_explicit_activities
()[source]¶ Aggregate all explicit activities and active forms of Agents.
This function iterates over self.statements and extracts explicitly stated activity types and active forms for Agents.
-
gather_implicit_activities
()[source]¶ Aggregate all implicit activities and active forms of Agents.
Iterate over self.statements and collect the implied activities and active forms of Agents that appear in the Statements.
Note that using this function to collect implied Agent activities can be risky. Assume, for instance, that a Statement from a reading system states that EGF bound to EGFR phosphorylates ERK. This would be interpreted as implicit evidence for the EGFR-bound form of EGF to have ‘kinase’ activity, which is clearly incorrect.
In contrast the alternative pair of this function: gather_explicit_activities collects only explicitly stated activities.
-
static
infer_activations
(stmts)[source]¶ Return inferred RegulateActivity from Modification + ActiveForm.
This function looks for combinations of Modification and ActiveForm Statements and infers Activation/Inhibition Statements from them. For example, if we know that A phosphorylates B, and the phosphorylated form of B is active, then we can infer that A activates B. This can also be viewed as having “explained” a given Activation/Inhibition Statement with a combination of more mechanistic Modification + ActiveForm Statements.
Parameters: stmts (list[indra.statements.Statement]) – A list of Statements to infer RegulateActivity from. Returns: linked_stmts – A list of LinkedStatements representing the inferred Statements. Return type: list[indra.mechlinker.LinkedStatement]
-
static
infer_active_forms
(stmts)[source]¶ Return inferred ActiveForm from RegulateActivity + Modification.
This function looks for combinations of Activation/Inhibition Statements and Modification Statements, and infers an ActiveForm from them. For example, if we know that A activates B and A phosphorylates B, then we can infer that the phosphorylated form of B is active.
Parameters: stmts (list[indra.statements.Statement]) – A list of Statements to infer ActiveForms from. Returns: linked_stmts – A list of LinkedStatements representing the inferred Statements. Return type: list[indra.mechlinker.LinkedStatement]
-
static
infer_complexes
(stmts)[source]¶ Return inferred Complex from Statements implying physical interaction.
Parameters: stmts (list[indra.statements.Statement]) – A list of Statements to infer Complexes from. Returns: linked_stmts – A list of LinkedStatements representing the inferred Statements. Return type: list[indra.mechlinker.LinkedStatement]
-
static
infer_modifications
(stmts)[source]¶ Return inferred Modification from RegulateActivity + ActiveForm.
This function looks for combinations of Activation/Inhibition Statements and ActiveForm Statements that imply a Modification Statement. For example, if we know that A activates B, and phosphorylated B is active, then we can infer that A leads to the phosphorylation of B. An additional requirement when making this assumption is that the activity of B should only be dependent on the modified state and not other context - otherwise the inferred Modification is not necessarily warranted.
Parameters: stmts (list[indra.statements.Statement]) – A list of Statements to infer Modifications from. Returns: linked_stmts – A list of LinkedStatements representing the inferred Statements. Return type: list[indra.mechlinker.LinkedStatement]
-
reduce_activities
()[source]¶ Rewrite the activity types referenced in Statements for consistency.
Activity types are reduced to the most specific form whenever possible. For instance, if ‘kinase’ is the only specific activity type known for the BaseAgent of BRAF, its generic ‘activity’ forms are rewritten to ‘kinase’.
-
replace_activations
(linked_stmts=None)[source]¶ Remove RegulateActivity Statements that can be inferred out.
This function iterates over self.statements and looks for RegulateActivity Statements that either match or are refined by inferred RegulateActivity Statements that were linked (provided as the linked_stmts argument). It removes RegulateActivity Statements from self.statements that can be explained by the linked statements.
Parameters: linked_stmts (Optional[list[indra.mechlinker.LinkedStatement]]) – A list of linked statements, optionally passed from outside. If None is passed, the MechLinker runs self.infer_activations to infer RegulateActivities and obtain a list of LinkedStatements that are then used for removing existing Complexes in self.statements.
-
replace_complexes
(linked_stmts=None)[source]¶ Remove Complex Statements that can be inferred out.
This function iterates over self.statements and looks for Complex Statements that either match or are refined by inferred Complex Statements that were linked (provided as the linked_stmts argument). It removes Complex Statements from self.statements that can be explained by the linked statements.
Parameters: linked_stmts (Optional[list[indra.mechlinker.LinkedStatement]]) – A list of linked statements, optionally passed from outside. If None is passed, the MechLinker runs self.infer_complexes to infer Complexes and obtain a list of LinkedStatements that are then used for removing existing Complexes in self.statements.
-
require_active_forms
()[source]¶ Rewrites Statements with Agents’ active forms in active positions.
As an example, the enzyme in a Modification Statement can be expected to be in an active state. Similarly, subjects of RegulateAmount and RegulateActivity Statements can be expected to be in an active form. This function takes the collected active states of Agents in their corresponding BaseAgents and then rewrites other Statements to apply the active Agent states to them.
Returns: new_stmts – A list of Statements which includes the newly rewritten Statements. This list is also set as the internal Statement list of the MechLinker. Return type: list[indra.statements.Statement]
-
Assemblers of model output (indra.assemblers
)¶
Executable PySB models (indra.assemblers.pysb_assembler
)¶
Cytoscape networks (indra.assemblers.cx_assembler
)¶
Natural language (indra.assemblers.english_assembler
)¶
-
class
indra.assemblers.english_assembler.
EnglishAssembler
(stmts=None)[source]¶ This assembler generates English sentences from INDRA Statements.
Parameters: stmts (Optional[list[indra.statements.Statement]]) – A list of INDRA Statements to be added to the assembler. -
statements
¶ list[indra.statements.Statement] – A list of INDRA Statements to assemble.
-
model
¶ str – The assembled sentences as a single string.
-
add_statements
(stmts)[source]¶ Add INDRA Statements to the assembler’s list of statements.
Parameters: stmts (list[indra.statements.Statement]) – A list of indra.statements.Statement
to be added to the statement list of the assembler.
-
Node-edge graphs (indra.assemblers.graph_assembler
)¶
-
class
indra.assemblers.graph_assembler.
GraphAssembler
(stmts=None, graph_properties=None, node_properties=None, edge_properties=None)[source]¶ The Graph assembler assembles INDRA Statements into a Graphviz node-edge graph.
Parameters: - stmts (Optional[list[indra.statements.Statement]]) – A list of INDRA Statements to be added to the assembler’s list of Statements.
- graph_properties (Optional[dict[str: str]]) – A dictionary of graphviz graph properties overriding the default ones.
- node_properties (Optional[dict[str: str]]) – A dictionary of graphviz node properties overriding the default ones.
- edge_properties (Optional[dict[str: str]]) – A dictionary of graphviz edge properties overriding the default ones.
-
statements
¶ list[indra.statements.Statement] – A list of INDRA Statements to be assembled.
-
graph
¶ pygraphviz.AGraph – A pygraphviz graph that is assembled by this assembler.
-
existing_nodes
¶ list[tuple] – The list of nodes (identified by node key tuples) that are already in the graph.
-
existing_edges
¶ list[tuple] – The list of edges (identified by edge key tuples) that are already in the graph.
-
graph_properties
¶ dict[str: str] – A dictionary of graphviz graph properties used for assembly.
-
node_properties
¶ dict[str: str] – A dictionary of graphviz node properties used for assembly.
-
edge_properties
¶ dict[str: str] – A dictionary of graphviz edge properties used for assembly. Note that most edge properties are determined based on the type of the edge by the assembler (e.g. color, arrowhead). These settings cannot be directly controlled through the API.
-
add_statements
(stmts)[source]¶ Add a list of statements to be assembled.
Parameters: stmts (list[indra.statements.Statement]) – A list of INDRA Statements to be appended to the assembler’s list.
-
get_string
()[source]¶ Return the assembled graph as a string.
Returns: graph_string – The assembled graph as a string. Return type: str
SIF / Boolean networks (indra.assemblers.sif_assembler
)¶
-
class
indra.assemblers.sif_assembler.
SifAssembler
(stmts=None)[source]¶ The SIF assembler assembles INDRA Statements into a networkx graph.
This graph can then be exported into SIF (simple ineraction format) or a Boolean network.
Parameters: stmts (Optional[list[indra.statements.Statement]]) – A list of INDRA Statements to be added to the assembler’s list of Statements. -
graph
¶ networkx.DiGraph – A networkx graph that is assembled by this assembler.
-
make_model
(use_name_as_key=False, include_mods=False, include_complexes=False)[source]¶ Assemble the graph from the assembler’s list of INDRA Statements.
Parameters: - use_name_as_key (boolean) – If True, uses the name of the agent as the key to the nodes in the network. If False (default) uses the matches_key() of the agent.
- include_mods (boolean) – If True, adds Modification statements into the graph as directed edges. Default is False.
- include_complexes (boolean) – If True, creates two edges (in both directions) between all pairs of nodes in Complex statements. Default is False.
-
print_boolean_net
(out_file=None)[source]¶ Return a Boolean network from the assembled graph.
See https://github.com/ialbert/booleannet for details about the format used to encode the Boolean rules.
Parameters: out_file (Optional[str]) – A file name in which the Boolean network is saved. Returns: full_str – The string representing the Boolean network. Return type: str
-
print_loopy
(as_url=True)[source]¶ Return
Parameters: out_file (Optional[str]) – A file name in which the Loopy network is saved. Returns: full_str – The string representing the Loopy network. Return type: str
-
MITRE “index cards” (indra.assemblers.index_card_assembler
)¶
-
indra.assemblers.index_card_assembler.
get_is_direct
(stmt)[source]¶ Returns true if there is evidence that the statement is a direct interaction. If any of the evidences associated with the statement indicates a direct interatcion then we assume the interaction is direct. If there is no evidence for the interaction being indirect then we default to direct.
SBGN output (indra.assemblers.sbgn_assembler
)¶
Explanation (indra.explanation
)¶
Check whether a rule-based model satisfies a property (indra.explanation.model_checker
)¶
Tools (indra.tools
)¶
Run assembly components in a pipeline (indra.tools.assemble_corpus
)¶
-
indra.tools.assemble_corpus.
dump_statements
(stmts, fname)[source]¶ Dump a list of statements into a pickle file.
Parameters: fname (str) – The name of the pickle file to dump statements into.
-
indra.tools.assemble_corpus.
dump_stmt_strings
(stmts, fname)[source]¶ Save printed statements in a file.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to save in a text file.
- fname (Optional[str]) – The name of a text file to save the printed statements into.
-
indra.tools.assemble_corpus.
expand_families
(stmts_in, **kwargs)[source]¶ Expand Bioentities Agents to individual genes.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to expand.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of expanded statements.
Return type:
-
indra.tools.assemble_corpus.
filter_belief
(stmts_in, belief_cutoff, **kwargs)[source]¶ Filter to statements with belief above a given cutoff.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- belief_cutoff (float) – Only statements with belief above the belief_cutoff will be returned. Here 0 < belief_cutoff < 1.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_by_type
(stmts_in, stmt_type, **kwargs)[source]¶ Filter to a given statement type.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- stmt_type (indra.statements.Statement) – The class of the statement type to filter for. Example: indra.statements.Modification
- invert (Optional[bool]) – If True, the statements that are not of the given type are returned. Default: False
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_direct
(stmts_in, **kwargs)[source]¶ Filter to statements that are direct interactions
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_enzyme_kinase
(stmts_in, **kwargs)[source]¶ Filter Phosphorylations to ones where the enzyme is a known kinase.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_evidence_source
(stmts_in, source_apis, policy='one', **kwargs)[source]¶ Filter to statements that have evidence from a given set of sources.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- source_apis (list[str]) – A list of sources to filter for. Examples: biopax, bel, reach
- policy (Optional[str]) – If ‘one’, a statement that hase evidence from any of the sources is kept. If ‘all’, only those statements are kept which have evidence from all the input sources specified in source_apis. If ‘none’, only those statements are kept that don’t have evidence from any of the sources specified in source_apis.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_gene_list
(stmts_in, gene_list, policy, allow_families=False, **kwargs)[source]¶ Return statements that contain genes given in a list.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- gene_list (list[str]) – A list of gene symbols to filter for.
- policy (str) – The policy to apply when filtering for the list of genes. “one”: keep statements that contain at least one of the list of genes and possibly others not in the list “all”: keep statements that only contain genes given in the list
- allow_families (Optional[bool]) – Will include statements involving Bioentities families containing one of the genes in the gene list. Default: False
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_genes_only
(stmts_in, **kwargs)[source]¶ Filter to statements containing genes only.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- specific_only (Optional[bool]) – If True, only elementary genes/proteins will be kept and families will be filtered out. If False, families are also included in the output. Default: False
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_grounded_only
(stmts_in, **kwargs)[source]¶ Filter to statements that have grounded agents.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_human_only
(stmts_in, **kwargs)[source]¶ Filter out statements that are not grounded to human genes.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_inconsequential_acts
(stmts_in, whitelist=None, **kwargs)[source]¶ Filter out Activations that modify inconsequential activities
Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific activity types should be preserved, for instance, to be used as readouts in a model. In this case, the given activities can be passed in a whitelist.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- whitelist (Optional[dict]) – A whitelist containing agent activity types which should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of activity types. Example: whitelist = {‘MAP2K1’: [‘kinase’]}
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_inconsequential_mods
(stmts_in, whitelist=None, **kwargs)[source]¶ Filter out Modifications that modify inconsequential sites
Inconsequential here means that the site is not mentioned / tested in any other statement. In some cases specific sites should be preserved, for instance, to be used as readouts in a model. In this case, the given sites can be passed in a whitelist.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- whitelist (Optional[dict]) – A whitelist containing agent modification sites whose modifications should be preserved even if no other statement refers to them. The whitelist parameter is a dictionary in which the key is a gene name and the value is a list of tuples of (modification_type, residue, position). Example: whitelist = {‘MAP2K1’: [(‘phosphorylation’, ‘S’, ‘222’)]}
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_mod_nokinase
(stmts_in, **kwargs)[source]¶ Filter non-phospho Modifications to ones with a non-kinase enzyme.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_mutation_status
(stmts_in, mutations, deletions, **kwargs)[source]¶ Filter statements based on existing mutations/deletions
This filter helps to contextualize a set of statements to a given cell type. Given a list of deleted genes, it removes statements that refer to these genes. It also takes a list of mutations and removes statements that refer to mutations not relevant for the given context.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- mutations (dict) – A dictionary whose keys are gene names, and the values are lists of tuples of the form (residue_from, position, residue_to). Example: mutations = {‘BRAF’: [(‘V’, ‘600’, ‘E’)]}
- deletions (list) – A list of gene names that are deleted.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_no_hypothesis
(stmts_in, **kwargs)[source]¶ Filter to statements that are not marked as hypothesis in epistemics.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_top_level
(stmts_in, **kwargs)[source]¶ Filter to statements that are at the top-level of the hierarchy.
Here top-level statements correspond to most specific ones.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_transcription_factor
(stmts_in, **kwargs)[source]¶ Filter out RegulateAmounts where subject is not a transcription factor.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
filter_uuid_list
(stmts_in, uuids, **kwargs)[source]¶ Filter to Statements corresponding to given UUIDs
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to filter.
- uuids (list[str]) – A list of UUIDs to filter for.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of filtered statements.
Return type:
-
indra.tools.assemble_corpus.
load_statements
(fname, as_dict=False)[source]¶ Load statements from a pickle file.
Parameters: - fname (str) – The name of the pickle file to load statements from.
- as_dict (Optional[bool]) – If True and the pickle file contains a dictionary of statements, it is returned as a dictionary. If False, the statements are always returned in a list. Default: False
Returns: stmts – A list or dict of statements that were loaded.
Return type: list
-
indra.tools.assemble_corpus.
map_grounding
(stmts_in, **kwargs)[source]¶ Map grounding using the GroundingMapper.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to map.
- do_rename (Optional[bool]) – If True, Agents are renamed based on their mapped grounding.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of mapped statements.
Return type:
-
indra.tools.assemble_corpus.
map_sequence
(stmts_in, **kwargs)[source]¶ Map sequences using the SiteMapper.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to map.
- do_methionine_offset (boolean) – Whether to check for off-by-one errors in site position (possibly) attributable to site numbering from mature proteins after cleavage of the initial methionine. If True, checks the reference sequence for a known modification at 1 site position greater than the given one; if there exists such a site, creates the mapping. Default is True.
- do_orthology_mapping (boolean) – Whether to check sequence positions for known modification sites in mouse or rat sequences (based on PhosphoSitePlus data). If a mouse/rat site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
- do_isoform_mapping (boolean) – Whether to check sequence positions for known modifications in other human isoforms of the protein (based on PhosphoSitePlus data). If a site is found that is linked to a site in the human reference sequence, a mapping is created. Default is True.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of mapped statements.
Return type:
-
indra.tools.assemble_corpus.
reduce_activities
(stmts_in, **kwargs)[source]¶ Reduce the activity types in a list of statements
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to reduce activity types in.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of reduced activity statements.
Return type:
-
indra.tools.assemble_corpus.
run_preassembly
(stmts_in, **kwargs)[source]¶ Run preassembly on a list of statements.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements to preassemble.
- return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True
- poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
- size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
- save_unique (Optional[str]) – The name of a pickle file to save the unique statements into.
Returns: stmts_out – A list of preassembled top-level statements.
Return type:
-
indra.tools.assemble_corpus.
run_preassembly_duplicate
(preassembler, beliefengine, **kwargs)[source]¶ Run deduplication stage of preassembly on a list of statements.
Parameters: - preassembler (indra.preassembler.Preassembler) – A Preassembler instance
- beliefengine (indra.belief.BeliefEngine) – A BeliefEngine instance
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of unique statements.
Return type:
Run related stage of preassembly on a list of statements.
Parameters: - preassembler (indra.preassembler.Preassembler) – A Preassembler instance which already has a set of unique statements internally.
- beliefengine (indra.belief.BeliefEngine) – A BeliefEngine instance
- return_toplevel (Optional[bool]) – If True, only the top-level statements are returned. If False, all statements are returned irrespective of level of specificity. Default: True
- poolsize (Optional[int]) – The number of worker processes to use to parallelize the comparisons performed by the function. If None (default), no parallelization is performed. NOTE: Parallelization is only available on Python 3.4 and above.
- size_cutoff (Optional[int]) – Groups with size_cutoff or more statements are sent to worker processes, while smaller groups are compared in the parent process. Default value is 100. Not relevant when parallelization is not used.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of preassembled top-level statements.
Return type:
-
indra.tools.assemble_corpus.
strip_agent_context
(stmts_in, **kwargs)[source]¶ Strip any context on agents within each statement.
Parameters: - stmts_in (list[indra.statements.Statement]) – A list of statements whose agent context should be stripped.
- save (Optional[str]) – The name of a pickle file to save the results (stmts_out) into.
Returns: stmts_out – A list of stripped statements.
Return type:
Build a network from a gene list (indra.tools.gene_network
)¶
-
class
indra.tools.gene_network.
GeneNetwork
(gene_list, basename=None)[source]¶ Build a set of INDRA statements for a given gene list from databases.
Parameters: - gene_list (string) – List of gene names.
- basename (string or None (default)) – Filename prefix to be used for caching of intermediates (Biopax OWL file, pickled statement lists, etc.). If None, no results are cached and no cached files are used.
-
gene_list
¶ string – List of gene names
-
basename
¶ string or None – Filename prefix for cached intermediates, or None if no cached used.
-
results
¶ dict – Dict containing results of preassembly (see return type for
run_preassembly()
.
-
get_bel_stmts
(filter=False)[source]¶ Get relevant statements from the BEL large corpus.
Performs a series of neighborhood queries and then takes the union of all the statements. Because the query process can take a long time for large gene lists, the resulting list of statements are cached in a pickle file with the filename <basename>_bel_stmts.pkl. If the pickle file is present, it is used by default; if not present, the queries are performed and the results are cached.
Parameters: filter (bool) – If True, includes only those statements that exclusively mention genes in gene_list
. Default is False. Note that the full (unfiltered) set of statements are cached.Returns: List of INDRA statements extracted from the BEL large corpus. Return type: list of indra.statements.Statement
-
get_biopax_stmts
(filter=False, query='pathsbetween')[source]¶ Get relevant statements from Pathway Commons.
Performs a “paths between” query for the genes in
gene_list
and uses the results to build statements. This function caches two files: the list of statements built from the query, which is cached in <basename>_biopax_stmts.pkl, and the OWL file returned by the Pathway Commons Web API, which is cached in <basename>_pc_pathsbetween.owl. If these cached files are found, then the results are returned based on the cached file and Pathway Commons is not queried again.Parameters: - filter (bool) – If True, includes only those statements that exclusively mention
genes in
gene_list
. Default is False. - query (str) – Defined what type of query is executed. The two options are ‘pathsbetween’ which finds paths between the given list of genes and only works if more than 1 gene is given, and ‘neighborhood’ which searches the immediate neighborhood of each given gene.
Returns: List of INDRA statements extracted from Pathway Commons.
Return type: list of
indra.statements.Statement
- filter (bool) – If True, includes only those statements that exclusively mention
genes in
-
get_statements
(filter=False)[source]¶ Return the combined list of statements from BEL and Pathway Commons.
Internally calls
get_biopax_stmts()
andget_bel_stmts()
.Parameters: filter (bool) – If True, includes only those statements that exclusively mention genes in gene_list
. Default is False.Returns: List of INDRA statements extracted the BEL large corpus and Pathway Commons. Return type: list of indra.statements.Statement
-
run_preassembly
(stmts, print_summary=True)[source]¶ Run complete preassembly procedure on the given statements.
Results are returned as a dict and stored in the attribute
results
. They are also saved in the pickle file <basename>_results.pkl.Parameters: - stmts (list of
indra.statements.Statement
) – Statements to preassemble. - print_summary (bool) – If True (default), prints a summary of the preassembly process to the console.
Returns: A dict containing the following entries:
- raw: the starting set of statements before preassembly.
- duplicates1: statements after initial de-duplication.
- valid: statements found to have valid modification sites.
- mapped: mapped statements (list of
indra.preassembler.sitemapper.MappedStatement
). - mapped_stmts: combined list of valid statements and statements after mapping.
- duplicates2: statements resulting from de-duplication of the statements in mapped_stmts.
- related2: top-level statements after combining the statements in duplicates2.
Return type: dict
- stmts (list of
Build an executable model from a fragment of a large network (indra.tools.executable_subnetwork
)¶
Build a model incrementally over time (indra.tools.incremental_model
)¶
-
class
indra.tools.incremental_model.
IncrementalModel
(model_fname=None)[source]¶ Assemble a model incrementally by iteratively adding new Statements.
Parameters: model_fname (Optional[str]) – The name of the pickle file in which a set of INDRA Statements are stored in a dict keyed by PubMed IDs. This is the state of an IncrementalModel that is loaded upon instantiation. -
stmts
¶ dict[str, list[indra.statements.Statement]] – A dictionary of INDRA Statements keyed by PMIDs that stores the current state of the IncrementalModel.
-
assembled_stmts
¶ list[indra.statements.Statement] – A list of INDRA Statements after assembly.
-
add_statements
(pmid, stmts)[source]¶ Add INDRA Statements to the incremental model indexed by PMID.
Parameters: - pmid (str) – The PMID of the paper from which statements were extracted.
- stmts (list[indra.statements.Statement]) – A list of INDRA Statements to be added to the model.
-
get_model_agents
()[source]¶ Return a list of all Agents from all Statements.
Returns: agents – A list of Agents that are in the model. Return type: list[indra.statements.Agent]
-
get_statements
()[source]¶ Return a list of all Statements in a single list.
Returns: stmts – A list of all the INDRA Statements in the model. Return type: list[indra.statements.Statement]
-
get_statements_noprior
()[source]¶ Return a list of all non-prior Statements in a single list.
Returns: stmts – A list of all the INDRA Statements in the model (excluding the prior). Return type: list[indra.statements.Statement]
-
get_statements_prior
()[source]¶ Return a list of all prior Statements in a single list.
Returns: stmts – A list of all the INDRA Statements in the prior. Return type: list[indra.statements.Statement]
-
load_prior
(prior_fname)[source]¶ Load a set of prior statements from a pickle file.
The prior statements have a special key in the stmts dictionary called “prior”.
Parameters: prior_fname (str) – The name of the pickle file containing the prior Statements.
-
preassemble
(filters=None)[source]¶ Preassemble the Statements collected in the model.
Use INDRA’s GroundingMapper, Preassembler and BeliefEngine on the IncrementalModel and save the unique statements and the top level statements in class attributes.
Currently the following filter options are implemented: - grounding: require that all Agents in statements are grounded - human_only: require that all proteins are human proteins - prior_one: require that at least one Agent is in the prior model - prior_all: require that all Agents are in the prior model
Parameters: filters (Optional[list[str]]) – A list of filter options to apply when choosing the statements. See description above for more details. Default: None
-
High-throughput reading tools (indra.tools.reading
)¶
Scoring INDRA Statements manually (indra.tools.stmt_scoring
)¶
Generate English language questions on linked mechanisms (indra.tools.mechlinker_queries
)¶
Tutorials¶
Using natural language to build models¶
In this tutorial we build a simple model using natural language, then contextualize and parameterize it, and export it into different formats.
Read INDRA Statements from a natural language string¶
First we import INDRA’s API to the TRIPS reading system. We then define a block of text which serves as the description of the mechanism to be modeled in the model_text variable. Finally, indra.sources.trips.process_text is called which sends a request to the TRIPS web service, gets a response and processes the extraction knowledge base to obtain a list of INDRA Statements
In [1]: from indra.sources import trips
In [2]: model_text = 'MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1.'
In [3]: tp = trips.process_text(model_text)
At this point tp.statements should contain 2 INDRA Statements: a Phosphorylation Statement and a Dephosphorylation Statement. Note that the evidence sentence for each Statement is propagated:
In [4]: for st in tp.statements:
...: print('%s with evidence "%s"' % (st, st.evidence[0].text))
...:
Phosphorylation(MAP2K1(), MAPK1()) with evidence "MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1."
Dephosphorylation(DUSP6(), MAPK1()) with evidence "MAP2K1 phosphorylates MAPK1 and DUSP6 dephosphorylates MAPK1."
Assemble the INDRA Statements into a rule-based executable model¶
We next use INDRA’s PySB Assembler to automatically assemble a rule-based model representing the biochemical mechanisms described in model_text. First a PysbAssembler object is instantiated, then the list of INDRA Statements is added to the assembler. Finally, the assembler’s make_model method is called which assembles the model and returns it, while also storing it in pa.model. Notice that we are using policies=’two_step’ as an argument of make_model. This directs the assemble to use rules in which enzymatic catalysis is modeled as a two-step process in which enzyme and substrate first reversibly bind and the enzyme-substrate complex produces and releases a product irreversibly.
In [5]: from indra.assemblers.pysb_assembler import PysbAssembler
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-5-18a38341912b> in <module>()
----> 1 from indra.assemblers.pysb_assembler import PysbAssembler
~/checkouts/readthedocs.org/user_builds/indra/checkouts/docstrings/indra/assemblers/pysb_assembler.py in <module>()
13
14 from indra import statements as ist
---> 15 from indra.databases import context_client, get_identifiers_url
16 from indra.preassembler.hierarchy_manager import entity_hierarchy as enth
17 from indra.tools.expand_families import _agent_from_uri
~/checkouts/readthedocs.org/user_builds/indra/checkouts/docstrings/indra/databases/context_client.py in <module>()
2 from builtins import dict, str
3 from copy import copy
----> 4 from indra.databases import cbio_client
5 # Python 2
6 try:
~/checkouts/readthedocs.org/user_builds/indra/checkouts/docstrings/indra/databases/cbio_client.py in <module>()
1 from __future__ import absolute_import, print_function, unicode_literals
2 from builtins import dict, str
----> 3 import pandas
4 import logging
5 import requests
ImportError: No module named 'pandas'
In [6]: pa = PysbAssembler()