The ExTRI Resource
Utility
What can be done with ExTRI?
The large set of 52,862 high confidence TRI abstract sentences provided by ExTRI offer a wealth of direct pointers to a potentially wide set of functional aspects of the 18,437 TF-TG interactions stated therein, including:
- regulatory sign (activation/repression)
- biological context (e.g., cell- or tissue type and state)
- experimental evidence
- confidence *
This information can be directly used in curation processes and may for many knowledge management purposes be sufficiently evidenced through the sentence, thus saving the user the time consuming work involved in reading a larger part of the text.
The ExTRI-resource can be accessed via BioGateway (either directly through SPARQL or via the Cytoscape BioGateway App) and through PSICQUIC searches. Access through the Biogateway App will allow users easy access to the full abstract from which the sentence is extracted, both by means of a ‘landing page’ showing the TRI as a highlighted sentence or, by following links on the landing page, as a premarked sentence in the Europe PMC SciLite App (see example).
* The presence of several abstracts mentioning the TRI, and specially the occurrence of the same TRI several times in the same abstract, were good indicators of high confidence, while having a large number of potential TRIs coming from the same sentence was an indicator of low confidence. See also Supplementary document 5, of the ExTRI publication.
Generalities
What type of information does ExTRI contains?
Can be used as a resource to guide curation, to examine the provenance of each reported TRI, or to learn some functional details about a particular TRI.
Takes all ExTRI sentences and cross-references the TRI with entries identified with other databases, allowing the user to see additional evidence for each pair found in each sentence.
Presents TRI coverage across all databases, including the additional information provided by each database.
Content of ExTRI corpus
ExTRI |
All |
High Conf. |
TRIs |
40.453 |
18.437 |
TFs |
991 |
865 |
TGs |
5.592 |
3.848 |
TRI-Sentences |
94.185 |
52.862 |
Unique sentences |
58.710 |
36.276 |
Abstracts |
33.776 |
21.772 |
TRIs – Transcription regulation interactions.
TFs – Specific DNA binding transcription factors.
TGs – Target genes.
TRI-sentences – Abstract sentences identified to contain TRIs.
Abstracts – abstracts found to contain sentence(s) with TRI.
Since some sentences might support several TRIs, the number of unique sentences is lower.
Community curation
We are pursuing efforts to allow users to engage in a community curation effort and check the validity of a TRI through SciLite or other curation Apps. In a regulatory network building effort, a Cytoscape user may then actively contribute to the validation of ExTRI and possibly other information obtained through text mining efforts (see also https://vsm.github.io/)
TRI integrated resource
For the sake of increased coverage, the ExTRI corpus has been integrated together with TF-TG relationships obtained from GOA, IntAct, TRRUST, CytReg, GEREDB, SIGNOR, HTRIdb and TFactS (collection date: December 2020).
The compiled resource is available as:
- Supplementary_Table_2.TF-TG_pairs.xlsx, on the download page
- The tfact2gene graph in BioGateway, see https://www.biogateway.eu/tfact2gene/
The integrated resource has ~50.000 TRIs, with >31.000 high confidence TRIs.
Databases used for the compiled resource of TF-TG interactions
Database |
Content extracted for compilation |
Reference |
all (human, mouse, rat) |
||
all (human) |
||
subset: protein-gene interactions (human, mouse, rat) |
||
subset: protein-gene regulatory interactions (human, mouse, rat) |
||
all (human, mouse) |
||
subset: interactions labelled with interaction mechanism ‘transcriptional regulation’ (human, mouse, rat) |
||
all (human, mouse) |
||
subset: interactions with regulator TFClass TF (human) |
The table indicates whether all interactions or subsets of them were included.