The ExTRI Resource

Utility

What can be done with ExTRI?

The large set of 52,862 high confidence TRI abstract sentences provided by ExTRI offer a wealth of direct pointers to a potentially wide set of functional aspects of the 18,437 TF-TG interactions stated therein, including:

Asset 1
  • regulatory sign (activation/repression)
  • biological context (e.g., cell- or tissue type and state)
  • experimental evidence
  • confidence *

This information can be directly used in curation processes and may for many knowledge management purposes be sufficiently evidenced through the sentence, thus saving the user the time consuming work involved in reading a larger part of the text.

The ExTRI-resource can be accessed via BioGateway (either directly through SPARQL or via the Cytoscape BioGateway App) and through PSICQUIC searches. Access through the Biogateway App will allow users easy access to the full abstract from which the sentence is extracted, both by means of a ‘landing page’ showing the TRI as a highlighted sentence or, by following links on the landing page, as a premarked sentence in the Europe PMC SciLite App (see example). 

* The presence of several abstracts mentioning the TRI, and specially the occurrence of the same TRI several times in the same abstract, were good indicators of high confidence, while having a large number of potential TRIs coming from the same sentence was an indicator of low confidence. See also Supplementary document 5, of the ExTRI publication.

Generalities

What type of information does ExTRI contains?

Can be used as a resource to guide curation, to examine the provenance of each reported TRI, or to learn some functional details about a particular TRI.

Takes all ExTRI sentences and cross-references the TRI with entries identified with other databases, allowing the user to see additional evidence for each pair found in each sentence.

Presents TRI coverage across all databases, including the additional information provided by each database.

Content of ExTRI corpus

ExTRI

All

High Conf.

TRIs

40.453

18.437

TFs

991

865

TGs

5.592

3.848

TRI-Sentences

94.185

52.862

Unique sentences

58.710

36.276

Abstracts

33.776

21.772

 

TRIs – Transcription regulation interactions.

TFs – Specific DNA binding transcription factors.

TGs – Target genes.

TRI-sentences – Abstract sentences identified to contain TRIs.

Abstracts – abstracts found to contain sentence(s) with TRI.

Since some sentences might support several TRIs, the number of unique sentences is lower.

Community curation

We are pursuing efforts to allow users to engage in a community curation effort and check the validity of a TRI through SciLite or other curation Apps. In a regulatory network building effort, a Cytoscape user may then actively contribute to the validation of ExTRI and possibly other information obtained through text mining efforts (see also https://vsm.github.io/)

TRI integrated resource

For the sake of increased coverage, the ExTRI corpus has been integrated together with TF-TG relationships obtained from GOA, IntAct, TRRUST, CytReg, GEREDB, SIGNOR, HTRIdb and TFactS (collection date: December 2020).

The compiled resource is available as:

The integrated resource has ~50.000 TRIs, with >31.000 high confidence TRIs.

Databases used for the compiled resource of TF-TG interactions

Database
Content extracted for compilation
Reference

TFactS

all (human, mouse, rat) 

Essaghir, 2010

HTRIdb

all (human)

Bovolenta, 2012

IntAct

subset: protein-gene interactions (human, mouse, rat) 

Kerrien, 2012

GOA

subset: protein-gene regulatory interactions (human, mouse, rat)

Huntley, 2015

TRRUST

all (human, mouse)

Han, 2015; Han 2018

SIGNOR

subset: interactions labelled with interaction mechanism ‘transcriptional regulation’ (human, mouse, rat) 

Perfetto, 2016

CytReg

 all (human, mouse)

Carrasco Pro, 2019

GEREDB

subset: interactions with regulator TFClass TF (human)

Huang, 2019

 The table indicates whether all interactions or subsets of them were included.