MINT a Molecular INTeraction database - (session: Database:
Ontology and Integration)
Luisa Montecchi-Palazzi, Andrea Cabibbo, Andreas Zanzoni,
Manuela Helmer-Citterich, Gianni Cesareni
Universita' Tor Vergata
MINT is
a public protein interaction database focused on collection of experimentally
verified data disseminated in the scientific literature. MINT entries
are extracted by expert curators assisted by "MINT Assistant", a
software that targets abstracts containing interaction information and
presents them to the curator in a user friendly format. Furthermore
MINT aims at being exhaustive in the description of the interaction
and, whenever available, information about kinetic and binding constants
and about the domains participating in the interaction are included
in the entry. All information is collected in a computer readable form
and stored on a web accessible database where interaction data can
be easily extracted and viewed graphically through "MINT Viewer". Presently
MINT contains 2098 manually curated interactions, 1516 of which are
interactions among mammalian proteins. MINT is accessible at http://cbm.bio.uniroma2.it/mint/.
To facilitate the inclusion in MINT of unpublished interaction data,
we are starting a new online peer-reviewed journal specifically aimed
at the publication of rigorously documented molecular interactions not
suitable for standalone publication on other journals. The "MINT journal
for Molecular Interactions" will also publish focused reviews on selected
molecular interaction networks or pathways. A preliminary version of
the MINT journal is available
online.
Integration
of data from different sources: a prototype devoted to p53 mutations
- (session: Database: Ontology and Integration)
A. Mucci°, A. Cusmano°, M. De Francisci^,
M. A. Manniello^, D. Marra^, P. Romano^, G. Mauri°
°Università di Milano Bicocca, Dipartimento di Informatica,
Sistemistica e Comunicazione - DISCO, Milano
^Istituto Nazionale per la Ricerca sul Cancro - IST, Genova
Oncology Over Internet is a project devoted to integration of
data from different sources of oncology interest. The main focus of the
project is on the software architecture, but important improvement for
in silico biology research and for some clinical investigations are foreseen.
A prototype has been developed in order to test different technical
solutions to access and integration issues and verify the overall feasibility
of the system. The prototype is focussed on the database of mutations
of the TP53 gene that is maintained by the International Agency for Research
on Cancer (IARC). The TP53 gene expresses the p53 protein which has
some important and well known influences in the control of cancer at
the very preliminar steps and in the elimination of mutated cells. The
database of the TP53 mutations has some implicit nand explicit links
with some molecular and cellular biology databases, such as sequence databases,
literature databanks and human cell lines catalogues.
The prototype includes the user interface, the Java based search
engine, that is in charge of carrying out of the queries and of gathering
the information, a knowledge base and a database where the data retrieved
from the various information sources is stored.
The user can ask for the execution of a query by submitting the
proper parameters through an online form to the main server. The search
engine will query every single database invlved in the query, after checking
the contents of the knowledge base; the results will finally be returned
to the user.
The main application is structured in several blocks, each of
which have a specific function:
· Query the knowledge base
· Preparation of the results' table
· Selection of the involved databases
· Preparation of the query
· Analysis of the sites hosting involved
databases
· Query of the involved databases
· Gathering and restructuring of the
results
· Displaying of results
The prototype is based on the knowledge base, whose goals are:
1) to select the information sources, 2) to carry out queries, 3) to extract
the required information and 4) to integrate data.
The prototype has been developed by using open source softwares
and products: MySQL database management system, Apache Tomcat web server
and Java programming language. At present, the prototype is under beta
test.
REELIN
IS A HEPARIN BINDING PROTEIN: IN VITRO TESTING AND IN SILICO ANALYSIS
- (session: Structural Genomics)
Roger Panteri1*, Alessandro Paiardini2*, Ramona Marino1 , Stefano
Pascarella2,3,4 , Gabriella D’Arcangelo5 and Flavio Keller1
1Laboratory of Developmental Neuroscience, Università
“Campus Bio-Medico”, Rome, Italy 2Dipartimento di Scienze Biochimiche
“A. Rossi Fanelli” and Centro di Biologia Molecolare del Consiglio Nazionale
delle Ricerche, , 3Centro Interdipartimentale di Ricerca per la Analisi
dei Modelli e dell’Informazione nei Sistemi Biomedici (CISB), 4Centro
di Eccellenza di Biologia e Medicina Molecolare (BEMM), Università
degli Studi di Roma “La Sapienza”, Rome, Italy, and
5 The Cain Foundation Laboratories, Department of Pediatrics, Division of
Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA
Reelin is a large molecule of the extracellular matrix (ECM)
which regulates neuronal positioning during the early stages of cortical
development in vertebrate species1,2,3. The localization of Reelin in
the ECM, its modular assembly and its role in the regulation of neuronal
migration led us to suppose a function for its modules in binding to polysaccharides
commonly found on proteoglycans of the ECM, similar to that observed for
the repeat modules of Laminins and Thrombospondins. We investigated whether
Reelin could interact with the polysaccharide heparin using an affinity
chromatography approach followed by immuno-blot analysis. The results obtained
indicate an important specific interaction between Reelin and the heteropolysaccharide
heparin; moreover the data support the involvement of the Reelin subrepeats
in the binding. Further bioinformatic analysis and three-dimensional modeling
of the Reelin subrepeat regions confirm the presence of structural features
common to polysaccharide binding modules, like an ASP-BNR hairpin loop,
large aromatic residues and a series of basic arginine residues, located
on the surface cleft of the 3D model of a Reelin subrepeat, and potentially
involved in the binding to polysaccharides. These findings provide new
insights into the structural organisation of Reelin and novel hypothesis
concerning the molecular function of this large ECM molecule, that could
be tested experimentally. Finally, this work points to new directions
in the research of therapeutic compounds that can modulate the activity
of Reelin, given the importance of this protein in several human neurodevelopmental
disorders.
1) D'Arcangelo, G., Miao, G.G., Chen, S.C., Soares, H.D.,
Morgan, J.I., and Curran, T. 1995. A protein related to extracellular
matrix proteins deleted in the mouse mutant reeler. Nature 374: 719-723.
2) Quattrocchi, C.C., Wannenes, F., Persico, A.M., Ciafrè,
S.A., D'Arcangelo, G., Farace, M.G., and Keller, F. 2002. Reelin is a
serine protease of the extracellular matrix. J. Biol. Chem. 277: 303-309.
3) Rice, D.S., and Curran, T. 2001. Role of the Reelin signalling
pathway in central nervous system development. Annu. Rev. Neurosci. 24:
1005-1039
Structural
model for Gas1p family members by combined threading and
secondary structure prediction methods - (session: Other)
Elena Papaleo, Gianluca Santarossa, Marina Vai, Piercarlo
Fantucci, Luca De Gioia
Università di Milano Bicocca - Dipartimento
di Biotecnologie e Bioscienze
The Gas1p is a S.Cerevisiae membrane glycoprotein that
plays a key role in cell wall assembly [1], and belongs to the
Gas1p family 72 of b-1,3 glucanases. Several others family members
were isolated from S.Cerevisiae and from Candida species, S.Pombe
and other fungal organisms.
In particular, five gas genes were present in S.Cerevisiae
coding for different Gas enzymes, each characterized by a different
modular organization of domains.
The catalytic domain (C-domain) is the most conserved
among all members of the family and its structural features are particularly
relevant to investigate structure-function relationships in this
class of enzymes.
Aim of this work was the prediction of the 3D structure
of this domain and the comparision of C-domains of different members
of the Gas1p family. Due to the unavailability of a 3D structure
template suited for homology model construction, we combined threading
methods [2] and secondary structure predictions to derive 3D models
of some Gas1p family members.
Base on this analysis we propose that the C-domain
assumes a TIM-barrel fold and that the portion of the active site
residues in our models are compatible with the catalytic characteristic
proposed for GHA clan members [3] and we conduct a detailed analysis
and comparision of the structural features of C-domains of some
of the different members of Gas1p family. SOLO POSTER
1. Popolo L., Vai M.,
The Gas1 glycoprotein, a putative wall polymer cross-linker, Biochim.
Biophys. Acta. 1999;1426(2):385-400.
2. Jones D. , Thornton J., Protein
fold recognition, J. Comput. Aided. Mol.Des., 1993, 4: 439-456.
3. Henrissat B., Callebaut I., Fabrega
S., Lehn P., Mornon J.P., Davies G., Conserved catalytic machinery
and the prediction of a common fold for several families of glycosyl
hydrolases, PNAS, 1996, 93(11):5674
An
Algorithm for Finding Common Secondary Structure Motifs in a Set
of Unaligned RNA Sequences - (session: Novel Algorithms for Bioinformatics)
Giulio Pavesi, Giancarlo Mauri, Graziano Pesole
Università Milano Bicocca
We present an algorithm for finding conserved secondary
structure motifs in a set of RNA sequences, that is, secondary
structure elements that appear in all or most of the secondary
structures formed by the sequences of the set.
Differently from the methods introduced so far for this
problem, the approach we present does not compute an alignment
of the sequences beforehand, nor takes into account sequence similarity
in any way, but looks directly for structural similarities. Thus,
it can be applied also to cases when RNA sequences do not present
significant similarity in their nucleotide sequence. The algorithm
takes as input the secondary structure of the sequences, exhaustively
enumerates all pattern representing feasible secondary structure
elements up to a maximum size (that can equal the length of the sequences),
searches for each one in the structures, and finally reports those
patterns that appear in all or most of the sequences of the set.
Occurrences of patterns can be approximate, that is,
can differ in the size of a stem, of an internal loop, in the presence
or not of a bulge, and so on: the type and degree of approximation
can be chosen at freedom by the user.
The input structures can be either determined experimentally,
or predicted by one of the existing methods. In the latter case,
we show how the algorithm can deal with the uncertainty deriving
from predictions, by considering different alternative secondary
structures for each sequence.
Experiments have shown that the algorithm, coupled with
existing secondary structure prediction methods, is able to discover
efficiently known RNA structural motifs, such as histone and IRE
stem-loop motifs in RNA untranslated regions, as well as structural
motifs shared by the members of different virus families.
Computational
analysis of non-coding regions in eukaryotic genomes -
(session: Comparative Genomics and Molecular Evolution)
E. Pizzi, E. Bultrini, P. Del Giudice, C. Frontali
Istituto Superiore di Sanità, Roma
Genome sequencing projects determine a large amount
of sequence data each year. One of the major challenges for computational
biologists is to extract relevant biological information from billions
of Megabases that have been stored in the databases so far. Whereas,
in the last years, many efforts have been devoted to locate genes within
genomes, relatively few tools have been developed to identify the regulatory
regions required for the correct transcriptional activity of the genome.
This task is particularly difficult in the case of eukaryotic organisms
for which regulatory regions represent a small percentage overwhelmed
by, presumably, non-functional DNA. Recently, several computational procedures
are emerging to solve this problem, including knowledge-based methods,
comparative genomics analysis as well as methods based on statistical-compositional
properties of genomes.
By using recurrence quantitative analysis we were able
to show that in some eukaryotic genomes, introns and intergenic tracts
exhibit highly recurrent patterns with correlated properties distinguishing
them from the low-recurrence regime present in exons. This observation
was explained assuming a peculiar oligonucleotide usage in non-coding
DNA and significant different in protein-coding regions. In order to
characterise this oligonucleotide usage, we applied principal component
analysis on pentamer distribution of experimentally introns and exons
from C.elegans and D. melanogaster genomes. We found a subset of pentamers
that significantly discriminate introns from their randomised counterparts
and from exons. A genome-wide analysis of pentamer usage revealed that
most introns and intergenic tracts utilize the identified subset of
pentamers, whereas exons and a small percentage of non-coding fraction
do not.
Our hypothesis is that genome pentamer-usage could be
reviewed as a sort of genome background noise and hence functional
sequences might emerge as regions having different compositional
properties. In order to test our hypothesis, we analysed the 5‚ upstream
regions of more than 100 members of a multigene family from P.falciparum
genome. We identified four regions, within 1 kb, with an anomalous
oligonucleotide-usage; we compared our results with those obtained
through a multiple alignment performed on the same sequences.
The overall compositional property could be reviewed
as a sort of genome background. Regulatory elements might take place
within regions that adopt a different oligos usage.
Annotation
of EST sequences by a structural bioinformatics approach - (session:
Other)
L. Pugliese
via Don Grioli, 4 -Torino
In the past few years the complete sequences of more than
55 genomes have been published and at least 100 more are known
to be near completion. One challenge of the genome era is to predict
molecular functions and biological roles for the predicted gene product.
Most approaches for the tentative assignment of functions
to predict proteins are based on pairwise sequence similarity
searches against known proteins using sequence comparison programs
such as FASTA and BLAST. However many proteins are multifunctional
multidomain proteins for which the assignment of a single function
results in loss of information. Also with more predicted proteins
from genome projects being added to the databases, the best hit in pairwise
sequence similarity searches is frequently a poorly annotated protein.
To overcome limitations of functional
annotation based on pairwise sequence similarity searches, the addition
of knowledge of the three dimensional structure of domains gains
more and more importance. In this view the application of
fold recognition methods coupled to homology model building and theoretical
structure verification methods represents a way to get a lot of information
in a short time.
The protocol applied in order to assign a function
to an EST sequence involves the following steps:
1. Submission of the sequence to a fold recognition/structure
prediction metaserver.
2. 3D alignment of sequences relative to templates
receiving good scores from the metaserver.
3. Homology model building using as template the pdb
files having the best scores within the metaserver output.
4. Evaluation of the models obtained by the program
ProsaII and the server http://atlas.physbio.mssm.edu:8084/servers/pg/
5. Analysis of the literature concerning the template
structures in order to extract information on the function of the
new sequence.
Detection
and analysis of spliced chimeric mRNAs in sequence databanks
- (session: Novel Algorithms for Bioinformatics)
Antonello Romani, Marco Trerotola, Emanuela Guerra,
Andrew Emerson, Elda Rossi, Agnieszka Bronowska and Saverio Alberti
Mario Negri Sud
A databank screening procedure, the In Silico Trans-splicing
Retrieval System (ISTReS), was developed to identify chimeric mRNAs
originating from chromosomal translocations, mRNA trans-splicing
and multi-locus transcription. A parsing algorithm to screen cDNA-vs-genome
Blast outputs was implemented. Key filtering criteria were Blast
scores of >= 300, match lengths of >= 95% of the query sequences,
junction of the two partners at exon-exon borders and concordant Œsense
/ sense‚ reading orientation. ISTReS was validated by the successful identification
of bona fide chromosomal translocation-derived fusion transcripts
in the HGI and RefSeq databanks. The performance of ISTReS was verified
against recently identified chimeric antisense transcripts. Analysis
of the UNIGENE database revealed 21742 chimeric sequences overall,
that correspond to ~ 1% of the database transcripts. Novel FOP-Rho GAP
and methionyl tRNA synthetase-advillin chimeric mRNAs with the canonical
features of trans-spliced-transcripts were identified among 246 chimeras
from the RefSeq databank. This suggests a frequency of canonically-spliced
chimeras of approximately 1% of all the hybrid sequences in current
databanks. These findings demonstrate the efficiency of ISTReS and the
overall feasibility of sequence/structure-based strategies to search
for chimeric mRNAs candidate to derive from the splicing of heterologous
transcripts.
Analysis
of p63 isoform-driven gene expression: a cDNA array/bioinformatics
integrated approach - (session: Structural Genomics)
S. Saviozzi, M. Lo Iacono, F. Lanzarato,
G. Franceschini, G. La Mantia, V. Calabrò and R.A. Calogero