Connected in a small world: Rapid integration of heterogenous biology resources
View/ Open
Date
2006Author
Park, Sang P.
Song, Carol X.
Topkara, Umut
Woo, Jungha
Metadata
Show full item recordAbstract
Timely access to the most up to date versions of
resources, such as data and software, is of paramount importance
for researchers in an active field like Biology. We introduce
a grid enabled biological data and software collection portal
architecture, SALSA (a Scalable Simple Architecture), that is
tailored towards fast integration of new computational resources
made available by ever faster advancing and diversifying research
in this area.
We identify two models that guide the design of SALSA:
heterogeneous database model and network growth model with
preferential attachment.
SALSA recognizes the challenges that are noted by the
previous research on heterogeneous database model inherent in
biological database resources; these resources are autonomously
managed and lack a common database schema.
SALSA is also guided by a model for the growth of the
portal’s collection (of data and associated software to process
this data) from previous research on related collections (e.g.
citation networks and software package dependencies). This
model suggests that in the presence of components that have
a higher likelihood of gaining new connections (e.g., popular
resources such as BLAST or FASTA sequences), the relationships
between components tend to organize in a small-world scale-free
network.
The growth model helps the portal developers identify important
hub components that emerge by taking part in increasing
number of tasks as the portal grows. In order to
effectively improve the overall user experience, developers can
direct expensive development efforts (e.g., query optimization,
user interface, documentation, etc.) to hub components, rather
than to specialized components that have a lesser likelihood of
developing to become hubs.
In this paper we discuss a grid enabled web portal implementation
that is built to contain a growing collection of biological data
and software to process this data. The implementation that we
present is a realization of Scalable Simple Architecture (SALSA)
that strives to rapidly integrate newly published components
into the existing collection in a sustainable fashion. Notably,
this implementation uses flexibility of XML for component
management, XSL for web user interface, SRB and MCAT for
large data storage.
The following license files are associated with this item: