JCSG Technologies
Our high throughput Structural Genomic Pipeline has to date delivered more that 1500 structures to the community. Targets are processed through an extensive combination of bioinformatics and biophysical analyses to efficiently characterize and optimize each target prior to selection for structure determination, and uses parallel processing methods at almost every step in the process. The pipeline can adapt to a wide range of targets from bacteria to human, including challenging targets, such as eukaryotic proteins, protein-protein and other macromolecular complexes. Processing such large numbers of targets and enormous amounts of associated data through the multiple stages of our experimental pipeline has resulted in development of innovative methods and tools at strategic stages in our gene-to-structure platform and led to functional characterization of countless targets. These resources, when feasible, have been converted to free-access, web-based tools and applications that include XtalPred, Structure Validation and Ligand Database servers and the TOPSAN annotation portal (www.topsan.org). We believe that these resources are of high value to the general scientific community and we welcome feedback and comments. From the onset the JCSG has been committed to the development of new technologies and methodologies that facilitate high throughput structural biology and push frontiers of structural genomics. The areas of development include hardware, software, new experimental methods, and adaptation of existing technologies to advance genome research. In the hardware arena, our commitment is to the development of technologies that accelerate structure solution by increasing throughput rates at every stage of the production pipeline. Therefore, one major area of hardware development has been the implementation of robotics. In the software arena, we have developed enterprise resource software that track success, failures, and sample histories from target selection to PDB deposition, annotation and target management tools, and helper applications aimed at facilitating and automating multiple steps in the pipeline. For more info, click on the individual components of the pipeline.TARGET SELECTION [back to Index]
Publication: Lukasz Jaroszewski, Lukasz Slabinski, John Wooley, Ashley M. Deacon, Scott A. Lesley, Ian A. Wilson and Adam Godzik, "Genome Pool Strategy for Structural Coverage of Protein Families ", Structure 11, 1659-1667 (2008). Pubmed:19000818
WebSite: http://ffas.burnham.org/XtalPred-cgi/xtal.pl Publication: Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, & Godzik A. , "XtalPred: a web server for prediction of protein crystallizability ", Bioinformatics 23, 3403-3405 (2007) Pubmed: 17921170 PROTEIN PRODUCTION [back to Index]
Publication:Heath E. Klock , Eric J. Koesema, Mark W. Knuth, Scott A. Lesley, "Combining the polymerase incomplete primer extension method for cloning and mutagenesis with microscreening to accelerate structural genomics efforts", Proteins: Structure, Function, and Bioinformatics, 71(2), 982 -994 (2008). Pubmed: 18004753
Publication: Heath E. Klock , Eric J. Koesema, Mark W. Knuth & Scott A. Lesley, "Combining the polymerase incomplete primer extension method for cloning and mutagenesis with microscreening to accelerate structural genomics efforts ", Proteins: Structure, Function, and Bioinformatics 71, 982 -994 (2008). Pubmed:18004753 Cloning Robotics: A large number of expression clones must be generated within the pipeline to accommodate the number of targets, expression systems and variants for each gene targeted. Many options for creating such expression clones were evaluated, including recombinatorial (Gateway/Echo) and topoisomerase treated systems. To maximize flexibility and minimize cost, we chose to automate a conventional cloning approach. We developed a robotic platform, which incorporates liquid and plate handling, with thermocyclers and a plate reader, and demonstrated the capacity to provide up to 384 validated expression clones per week, which is sufficient to meet our pipeline needs. To date, over 2500 total expression clones have been generated with this system by a single operator. Large-scale
bacterial expression:
Protein expression has primarily been performed in E. coli. To
allow expression at a scale sufficient Publication:Kreusch, A. and Lesley, S. A. , "High-Throughput Cloning, Expression, and Purification Technologies. ", Genomics, Proteomics, and Vaccines, ed. G. Grandi, Wiley Press, UK , 171-184 (2004).
![]() Automated affinity purification: Processing of the resulting cell pellets through affinity purification is performed with custom automation (GNFuge). Fermentation tubes are directly processed in the GNFuge, for the steps of lysis, removal of cell debris and affinity purification. The resulting affinity purified proteins can then be processed by secondary purification or can be advanced directly to crystallization screening. Publication: Lesley, SA, "High-throughput proteomics: protein expression and purification in the postgenomic world.", Protein Expr. Purif. 22(2):159-64, 2001. Pubmed: 11437590 Secondary purification: Purification beyond affinity steps is achieved using standard commercial instrumentation, which has been configured for automated large-scale purification. By integrating a custom valve configuration and an air sensor with the Akta Purifyer systems (GE Healthcare), we can achieve automatic loading and processing of up to 12 samples, without the limitations on initial sample volume imposed by commercial autosamplers. With three such systems online, our demonstrated capacity for secondary purification is approximately 48-96 proteins per week at a 10-50mg scale. ![]() BIOPHYSICAL CHARACTERIZATION [back to Index] Biophysical characterization of samples is a critical component of our pipeline process that provides guidance for target strategies, and metrics for evaluating the various pipeline components. However, performing such characterization on a large number of targets has serious implications on pipeline throughput. The JCSG has devoted significant effort towards developing HT approaches to protein characterization and the gathering and tracking of this information for thousands of samples. The volume of data is enormous and has emphasized the need for active target management to take advantage of such knowledge as it arises. These biophysical data are also of tremendous value to the scientific community and for collaborative functional studies. Multiparametric Biophysical Protein Characterization: Biophysical parameters currently collected for each target are:
Publications:Santarsiero BD, Yegian DT, Lee CC, Spraggon G, Gu J, Scheibe D, Uber DC, Cornell EW, Nordmeyer RA, Kolbe WF, Jin J, Jones AL, Jaklevik JM, Schultz PG, Stevens RC (2002) "An approach to rapid protein crystallization using nanodroplets. "Journal of Applied Crystallography, 35, 278-281. Journal Link
To fulfill the demands of the
JCSG HT structure determination pipeline, it was clear at the outset that
an automated crystal screening capability would be a vital asset. The JCSG
pipeline is currently producing in excess of 500 crystals per month for
diffraction screening. X-ray screening forms a critical feedback loop, which
is used by the CC to identify promising targets and crystallization conditions.
Manual mounting and dismounting of crystal samples at the beam line is a
labor-intensive task, which wastes significant beam time and is prone to
human error. SDC has co-developed a completely automated crystal screening
system in close collaboration with the core Structural Molecular Biology
group at SSRL, which meets the needs of both JCSG and the wider structural
biology community. The key features are:
:
Secure crystal transport and storage is accomplished via a compact, cylindrical,
aluminum
![]() WebSite: http://smb.slac.stanford.edu/public/facilities/hardware/cassette_kit/ Publication: Cohen AE, Ellis PJ, Miller MD, Deacon AM, Phizacherley RP. "An automated system to mount cryo-cooled protein crystals on a synchtrotron beam line, using compact sample cassettes and a small-scale robot. ",J Appl Crystallogr, 35: 720-726 (2002). Stanford Auto-Mounter (SAM): The Stanford Auto-Mounter (SAM) has been been developed, which allows automated screening of crystals at the synchrotron. Individual crystals are mounted onto the beam line for screening using the SAM system. Three sample cassettes are held under liquid nitrogen in a dispensing dewar, which is located close to the goniometer, inside the experimental hutch. A commercial Epson ES553S 4-axis robot, outfitted with a pneumatically operated cryo-tong, removes samples from the cassette and places them on the goniometer. The SAM system also allows sorting of crystals from one cassette to another. Thus, the most promising crystals can be consolidated into a single cassette prior to data collection. The sorting facility is now in a prototype stage and will be developed into a full user system in the near future. SDC has fully integrated the ![]() WebSite: http://smb.slac.stanford.edu/public/facilities/hardware/SAM/ Publication: Cohen AE, Ellis PJ, Miller MD, Deacon AM, Phizacherley RP. "An automated system to mount cryo-cooled protein crystals on a synchtrotron beam line, using compact sample cassettes and a small-scale robot. ",J Appl Crystallogr, 35: 720-726 (2002). Sample
visualization and loop alignment system:
Reliable centering of the sample with the X-ray beam is an essential step
for automatic screening and requires good sample illumination and imaging.
A high-quality visualization system was developed by SDC on BL11-1 at SSR
![]() DIFFRACTION DATA COLLECTION [back to Index] The majority of the JCSG data collection has been conducted on the macromolecular crystallography beam lines at SSRL. The SSRL storage ring, the Stanford Positron Electron Asymmetric Ring (SPEAR), was recently upgraded to 3rd generation synchrotron capabilities and now offers increased brightness and higher operating ring current. All protein crystallography beam lines have benefited from the upgrade and typical exposure times have been significantly reduced. During the SPEAR-3 upgrade from April 2003 to March 2004 and also during shorter SSRL maintenance shutdowns, JCSG data were collected at the Advanced Light Source (ALS) and the Advanced Photon Source (APS). A program proposal provided time at APS (distributed over: SBC-CAT, BIO-CARS and NE-CAT) and a Memorandum of Understanding provided regular access at ALS. During these shutdown periods, the SAM system was used with an X-ray microsource generator to pre-screen crystals before trips to remote beamlines. Automated MAD data collection with BLU-ICE: JCSG has contributed to the ongoing development of the BLU-ICE data collection software at SSRL. In addition to the new crystal screening capabilities (described above), BLU-ICE now supports completely automated execution of MAD ![]() WebSite: http://smb.slac.stanford.edu/public/facilities/software/blu-ice/ Publication: McPhillips, TM, McPhillips SE, Chiu HJ, Cohen AE, Deacon AM, Ellis PJ, Garman E, Gonzales A, Sauter NK, Phizackerley RP, Soltis SM, Kuhn P. "Blu-Ice and the Distributed Control System: software for data acquisition and instrument control at macromolecular crystallography beamline ", Reference. Pubmed:12409628
WebSite: http://smb.slac.stanford.edu/public/facilities/hardware/cassette_kit/ Publication: Soltis SM, Cohen AE, Deacon A, Eriksson T, Gonzlez A, McPhillips S, Chui H, Dunten P, Hollenbeck M, Mathews I, Miller M, Moorhead P, Phizackerley RP, Smith C, Song J, van dem Bedem H, Ellis P, Kuhn P, McPhillips T, Sauter N, Sharp K, Tsyba I, Wolf G. "New paradigm for macromolecular crystallography experiments at SSRL: automated crystal screening and remote data collection." Acta Crystallogr D Biol Crystallogr, 64: 1210-1221 (2008). Pubmed:19018097 DATA PROCESSING
AND STRUCTURE DETERMINATION [back
to Index] Xsolve:
Xsolve can execute all crystallographic data processing and MAD structure
determination steps. Xsolve also prepares a standard set of files for
upload to the Structure Solution Tracking System (SSTS), which provides
a direct interface to the JCSG database. Xsolve allows parallel processing
of structure determination tasks using a variety of established crystallographic
applications. The Xsolve system has a flexible and open architecture so
that new versions of applications can readily be upgraded and newly emerging
programs can easily be incorporated. In this way, SDC can quickly capitalize
on developments made by the wider crystallographic community. Xsolve performs
all processing steps including initial indexing of a diffraction image,
integration, scaling, phase determination, phase improvement and initial
model building. The system has been optimized to provide high quality
results for direct upload to the JCSG central database. Publication: Schwarzenbacher R, Godzik A, Grzechnik SK, Jaroszewski L (2004). "The importance of alignment accuracy for molecular replacement. ", Acta Crystallogr D Biol Crystallogr, 60: 1229-1236. Pubmed: 15213384 Publication: Schwarzenbacher R, Godzik A, Jaroszewski L. (2008) "The JCSG MR pipeline: optimized alignments, multiple models and parallel searches. ", Acta Crystallogr D Biol Crystallogr., 64(Pt 1): 133-140. Pubmed:18094477 WebSite: http://smb.slac.stanford.edu/jcsg/QC WebSite: Link Publication: author "Title of Paper ", Reference. Pubmed:xxx COMPUTATIONAL TARGET ANALYSIS AND FUNCTIONAL ANNOTATION [back to Index] In collaboration with UCSD, Burnham and ANL bioinformatics groups, JCSG
has developed a unified protein structure and sequence analysis system
that includes predictions about the function of proteins solved by the
experimental pipeline. Elements of the system include structure similarity
analysis performed by DALI, CE and FATCAT structure alignment programs,
distant homology analysis performed by the FFAS profile-profile alignment
program, and genome context and pathway analysis performed by the SEED
system. These annotations are manually analyzed and subjected to internal
discussions using a unique system of interactive annotation pages developed
at JCSG. Through application of this system, functional annotations of
over half of the proteins solved by JCSG, including several previously
unannotated “hypothetical proteins,” have been established
with high reliability and have now been entered into public databases.
In addition, a functional annotation page has been created for each target,
which instantly allows JCSG scientists to curate and update biological
information generated during the structure determination process.
: Access to target annotations can
be accomplished through the PSCA system. Annotations from public databases,
links, and preprocessed target information are available through a tabbed
user interface. Data such as fold similarity, sequence similarity, domain
organization or physicochemical properties are periodically precalculated,
which highly speeds up access to a large collection of data for each target.
Wiki-based Collaborative Annotation: TOPSAN offers a combination of automatically generated, as well as comprehensive, expert-curated annotations, provided by JCSG personnel and members from the research community. TOPSAN, is an experiment in open, collaborative research on proteins whose structures are being determined by Protein Structure Initiative Centers. While built on a wiki platform, TOPSAN differs significantly from the conventional application of wiki technology. Instead of performing an encyclopedic distillation of established information, TOPSAN focuses on creating new knowledge by enabling instant collaborations among distributed participants. The immediate goal of TOPSAN is to enhance the impact of the vast numbers of structures being determined in structural genomics. But we also anticipate that TOPSAN as a model could facilitate the development of novel forms of continuous scientific communication and knowledge creation. WebSite: http://www.topsan.org Structure
Notes: JCSG structures are shared with the scientific
community not only through deposition in the PDB, but also through publication
of "structure notes." Structure notes are short papers describing
the annotation, biology, structure and functional implications of each
protein. The process of collecting all relevant data, from all stages
of the JCSG pipeline has been streamlined through the central JCSG database,
which includes information on the sequence, annotation, cloning, purification,
crystallization, data collection, structure solution, tracing, refinement
and structural evaluation. The structure note automatically captures any
functional information in the JCSG annotation system (see above). The
paper introduction, for example, includes annotation information, with
a brief biological background taken and curated from the PFAM, Interpro,
SwissProt, BRENDA, and SEED databases. Methodological and experimental
data, as well as all crystallographic statistics, are automatically harvested
from the JCSG database and assembled into purification, crystallization,
structure solution and refinement paragraphs. The structure description
and the preparation of figures are done manually using PYMOL. Structures
are analyzed, compared and evaluated for biological significance using
a plethora of structure analysis tools including structural homology searches
(DALI, CE, FATCAT), and extensive literature searches. WebSite: http://www.jcsg.org/datasets-info.shtml WebSite: http://www.jcsg.org Publication: Godzik A, Canaves J, Grzechnik S, Jaroszewski L, Morse A, Ouyang J, Wang X, West B, Wooley J. "Challenges of structural genomics: bioinformatics. ", Biosilico 1: 36-41 (2003).
The ability to mine data from a consistent process is invaluable for optimizing
our pipeline. Since our targets are processed using similar methods and
materials, often in parallel, more insightful comparisons can be made
than from extracting equivalent data from the literature. Furthermore,
the large number of targets processed, as well as their diverse nature,
makes identification of general principles more valid.
Analysis of PCR amplification success rates: The feedback from analysis of success rates was used to improve the primer generation system. As a result, a scoring function that selects primers with optimal GC clamps within the specified melting temperature and length range was added to the system. In its present form, the optimized system is capable of generating primer sets with success rates as high as 98%. Analysis of crystallization screen: An analysis of over 340,000 individual crystallization trials has led to the creation of a new minimal coarse screen (GNF96), which is highly effective in identifying targets which crystallize easily and providing leads for optimization.The realization that a significant number of coarse screen crystallization conditions never yielded any crystals, whereas in other cases proteins crystallized under many different conditions, led to the development of a minimal crystallization screen. Our large number of crystallization trials (>500,000) and our consistent processing approach allowed us to analyze and optimize our crystallization strategy. Redundancy in the commercial conditions, particularly in the high molecular weight PEGs, skews the statistics on relative efficacy of different crystallization conditions. In review of our Tier 1 screening using the 480 available screening conditions, we defined a small subset of 67 conditions which optimally samples crystallization space and would have encompassed 84% of the proteins which ultimately crystallized. This subset was expanded slightly to 96 conditions (GNF96) and forms our basic screen to test whether a particular protein construct will readily crystallize. Results to date from 340,000 individual crystallization trials show that the minimal coarse screen (GNF96) is highly effective in identifying targets which readily crystallize and in providing crystal leads for fine screen optimization. WebSite: JCSG Screens are available form Qiagen: JCSG+ Suite and JCSGCore Suites Publication:Lesley SA, Wilson IA.(2005) ,"Protein production and crystallization at the joint center for structural genomics. ", J Struct Funct Genomics. 6 (2-3), 71-9 . Pubmed: 16211502
|