Science on Beagle2
We describe in this section some highlights of the research carried out on Beagle. While the majority of usage of Beagle is on health-related projects, as required by NIH, some work on projects in other areas has also been accomplished.
To see our Publication List please follow this link.
D. Witonsky, C. Jeong, A. Di Rienzo
To enable studies of human genetic adaptations, we have imputed genotypes for the entire genome of the 938 individuals in the Human Genome Diversity Project panel, which includes samples from 52 populations from 6 continents genotyped at a set of 637,707 autosomal single nucleotide polymorphisms. As a reference panel, we used whole genome sequence data for the 2,504 individuals in the 1000 Genomes Project. After applying filters for quality control to the imputed genotype data, we obtained high-confidence imputed genotype data at 9,162,006 autosomal SNPs, thus achieving a greater than 14-fold increase in the amount of available sequence variation data for the Human Genome Diversity Project panel. The imputed genotype data will be used for investigating models of human adaptations to different local environments. In addition, the data set will be made publicly available to investigators interested in the study of human evolution.---------------------------------------------------------------------------------------------------------------------------------------------
The past 15 years have witnessed tremendous interest in characterizing patterns of sequence variation in the human genome. This was in part motivated by the opportunity to dissect the genetic bases of common diseases, as single nucleotide polymorphisms (SNPs) contribute directly to disease susceptibility, or can be used as markers to identify the neighborhood of a disease susceptibility variant by linkage disequilibrium (LD) mapping. Importantly, the successful design and implementation of disease mapping approaches benefits from a detailed knowledge of levels of variation and LD between marker alleles. This requires a detailed description of the structure of human variation in different ethnic groups and genomic regions.
Beyond the simple description, patterns of sequence variation also contain a record of past events at the population level and hence may be used to elucidate the factors that have shaped human variation. These include genomic factors (for example, the distribution of recombination and mutation rates) as well as evolutionary factors, such as the history of population size and structure, and natural selection.
An understanding of these processes will lead to inferences about human histories, which might, in turn, facilitate disease-mapping studies. For example, susceptibility genes for some diseases are hypothesized to have evolved under positive natural selection (e.g. the thrifty genotype hypothesis for type 2 diabetes). If so, detecting the signature of selection might help narrow down the candidate region or the set of variants contributing to disease susceptibility.
In particular, the identification of loci affected by natural selection requires a characterization of the variation expected at neutrally evolving loci. However, neutral evolutionary processes are highly stochastic: a great deal of variation is expected across independent realizations of evolution, even if the genomic and evolutionary factors remain unchanged. Thus, robust inferences from polymorphism data will require the combined analysis of many unlinked loci. This is particularly true if the effects of population history are to be distinguished from those of natural selection. While population history has genome-wide effects, locus-specific forces such as natural selection will only shape genetic variability at loci that are tightly linked to the selected locus.
Our laboratory has been particularly interested in detecting human genetic adaptations to different local environments. Since their migration out of Africa 50–100 thousand years ago (kya), anatomically modern humans have colonized a wide range of environments in a relatively short time period. For example, archaeological studies provide evidence of human habitation in environments extremely divergent from those of sub-Saharan Africa, such as cold climates in arctic Siberia or high-altitude environments in the Tibetan plateau, as early as 27 and 30 kya, respectively. In addition to differences in the physical environment, cultural transitions such as the introduction of agriculture and pastoralism also contributed to divergence of human environments. Understanding the genetic basis of heritable beneficial traits is a major goal of human genetics.
Population genetics approaches scan the genome for signatures of recent positive selection, such as unusually large divergence in allele frequency between populations or extended haplotype homozygosity around selected variants. These approaches have reasonable power to detect variants with relatively large effects on the adaptive phenotype. However, when adaptation occurs in traits that are due to the combined contribution of variation at many genes, e.g. height, the genetic signals left behind are more diffuse, and no individual region of the genome is likely to show strong signatures of selection. Recently, new statistical methodology has been developed to detect the impact of selection on human phenotypic variation based on the more realistic model of polygenic adaptations. This methodology requires whole genome sequence information from a large number of populations with widespread geographic distribution.
The Human Genome Diversity Project (HGDP) panel, which includes 938 unrelated individuals from 52 populations in 6 continental regions, is the most widely used resource for studies of human genetic variation. This is a panel of lymphoblastoid cell lines providing a renewable source of genomic DNA made available to investigators all over the world. The aim of the resource was to promote worldwide research on human genetic diversity, with the ultimate goal of understanding how and when patterns of diversity were formed. It also has the added benefit of providing information that may prove useful to biomedical research. As a result, these samples have been genotyped at approximately 650,000 SNPs genome-wide using Illumina genotyping arrays; all data are publicly available. These data have been analyzed in a number of high profile publications, which have characterized the geographic structure of human variation, have allowed developing and testing models of human evolution and dispersal, and have enabled scans of genetic adaptations in geographically and ethnically diverse populations. However, whole genome sequence information for most of these samples is not available, thus preventing analyses that require this type of data, e.g. testing for polygenic adaptations.
More recently, the NHGRI-sponsored 1000 Genomes Project (1KGP) has generated low-coverage whole genome sequence data in 2,504 individuals from a distinct set of 26 populations. Unlike the HGDP, the goals of this project are primarily biomedical, i.e. to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Therefore, the populations to be sequenced were chosen to represent large populations that might be subjected to genome-wide association studies of biomedically important traits or diseases.
Our goal has been to leverage the sequence-level information from the 1KGP and the SNP array data for the HGDP to perform genome-wide genotype imputation of the HGDP individuals, thus integrating these two remarkable genome variation resources. Genotype imputation refers to the process of predicting genotypes that are not directly assayed in a study sample of individuals using a reference panel of haplotypes at a dense set of SNPs whereas the study sample has been genotyped at a subset of the SNPs. In our case, the reference panel is the whole-genome sequenced 1KGP data set while the HGDP is our study sample.
We used the algorithm implemented in the software IMPUTE v2. This method divides the SNPs into two sets: a set T that is typed in both the study sample and reference panel, and a set U that is untyped in the study sample but typed in the reference panel. The algorithm involves estimating haplotypes at SNPs in T using a Hidden Markov Model and then imputing alleles at SNPs in U conditional on the estimated haplotypes. Uncertainty in the haplotype phase estimation is accounted for by iterating these steps using a Markov chain Monte Carlo approach. As correct matching of haplotypes affects imputation accuracy, the method uses as many individuals as possible with genotype information at the SNPs in T. The output is a posterior probability for each genotype at each SNP.
In the output, we retained genotypes with posterior probability greater than 0.9 at each SNP, reflecting high confidence in the imputation. As a further quality filter and to simplify downstream analyses, we omitted all SNPs with frequency of the minor allele lower than 0.5%, which make up the vast majority of variants, and those with a rate of missing data greater than 10%.
We further investigated the accuracy of the imputation by calculating the concordance between the original genotype values in the study sample and their imputed values. To this end, IMPUTE v2 masks the genotypes of one variant at a time in the study panel, then imputes the masked genotypes with information from the reference panel and nearby study variants. The concordance between imputed and original genotypes provides an assessment of the quality of the imputation.
The plot below shows the distribution of the concordance rates for SNP genotypes imputed in different groups of populations. In general, the concordance rate is very high, i.e. >90%, for all populations. However, there are significant differences across ethnic groups. For example, the distributions for African and Oceanian populations are clearly shifted to the left, reflecting lower accuracy in imputations in these groups. Because genotype imputation of unassayed SNPs is based on patterns of linkage disequilibrium among SNPs, which in turn is a property of the history of the population, the lower concordance rate in African individuals is likely due to the well-documented lower extent of LD in these populations. This explanation is unlikely to account for the same finding in Oceanian populations, which are known to be heavily drifted and therefore harbor high levels of LD. The lack of representation of Oceanian populations in the 1KGP reference panel is more likely to underlie the lower imputation accuracy. Both factors are well known to affect imputation accuracy, thus caution should be taken in interpreting the results for these populations.
We will now use this imputed data set that we generated using Beagle to investigate polygenic adaptations at the genome-wide level. In addition, the full imputed data set, with information about estimates of the uncertainty for each imputed genotype, will be made publicly available through the Di Rienzo lab web site (http://genapps.uchicago.edu/newlabweb/index.html). We anticipate that this resource will be widely used by investigators interested in similar questions as well as in projects focusing on the evolutionary history of our species.
Arij Daou, Peter Malonis, Daniel Margoliash
Neuronal spiking activity is the fast and information-dense mode of communication in the nervous system. The timing of spikes carries information, and changes in the patterns of spiking activity can represent changes in information related to memory formation or loss. Traditional mechanisms giving rise to such changes include changes in network connectivity and changes in strength of synapses between neurons. Another, less intensely studied plastic neuronal mechanism is the intrinsic excitability (IE) of neurons, which also is regulated in an activity-dependent manner and can represent a memory. A neuron’s IE reflects the complement and magnitude of voltage-gated ionic channels contributing to the subthreshold activity that gives rise to suprathreshold spiking activity.
Figure 1.One trace from each corticostriatal neuron recorded from each of four birds. Each trace is the 1st spike emitted in response to 100 pA stimulation. The traces are similar within but differ between each of two control bird (top panels). The cDAF birds, that experienced abnormal auditory feedback during singing, have abnormal spike waveforms and vastly greater variability. Color is to help visualize individual traces.
Figure 2.High-resolution error manifold for one neuron. The manually-determined solution (arrow) is near the global minimum.
Birdsong learning is a valuable model system for studying learning and memory, and shares important similarities with speech and language. Both birds and humans monitor their auditory feedback, for example exposure to delayed auditory feedback (DAF) can rapidly induce stuttering in humans and over a longer period of time can induce abnormal syllable morphology and song syntax in songbirds. In both cases these are behaviors with very fast dynamics and a high degree of temporal precision. We have been exploring a new result that gives insight into how the feedback error signal is encoded. Examining the population of “corticostriatal” forebrain neurons known to carry the error signal in zebra finches, we recorded from neurons in a brain slice preparation, so that the activity of neurons is not as dominated by network activity as it is in vivo. Under these conditions, all the neurons from a given bird showed similar neuronal firing patterns to a canonical stimulus, but the firing patterns varied from bird to bird. Similarities between birds, however, were observed in pairs of sibling birds. To refine this result, we observed that the within-bird similarity of neuronal response was disrupted in birds exposed to continuous DAF (cDAF). We also observed that the similarities in firing properties were correlated with similarities in the spike morphology of neurons. For example, in Figure 1, for each of two control birds singing normal songs (top panels), the corticostriatal neurons showed similar spike waveform morphologies, but the morphologies varied between the birds. The spike waveform morphologies were highly disrupted in cDAF birds that experienced singing with abnormal auditory feedback (bottom panels).
Similarities in the IE of neurons from the same bird, and differences in the IE of neurons from the different birds, are likely to underlie the differences in bursting properties and spike waveform morphologies we observed. To gain insight into the mechanisms that give rise to this result, we developed a Hodgkin-Huxley (HH) conductance based model and manually fit the magnitude of five voltage-gated ionic conductances to the spike waveforms for specific values of step currents, achieving good predictions to other step currents and to chaotic current stimuli. Using such HH models to estimate biological parameters is complicated since a range of parameters can appear to reproduce the measured behavior of a neuron. We have therefore been using Beagle to compute the error manifolds for this set of conductances. This massive computation enables us to eliminate potential bias introduced in the labor-intensive manual fitting process. For neurons examined to date, the computations identify global minima (Figure 2). In some cases, the manually determined minimum is near the computed global minimum (Figure 2). In other cases, there is a discrepancy between the two. This arises because manual fitting tends to emphasize shape features of spike morphology whereas the error function used in the error calculations is dominated by the spike heights. Further calculations on Beagle will allow us to explore the sensitivity of the model to different measures of the fitting function.
There is an important set of questions whether such models have global minima, and whether the “solution” of a biological neuron changes over time, as our work would suggest.
Once we complete sufficient calculations to examine the distribution of ionic current magnitudes for different birds this should give us some insight into a principal question, how IE of corticostriatal neurons is related to the behavior of the animal (its song). Ultimately we wish to know how does the IE “set point” of a population of neurons affect the dynamics of that network. We are pursuing this work, with the help of Beagle.
Kazutaka Takahashi,Sanggyun Kim,Todd P. Coleman,Kevin A. Brown,Aaron J. Suminski,Matthew D. Best & Nicholas G. Hatsopoulos
Aggregate signals in cortex are known to be spatiotemporally organized as propagating waves across the cortical surface, but it remains unclear whether the same is true for spiking activity in individual neurons. Furthermore, the functional interactions between cortical neurons are well documented but their spatial arrangement on the cortical surface has been largely ignored. Here we use a functional network analysis to demonstrate that a subset of motor cortical neurons in non-human primates spatially coordinate their spiking activity in a manner that closely matches wave propagation measured in the beta oscillatory band of the local field potential. We also demonstrate that sequential spiking of pairs of neuron contains task-relevant information that peaks when the neurons are spatially oriented along the wave axis. We hypothesize that the spatial anisotropy of spike patterning may reflect the underlying organization of motor cortex and may be a general property shared by other cortical areas.
Figure 1.Properties of LFP beta waves (a) Temporal snapshots of the LFP voltages across the array indicating wave propagation. The LFP voltage on each electrode was band-pass filtered in the beta frequency range (that is, ±3 Hz centred at the beta peak of the power spectrum). Time in milliseconds labelled above each plot is with respect to the onset of the visual target. The red arrow in the bottom right panel shows the propagation direction of the wave. (b) Temporal evolution of the GOF measure of planar wave activity (PGD in blue ranging from 0.295 to 0.335, as well as the mean hand speed (green) ranging from 5 to 36 cm s−1. (c) Averaged spectrogram of a single-channel LFP revealing the temporal dynamics in beta frequency power relative to the visual target onset. (d) Circular distributions of wave propagation directions for monkeys Rs (cyan), Mk (magenta) and Rj (brown). A solid black line denotes the rostro–caudal axis on the cortical surface. A dashed line in each rose plot connecting Cw (caudal wave direction) and Rw (rostral wave direction) denotes the axis defined by the first or only mode of beta wave propagation axis. Each panel below the circular distributions depicts the location of the multielectrode arrays (4 × 4 mm) in the arm area of MI for the corresponding subject. A red horizontal bar in the right lower corner in each panel is 4 mm. Landmarks and orientations: Cs, central sulcus; As, arcuate sulcus; C, caudal; R, rostral; M, medial; L, lateral. (e) Distributions of estimated propagation speeds for monkeys Rs (cyan), Mk (magenta) and Rj (brown).
Figure 2.Two functional cell classes based on extracellular spike waveform widthProperties of LFP beta waves (a) Histograms of spike waveform widths for all three monkeys, Rs, Mk and Rj. The solid curve indicates the fit using mixture of two Gaussians. Background colour indicates the bins classified as narrow (green) and wide (yellow) units. Inset: Examples of motor cortical units with narrow (green) and wide (yellow) spike waveform widths. (b) The difference between BIC values minus the mean BIC values over the range as a function of the number of Gaussians used in the mixture model to fit the distributions of spike widths. (c) Example peri-stimulus time histograms from two narrow (green) and two wide (yellow) spiking neurons from monkey Mk. (d) Averaged population spike rates for the two classes of cells from monkey Mk, narrow class in green and wide class in yellow. The time-resolved beta oscillation amplitude is plotted as a solid black line. (e,f) Averaged spectrograms for the two classes of units from monkey Mk, narrow class (e, left) and wide class (f, left), and averaged power spectra for the each class of units, narrow class in green (e, right) and wide class in yellow (f, right) computed over [−100, 300] ms relative to the visual target onset. Black dashed line on the right subfigures denotes average LFP spectrum over all channels with the base line pink noise removed.
Figure 3.Spatiotemporal patterns of network connectivity of narrow class of neurons in MI in six 150 ms time windows incremented by 50 ms from monkey Rs data set.(a) Networks of significant directed connections on the array at different time windows (in milliseconds) in relation to the onset of visual target appearance. Each horizontal colour bar indicates duration of a time window used to compute a network surrounded by the same colour. C, R, M and L indicate caudal, rostral, medial and lateral orientations, respectively, on the cortical surface. Red and blue arrows represent excitatory and inhibitory connections, respectively. The black dots represent the relative positions of the electrodes on the array where the neurons were detected. Neurons whose spikes rates are greater than or equal to1 spike s−1 and with narrow spike widths (≤0.267 ms) were analysed. The green scale bar on the right-bottom equals 400 μm on the cortical surface. (b) Number of significant connections at different time windows for both excitatory and inhibitory connections (left), excitatory connections only (center) and inhibitory connections only (right). The lines in different colour shades correspond to different time windows of data in the recording session. (c) Circular distribution of directed excitatory connections weighted by their strength and normalized by the total number of possible connections in each orientation on the surface of MI at the different time windows (top). Circular distribution of directed inhibitory connections weighted by their strength and normalized by the total number of possible connections in each orientation on the surface of MI at the different time windows (bottom). All rose plots are oriented in the same way as in the anatomical coordinate system defined with C, caudal; R, rostral; M, medial and L, lateral in a. A dashed line in the top left rose plot connecting Cw (caudal wave direction) and Rw (rostral wave direction) defines the wave propagation axis as in Fig. 1d.
Propagating waves of neural activity are ubiquitous and have been documented at different spatial resolutions in a number of different neocortical areas including visual, somatosensory, auditory and motor cortices as measured via multielectrode local field potential (LFP) recordings, voltage-sensitive dyes (VSDs) and multiunit activities. Oscillatory LFPs and electroencephalograms in the beta frequency range (15–40 Hz) are ubiquitous in the motor cortex of mammals including monkeys and humans. In particular, we have previously demonstrated that across the precentral gyrus of the upper-limb area of primary motor cortex (MI), these oscillations are not perfectly synchronized but rather exhibit phase gradients that indicate planar propagating waves along what we define as a beta wave axis, a rostro–caudal axis in monkeys13 and a medio–lateral axis in humans14 at a range of propagating speeds that were consistent across subjects.
However, as both LFPs and VSD measure aggregate potentials from groups of neurons near the recording site, it has never been shown whether action potentials from individual neurons demonstrate spatiotemporal patterning consistent with wave propagation. This is important because it is still debated as to what aggregate signals such as LFPs and VSD signify physiologically, whereas single-unit action potentials are understood to mediate interneuronal communication. Moreover, the functional significance of this wave propagation for motor control is unclear (but see recent computational studies). Here we first show that MI neurons can be classified, based on the spike waveform widths, into two groups of neurons exhibiting distinct spectral properties. We then estimate effective connectivity of networks of spiking neurons based on this classification using a Granger causality analysis applied to point processes, and demonstrate that a class of simultaneously recorded, single-motor cortical neurons with narrow spike waveforms in non-human primates spatially coordinates their spiking activity in a manner that closely matches the orientation of prominent beta wave propagation. We also demonstrate that sequential spiking activity of that class of neuron pairs contains task-relevant, target-direction information whose magnitude varies according to the spatial orientation of the constituent neurons in a manner consistent with the beta wave axis.
ResultsBeta waves in the motor cortex
We recorded multiple single-unit and LFP activity from MI using chronically implanted high-density microelectrode arrays while three rhesus monkeys (Rs, Mk and Rj) made planar reaching movements using a two-link robotic exoskeleton (BKIN Technologies, ON, Canada). The monkeys performed a random target-pursuit (RTP) task23 that required them to move a cursor (aligned with the position of their hand) through a sequence of randomly positioned targets. Movement durations from target to target ranged from 300 to 450 ms with mean speeds (±s.d.) of 22.33±11.17 (Rs), 14.12±6.27 (Mk) and 6.11±7.29 cm s−1 (Rj). Planar beta wave activity measured from spatially distributed LFP sites was evident at particular intervals of time throughout the performance of this task (Fig. 1a). We used a method described previously13 to characterize the properties of planar beta waves. We found that the degree of planar wave propagation as measured by a quantity called phase gradient directionality (PGD) was strongest ~100–150 ms after the target onset (Fig. 1b) when beta power was high (Fig. 1c), and when visual target information reached the motor cortex13 followed by movement initiation to the new target (see wrist speed in Fig. 1b). Consistent with our previous findings using a center-out task13, wave propagation directions during the RTP task exhibited either a bimodal distribution (monkey Rs) or unimodal distribution with a small secondary mode (monkeys Mk and Rj), with one mode oriented in the rostral-to-caudal direction and a secondary mode oriented in the opposite direction (Fig. 1d). We denoted the caudal wave and rostral wave directions defined by the mean direction of the first or only mode of the wave propagation distribution and the opposite direction oriented roughly along the rostro–caudal axis. The distribution of propagation speeds was always unimodal with means and medians ranging from 23.2 to 26.7 and from 10.1 to 13.5 cm s−1, respectively (Fig. 1e).
To read full article follow this link.
Yilin Meng, Yen-lin Lin, Lei Huang, Wei Jiang and Benoît Roux
Gleevec is a potent inhibitor of Abl tyrosine kinase but not of the highly homologous Src kinase. Because the ligand binds to an inactive form of the protein in which an Asp-Phe-Gly structural motif along the activation-loop adopts a so-called DFG-out conformation, it was suggested that binding specificity was controlled by a “conformational selection” mechanism. In this context, the binding affinity displayed by the kinase inhibitor G6G poses an intriguing challenge. Although it possesses a chemical core very similar to that of Gleevec, this compound is a potent inhibitor of both Abl and Src kinases. Both inhibitors bind to the DFG-out conformation of the kinases, which seems to be in contradiction with the conformational selection mechanism. To elucidate the molecular mechanism of Gleevec/G6G binding specificity and display the hidden thermodynamic contributions affecting the binding selectivity, large-scale molecular dynamics free energy simulations with explicit solvent molecules were carried out on Beagle.
Protein tyrosine kinases are crucial enzymes in cellular signaling pathways regulating cell growth, proliferation, metabolism, differentiation and migration. In their active state, protein tyrosine kinases catalyze the transfer of γ-phosphate of an adenosine triphosphate (ATP) molecule covalently onto a tyrosine residue in substrate proteins (phosphorylation of a tyrosine residue). Phosphorylation of a substrate protein usually results in a functional change of the substrate. To maintain normal regulation of cellular signal transductions, the activity of tyrosine kinases is tightly regulated by multiple mechanisms. Mutations of certain residues can disrupt normal inhibitory mechanisms and make tyrosine kinases constitutively active, leading to a number of diseases, such as cancers, diabetes, and inflammation. For this reason, protein tyrosine kinases represent attractive drug targets for certain types of cancers. In recent years, many small-molecule inhibitors of kinases have been developed as possible treatments of these diseases. Contrary to most standard chemotherapies which act on all rapidly dividing normal and cancerous cells, kinase inhibitors are deliberately designed to interact with specific target. They are a cornerstone of the precision medicine, a form of medicine that uses information about a person’s genes and proteins to prevent, diagnose, and treat disease. Although successes have been achieved, designing inhibitors that are targeting specific kinases in the active state is, however, difficult because they all present structurally similar catalytic pockets owing to the common enzymatic function requiring ATP binding. Inhibitors that are targeting inactive conformations of the kinases appear to be more selective. One notable example of inhibitors targeting an inactive state of tyrosine kinases is Gleevec (developed by Novartis). Gleevec is found to induce high remission rates in patients with chronic myeloid leukemia (CML), which is caused by the Bcr-Abl kinase (in this case, Bcr-Abl kinase is the target). Gleevec is also used in the treatment of gastrointestinal stromal tumors because it’s also a potent inhibitor of receptor tyrosine kinase c-Kit.
Figure 1. Graphical representation of the selectivity profile of kinase inhibitors based on the phylogenetic tree of human protein kinases. The size of the red circle represents the strength of binding. Anticancer drug Sprycel (dasatinib) binds to the active state and is a promiscuous inhibitor, whereas anticancer drug (lapatinib) binds to an inactive state and is a selective inhibitor. This figure was adapted from Nat. Biotechnol. 2008, 26, 127-132 with permission.
The mechanism underlying the binding of Gleevec for different kinases is still being debated. Quantitatively, the specificity of the binding process is affected by two different factors. First, a short motif comprised of the conserved residues Asp-Phe-Gly (DFG) in the binding pocket of the kinase undergoes a conformational transition via a 180-degree rotation, referred to as “DFG-flip”, switching the enzyme from an active (DFG-in) to an inactive (DFG-out) state. Gleevec binds only to the inactive DFG-out state. Therefore, whether a given kinase can readily adopt this conformation has an important impact on the apparent binding affinity of Gleevec. The accessibility and relative stability of the DFG-in and DFG-out conformations gives rise to a “conformational selection” mechanism for the binding of Gleevec. A second factor affecting the binding specificity is the strength of binding of Gleevec to the apo DFG-out conformation for a given kinase. This directly depends on the magnitude of the interactions of the ligand with the residues lining the binding pocket of a given kinase. Differences regarding both the conformational selection as well as ligand-protein interactions have been proposed to explain and to rationalize the binding specificity of Gleevec for Abl over Src. It is sometimes argued that the key to the binding specificity is that the DFG-out conformation (required for the drug binding) is allowed in Abl but forbidden in Src. However, crystal structures of Abl and Src in complex with Gleevec and with its derivatives (ligand name G6G) determined subsequently, as well as the observations of single-point mutagenesis raised the possibility that differences in the ligand-protein interactions could be at the origin of kinase inhibitor binding specificity rather than conformational selection. According to this view, Src does not incur a large energetic penalty for adopting the DFG-out conformation required for the ligand binding and distinct protein-ligand interactions give rise to the observed binding specificity. Such conflicting views about the mechanism of binding specificity cannot be easily resolved with the limited information currently available from experiments. Computer modeling can provide a wealth of molecular details on the specificity of Gleevec. In this study, we aimed to quantify the thermodynamics of Gleevec and G6G binding to the homologous kinases, Abl and Src. We also intended to identify the physical/chemical determinants governing the differing selectivity of Gleevec and G6G. The importance of structural variations in the ligands as well as small differences in amino acid identity in the kinases on selective kinase inhibition was examined. To address those issues, large-scale molecular dynamics free energy calculations were performed on Beagle to study the binding of Gleevec and G6G to Abl and Src kinase domains, in which solute and solvent atoms are treated explicitly with atomic force fields. We took advantage of Beagle’s supercomputing size and speed to allow for the rapid simultaneous simulations. The study demonstrates that detailed analyses can provide valuable information to explain the observed target preference. Information derived from this study will provide useful principles to guide lead optimization studies aimed at increasing potency and selectivity of drugs.
Figure 2. Free energy difference between the apo and the holo states upon Gleevec and G6G binding. The free energy difference consists of two parts: contribution from the conformational change (DFG-flip) and from protein (apo) – ligand interaction. The calculated binding free energy difference, which is the sum of the two contributions, agrees well with experiments.
- DFG-flip: Umbrella sampling (US) technique was applied to determine the free energy cost associated with the conformational transition converting the DFG motif of the (apo) c-Abl and c-Src tyrosine kinases from the active “in” to the inactive “out” state. Umbrella sampling introduces the concept of a biased simulation “window”, a theoretical object aimed at producing an enhanced sampling over a focused region of configurational space. In general, a collection of simulation windows with narrowly defined biasing potentials covering the relevant region are carried out. The information from these windows will be reweighted in order to yield an unbiased probability distribution. A total of 5184 and 401 umbrella windows were employed for Abl and Src respectively. Each umbrella window was propagated for 5 ns. An umbrella sampling simulation with 5184 windows is an extremely expensive calculation. Finding proper computational resource is the bottleneck of this type of molecular simulations: one needs many fast computers to achieve the goal. The supercomputing size of Beagle is ideal for achieving the balance between the number of simultaneously running jobs and the speed of individual jobs. The umbrella sampling calculation cost ~ 190 years of CPU time (i.e. 190 node years). The calculated free energy landscapes describing the DFG-flip conformational change show that the DFG-out conformation can be stable in c-Abl even in the absence of Gleevec, whereas it corresponds to a state of high free energy in c-Src.
- Gleevec/G6G binding to apo proteins: Alchemical free energy methods such as free energy perturbation (FEP) are usually employed in the investigation of ligand bindings. In the process of computational alchemy, a ligand (Gleevec/G6G in our case) is created or annihilated in the binding site by turning on or off its interaction with the receptor protein, e.g. the kinase domain of Abl and Src. A rigorous step-by-step approach of FEP augmented by replica exchange molecular dynamics (REMD) method which accelerates and improves the conformational sampling was employed in order to achieve accurate estimation of the absolute binding free energy difference. We found that, relative to Gleevec, G6G forms highly favorable van der Waals dispersive interactions upon binding to the kinases, which is considerably larger than the corresponding moiety in Gleevec. Upon binding of G6G to Src, these interactions offset the unfavorable free energy cost of the DFG-flip. When binding to Abl, however, G6G experiences some unfavorable free energy penalty due to steric clashes with the phosphate-binding loop, yielding an overall binding affinity that is similar to that of Gleevec. Such steric clashes are absent when G6G binds to Src, due to a different conformation of the phosphate-binding loop.
Jason Pitt, Kevin White Lab, Committee on Genetics, Genomics, and Systems Biology, University of Chicago
Figure 1. Schematic of the SwiftSeq workflow. By employing multiple map-reduce steps, an optimized job packing scheme, and the parallel nature of Swift, SwiftSeq provides both a fast and efficient way to process next-generation DNA sequencing data.
With increasing throughput and decreasing cost of next generation sequencing (NGS) technologies, investigators and consortia are now producing extraordinary amounts of genomics data. This transformation of genomics to a data-intensive science has left a significant portion of researchers unable to reap the benefits of their own data, let alone the troves available from public repositories.
In the Kevin White lab, much of our data comes from The Cancer Genome Atlas, a project which strives to characterize the genomic properties of over 20 different cancer types. To date, this project has generated over a petabyte of compressed sequencing data. While significant hardware resources such as Beagle are necessary to perform large-scale analyses, insufficient software infrastructure is a substantial bottleneck to data-intensive genomics. Using Beagle as a test-bed, and in collaboration with the Computation Institute, we've developed SwiftSeq, a highly-parallel and scalable next generation DNA sequencing workflow underpinned by the parallel scripting language Swift (Figure 1). SwiftSeq is able to perform end-to-end analysis beginning with sequencing reads and ending with annotated genotypes. Independent samples, tumor-normal pairs, and either bam or fastq files are accepted as input. Samples are pre-processed, aligned, post-processed, and genotyped in a manner consistent with the 1,000 Genomes Project and Genome Analysis Toolkit's best practices. To maintain workflow flexibility, algorithms for key processes such as alignment and genotyping are interchangeable, which is critical given dynamic nature of genome analysis. Computational adversities such as execution site selection, parallelization, and synchronization are all managed under-the-hood by capitalizing on Swift's implicitly parallel framework. Executed tasks are robust to transient software and localized hardware failures, keeping user intervention to a minimum. Due to an optimized packing scheme, SwiftSeq requires less than a third of the computational resources necessary for standard pipeline-based approaches. Importantly, the workflow is easily portable and has been successfully deployed to clusters and Bionimbus clouds in addition to Beagle. Both members of the Beagle team (Lorenzo Pesce, Ana Marija Sokovic, & Joe Urbanski) and the Swift developers have provided significant technical and development support to make this software possible. With their continued assistance, we hope to provide SwiftSeq as an analysis resource to the University of Chicago as well as the genomics community at large.
Scientifically, SwiftSeq has become a workhorse for many collaborative projects such as whole genome sequencing of Nigerian individuals with breast cancer (Funmi Olopade, University of Chicago), exome sequencing of autism families (Andrey Rzhetsky, University of Chicago), and the ENCODE-Cancer working group. To date, over 5,500 germline exomes from The Cancer Genome Atlas have been uniformly processed with SwiftSeq using approximately 4.8 million core hours. By integrating these exonic genotype calls with somatic copy number alterations, we've shown that inherited and acquired DNA changes can interact to reveal novel tumor suppressor genes. Using cells engineered to carry our mutations of interest, we're performing experiments to validate if these aberrations can indeed induce cancer phenotypes. With SwiftSeq and Beagle we intend to continue discovering and validating novel genomic features, as well as exploring biological questions that would otherwise be computationally infeasible.
Figure 1. Top left, the crystal structure of the pore domain of NavAb (PDB ID : 3RVY) with front and back segments removed for clarity. Note that the gate at the bottom on the pore domain is closed preventing ion conduction. Top right, bird’s eye view through the top of the truncated model of NavAb.
Figure 2. The side view of the truncated model of NavAb used for simulations of ion conduction with the front and back segments removed. Water (blue) and sodium (white) penetrate the pore, however chloride (yellow) does not.
Karen Callahan, Benoit Roux Lab
The cell membrane separates a cell from its environment, but there are many kinds of transport proteins that can allow ions, water, and small molecules to cross. Ion channels are a type of transport that allows ions to pass into or out of the cell, some of these only allow a specific anion or cation type to pass, others are less selective. In humans voltage-gated, ion-selective ion channels are part of many pathways, including action potentials of nerves, muscles, and many signaling pathways. The first voltage-gated potassium channel was solved over ten years ago, but other ion-selective channel families have been more elusive. Two summers ago, the first crystal structure of a putative sodium-selective, voltage-gated ion channel, NavAb, was published. The pore domain has a narrow region called a selectivity filter, a large water-filled cavity, and at the bottom it has a gate that is opened and closed by the voltage sensors(see figure).
The selectivity filter of NavAb is not as narrow as the selectivity filter of the known potassium-selective channels, in most places there is room for the penetrating ion to remain solvated, and cations may pass each other in some places, potentially leading to a multitude of non-serial pathways for conduction. To provide non-static picture of the ion channel, and to study the mechanism of ion conduction and ion selectivity through the selectivity filter of the pore in a statistically based manner, microsecond-timescale molecular dynamics simulations of ion conduction were performed on the Beagle supercomputer. While these timescales are frequently achieved on dedicated machines like DE Shaw Research’s Anton, they are less commonly attained using local clusters and were unheard of in the recent past. The long timescales of the simulation, along with the versatility of using parallel versions of traditional molecular dynamics code (NAMD), allowed us to use a protein that was truncated to provide a model of the selectivity filter in a stiff supporting lattice (figure), rather than requiring a complete pore in a lipid membrane, reducing necessary system size by nearly half. The effect of cations in solution, their concentration and applied voltage on the selective conduction of sodium and potassium through the NavAb selectivity filter has been studied.
Figure 1. An illustration of some genomic interactions that can be inferred in the analysis of a gene pair. Blue solid lines indicate intragenic interactions. Orange dashed lines indicate intergenic interactions.
Figure 2. Examples of significant molecular interactions identified by BGMs in breast invasive carcinoma. A. The identified intragenic interaction network of tumor suppressor gene PTEN. B. The most significant interaction between Androgen Receptor (AR) and KLK3 inferred by BGM. AR is a transcription factor that regulates KLK3 expression. AR has been demonstrated to be a prognostic factor for breast cancer and is considered a target of endocrine therapies for breast cancer.
Yuan Ji,PhD, Lorenzo Pesce,PhD, Yitan Zhu PhD
Complex diseases like cancer are rarely caused by an abnormality in a single gene, but rather reflect perturbations to intracellular molecular interaction networks that attract cells to new malignant and carcinogenic states. Learning the genomic interaction networks is critical to the elucidation of the molecular mechanisms of cancer, identification of cancer genes and pathways, and discovery of network-based biomarkers that improve disease diagnosis and prognosis. We introduce Zodiac, a comprehensive cancer genomic interaction database, based on integrative analyses of The Cancer Genome Atlas (TCGA) data (https://tcga-data.nci.nih.gov/tcga/), which include whole-genome measurements of multiple genomic and epigenomic features, such as DNA sequence, copy number, DNA methylation, gene expression, protein expression, and miRNA expression for thousands of matched cancer patient samples in over 20 types of cancer. We develop and extend novel Bayesian Graphical Models (BGMs) to infer genomic interactions in each pair of genes (total ~200 million gene pairs) for every cancer type using the multi-modal TCGA data.
Figure 1 illustrates potential intragenic and intergenic interactions that can be inferred or validated by our analysis. Intragenic interactions indicate relationships between the genomic and epigenomic features of the same gene, such as repression of transcription by promoter methylation. Intergenic interactions refer to interactions between genetic and epigenetic features of different genes. For example, we can validate whether the protein expression of a transcription factor (TF) is statistically inferred to have a significant positive interaction with the gene expression of its potential target gene in various cancer conditions. Some other potential intergenic interactions that can be identified include protein-protein integrations, miRNA regulation of gene expression, coexpressed or anti-expressed genes/proteins, similar or anti-correlated methylation patterns of genes, and similar or anti-correlated copy number changes of genes. The genomic interaction network is taken as a random variable to be inferred by the proposed BGM. The key idea is to combine prior knowledge (if any) on the interaction networks with the information contained in TCGA data to conduct posterior inference, i.e., which improves the biological relevance of the inference results.
In our analysis, prior knowledge about genomic interaction network is obtained from the following four different sources.
(1) Databases of miRNA-mRNA regulations, such as miRtarbase and TargetScan, which provide miRNA targets curated based on either biological validation experiments or predictions based on complementary sequence matching between miRNA seed region and binding site in mRNA 3’UTR.
(2) Databases of TF-DNA binding, such as ENCODE, which provides TF ChIP-seq data in various cell lines including some cancer cell lines, as well as TRANSFAC and JASPAR, which provide predictions of TF binding sites based on sequence motif analysis.
(3) “Central dogma” of intragenic interactions, such as gene copy number amplification increases gene expression and gene expression upregulation increases protein expression.
(4) Other comprehensive databases about molecular interactions, including pathway databases, such as KEGG Pathway and Pathway Commons, and protein-protein interaction databases, such as HPRD, IntAct, MINT, and BioGRID.
Markov chain Monte Carlo sampling is used to fit the BGM to the multi-platform TCGA data for interaction network analysis, which takes about 50 seconds in average for the analysis of one gene pair. The analyses of gene pairs are independent from each other and can be performed in parallel, which forms an idea computation task to be executed on Beagle. The total computation time of all gene-pair analyses (~200 million) is about 110,000 node hours on Beagle. We finished the first round analysis of all gene pairs and are currently in the process of screening the analysis results for biological findings. Figure 2 shows examples of inferred biologically plausible genomic interaction networks.
Figure 1. Cost of sequencing. Prior to 2007, the cost of sequencing (green line) followed the same scaling as Moore’s law (white line). The advent of next generation sequencing technology has caused a dramatic shift in that the bottleneck for whole genome analysis is no longer sequencing cost/time, but computational efficiency.
Figure 2. Time Constraints of MegaSeq. Steps of MegaSeq workflow are shown along the X-axis. Wall time (read squares), CPU time for 1 genome (blue triangles) and CPU time for 240 genomes (green triangles) are plotted. Beagle’s large size and memory allow for massively parallel analysis of genomic data.
Megan Puckelwartz, PhD, Lorenzo Pesce, PhD, Elizabeth McNally, MD, PhD
Megan Puckelwartz and Lorenzo Pesce explore the cost of sequencing an entire human genome is now moving into the range where it is being broadly applied in both the research and clinical setting. Whole genome analysis (WGA) requires the alignment and comparison of raw sequence data, with approximately 125GB of data generated per individual genome. While the cost of generating such data has declined dramatically, a computational bottleneck exists limiting accuracy and efficiency to analyze multiple genomes simultaneously. We now adapted Beagle, the Cray XE6 supercomputer at Argonne National Laboratories and the Computational Institute, to achieve the parallelization necessary for concurrent multiple genome analysis and named this workflow Megaseq. Compared to other rapid approaches for WGA, Megaseq produces more usable sequence per genome from the same raw data. Megaseq relies on publicly-available software packages, and using MegaSeq WGA of as many as 240 genomes, from fastq extraction through variant calling, can be completed in approximately 50 hours. This achievement is poised to greatly facilitate defining the range and utility of human genome variation.
Time Constraints: Computational time is a major bottleneck in the analysis of whole genome sequence data. We calculated CPU (central processing unit) time based on a 2.1GHz processor at 2208 hours for a single genome. This time can vary based on clock speed, memory speed and disk speed. These calculations are bound by disk and memory more than by CPU. We suggest that these single genome times push the envelope for the maximum speed for the Beagle supercomputer. A total of ~37 years of CPU time is required to complete the workflow for 240 genomes. The MegaSeq workflow can be completed on 240 genomes in 50.4 real time hours using Beagle (Figure 2). Space Constraints: Space constraints are another major hurdle in large volume whole genome analysis. We estimate that approximately 1TB of space, per genome, was needed to complete the MegaSeq analysis .While it is possible to discard output from previous steps, there is still a large volume of data that is active at any one-time. During the cleanup stages, each genome uses approximately ~85GB of space; roughly 5TB of space is needed to process each step for 61 genomes. These space requirements can generally not be met by small clusters or by large clusters with many users. The final merged VCF file for each patient is approximately 1GB. We employed Beagle to improve the throughput and efficiency of WGA, using freely available, publicly available software packages. MegaSeq provides a workflow by which, whole genome sequencing can be rapidly analyzed relieving the computational bottleneck that currently exists in analyzing human genomic variation.
Raphael Lee, Paul Liu, Ana Marija Sokovic
Electrical trauma can result from direct contact with a variety of electricity sources, such as exposed parts of electrical appliances or wiring, high-voltage power lines and lightning. While some electrical shocks may result in minor burns, there still may be serious internal damage, which can produce a complex pattern of injury and clinical manifestations. High-voltage electrical trauma produces some of the most devastating physical injuries. A typical high-voltage electric shock cause massive damages in skeletal muscles, blood vessel and peripheral nerves that can lead to repeated removal of the injured tissues, extensive rehabilitation and limb amputation rates as high as 75% . Due to the variability of electrical shock scenarios, it is almost impossible to precisely diagnose the patient’s tissue injury at the time of admission. Therefore, a computer-aided electric shock simulation will provide important insight into the tissue damages and improve the clinical management for electrical trauma.
Tissue injury mechanisms due to electric shock include Joule heating and cell membrane electroporation. We describe a worst-case hand-to-hand high-voltage electrical trauma model that takes both mechanisms into account. Electric field and Joule heating along the tissues in the human upper limb are presented by solving Laplace and Pennes’ bioheat equations in a 3-D mesh. Our simulation shows a 7.2k-Volt electric shock with duration of 1-second can cause severe muscle damage in distal forearm. We also evaluated several post-shock treatment methods and found that ice cooling applied immediately post shock reduces the chance of tissue damage by more than 70%. The analysis of electric shock provides insight into the mechanisms of tissue damage and guidance to the development of protection gear and treatment of electric trauma.---------------------------------------------------------------------------------------------------------------------------------------------
The model is built upon a 1-mm-resolution human upper extremity MRI image and implemented with thermal and electrical properties of various tissues. We use Beagle to calculate the electric field and temperature distribution of 2.5 million elements over more than 7000 time steps by solving multiple electromagnetic and bioheat equations. The simulation also shows the kinetics of tissue damage caused by plasma membrane electroporation and Joule heating from the shock phase to one hour post shock.
- We presented an electrical shock trauma model (including both electroporation and Joule heating) for worst-case hand-to-hand or hand-to-feet scenarios on 3-d human upper limb mesh
- For a 7.2kV 60-Hz AC shock with 1-second duration, electroporation is the dominant mechanism for muscle damage in shock phase whereas thermal injury dominates the post-shock phase.
- Ice cooling, if applied immediately, is effective in reducing tissue injury
Warning: Invalid argument supplied for foreach() in /www/vhosts/beagle.ci/wp-content/plugins/responsive-accordion-and-collapse/front/ac-content.php on line 58
- A c-Myc-regulated stem cell-like signature in high-risk neuroblastoma: A systematic discovery (Target neuroblastoma ESC-like signature).
Yang XH, Tang F, Shin J, Cunningham JM.
Sci Rep. 2017 Dec;7(1):41. doi: 10.1038/s41598-017-00122-x. Epub 2017 Mar 3.
- Changes in Cortical Network Connectivity with Long-term Brain-Machine Interface Exposure in Chronic Amputees
K.Balasubramanian, M.Vaidya, J.Southerland, et al.
Nature Communications, (under revision)
- Parallel simulated annealing using an adaptive resampling interval.
Z. Lou, J. Reinitz
Parallel Computing; 53: 23-31
- Supercomputing ulcerative colitis-associated cancer simulations to bridge mechanism with epidemiology
R.C. Cockrell, M. E. Stack, G. An
Digestive Disease Week, San Diego, CA, 5/21/2016
- Characterizing the behavioral landscape of sepsis: supercomputing simulation of 40 million in silico patients
Society of Critical Care Medicine Annual Congress, Orlando, FL 02/20/2016
- Supercomputing sepsis simulations for in silico outcome prediction
Academic Surgical Congress 2016, Jacksonville, FL 02/03/2016
- Investigating the development of ulcerative colitis-associated cancers through an agent-based model
Academic Surgical Congress 2016, Jacksonville, FL 02/03/2016
- The Cardiac TBX5 Interactome Reveals a Chromatin Remodeling Network Essential for Cardiac Septation
L. Waldron, J.D. Steimle, T.M. Greco et al.
Developmental Cell 2016; 36(3), p262–275, 8 February 2016
- Multiscale Simulations Reveal Key Aspects of the Proton Transport Mechanism in the ClC-ec1 Antiporter
S. Lee, J.M. Swanson, G.A. Voth
Biophys J. 2016; 110(6), 1334-45
- Computationally Efficient Multiscale Reactive Molecular Dynamics to Describe Amino Acid Deprotonation in Proteins
S. Lee, R. Liang, G.A. Voth, J.M. Swanson
J Chem Theory Comput. 2016; 12(2), 879-91
- Comparative analysis of the genomic landscape of breast cancers from women of African and European ancestry
Olopade OI, Pitt JJ, Riester M, Odetunde A, Yoshimatsu T et al.
SABCS 2016, San Antonio, Texas, December 2016
- The Activation of c‑Src Tyrosine Kinase: Conformational Transition Pathway and Free Energy Landscape
Mikolai Fajer, Yilin Meng and Benoît Roux
J. Phys. Chem. B, Article ASAP, October 7, 2016
- The selectivity of the Na+/K+-pump is controlled by binding site protonation and self-correcting occlusion
Huan Rui, Pablo Artigas, Benoît Roux
- Integrative genomics analyses unveil downstream biological effectors of disease-specific polymorphisms buried in intergenic regions
H. Li, I. Achour, L. Bastarache et al.
npj Genomic Medicine 1:16006, 2016. doi:10.1038/npjgenmed.2016.6.
- Multiscale simulations reveal the proton transport mechanism in the ClC-ec1 antiporter.Lee, S.; Swanson, J. M. J.; Voth, G. A., Biophys. J. (submitted) 2015
- Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition TimesYanxun Xu, Peter Mueller, Abdus S. Wahed, Peter F. Thall Journal of American Statistical Association (2015)
- Causal Network in a Deafferented Non-Human Primate Brain37 th Annual international conference of IEEE Engineering in Medicine and Biology Society (EMBC), Milan, August 2015 Karthikeyan Balasubramanian, Kazutaka Takahashi, Nicholas Hatsopoulos
- Computational study of the "DFG-flip" conformational transition in c-Abl and c-Src tyrosine kinasesJ Phys Chem B (2015) Meng, Y., Lin, Y. L., Roux, B.
- DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural FieldsInt. J. Mol. Sci. 2015 Sheng Wang, Shunyan Weng, Jianzhu Ma and Qingming Tang
- Computational study of the "DFG-flip" conformational transition in c-Abl and c-Src tyrosine kinasesJ Phys Chem B (2015) Meng, Y., Lin, Y. L., Roux, B.
- Ethnic-specific associations of rare and low-frequency DNA sequence variants with asthmaNat Commun (2015) Igartua, C., Myers, R. A., Mathias, R. A., Pino-Yanes, M., Eng, C., Graves, P. E., Levin, A. M.
- Large-scale spatiotemporal spike patterning consistent with wave propagation in motor cortexNature Communications 2015 Kazutaka Takahashi,Sanggyun Kim,Todd P. Coleman,Kevin A. Brown, Aaron J. Suminski, Matthew D. Best& Nicholas G. Hatsopoulos
- Identification of epigenetic modifications that contribute to pathogenesis in therapy-related AML: Effective integration of genome-wide histone modification with transcriptional profilesBMC Medical Genomics 2015 Xinan (Holly) Yang, Bin Wang, John M Cunningham
- Large-scale spatiotemporal spike patterning consistent with wave propagation in motor cortex.Nature Communication Kazutaka Takahashi, Sanggyun Kim, Todd P. Coleman, Kevin A. Brown, Aaron J. Suminski5, Matthew D. Best & Nicholas G. Hatsopoulos
- An analysis of biomolecular force fields for simulations of polyglutamineBiophysical Journal Aaron M. Fluitt and Juan J. de Pablo
- Investigation of inflammation and tissue patterning in the gut using a spatially explicit general-purpose model of enteric tissue (SEGMEnT).PLoS computational biology 10, no. 3 (2014): e1003507 Cockrell, Chase, Scott Christley, and Gary An
- Towards Anatomic Scale Agent-Based Modeling with a Massively Parallel Spatially Explicit General-Purpose Model of Enteric Tissue (SEGMEnT_HPC).PloS one 10, no. 3 (2014): e0122192-e0122192 Cockrell, Robert Chase, Scott Christley, Eugene Chang, and Gary An
- Zodiac: A comprehensive depiction of genetic interactions in cancer by integrating TCGA dataJournal of The National Cancer Institute, Accepted Y. Zhu, Y. Xu, D.L. Helseth, K. Gulukota, S. Yang, L.L. Pesce, R. Mitra, P. Müller, S. Sengupta, W. Guo, J.C. Silverstein, I. Foster, N. Parsad, K.P. White, Y. Ji
- Targeted analysis of whole genome sequence data to diagnose genetic cardiomyopathyCirc Cardiovasc Genet. 2014 Golbus JR, Puckelwartz MJ, Dellefave-Castillo L, Fahrenbach JP, Nelakuditi V, Pesce LL, Pytel P, McNally EM
- Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web ServicesConcurr Comput (2014) Madduri, R. K., Sulakhe, D., Lacinski, L., Liu, B.
- Determinants of fluidlike behavior and effective viscosity in cross-linked actin networksBiophys J (2014) Kim, T., Gardel, M. L., Munro, E.
- 'N-of-1-pathways' unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicineJ Am Med Inform Assoc (2014) Gardeux, V., Achour, I., Li, J. Maienschein-Cline, M.Li, H., Pesce, L.
- Loss of conformational entropy in protein folding calculated using realistic ensembles and its implications for NMR-based calculations.Proc Natl Acad Sci U S A. 2014 Baxa MC1, Haddadian EJ2, Jumper JM3, Freed KF4, Sosnick TR5
- About 20,000,000 core-hours of high throughput computations were conducted on the Beagle GLOBUS. The study was supported in part by the NIH grants UL1TR000050 (University of Illinois CTSA),1S10RR029030-01 (BEAGLE Cray Supercomputer, K22LM008308), and NCI P30CA023074 grant of the University of Arizona Cancer Center. Haiquan Li, Ikbel Achour, Vincent Gardeux, Jianrong Li, Lorenzo Pesce, Younghee Lee, Xinan Yang, Mark Maienschein-Cline, Roger Luo, Ian Foster, Jason H. Moore, Yves A. Lussier
- Foxf Genes Integrate Tbx5 and Hedgehog Pathways in the Second Heart Field for Cardiac SeptationPLOS Genetics Andrew D. Hoffmann, Xinan Holly Yang, Ozanna Burnicka-Turek1, Joshua D. Bosman
- Molecular mechanism for differential recognition of membrane phosphatidylserine by the immune regulatory receptor Tim4.Proc Natl Acad Sci U S A Tietjen, G. T. Gong, Z. Chen, C. H. Vargas
- Escherichia coli peptidoglycan structure and mechanics as predicted by atomic-scale simulations.PLOS Comput Biol Gumbart, J. C. Beeby, M. Jensen, G. J. Roux, B.
- Characterization of nonlinear propagation waves of local field potentials in motor cortical areasto be submitted to IEEE EMBC 2014. J Chemili and K. Takahashi
- Spatiotemporal dynamics of motor cortical network using ECoG signalsto be submitted to IEEE EMBC 2014. H. Watanabe, K. Takahashi, T. Isa
- Multi-modal Decoding: Identifying Information Redundancy between Spike Trains and Sub-dural Electrocorticogram Signalsto be submitted to IEEE EMBC 2014. K. Balasubramanian, K. Takahashi, M. Slutzky, N. Hatsopoulos
- Mixed performance of biomolecular force fields in reproducing the properties of polyglutamine in solution.To be submitted, 2014 Fluitt, A.M. and de Pablo, J.J.
- TCGA-Assembler: An Open-Source Pipeline for TCGA Data Downloading, Assembling, and Processing. (Submitted, 2014). Y. Zhu, P. Qiu, Y. Ji.
- Compiler Optimization for Extreme-Scale Scripting To appear at CCGrid 2014 Doctoral Symposium, 2014 T.G. Armstrong, J.M. Wozniak, M. Wilde, I.T. Foster
- Compiler Optimization for Data-Driven Task Parallelism on Distributed Memory Systems Argonne National Laboratory Tech Report ANL/MCS-P5080-0214, 2014. T.G. Armstrong, J.M. Wozniak, M. Wilde, I.T. Foster
- Atomistic simulation of polyglutamine in solution. American Institute of Chemical Engineers Annual Meeting. San Francisco, CA. (11/6/2013) A.M. Fluitt, J.J. de Pablo
- Differences in dynamics and stability of the wild type β-Amyloid Aβ(1-40) and ΔE22-Aβ(1-39) (Japanese) protofibril structures, a Molecular Dynamics Study. Biophysical society 58th annual meeting. February 2014. San Francisco, CA. K.W. Johnson, T.R. Sosnick, K.F. Freed, E.J. Haddadian.
- Peptidoglycan Structure and Mechanics as Predicted by Atomic-Scale Simulations.PLOS Computational Biology.10(2), (2014). J.C. Gumbart, Morgan Beeby, G.J. Jensen, B. Roux.
- Computational analysis of the binding specificity of Gleevec to Abl, c-Kit, Lck, and c-Src tyrosine kinases.J Am Chem Soc. 135(39), 2013. Y. Lin, B. Roux.
- Locking the active conformation of c-Src kinase through the phosphorylation of the activation loop.J Mol Biol., 426(2), 2014. Y. Meng, B. Roux.
- Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms.Computational and Mathematical Methods in Medicine, vol. 2013, Article ID 182145, 10 pages, 2013. L.L. Pesce, H.C. Lee, M. Hereld, S. Visser, R.L. Stevens, A. Wildeman, W. van Drongelen
- Clinical Predictors of Port Infections within the First 30 Days of Placement.J Vasc. Interv. Radiol. 2014;25:419–423 R. Bamba, J.M. Lorenz, A.J. Lale, B.S. Funaki, S.M. Zangan.
- Supercomputing for the parallelization of whole genome analysis.Bioinformatics. (In press, 2014.) M.J. Puckelwartz, L.L. Pesce, V. Nelakuditi, L. Dellefave-Castillo, J.R. Golbus, S.M. Day, T.P. Cappola, G.W. Dorn II, I.T. Foster and E.M. McNally.
- A molecular mechanism for differential recognition of membrane phosphatidylserine by the immune regulatory receptor Tim4.PNAS. (In press, 2014). G.T. Tietjen, Z. Gong, C.-H. Chen, E. Vargas, J.E. Crooks, K.D. Cao, J.M. Henderson, C.T.R. Heffern, M. Meron, B. Lin, B. Roux, M.L. Schlossman, T.L. Steck, K.Y. C. Lee and E. J Adams.
- Constraints and potentials of future irrigation water availability on agricultural production under climate change Joshua Elliott, Delphine Deryng, Christoph Müller, Katja Frieler, Markus Konzmann, Dieter Gerten, et al. in Proceedings of the National Academy of Sciences of the United States of America (2013)
- Subgroup-Based Adaptive (SUBA) Designs for Multi-Arm Biomarker Trials Yanxun Xu, Lorenzo Trippa, Peter Muller, Yuan Ji in Statistics in Biosciences (2014)
- Molecular Dynamics Studies of Ion Conduction in a Prokaryotic Channel Karen M Callahan, Benoit Roux in Biophysj (2014)
- Multiscale Reactive Molecular Dynamics for Absolute pK a Predictions and Amino Acid Deprotonation
J.G. Nelson, Y. Peng , D.W. Silverstein, J.M. Swanson
J Chem Theory Comput. 10(7): 2729-37
- Simplified Protein Models: Predicting Folding Pathways and Structure Using Amino Acid Sequences A Adhikari, KF Freed, Tobin R. Sosnick in Phys Rev Lett (2013)
- Fast and accurate pedigree-based imputation from sequenced data in a founder population by Oren E. Livne, L. Han, G. Alkorta-Aranburu, William Wentworth-Sheilds, Lorenzo L. Pesce, Carole Ober, et al. in American Society of Human Genetics (2013)
- Wave propagation of beta oscillations in motor cortex and its relation to spatiotemporal spiking patterns and behavior by Kazutaka Takahashi in CiNet (2013)
- Kinematically defined state dependent changes in functional connectivity of neurons and in local field potentials in orofacial motor cortex during chewing and swallowing by Kazutaka Takahashi in Motor control of feeding behaviors session, at Neuro2013 (2013)
- Large scale spatiotemporal spike patterning associated with wave propagation of beta oscillation in motor cortex by Kazutaka Takahashi in Neural Oscillation Conference (2013)
- Development of the Charmm Drude Polarizable Force Field for the Study of Ion Interactions in Biological Systems by Hui Li, Karen Callahan, Jr. Mackerell, Alexander D., Benoît Roux in Biophysical Journal (2013)
- Explaining why Gleevec is a specific and potent inhibitor of Abl kinase by Yen-Lin Lin, Yilin Meng, Wei Jiang, Benoît Roux in Proceedings of the National Academy of Sciences of the United States of America (2013)
- Architecture and assembly of the Gram-positive cell wallMol Microbiol (2013) Beeby, M., Gumbart, J. C., Roux, B., Jensen, G. J.
- Ion conduction and selectivity in an open pore sodium channel by Karen M. Callahan, Benoît Roux in XSEDE12 Conference, Extreme Science and Engineering Discovery Environment (2012)
- Computational study of the prokaryotic sodium channel by Karen M.Callahan, Benoît Roux in Biophysical journal (2013)
- Coarse-Graining Provides Insights on the Essential Nature of Heterogeneity in Actin Filaments by Jun Fan, Marissa G. Saunders, Gregory A. Voth
- Discovering RNA-Protein Interactome by Using Chemical Context Profiling of the RNA-Protein Interface by Marc Parisien, Xiaoyun Wang, George Perdrizet, Corissa Lamphear, Ketan C Fierke Carol A Maheshwari, Michael J Wilde, Tobin R Sosnick, Tao Pan
- Supercomputing for the parallelization of whole genome analysis Megan J. Puckelwartz, Lorenzo Pesce, Viswateja Nelakuditi, Sharlene Day, Thomas Cappola, Euan Ashley, et al. in In Preparation
- Molecular autopsy for sudden death using whole genome sequencing Megan J. Puckelwartz, Lisa Dellefave-Castillo, Don Wolfgeher, Viswateja Nelakuditi, Mark T. Campbell, Jessica R Golbus, in In Preparation
- Validation of bootstrap-based comparison of ROC curves on partially-paired data sets Y. Peng, L. Pesce, Y. Jiang in SPIE (2013)
- Computational Modeling of α/β T-Cell Receptor Loop Conformations James Crooks, Ridgway Scott, Erin Adams in 57th Annual Meeting of the Biophysical Society (2013)
- Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms Lorenzo L. Pesce, Hyong C. Lee Lee, Mark Hereld, Sid Visser, Rick L Stevens, Wim van Drongelen in TBD (2013)
- Determinants of Fluid-Like Behavior and Effective Viscosity in Cross-Linked Actin Networks Taeyoon Kim, Margaret L. Gardel, Ed Munro in PLOS Comput Biol (2013)
- Numerical Solution of Dynamic Portfolio Optimization with Transaction Costs Y Cai, K.L. Judd, R. Xu (2013)
- A statistically defined anthropomorphic software breast phantom by Beverly A Lau, Ingrid Reiser, Robert M Nishikawa, Predrag R Bakic Medical Physics (2012)
- Analysis of cardiomyopathy using whole genome sequencing E. M. McNally, J. R. Golbus, L. L. Pesce, D. Wolfgeher, L.Dellefave-Castillo, M. J. Puckelwartzin 62nd Annual Meeting of The American Society of Human Genetics (2012)
- Computer-aided image analysis and detection of prostate cancer using immunostaining for alpha-methylacyl-CoA racemase, p63, and high-molecular-weight cytokeratin Y Peng, Y Jiang, XJ Yang in Machine Learning in Computer-Aided Diagnosis: Medical Imaging Intelligence and Analysis
- Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory Haiquan Li, Younghee Lee, James L. Chen, Ellen Rebman, Jianrong Li, Yves A. Lussier in Journal of American Medical Informatics Association (2012)
- Context and Force Field Dependence of the Loss of Protein Backbone Entropy upon Folding Using Realistic Denatured and Native State Ensembles MC Baxa, EJ Haddadian, AK Jha, KF Freed, Sosnick TR in Journal of the American Chemical Society (2012)
- Numerical Simulations of Strong Incompressible Magnetohydrodynamics Turbulence J Mason, Perez, Boldyrev, F Cattaneo in Physics of Plasmas (2012)
- A network of subpopulation of neurons in primary motor cortex and beta oscillation waves exhibit similar spatiotemporal patterns K. Takahashi, K. Brown, S. Kim, T. Coleman, N.G. Hatsopoulos in Nanosymposium on Signal Propagation, The annual meeting, Society for Neuroscience (2012)
- Granger causality analysis of functional connectivity of spiking neurons in orofacial motor cortex during chewing and swallowing K. Takahashi, L. Pesce, J. Iriarte-D ́ıaz, S. Kim, T.P. Coleman, N.G. Hatsopoulos, in IEEE Conference Engineering in Medicine and Biology Society (2012)
- Granger causality analysis of state dependent functional connectivity of neurons in orofacial motor cortex during chewing and swallowing K Takahashi, L. Pesce, M. Best, J. Iriarte-Dıaz, S. Kim, T.P. Coleman, et al. in IEEE the 6th International Conference on Soft Computing and Intelligent Systems (2012)
- Discovering RNA-protein interactome using chemical context profiling of the RNA-protein interface M Parisien, X Wang, G II Perdrizet, C Lamphear, CA Fierke, KC Maheshwari, et al.
- On docking, scoring and assessing protein-DNA complexes in a rigid-body framework M Parisien, KF Freed, TR Sosnick in PloS one (2012)
- Growing Point-to-Set Length Scale Correlates with Growing Relaxation Times in Model Supercooled Liquids Glen Hocky, Thomas Markland, David Reichman in Physical Review Letters (2012)
- Evaluating Communication Performance in IBM BG/Q and Cray XE6 Supercomputing Systems H. Bui, V. Vishwanath, J. Leigh, M. E. Papka in Proceedings of the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2012) (2012)
- FLASH simulations of 120MJ target explosions in LIFE reactor chamber Ryan Sacks, Gregory Moses, Milad Fatenejad in 54th Annual Meeting of the APS Division of Plasma Physics (2012)
- ELUCIDATING THE MOLECULAR DETAILS OF PHOSPHATIDYLSERINE MEMBRANE RECOGNITION IN IMMUNE RESPONSE Gregory T. Tietjen, Zhiliang Gong, Chiu-Hao Chen, James Crooks, Ernesto Vargas, Kathleen Cao, et al. in Biophysical Society Annual Meeting (2012)
- De novo prediction of protein folding pathways and structure using the principle of sequential stabilization AN Adhikari, KF Freed, TR Sosnick in P Natl Acad Sci USA (2012) Two gene co-expression modules differentiate psychotics and controls C Chen, L Cheng, K Grennan, F Pibiri, C Zhang, J a Badner, et al. in Molecular psychiatry (2012)
- Dynamic mechanisms of cell rigidity sensing: insights from a computational model of actomyosin networks Carlos Borau, Taeyoon Kim, Tamara Bidone, José Manuel García-Aznar, Roger D Kamm in PloS one (2012)
- Mechanisms of contractility and coarsening of active cytoskeletal networks Taeyoon Kim, Margaret L. Gardel, Ed Munroe in Biophysical Society Meeting (2012)
- Structural reorganization of active actin networks via competition between force generation and dissipation Taeyoon Kim, Ed Munro, Margaret L. Gardel in BMES annual meeting (2012)
- Determination of membrane-insertion free energies by molecular dynamics simulations Gumbart, J., Roux, B. in Biophys J (2012)
- Scaling Behavior Of Computational Model of Neocortex Hyong C. Lee Lee, Mark Hereld, Sid Visser, Lorenzo L Pesce, Wim van Drongelen in 4th International Meeting on Epilepsy Research (2011)
- A. N. Adhikari, J. DeBartolo, K. F. Freed, and T. R. Sosnick, "Ab initio protein structure prediction aided by sequential stabilization of tertiary structure," poster, Protein Folding Symposium. Berkeley, California, USA, June, 2011.
- F. Cattaneo, "Magneto-Rotational Turbulence and Dynamo Action," Mini-conference on Understanding Astrophysical Dynamos, 53rd Annual Meeting of the APS Division of Plasma Physics, Salt Lake City, Utah, USA, November, 2011.
- A. Dickson, M. Maienschein-Cline, A. Tovo-Dwyer, J. R. Hammond, and A. R. Dinner, "Flow-Dependent Unfolding and Refolding of an RNA by Nonequilibrium Umbrella Sampling," Journal of Chemical Theory and Computation, v. 7(9), pp. 2710-2720, 2011.
- R. Jamieson, "Grid Based Quantitative Breast Image Analysis," poster, Era of Hope 2011, (DoD Fellowship Award meeting).
- R. Jamieson, K. Drukker, M. L. Giger, "Use of Data Reduction Techniques for the Visualization of Computer-Extracted Lesion Features in the Diagnostic Interpretation of Breast Cancer," Radiological Society of North America, 2011.
- H. C. Lee, M. Hereld, S. Visser, L. Pesce, and W. van Drongelen, "Scaling Behavior Of Computational Model of Neocortex," (poster) 4th International Meeting on Epilepsy Research, Chicago, IL, May 2011.
- M. Parisien, T. R. Sosnick, T. Pan, "Systematic prediction and validation of RNA-protein interactome," (poster), RNA Society, Kyoto, June 2011.
Glen Hocky of the Reichman Group at Columbia is evaluating a new cavity method for theoretical chemistry studies of how and why a liquid becomes a glass. This fundamental topic of statistical mechanics and is described as "the deepest and most interesting unsolved problem in solid state theory". Beagle was used via Swift to calculate from simulation what is known as the "mosaic length", where particles are simulated by molecular dynamics or Monte Carlo methods within cavities having amorphous boundary conditions. This work has implications in many fields, including biology, biophysics, and computer science. The framework used in these Beagle runs is the same as one popular theory for understanding protein folding, discussed in highly cited works by Wolynes and collaborators (Spin glasses and the statistical mechanics of protein folding. Bryngelson and Wolynes. PNAS November 1, 1987 vol. 84 no. 21 7524-7528.)
Joshua Elliott (University of Chicago) in the NSF Center on Robust Decision making for Climate and Energy Policy (RDCEP) is developing a large-scale integrated modeling framework for decision makers in climate and energy policy. The project has used Beagle to study land use, land cover, and the impacts of climate change on agriculture and the global food supply using a crop systems model called DSSAT (decision support system for agro-technology transfer.) Benchmarks of this model have been performed on a simulation campaign, measuring yield and climate impact for a single crop (maize) across the US with daily weather data and climate model output spanning 1981-2100 with 16 configurations of fertilizer and irrigation, and cultivar. Initial results have been presented in an advisory board meeting of the RDCEP project. Each simulation runs 125,000 DSSAT models on Beagle using Swift. This work has the potential to help decision makes understand the tradeoffs in decisions on climate and energy.
The Paul Fisher’s group (Argonne National Laboratory) has been developing, porting, and testing the NEK5000 code on Beagle. The main purpose of their work is to develop large-scale, high-fidelity simulations of fluid flow in order to understand phenomena such as transitions in vascular flow, thermal hydraulics of nuclear reactor cores, and magneto-hydrodynamics of accretion disks. NEK5000 was ported to Beagle and timing and scaling studies on various MPI operations were conducted successfully. Fischer’s group conducted preliminary studies of the development and saturation of magneto-rotational instability (MRI) and MRI-driven dynamo, believed to be responsible for the loss of orbital momentum and energy in accretion disks powering the most energetic phenomena in the universe (see image). These results were presented as Fausto Cattaneo, "Magneto-Rotational Turbulence and Dynamo Action," at the Mini-conference on Understanding Astrophysical Dynamos, 53rd Annual Meeting of the APS Division of Plasma Physics, November 2011. Parametric studies of these phenomena are planned.
Azimuthal magnetic field in MHD simulations of large-scale MRI-driven dynamo with NEK5000.
Illustration of the network morphology with different levels of substrate stiffness (E) and showing the details of motors and ACPs. (A) An actin network before contraction, consisting of actin filaments (cyan) cross-linked by ACPs (green) and molecular motors (red). (B) Cross-sections of the network at steady states for three different values of E showing morphology and the magnitudes of extensional forces ( ). The network contracts to different degrees and at different rates depending on E. Soft substrates (lower E) lead to a condensed network, whereas stiff substrates (higher E) contract very little, resulting in a heterogeneous network with tensed filaments.
Taeyoon Kim’s group (University of Chicago) explore how a cell contracts against an external matrix using a computational model incorporating actin filaments, actin cross-linking proteins, and molecular motors. Results illustrate a novel mechanism for rigidity-sensing, which depends on the distance a motor walks along an actin filament to generate contractile force.Cells modulate themselves in response to the surrounding environment like substrate elasticity, exhibiting structural reorganization driven by the contractility of cytoskeleton. The cytoskeleton is the scaffolding structure of eukaryotic cells, playing a central role in many mechanical and biological functions. It is composed of a network of actins, actin cross-linking proteins (ACPs), and molecular motors.
Aaron Dinner’s group (University of Chicago) tested new software that they developed on Beagle. This software enables more than a thousand fold speedup of simulation of molecular processes in living systems. The specific process studied was the folding of an RNA that must take a well-defined shape to perform its biological function.
Example of an unfolding intermediate harvested at high flow. The RNA is tethered in a microfluidic channel at the point indicated by the gray sphere. Colored lines show major contacts that are broken. (Credit: Dinner)
Rick Stevens’ group (University of Chicago & Argonne National Laboratory) has used Beagle to build flux balance analysis (FBA) based metabolic models for all sequenced microbes (over 4,000 models). These models have been used to computationally predict essential genes in hundreds of pathogens, thus enabling deeper understanding of potential targets for antimicrobial therapies. These targets are being made available to the public via the ModelSEED web site (http://seed-viewer.theseed.org/seedviewer.cgi?page=ModelView)
Reconstructing Models in High Throughput: Model SEED (Credit: Henry and Stevens)
Example of feature extraction followed by dimensionality reduction before the training of a classifier for the identification of breast cancer lesions. (Credit: Giger)
Maryellen Giger’s group (University of Chicago) works on Advanced Breast Cancer Image Analysis and Computer Aided Diagnosis (CAD). They explore state-of-the-art techniques for recovering complex, non-linear structures from high-dimensional data, with the specific goal of relating disease and information extracted from radiologically imaged breast tissue. Using Beagle, they have investigated the properties of various dimensionality reduction schemes and their influence on CAD engine performance, and a manuscript is in the final phases of preparation. Overall, CAD is a tool that is expected to help improve performance and consistency for radiologists. This has the potential to improve both the speed and accuracy of cancer diagnoses, reducing unnecessary biopsies and avoiding false negatives, and it may later impact diagnoses in other areas where imaging plays an important role. Other research that is part of this project involved development and testing of reconstruction methods for CT and MRI scans, analysis of MRI images based on contrast agents or dynamic data, which is promising for breast cancer diagnosis. Other lines of research involve an investigation of the potential of simulation-based methods for the optimization of breast tomosynthesis.
The collaborating groups of Karl Freed and Tobin Sosnick (University of Chicago) use Beagle to develop and validate new protein structure and folding pathway prediction algorithms that start from the amino acid sequence, without the use of templates. Their approach employs novel, compact molecular representations and innovative "moves" of the protein backbone to achieve accurate prediction with far less computation than previous methods. One of their methods, called "loop modeling", which rebuilds local protein structures, was highly ranked in the international CASP9 competition (Critical Assessment of Protein Structure Prediction). The group is using Beagle to evaluate loop modeling algorithm on new proteins and parameter spaces, to develop a new "tertiary fixing" algorithm, and to compare these algorithm to other approaches. Computational prediction is important for proteins whose structure has not yet been, or cannot be, determined by experimental methods. These largely unsolved problems are crucial for basic biological research and for applications in national priority areas, such as pathogens, biofuels, remediation, ecosystems, metagenomics, crops, and nutrition.
Tobin Sosnick’s group developed and evaluated a new, more accurate and efficient predictive methods for large-scale explorations of RNA-protein interactions, yielding new insights into cellular networks. Their "modFTDock" application (M. Parisien et al.) can identify protein molecules having a high structural affinity towards a specific RNA molecule. Non-coding RNAs often function in cells through specific interactions with their protein partners. Experiments alone cannot provide a complete picture of this interaction. No existing computing method was previously able to provide such genome-wide predictions of docked RNA-protein complexes. Using Beagle, the group was able to process 1,500 proteins using the Swift parallel scripting language. This work was presented in a poster at the 2011 annual RNA Society meeting: Parisien M., Sosnick T.R., Pan T., "Systematic prediction and validation of RNA-protein interactome," RNA Society, Kyoto, June 2011, http://www.ci.uchicago.edu/~wilde/RNA2011-poster.pdf. A manuscript is in progress.
The Jinbo Xu’s group (University of Chicago) is developing continued enhancements to a "template based" method of protein structure modeling that he has embodied in the "Raptor" family of structure prediction applications. These methods give excellent results for proteins where sufficient homologies can be found among proteins of known structure. The methods have scored excellently among template-based approaches in the international CASP assessment process. Results are described in two submitted papers that acknowledge Beagle: 1) Jianzhu Ma, Jian Peng, Sheng Wang and Jinbo Xu, "A Conditional Neural Fields model for protein threading," submitted to RECOMB 2012; to be submitted to the Bioinformatics Journal and 2) Feng Zhao and Jinbo Xu, "Position-specific distance-dependent statistical potential for protein folding and assessment," submitted to PNAS. Computational prediction is important for proteins whose structure has not yet been, or cannot be, determined by experimental methods. These largely unsolved problems are crucial for basic biological research and for applications in national priority areas, such as pathogens, biofuels, remediation, ecosystems, metagenomics, crops, and nutrition.
The Roux Group (Benoit Roux, University of Chicago, with Jim Phillips of the UIUC NAMD group) led work to port and tune NAMD for the Beagle XE6 system and its Gemini interconnect. NAMD is one of the flagship "workhorse" applications of molecular dynamics for biological applications. The results characterized Beagle’s NAMD performance as "excellent scaling, comparing well to (other leading) machines like TACC’s Lonestar, completely obliterating older ones like Ranger, and making our old rule-of-thumb of aiming for 1000 atoms/core outdated". The effort achieved multiple nanoseconds of simulated time/day, excellent performance for a general purpose NAMD. This work benefits many applications, as this code is widely used. In addition, the Roux Group (with J. C. Grubart) is using Beagle to perform molecular dynamics simulations to quantify the free energy of the insertion of membrane proteins and determine all of the contributions to that energy. These results help to resolve a longstanding experiment-simulation disagreement on the magnitude of the free energies for different amino acids. This work has been submitted to Biophysical Journal as James Gumbart and Benoit Roux, "Determination of membrane-insertion free energies by molecular dynamics simulations," 2011.
Andrey Rzhetsky (University of Chicago) is developing tools to use Beagle for his project "Assessing and Predicting Scientific Progress through Computational Language Understanding". The project develops general methods relevant for students, policy makers and scientists. The goal is to generate dynamic high fidelity maps of knowledge claims in chemistry and related fields such as pharmaceuticals and toxicology. Their initial use of Beagle involves optimization of parameters for graph-based models of scientific collaboration. This work has potential to help scientists gain knowledge in many health and non-health domains, particularly those where the literature has already grown to a size where human comprehension of all publications is no longer possible.
Yves Lussier’s group (University of Chicago; recently moved to UIC) have been using Beagle to develop high-throughput methods for identifying novel microRNA-related targets for cancer therapy, identifying the genetic underpinning of complex disease polymorphisms and computationally identifying novel unique personal variant polymorphisms associated to a disease. They expect to complete a human phenome map of genome-wide association studies (GWAS), used to identify common genetic factors that influence health and disease, by the end of 2011. These studies are computationally intensive. Part of this work was presented as "Complex disease networks of trait-associated SNPs unveiled by Information Theory" at the AMIA Annual Symposium 2011, and will be included in JAMIA with a distinguished paper award. Another paper is in preparation: "Mechanisms of Intergenic Polymorphisms and of Intragenic Pleiotropy of Complex Diseases Unveiled by Similarity and Overlap in eQTL Networks."
Complex Disease Gene Modules of A Cancer-Associated SNPs Unveiled by Information Theory and Protein Interaction Networks. (Credit: Lussier)
The focus of the Hatsopoulos lab’s (Nicho Hatsopoulos, University of Chicago) research is the analysis of empirical neural activity based on Granger causal network models derived from generalized linear model of large-scale networks. These models are computationally intensive and are run efficiently as Matlab compiled executables on Beagle. The group is starting to investigate how such networks of causally related neurons change over time. Eventually, when they have completed more efficient models, they will shed light on how a group of neurons form a spatiotemporal pattern that is comparable with previously published local field potential spatiotemporal patterns and what the implication of those patterns are to behavior. Results were presented at annual meeting of Society for Neuroscience, Nov 2011. The team is currently developing C++ implementations of the estimation algorithms, which will allow to scale up the models and test their performance on different Cray architectures such as the XK6.
Example of causality networks estimated at different timings in relation to visual cue onset. Before: [-101,50] ms. At cue: [51, 200] ms. After: [201, 350] ms.
Distribution of directions of LPF beta wave propagation during Cue period. (Credit: Hatsopoulos)
Snapshot of agent-based Brownian dynamics simulation of macroscopic dynamics of contractility and stress relaxation in actin filaments, cross-linkers and molecular motors. (Credit: Munro)
In non-muscle cells, contractile force production is governed by highly dynamic self-organized actomyosin networks whose structure and mechanical properties emerge from a continuous interplay of assembly, interaction and disassembly of actin filaments, cross-linkers and molecular motors. To explore how, Ed Munro and collaborators (University of Chicago) have developed agent-based Brownian dynamics simulations that predicts macroscopic dynamics of contractility and stress relaxation based on purely local microscopic interactions among filaments, cross-linkers and motors. Using Beagle, they have studied the interplay of network architecture and turnover of network elements determine "effective viscosity" and network flow on long timescales, and force-dependent cross-bridge kinetics and myosin minifilament assembly state control size and spacing of pulsed actomyosin contractions. Two articles are in preparation, by J. Alberts and E. M. Munro; and T. Y. Kim, M. Gardel, and E. M. Munro.
Snapshot of neuronal activity in a simulation of brain neocortex. (Credit: Van Drongelen)
Epilepsy is a disease that is still poorly understood. No obvious biological cause has been unequivocally identified and it appears to be a complex disease. To understand its mechanics, the Van Drongelen lab (Wim van Drongelen, University of Chicago_ has been working at performing realistic computational simulations that bridge the gap between the experimentally accessible individual behavior (~100 cells) and the clinically recorded aggregate behavior (~ 1M). They have ported their models to Beagle and have performed scaling and optimization studies, which have been presented as Lee HC, Hereld M, Visser S, Pesce L, and van Drongelen W. "Scaling Behavior Of Computational Model of Neocortex", poster at 4th International Meeting on Epilepsy Research, Chicago, IL, May 19-21, 2011. A paper describing those findings in more details will be prepared soon. In the near future, they plan to investigate the network and cell properties that appear to affect the dynamics of epileptic seizures through simulation of networks of neurons. Parameter spaces will be studied to understand which appear to be the key factors. A key part of the project is also to analyze experimental data from neurons grown in vitro.
Kevin White and Bob Grossman (University of Chicago) are developing reliable and robust profiling and alignment software, which requires regular and thorough testing almost every time a new version is produced. This operation requires the reanalysis of a considerable amount of data. In practice, at this time, it involves the regular analysis of a number of datasets currently available (2000+ datasets). Other tests might also be added in the future. The scale of this operation requires a supercomputer such as Beagle to be able to provide a timely and reliable answer. The team has developed workflows and successfully ported analysis tools, and are now getting ready to perform a full-scale test on Beagle.
To qualitatively assess the fit of the model to the data, 104 parameter sets were generated from the joint posterior distribution of the parameters, and used to simulate speed-of-kill. The median (1st percentile and 99th percentile) values for each time are plotted in red as a solid (dotted) line. The overlaying black squares are the data collected during the dose-response study. Also contained in the figure is the virus dose received (D) and the number of larvae infected (n). (Credit: David A. Kennedy)
An important next step in host-pathogen evolution studies is to incorporate models of within-host dynamics into existing between-host models, which is being studied by Stefano Allesina and Greg Dwyer at the University of Chicago. These within-host models, however, are often difficult to incorporate, because little is known about pathogen dynamics inside hosts. As early as 1961 (Saaty) it was postulated that within-host models could be compared using dose-response-time data, yet the computational complexity involved prevented the implementation of this strategy. Thus, despite many within-host models being proposed, few have been used for parameter estimation and none have been tested. Today, with the advancement of computing technology, in particular cluster computing, it is possible to perform parameter estimation on mechanistic models of within-host disease dynamics. By implementing message passing on Beagle, this team uses a simulation-based approach for model selection on nested models to learn the relative importance of various mechanisms involved in virus growth in the gypsy moth caterpillar. From this analysis, they draw biological conclusions that provide the foundation to improve current practices in applying gypsy moth virus as an environmentally-benign pesticide. They also use their results to gain an understanding of host-pathogen coevolution more generally.
A snapshot from fully atomistic MD simulations of a 13-subunit piece of an actin filament used to understand the structural heterogeneity of this system, and its response to nucleotide state. Critical components to accurately simulate the filamentous state include the periodically repeating filament (boundaries of the periodic cell are shown in blue) and explicit solvation with appropriate ion concentrations. The lower half of the simulation box shows the water environment; potassium and chloride ions, added to approximate physiological concentrations, are shown as tan and cyan spheres respectively. (Credit: Voth)
The Voth group (Greg Voth, University of Chicago) has used Beagle to simulate actin, which is the primary component of the eukaryotic cellular cytoskeleton and has been the target of much recent experimental work. The specific focus of these simulations has been to characterize the structure and heterogeneity of the multi-protein actin filament. These simulations approach 1 million atoms in size and therefore the Beagle resource has been critical for achieving these research aims. This work has contributed new knowledge on the behavior of actin filaments and also to our understanding of possible binding sites for therapeutic agents, e.g., in cancer cells.
The research group of Bob Eisenberg (Rush University) works on the calcium channels that control the contraction of the heart and the sodium channels that provide signaling in the nervous system. They have systematically studied how these channels choose particular ions to perform their function. In the time that Beagle has been operation, it has allowed Eisenberg’s group to make progress, for example, showing that dielectric boundaries have important roles in determining selectivity. Improvement in channel function would have profound effects on the treatment of arrhythmias and diseases like multiple sclerosis that arise from poor function of these channels.
Ion Moving through a Bacterial Channel (Credit: Eisenberg)