Menu ▼

Ancient Himalayan DNA uncovered in Chicago


Beagle2 Upgrade News

December 2015
    • All compute node CPUs upgraded from Magny-Cours to Abu Dhabi processors. With this upgrade, core count has been increased from 24 to 32 cores per node.
    • All compute node RAM DIMMs replaced. Available RAM has been increased from 32GB to 64GB per node.
    • Four nVidia GPU accelerator nodes have been added.
    • Two additional DDN storage cabinets installed to provide new Lustre scratch storage. An additional 1.4 PB of usable space has been added with that upgrade.
    • SMW and CLE system software upgraded from 7.0 to 7.2 and 4.2 to 5.2 respectively.
    • Development environment tools (Cray compiler, GNU complier) updated.
    • Lustre software updated from version 1.8 to 2.5. The issue that prevented group permissions from working properly on Lustre has now been fixed as well.

We had also planned to upgrade the network cards in the login nodes to 10Gbps connections, but have delayed that until January 2015. That upgrade will take place after ANL finishes planned upgrades to their network infrastructure taking place over the next few weeks.

With these changes, you will need to carefully reconsider and modify any and all submit scripts you have previous used on Beagle. These changes will affect the switches you use with aprun in your submit script, but you should spend some time generally thinking about how the increased hardware capabilities affect how you submit and run your jobs. Our wiki should now be updated to reflect these changes, but if you have any questions or concerns please let us know.

We’re also still in the process of moving data from the old Lustre volume to the new Lustre volume. By limiting access at first we’ll able to prioritize copying the data of users who need early access while we continue copying the data in the background.
If you want to be considered for early access or have any questions or concerns of any kind, please let us know at and we’ll do our best to answer ASAP.

As always, we appreciate your patience during this upgrade period.

Whole Genome Analysis, STAT

For media inquiries:
John Easton
(773) 795-5225, john.easton@uchospitals.ed

Although the time and cost of sequencing an entire human genome has plummeted, analyzing the resulting three billion base pairs of genetic information from a single genome can take many months.

In the journal Bioinformatics, however, a University of Chicago-based team—working with Beagle, one of the world’s fastest supercomputers devoted to life sciences—reports that genome analysis can be radically accelerated. This computer, based at Argonne National Laboratory, is able to analyze 240 full genomes in about two days.

“This is a resource that can change patient management and, over time, add depth to our understanding of the genetic causes of risk and disease,” said study author Elizabeth McNally, MD, PhD, the A. J. Carlson Professor of Medicine and Human Genetics and director of the Cardiovascular Genetics clinic at the University of Chicago Medicine.

“The supercomputer can process many genomes simultaneously rather than one at a time,” said first author Megan Puckelwartz, a graduate student in McNally’s laboratory. “It converts whole genome sequencing, which has primarily been used as a research tool, into something that is immediately valuable for patient care.”

Because the genome is so vast, those involved in clinical genetics have turned to exome sequencing, which focuses on the two percent or less of the genome that codes for proteins. This approach is often useful. An estimated 85 percent of disease-causing mutations are located in coding regions. But the rest, about 15 percent of clinically significant mutations, come from non-coding regions, once referred to as “junk DNA” but now known to serve important functions. If not for the tremendous data-processing challenges of analysis, whole genome sequencing would be the method of choice.

To test the system, McNally’s team used raw sequencing data from 61 human genomes and analyzed that data on Beagle. They used publicly available software packages and one quarter of the computer’s total capacity. They found that shifting to the supercomputer environment improved accuracy and dramatically accelerated speed.

“Improving analysis through both speed and accuracy reduces the price per genome,” McNally said. “With this approach, the price for analyzing an entire genome is less than the cost of the looking at just a fraction of genome. New technology promises to bring the costs of sequencing down to around $1,000 per genome. Our goal is get the cost of analysis down into that range.”

“This work vividly demonstrates the benefits of dedicating a powerful supercomputer resource to biomedical research,” said co-author Ian Foster, director of the Computation Institute and Arthur Holly Compton Distinguished Service Professor of Computer Science. “The methods developed here will be instrumental in relieving the data analysis bottleneck that researchers face as genetic sequencing grows cheaper and faster.”

The finding has immediate medical applications. McNally’s Cardiovascular Genetics clinic, for example, relies on rigorous interrogation of the genes from an initial patient as well as multiple family members to understand, treat and prevent disease. More than 50 genes can contribute to cardiomyopathy. Other genes can trigger heart failure, rhythm disorders or vascular problems.

“We start genetic testing with the patient,” she said, “but when we find a significant mutation we have to think about testing the whole family to identify individuals at risk.”

The range of testable mutations has radically expanded. “In the early days we would test one to three genes,” she said. “In 2007, we did our first five-gene panel. Now we order 50 to 70 genes at a time, which usually gets us an answer. At that point, it can be more useful and less expensive to sequence the whole genome.”

The information from these genomes combined with careful attention to patient and family histories “adds to our knowledge about these inherited disorders,” McNally said. “It can refine the classification of these disorders,” she said. “By paying close attention to family members with genes that place then at increased risk, but who do not yet show signs of disease, we can investigate early phases of a disorder. In this setting, each patient is a big-data problem.”

Beagle, a Cray XE6 supercomputer housed in the Theory and Computing Sciences (TCS) building at Argonne National Laboratory, supports computation, simulation and data analysis for the biomedical research community. It is available for use by University of Chicago researchers, their collaborators and “other meritorious investigators.” It was named after the HMS Beagle, the ship that carried Charles Darwin on his famous scientific voyage in 1831.

The National Institutes of Health and the Doris Duke Charitable Foundation funded this study. Additional authors include Lorenzo Pesce, Viswateja Nelakuditi, Lisa Dellefave-Castillo and Jessica Golbus of the University of Chicago; Sharlene Day of the University of Michigan; Thomas Coppola of the University of Pennsylvania; and Gerald Dorn of Washington University.


A Year of Computational Discovery: The CI’s 2013 in Reviews


Researchers at the CI’s Center for Multiscale Theory and Simulation published a paper in The Journal of Physical Chemistry using computational models to study the activity of monoclonal antibody therapies used for cancer, arthritis and other conditions.


With a $5.2 million grant from the John Templeton Foundation, CI Senior Fellow James Evans launched the Knowledge Lab, a new research center dedicated to using text-mining, network theory, and other computational techniques to study the creation, evolution, and spread of human knowledge.

The Urban Studies Research Coordination Network, led by the CI’s Urban Center for Computation and Data, held its inaugural meeting at the School of the Art Institute of Chicago, bringing together computer scientists, statisticians, artists, policy experts and urban planners to explore new research directions.

A paper in Ecology Letters from the laboratory of CI Faculty Stefano Allesina described the multi-dimensionality of ecological networks, which will help scientists develop better models for studying food networks and invasive species.


The human body has its own ecology, a world of bacteria, viruses and other microorganisms that lives inside our organs and on our skin. As part of TEDxNaperville, CI Senior Fellow Rick Stevens described ongoing efforts to reveal those universes within and determine how they affect our health and wellbeing.

The Research Data Management Implementations Workshop in Arlington, VA brought institutions together to compare IT successes and failures, and featured a keynote address by CI Director Ian Foster about building cloud-based services for research.

A collaboration between the City of Chicago and the Urban Center for Computation and Data finished ahead of several dozen entries in the Bloomberg Mayors Challenge, receiving $1 million to develop the SmartData Platform to help cities run more efficiently and effectively.

As part of Argonne National Laboratory’s OutLoud series, CI Senior Fellow Pete Beckman gave a whirlwind tour through the history of computing, from abacuses and slide rules through Angry Birds and exascale.


Some of the same techniques used by special effects teams in film to create computer-generated worlds are deployed by computational chemists to study the activity of molecular machines invisible to traditional scientific equipment, said CI Senior Fellow and Faculty Gregory Voth in his talk for the Chicago Council on Science & Technology.

After a very busy year working as Chief Data Scientist for the Obama re-election campaign, Rayid Ghani joined the Computation Institute to help create a new generation of people interested in making a social impact through data and analytics.

The 2012 drought that struck the United States may have offered a sneak preview of how climate change will disrupt agriculture, according to research by CI Fellow Joshua Elliott covered in International Science Grid This Week.

The annual GlobusWorld conference unveiled new capabilities of the research data management software and featured testimonies from researchers around the world that are using the platform. Earlier in the month, the team announced Globus Genomics, a cloud-based platform for moving and analyzing genomic data.

Researchers celebrated the first three years of the Beagle — UChicago’s 150-teraflop supercomputer dedicated to studies in biology and medicine. The Day of the Beagle featured over a dozen talks about work enabled by the machine, ranging from neuroscience to molecular biology to genetic medicine.

Two large gifts were announced to fund UChicago biomedical research using big data to study pancreatic cancer and other diseases using genetics and electronic medical records.

During Big Data Week in Chicago, a CI/UrbanCCD panel discussed the use of government and public data in constructing and evaluating policy, and Joseph Insley and Rayid Ghani gave webinars on their work in data visualization and political campaigns, respectively.


An all-star roster of computer scientists assembled at Argonne National Laboratory to celebrate 30 Years of Parallel Computing, highlighting the contributions of the lab towards creating the computer architecture of today.

CI Senior Fellow Rick Stevens testified to Congress about the importance of US leadership in “the race to exascale” — building computers capable of performing one million trillion calculations per second.

Mobile apps can be much more than just time-killing games, and CI Fellow Andrew Binkowski explained how to get started developing mobile applications for education and research at a Faculty Technology Day event.

The CERN Laboratory in Geneva, Switzerland hosted its first-ever TEDx conference, and invited CI Director Ian Foster to speak about his vision of The Discovery Cloud — using the potential of cloud computing to bring advanced scientific computing infrastructure to researchers around the world.


The first 40 fellows of the Data Science for Social Good Summer Fellowship arrived in Chicago for 12 busy weeks developing solutions for non-profits and government agencies. DSSG Director Rayid Ghani spoke at Techweek Chicago about how the program offers a new path for students interested in using technology to improve the world.

Computation Institute researchers were well represented at the University’s Alumni Weekend, with Senior Fellow Gary An giving a talk about modeling in biomedicine, UrbanCCD director Charlie Catlett appearing on an urban data panel with Harris School of Public Policy Dean Colm O’Muircheartaigh and Lewis-Sebring Distinguished Service Professor Stephen W. Raudenbush, and Director Ian Foster talking about how “Big Computation” can unlock bigger knowledge.


Argonne National Laboratory, with help from Sen. Dick Durbin, officially unveiled their newest supercomputer, Mira, a 10-petaflop IBM Blue Gene/Q machine that ranks among the top ten fastest computers in the world.

CI Senior Fellow Benoit Roux published new findings in the journal Nature on how water affects one of the most important proteins for life, the potassium channel.


To address the challenges of planning the massive 600-acre Chicago Lakeside Development on the South Side, UrbanCCD researchers linked up with developers McCaffery Interests and architects at Skidmore, Owings & Merrill for the LakeSim project, a computational platform for large-scale urban design.

In a paper for Cancer Research, CI Fellow and Faculty Samuel Volchenboum and CI Fellow Mike Wilde published a new algorithm to help researchers find better genetic classifiers for diagnosing and treating cancer.

The inaugural class of the Data Science for Social Good Fellowship summarized their summer work in a Data Slam event held downtown at the University of Chicago Gleacher Center.


The Knowledge Lab’s Metaknowledge Research Network held their first meeting in Pacific Grove, California, strategizing their approach to questions such as “What makes someone a great scientist or inventor?”

A team led by CI Senior Fellow and Faculty Andrey Rzhetsky analyzed over 120 million patient records and thousands of scientific studies to create a groundbreaking genetic map of complex diseases, research published in the journal Cell.


No news was bigger this year than the naming of CI Senior Fellow Lars Peter Hansen as one of three recipients of the 2013 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel. Hansen received the honor along with UChicago colleague Eugene Fama and Yale’s Robert Shiller, and was credited with important statistical work in the field of econometrics. A panel of UChicago economists later celebrated the significance of Hansen and Fama’s contributions to the field and our knowledge about the behavior of financial markets. Other Nobel Prizes awarded this month acknowledged computational work in discovering the Higgs Boson and studying chemical changes.

The Texas Advanced Computing Center announced Wrangler, a new open source data analysis and management system that will use Globus services for data transfer.

The Center for Robust Decision Making on Climate and Energy Policy held their annual all-hands meeting in Chicago, and discussed research that is improving the accuracy and scope of computer models for climate, agriculture, and the economy.

As scientific instruments collect more data and more complex data, their computational demands soar. CI Senior Fellow Andrew Chien launched a project with measurement company Agilent to find new pattern-matching strategies for analyzing that data more quickly and efficiently.

In a paper published by PLoS Genetics, CI Senior Fellow Nancy Cox addressed one of the most important mysteries in the study of genetics and disease: the case of missing heritability.

UrbanCCD Director Charlie Catlett spoke to an architecture conference about the potential of data to transform how cities are designed and built.


The Computation Institute invited the University of Chicago campus to a series of “lightning talks” — short presentations of ongoing research and opportunities for student involvement (Watch videos from the event at the CI YouTube channel).

The Social Sciences Division produced a video and feature on CI Senior Fellow James Evans and his new Knowledge Lab; Evans also wrote an editorial in Science discussing how modeling scientific impact and using “robot scientists” to suggest experiments could revolutionize research.

A panel of CI and UrbanCCD experts discussed the data-based future of city planning and development in a special UChicago Discovery Series event, Chicago: City of Big Data.

CI Director Ian Foster and Fellow Ravi Madduri presented the discovery-accelerating powers of Globus and Globus Genomics at Amazon Web Services’ re:Invent conference.


A multi-institutional collaboration including the CI announced the new Center for Hierarchical Materials Design to explore innovative new avenues in the creation of materials for technology, medicine, and energy.

As part of a special section in the Proceedings of the National Academy of Sciences, CI Fellow Joshua Elliott and other RDCEP researchers published a study modeling the effect of climate change on the world’s freshwater supply and agriculture.

PATRIC, the world’s largest database of genomic information about pathogenic bacteria, prepared for its tenth anniversary by surpassing 10,000 annotated genomes.

University of Chicago students tested the limits of the new Hack Arts Lab in the first-ever digital fabrication course.


To read this article in original please visit this link

Lower priority jobs

For the new low priority settings, the restrictions are a walltime of four hours or less, and nothing more than 50 nodes. Basically, we’re scheduling normal priority jobs first, then filling in the gaps with lower priority jobs. The smaller the job, the easier it’ll be to schedule, so the quicker it will move from queued to running.
Nothing needs to be specified in the submit script. Low priority configurations are handled on the server side on a per-project basis. Any project that has been designated as having a low-priority allocation will automatically have those settings assigned when Moab reads the project code/name when the job is submitted.

New command to check your allocations

With the new fiscal year’s allocations in place, we’re releasing a tool we’ve developed which allows our users to check the status of their allocations. Just run ‘show_alloc‘ from any login node and the output will display the remaining node hours for any projects you are a part of.

Updates after last maintenance:

  • Installed boost 1.54.0
  • installed python with MPI as MPI4PY
  • SMW upgraded to the latest version, SMW 7.0.UP03
  • CLE upgraded to the latest version, CLE 4.2.UP02</>
  • CADE upgraded to the latest version, 6.24</>
    NOTE: Cray renamed the ‘cray-mpich2‘ module to ‘cray-mpich‘.
    We’ve changed the default user profile to reflect this name change and load this module by default, so no action on your part is necessary to use it. However, if you reference the older module name in any setup files or scripts, you may see a message about it being deprecated. Be sure to change those files to reflect the new name.

Change in scheduler:

  • Removed interactive queue because of it is used rarely and when it is, often badly. We recommend users to ask for a reservation if necessary
  • Removed scalability queue because of it is used rarely and when it is, often inappropriately (it is not meant to be a queue for smallish jobs, but rather to test scalability). We recommend users to ask for a reservation if scalability tests are inefficiently run in the batch (which is often the case then the machine is busy)

Reminder that the new allocations are active and that there is no carry over of hours from last year. Some groups might not have an allocation anymore because they either not used their previous one or haven’t used the machine in some time, if they want to keep computing on Beagle they need to reapply.

The Beagle’s Biological Voyage Continued

April 30, 2013

Graphic by Ana Marija Sokovic

When Charles Darwin took his historic voyage aboard the HMS Beagle from 1831 to 1836, “big data” was measured in pages. On his travels, the young naturalist produced at least 20 field notebooks, zoological and geological diaries, a catalogue of the thousands of specimens he brought back and a personal journal that would later be turned into The Voyage of the Beagle. But it took more than two decades for Darwin to process all of that information and into his theory of natural selection and the publication of On the Origin of Species.

While biological data may have since transitioned from analog pages to digital bits, extracting knowledge from data has only become more difficult as datasets have grown larger and larger. To wedge open this bottleneck, the University of Chicago Biological Sciences Division and the Computation Institute launched their very own Beagle — a 150-teraflop Cray XE6 supercomputer that ranks among the most powerful machines dedicated to biomedical research. Since the Beagle’s debut in 2010, over 300 researchers from across the University have run more than 80 projects on the system, yielding over 30 publications.

“We haven’t had to beat the bushes for users; we went up to 100 percent usage on day one, and have held pretty steady since that time,” said CI director Ian Foster in his opening remarks. ”Supercomputers have a reputation as being hard to use, but because of the Beagle team’s efforts, because the machine is well engineered, and because the community was ready for it, we’ve really seen rapid uptake of the computer.”

A sampler of those projects was on display last week as part of the first Day of the Beagle symposium, an exploration of scientific discovery on the supercomputer. The projects on display covered the very big — networks of genes, regulators and diseases built by UIC’s Yves Lussier — to the very small — atomic models of molecular motion in immunological factors, cell structures and cancer drugs. Beagle’s flexibility in handling projects from across the landscape of biology and medicine ably demonstrated how computation has solidified into a key branch of research in these disciplines alongside traditional theory and experimentation.

In the day’s first research talk, Kazutaka Takahashi of the Department of Organismal Biology showed how science can move fluidly between these realms. Takahashi studies how the neurons of the brain’s motor cortex behave during eating, using a very tiny electrode array that can record dozens of neurons simultaneously. But recording is just the beginning — the analysis required to untangle how these neurons connect and influence each other during a meal is far more than your everyday computer can handle. Takahashi said the software was originally set up to analyze no more than five neurons at a time, and trying to do 70 at once would take months on a desktop PC. Moving their analysis to Beagle freed up the researchers to more rapidly tease out the neural network and relate it to different stages of chewing and swallowing, experiments that may someday help stroke sufferers regain normal eating ability.

Other Beagle projects are dedicated to finding new medical treatments in data that’s already been collected. Lussier, a former CI fellow, is constructing enormous networks of disease using data from the Human Genome Project, the ENCODE study of gene regulatory elements and clinical research to find new genetic and pathway targets in different types of cancer. Only a supercomputer can sort through the thousands of different possible combinations and hypotheses — one recent analysis required around 2 million core hours to complete, a task that took Beagle only 20 days as compared to approximately 14 years on a desktop. Lussier hopes to use similar approaches on complex diseases such as diabetes, funneling the rising tide of genomic data into “the medicine of tomorrow.”

On the other extreme, CI Senior Fellow Benoit Roux is running molecular dynamics simulations on Beagle to study the activity of an already-proven treatment: Gleevec. One of the first successful targeted cancer therapies, Gleevec was developed in the 1990′s to treat certain types of leukemia by switching off an overactive protein. Roux’s models simulate the motion of individual atoms to examine exactly how the drug binds its target and investigate Gleevec’s chemical relatives to see why they differ in their effectiveness. The investigations could lead to the design of better drugs, or help physicians circumvent the drug resistance that develops after prolonged use.

Other researchers at Day of the Beagle described the molecular worlds that they are simulating on the supercomputer. Edwin Munro‘s research on the elaborate choreography of the cytoskeletal elements actin and myosin revealed the strategies used by cells to re-shape, squeeze and split themselves. Greg Tietjen used computer models and x-ray scattering experiments to study how the Tim protein family recognizes different combinations of lipid membrane proteins and makes decisions as part of the immune response. Molecular dynamics models can even be a teaching tool, as Esmael Jafari Haddadian explained in his talk on using Beagle in an undergraduate course on quantitative biology. For their final project, students picked a protein of interest — such as yellow mealworm beetle anti-freeze protein — and modeled its behavior in a simulated solution.

But perhaps the biggest computational challenge facing modern biologists is how to manage and make sense of the flow of genomic data as genetic sequencing becomes cheaper and more routine. Jason Pitt, a student in the laboratory of CI senior fellow Kevin White, described the SwiftSeq workflow for the parallel processing of terabytes of raw data from the Cancer Genome Atlas on Beagle, which reduced compute time by as much as 75 percent. Another data pipeline, MegaSeq, was the subject of Megan Puckelwartz’s talk, which focused on whole genomic sequencing for studying rare genetic variants associated with the heart disease cardiomyopathy.

“Beagle is ideal for whole genome analysis,” Puckelwartz said. “Other people are done after their first run, but Beagle allows us to extract the most data and continuously mine that data as new methods of analysis come available.”

by Rob Mitchum