Archive for the ‘Cyberinfrastructure’ Category



China’s Milky Way 2 supercomputer was recently declared the fastest supercomputer in the world by industry scorekeeper Top500, the latest move in the increasingly international race for high performance computing supremacy. Late last month, CI Senior Fellow Rick Stevens appeared on Science Friday, alongside Top 500 editor Horst Simon, to talk about why that competition matters, and what the global push for faster computation will do for medicine, engineering and other sciences.

“These top supercomputers are like time machines,” Stevens said. “They give us access to a capability that won’t be broadly available for five to ten years. So whoever has the time machine is able to do experiments, able to see into the future deeper and more clearly than those that don’t have such machines.”

The same time machine metaphor was also picked up by the University of Chicago’s profile of Mira, our local Top500 competitor, which was bumped down to #5 by the Milky Way 2’s top ranking. But there’s no shame in fifth-best, when fifth-best can run 10 quadrillion calculations per second — the equivalent computing power of 58 million iPads. CI Senior Fellow Gregory Voth is quoted about how access to such a world-class resource helps both today and tomorrow’s scientists.

“Having access to a computing resource like Mira provides excellent opportunities and experience for educating up-and-coming young scientists as it forces them to think about how to properly utilize such a grand resource very early in their careers,” Voth says. “This gives them a unique perspective on how to solve challenging scientific problems and puts them in an excellent position to utilize computing hardware being imagined now for tomorrow.”


The Data Science for Social Good fellowship has reached the halfway point, and the website is starting to fill up with interesting content about the projects. Some fellows have already produced tools for the community to use, such as Paul Meinshausen’s interactive tree map of the City of Chicago’s Data Portal. Instead of a cold, no-frills list of the datasets available for download by the public, Meinshausen’ s map uses color and shape to guide users quickly to the data they are seeking and make rapid comparisons about the size of the dataset. The visualization was popular enough that programmers in Boston and San Francisco quickly applied his code to their own city’s data portals, while another built a common map for every city that uses Socrata software to share its data.


Read Full Post »

CERN is known as the current world epicenter of particle physics, the home of the Large Hadron Collider and thousands of scientists expanding our knowledge of the universe’s most basic ingredients. For one day earlier this month, the Geneva, Switzerland laboratory was also a meeting place for scientists, philosophers, musicians, animators and even Will.I.Am to share their grand ideas for the first ever TEDxCERN event. Among the speakers riffing on the theme of “Multiplying Dimensions” was CI Director Ian Foster, who presented his vision for The Discovery Cloud and accelerating the pace of science by bringing advanced data and computation tools to the smaller laboratories and citizen scientists of the world.

What we need to do is to in a sense create a new set of cloud services which do for science what the myriad of business cloud services do for business. We might call it the discovery cloud. It would be a set of services that take on, automate, and allow people to handle or outsource many of the routine activities that currently dominate research…I believe if we do that right, we can really make a transformative difference in how people do science.

You can watch a full video of Foster’s presentation below:

International Science Grid This Week also covered Foster’s talk and another given a day earlier to the information technology team at CERN. In that speech, Foster delivered a similar message about the need to bring advanced cyberinfrastructre to the “99%” of laboratories who can’t afford to build international data grids akin to what CERN used in its discovery of the Higgs boson.

“We have managed to create exceptional infrastructure for the 1%, but what about the rest?” asks Foster. “We have big science, but small labs. How do we deliver cyber infrastructure to small groups? They need something that is frictionless, affordable and sustainable.”

Read Full Post »

©CERN Photo: Brice Maximilien / Laurent Egli — at CERN.

©CERN Photo: Brice Maximilien / Laurent Egli — at CERN.


We were thrilled to spend Friday morning with the folks at TEDxCERN via webcast, enjoying fascinating talks by CI director Ian Foster and several other amazing scientists and educators. Foster’s talk focused on “The Discovery Cloud,” the idea that many complex and time-consuming research tasks can be moved to cloud-based tools, freeing up scientists to accelerate the pace of discovery. We’ll post the video when it’s up, but for now, enjoy this great animation produced for the conference by TED-Ed explaining grid computing, cloud computing and big data.


Speaking of CERN, ISGTW ran a lengthy profile of the computing grid that powers the particle physics research on their Large Hadron Collider. In the three years since the LHC started running, it has produced 70 petabytes of data, which is subsequently distributed around the world to over 150 sites for coordinated and parallel analysis. As Wired wrote back in 2004, the LHC grid was built on Globus Toolkit, created by Ian Foster and Carl Kesselman, “the Lewis and Clark of grid computing.”

Some of the science-as-a-service ideas Foster discussed in his TEDxCERN talk were brought up a week earlier by Renee DiResta in O’Reilly Radar. Companies that provide 3D microscopic scanning, data platforms for computational biology or drug discovery and even connections with freelance scientists are featured.

Computation is eating science, and that’s a good thing…but funding agencies and researchers need to change or be digested, writes Will Schroeder at Kitware.

Read Full Post »

image descriptionPeople who work in laboratories take a lot of things for granted. When they come into work in the morning, they expect the equipment to have power, the sink to produce hot and cold water, and the internet and e-mail to be functional. Because these routine services are taken care of “behind the scenes” by facilities and IT staff, scientists can get started right away on their research.

But increasingly, scientists are hitting a new speed bump in their day-to-day activities: the storage, movement and analysis of data. As datasets grow far beyond what can easily be handled on a single desktop computer and long-distance collaborations become increasingly common, frustrated researchers find themselves spending more and more time and money on data management. To get the march of science back up to top speed, new services must be provided that make handling data as simple as switching on the lights.

That mission was the common thread through the second day of the GlobusWorld conference, an annual meeting for the makers and users of the data management service, held this year at Argonne National Laboratory. As Globus software has evolved from enabling the grids that connect computing centers around the world to a cloud-based service for moving and sharing data, the focus has shifted from large, Big Science collaborations to individual researchers. Easing the headache for those smaller laboratories with little to no IT budget can make a big impact on the pace of their science, said Ian Foster, Computation Institute Director and Globus co-founder, in his keynote address.

“We are sometimes described as plumbers,” Foster said. “We are trying to build software and services that automate activities that get in the way of discovery and innovation in research labs, that no one wants to be an expert in, that people find time-consuming and painful to do themselves, and that can be done more effectively when automated. By providing the right services, we believe we can accelerate discovery and reduce costs, which are often two sides of the same coin.”


Read Full Post »

Dna-splitThe original Human Genome Project needed 13 years, hundreds of scientists and billions of dollars to produce the first complete human DNA sequence. Only ten years after that achievement  genome sequencing is a routine activity in laboratories around the world, creating a new demand for analytics tools that can grapple with the large datasets these methods produce. Large projects like the HGP could assemble their own expensive cyberinfrastructure to handle these tasks, but even as sequencing gets cheaper, data storage, transfer and analysis remains a time and financial burden for smaller labs.

Today at the Bio-IT World conference in Boston, the CI’s Globus Online officially unveiled their solution for these scientists: Globus Genomics. Per the news release, “integrates the data transfer capabilities of Globus Online, the workflow tools of Galaxy, and the elastic computational infrastructure of Amazon Web Services. The result is a powerful platform for simplifying and streamlining sequencing analysis, ensuring that IT expertise is not a requirement for advanced genomics research.”

In the release, positive feedback is provided by researchers including William Dobyns, who studies the genetics and neuroscience of developmental disorders at the University of Washington, and Kenan Onel, a pediatric oncologist who directs the Familial Cancer Clinic at The University of Chicago Medicine. Nancy Cox, section chief for genetic medicine at UChicago Medicine, said the service enabled her laboratory to meet the big data challenges of modern genomic research.

“We needed a solution that would give us flexibility to extend our analysis pipelines and apply them to very large data sets,” says Dr. Cox. “Globus Genomics has provided us with a key set of tools and scalable infrastructure to support our research needs.”

If you’re at the Bio-IT World conference, you can visit the Globus Online team at Booth 100 and get a tutorial on the new Globus Genomics service.

Read Full Post »


On Thursday, the National Center for Supercomputing Applications at the University of Illinois celebrated the full launch of Blue Waters, their new one-petaflop supercomputer. As part of the ceremony, Governor Pat Quinn declared “Blue Waters Supercomputer Day,” and Senator Dick Durbin saluted the machine and other supercomputers as “the gateway to next-generation research.” The start of 24/7 research was also a proud day for Computation Institute scientists such as Michael Wilde and Daniel Katz, who were involved in getting Blue Waters up and running. Wilde spoke about the supercomputer at the CI’s Petascale Day event last October.

Meanwhile, a couple hundred miles north of Blue Waters, Argonne’s new 10-petaflop supercomputer Mira nears the start of its own full production period later this year. This week, the laboratory released a new timelapse video of the machine’s construction, which you can watch below. But science isn’t waiting for Mira to reach full strength, as demonstrated by this new project on the combustion and detonation of hydrogen-oxygen mixtures — a potential alternative source of fuel.


In recent years, cloud computing has crossed over from inside baseball IT chatter to the general public. As CI fellow Rob Gardner recently charted, web searches for the term began climbing in 2009 and still vastly out-pace searches for similar buzzwordy topics such as “big data” and “virtualization.” Now that consumers are comfortable with storing files and running programs in the cloud, it’s time for the pioneers of that technology to take their victory laps. One recent round-up of cloud computing mavericks at Forbes tagged CI fellow Kate Keahey as “the grand mother of cloud,” recognizing her early work on infrastructure-as-a-service (Iaas) platforms. Her current project, Nimbus, is dedicated to providing cloud-based infrastructure for scientific laboratories.


A lot of what we know about science may be wrong, but finding those flaws could lead to better discovery in the future. That’s how this article on Txchnologist framed the new Metaknowledge Network led by CI fellow James Evans. “We’re building on decades of this deep work on science and trying to connect it to this computational moment…to get a quantitative understanding of why we have the knowledge we have,” Evans told reporter Rebecca Ruiz.

The open release of data by the city of Chicago hasn’t just improved our understanding of how the city works, but also how we see it. These beautiful visualizations created with the Edifice software (one of the projects at the Open City collaborative) make the neighborhoods of Chicago look like a genomic SNP chip…or an elaborate Lite Brite project.

Many Chicago homes would benefit from improvements that improve energy efficiency, saving them a huge portion of their monthly utility bills. But many residents are unaware of the option or unwilling to bear the up-front expenses needed to retrofit homes to reduce energy usage. According to WBEZ, two University of Chicago students have founded a new startup called Effortless Energy that uses data-mining techniques to both locate and assist these opportunities for conservation and savings.

The “traveling salesman problem” of finding the most efficient route between 20 different cities has long frustrated mathematicians. So English scientists created “programmable goo” to find the shortest route in similar fashion to studies that have used slime mold as navigators. You can read the paper, “Computation of the Traveling Salesman Problem by a Shrinking Blob” at arXiv.

Read Full Post »

Ian Foster speaking at the RDMI workshop, March 13, 2013. Photo by Rick Reinhard.

Ian Foster speaking at the RDMI workshop, March 13, 2013. Photo by Rick Reinhard.

Big science projects can afford big cyberinfrastructure. For example, the Large Hadron Collider at CERN in Geneva generates 15 petabytes of data a year, but also boasts a sophisticated data management infrastructure for the movement, sharing and analysis of that gargantuan data flow. But big data is no longer an exclusive problem for these massive collaborations in particle physics, astronomy and climate modeling. Individual researchers, faced with new laboratory equipment and methods that can generate their own torrents of data, increasingly need their own data management tools, but lack the hefty budget large projects can dedicate to such tasks. What can the 99% of researchers doing big science in small labs do with their data?

That was how Computation Institute director Ian Foster framed the mission at hand for the Research Data Management Implementations Workshop, happening today and tomorrow in Arlington, VA. The workshop was designed to help researchers, collaborations and campuses deal with the growing need for   high-performance data transfer, storage, curation and analysis — while avoiding wasteful redundancy.

“The lack of a broader solution or methodology has led basically to a culture of one-off implementation solutions, where each institution is trying to solve their problem their way, where we don’t even talk to each other, where we are basically reinventing the wheel every day,” said H. Birali Runesha, director of the University of Chicago Research Computing Center, in his opening remarks.


Read Full Post »