One of the last computing chores to be sucked into the cloud is data analysis. Perhaps it’s because scientists are naturally good at programming and so they enjoy having a machine on their desks. Or maybe it’s because the lab equipment is hooked up directly to the computer to record the data. Or perhaps it’s because the data sets can be so large that it’s time-consuming to move them.
Whatever the reasons, scientists and data analysts have embraced remote computing slowly, but they are coming around. Cloud-based tools for machine learning, artificial intelligence, and data analysis are growing. Some of the reasons are the same ones that drove interest in cloud-based document editing and email. Teams can log into a central repository from any machine and do the work in remote locations, on the road, or maybe even at the beach. The cloud handles backups and synchronization, simplifying everything for the group.
But there are also practical reasons why the cloud is even better for data analysis. When the data sets are large, cloud users can spool up large jobs on rented hardware that accomplish the work much, much faster. There is no need to start your PC working and then go out to lunch only to come back to find out that the job failed after a few hours. Now you can push the button, spin up dozens of cloud instances loaded with tons of memory, and watch your code fail in a few minutes. Since the clouds now bill by the second, you can save time and money.
There are dangers too. The biggest is the amorphous worry about privacy. Some data analysis involves personal information from subjects who trusted you to protect them. We’ve grown accustomed to the security issues involved in locking data on a hard drive in your lab. It’s hard to know just what’s going on in the cloud.
It will be some time before we’re comfortable with the best practices used by the cloud providers but already people are recognizing that maybe the cloud providers can hire more security consultants than the grad student in the corner of a lab. It’s not like personal computers are immune from viruses or other backdoors. If the personal computer is connected to the Internet, well, you might say it’s already part of the cloud.