Simon Urbanek, Colloquium Speaker

Member of the Statistics Research Department at AT&T Labs
Ph.D. Statistics, University of Augsburg, Germany
Date: 
Thursday, October 9, 2014 - 3:00pm
Colloquium Title: 
RCloud and iotools - tools for collaboration and distributed computing using R
Location: 
Reception at 3:00 p.m. in 241 SH / Talk at 3:30 in 61 SH

The rising interest in Big Data analytics has lead to at least two fundamental challenges: the most often addressed one is the ability to perform the necessary computing operation in reasonable time. However, the second often neglected one is the ability to leverage existing work and collaborate on complex solutions. This this talk we will present tools to address both issues on the basis of the R computing environment.

RCloud is a web-based, distributed, collaborative analytics environment which enables the sharing of work in social-network like fashion, facilitates reproducibility, discovery and collaboration. It allows easy re-use of code, incorporates visualization tools and a path to deployment.

To allow the use of RCloud over large clusters, we have also developed iotools - an R package that provides highly-efficient ways for loading and parsing data into R and running map-reduce jobs over Hadoop clusters in much faster ways than most existing approaches. It also provides an additional in-memory distributed computing model allowing for complex algorithms that do not fit into the map/reduce framework to be run over stock Hadoop infrastructure. It can be also used for local batchwise processing when needed.

In this talk we will discuss both open-source projects RCloud and iotools with applications on real data and a real Hadoop cluster, ranging from visualization to high-throughput computing using R.