Automating chunking of big data in R using a loop -

September 15, 2011

i trying break big dataset chunks. code looks this:

#chunk 1   info <- read.csv("/users/admin/desktop/data/sample.csv", header=t, nrow=1000000) write.csv(data, "/users/admin/desktop/data/data1.csv")  #chunk 2   info <- read.csv("/users/admin/desktop/data/sample.csv", header=f, nrow=1000000, skip=1000000) write.csv(data, "/users/admin/desktop/data/data2.csv")  #chunk 3   info <- read.csv("/users/admin/desktop/data/sample.csv", header=f, nrow=1000000, skip=2000000) write.csv(data, "/users/admin/desktop/data/data3.csv")

there hundreds of millions of rows in dataset, need create lot of chunks, , automate process. there way loop command each subsequent chunk automatically skips 1,000,000 more rows previous chunk did, , file saves "datan.csv" (n symbolizing subsequent number of previous file)?

what next way? demonstration created info frame 2 columns , 10 lines , read within loop 2 times, each time 5 lines, saving result text file:

f<-"c:/users/mypc/desktop/" for(i in 1:2){     df <- read.table("c:/users/mypc/desktop/df.txt", header=false, nrow=5, skip=5*(i-1))     file <- paste(f,sep="","df",i,".txt")     write.table(df,file) }

r chunking

Search This Blog

New Th

Automating chunking of big data in R using a loop -

Comments

Post a Comment

Popular posts from this blog

xslt - DocBook 5 to PDF transform failing with error: "fo:flow" is missing child elements. Required content model: marker* -

mediawiki - How do I insert tables inside infoboxes on Wikia pages? -

Local Service User Logged into Windows -