propmiss <- function(dataframe) lapply(dataframe,function(x) data.frame(nmiss=sum(is.na(x)), n=length(x), propmiss=sum(is.na(x))/length(x)))
Let's try it out.
#simulate some fake data
fakedata=data.frame(var1=c(1,2,NA,4,NA,6,7,8,9,10),var2=c(11,NA,NA,14,NA,16,17,NA,19,NA))
print(fakedata) var1 var2 1 1 11 2 2 NA 3 NA NA 4 4 14 5 NA NA 6 6 16 7 7 17 8 8 NA 9 9 19 10 10 NA
# summarize the missing data
propmiss(fakedata)
$var1 nmiss n propmiss 1 2 10 0.2 $var2 nmiss n propmiss 1 5 10 0.5
Running that function returns a list of data.frame objects. You can access the proportion missing for var1 by running propmiss(fakedata)$var1$propmis.
*Edit 2011-02-23*
Commenter A. Friedman asked for a version of this function that gives you the output as a data frame. The function's a bit uglier because something was being coerced as a list, but this does the trick:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
propmiss <- function(dataframe) { | |
m <- sapply(dataframe, function(x) { | |
data.frame( | |
nmiss=sum(is.na(x)), | |
n=length(x), | |
propmiss=sum(is.na(x))/length(x) | |
) | |
}) | |
d <- data.frame(t(m)) | |
d <- sapply(d, unlist) | |
d <- as.data.frame(d) | |
d$variable <- row.names(d) | |
row.names(d) <- NULL | |
d <- cbind(d[ncol(d)],d[-ncol(d)]) | |
return(d[order(d$propmiss), ]) | |
} | |
That's handy. Could you write an as.data.frame.propmiss() method that would coerce the output to a data.frame for easy use when there are a lot of variables being considered?
ReplyDeleteThanks.
VERY handy thank you alot! this is why I love R and the R community.
ReplyDeleteA. Friedman - rewrote the function to do just that.
ReplyDeletewould you please comment the code? i need to do the same thing but calculating the sum of missings in every row instead.
ReplyDeletereturning output as a dataframe from original function is simple with dplyr: just wrap the call to propmiss into bind_rows.
ReplyDeletex <- bind_rows(propmiss(df))