Posts filed under 'R'
I often use YouTube to find out all kinds of stuff - when I want to learn new gym exercises for example, it is easy to find all kinds of instruction videos there. I hadn’t really thought of using it to find R tutorials yet, however, which could be quite useful for students. So far I found one series, with some useful parts.
Useful for students:
Help search
Plots and history
External files
Large datasets
Less urgent ones:
CRAN on the web
Vector arithmetic
Matrix operations
I have not actually checked them - going by the titles. If there are mistakes on them, let me know.
November 17th, 2008
Beanplot … Just a nice alternative to the boxplot. I should use this diagram from Wikipedia to explain the boxplot in class - it looks very nice.
November 15th, 2008
Ah, just because I got tired of typing latex; bibtex; latex; latex; dvipdf every single time I wrote a little script a few months ago to do that. Of course, the script is super simple:
latex $1; bibtex $1; latex $1; latex $1; dvipdf $1; cleanup
whereby cleanup refers to another little script that just deletes all Latex’s intermediary files. (Perhaps there is a Latex setting to have them saved in an entirely different place anyway, wouldn’t there be?)
Later I added one for the rare occasion that I use Sweave as well to integrate R code into my Latex files. Now I am also starting to become a fan of the pic language to add diagrams to my files and it also integrates very well with Latex. So now I upgraded my script a little and just post it here in the odd case that it turns out to be useful for someone else as well. The Sweave version:
#!/bin/sh
R CMD Sweave $1
fn=c`date +"%M%S"`
pic -t $1.tex > $fn.tex
rm $1.tex
latex $fn
bibtex $fn
latex $fn
latex $fn
dvipdf $fn $1.pdf
rm *.aux
rm $fn*
I had tried Sweave earlier but was not too satisfied, since most things I was estimating took too long to have them run every time I compile my text. But now I have been working on a document with a lot of straightforward regressions, several plots, and several diagrams, and the final layout is only of importance as an internal working document - not for publication - so I have been making heavy use of both Sweave and pic and I am really enthousiast about the result.
October 5th, 2008
Based on this post in Andrew Gelman’s blog, which I read religiously, I wrote a little R template for the bootstrap procedure. Well, the template is simple, but the cool thing is that it tells you when it is finished, so you can stop going back and forth to the R window to check whether it’s done. This one is fast, but for a long, slow procedure, that’s pretty cool. Alas, only works on a Mac (and perhaps Linux?).
x <- rnorm(100,3,2)
bs <- NULL
for (i in 1:1000)
{
bs.sample <- sample(x, length(x), replace=T)
bs[i] <- mean(bs.sample)
}
system(sprintf("say The bootstrap has finished. The result is a mean of %4.2f, with standard error %4.2f. You can access the result with the b s variable.", mean(bs), sd(bs)))
October 28th, 2007
This video tutorial is an interesting way to get a basic idea of how the statistical package R, which I use for teaching statistics, works.
October 10th, 2007
R already has a function called ls(), which simply lists the names of all objects in the current environment, and ls.str(), which does so with a lot more information. I wanted one that looks a little more similar to ls -l in a Unix environment, so here’s a start:
my.ls <- function(envir = as.environment(-1)) {
names <- .Internal(ls(envir, all.names=T))
for (item in names) {
l1 <- length(get(item))
l2 <- ""
if (!is.null(dim(get(item)))) {
l1 <- dim(get(item))[1]
l2 <- sprintf("%10d", dim(get(item))[2])
}
cat(sprintf("%-30s %-10s %-10s %10d %10sn",
item, class(get(item)), mode(get(item)), l1, l2))
}
}
April 29th, 2007
Often people want to merge datasets and have names of countries or locations that they want to merge on. These names are often somewhat similar, but not exactly. A function in R that is hugely useful to merge in this case is called agrep. With this function you can do approximate matching of names (or rather, or strings as subset of other strings). To merge properly, though, you do want to avoid matching the same name twice and you want to prioritize exact matches over very fuzzy matches. The idea is not mine, but Eduardo’s. To do so, I wrote a little R function, which is here in beta version:
agrep.wrapper < - function(x, y, names.x = "name", names.y = "name", ids.x = "id", ignore.case=TRUE, max.threshold=1) {
x <- as.data.frame(x, stringsAsFactors=FALSE)
y <- as.data.frame(y, stringsAsFactors=FALSE)
unique.x.select <- !duplicated(x[,ids.x])
unique.x.names <- x[,names.x][unique.x.select]
unique.x.ids <- x[,ids.x][unique.x.select]
unique.y.select <- !duplicated(y[,names.y])
unique.y.names <- y[,names.y][unique.y.select]
unique.y.ids <- rep(NA,length(unique.y.names))
matching.x.names <- unique.x.names
matching.x.ids <- unique.x.ids
for (threshold in seq(from=0, to=max.threshold, by=.1)) {
i <- 1
while (i <= length(matching.x.names)) {
select <- (1:length(unique.y.ids) %in% agrep(matching.x.names[i], unique.y.names, ignore.case=ignore.case, max.distance=threshold)) & is.na(unique.y.ids)
if (sum(select) > 0) {
unique.y.ids[select] <- matching.x.ids[i]
matching.x.ids <- matching.x.ids[-i]
matching.x.names <- matching.x.names[-i]
} else
i <- i + 1
}
}
unique.data <- merge(data.frame(unique.x.names, unique.x.ids), data.frame(unique.y.names, unique.y.ids), by.x=”unique.x.ids”, by.y=”unique.y.ids”, all=TRUE)
list(matches = unique.data)
}
March 8th, 2007