Home Trying to assign groups in R, but it is filling in NA values and missing others that belong in the group
Reply: 2

Trying to assign groups in R, but it is filling in NA values and missing others that belong in the group

Help ME
1#
Help ME Published in 2017-12-07 21:21:11Z

So I have a dataset in R (Framingham Heart Study data), and I am trying to assign BMI groups "underweight," "normal," "overweight," and "obese."

It has over 11,000 observations and 38 variables/columns, so it would be kind of hard to post some of the data here (I hope this won't be too much trouble to answer without it).

The dataset is called frm and I am trying to subset in the following way:

frm$BMIGRP <- NA  #Creating new variable (this part works and creates a BMIGRP column with all NA values)
frm$BMIGRP[which(as.numeric(frm$BMI) < 18.5)] <- "underweight"

However, there are NA values in the data set BMI variable (indicated with a ".", which I have also tried changing to NA).

When I try to subset this way for each group, it is assigning only some of the underweight values to "underweight" and is assigning a lot of NA / "." values to underweight as well. It then tells me there are only 10 "normal" weight subjects and about 11000 in the obese category which is just not true because I can view the data set.

If done correctly, this should create the four groups with several hundred to several thousand observations in each category. But I am only getting 10 normal, 71 underweight, and ~11,000 obese.

I'm just not sure where I'm going wrong with this or if there is a different way I can create a new variable and assign it in the same kind of way. Any help is very much appreciated.

I should also mention that this is the code that my professor gave us as an example in our lab session, and I am basically copying and pasting it with the appropriate replacements for my data set.

This is my first question on this website so I apologize if it is incomplete or if I need to give more information. Thanks!

leeum
2#
leeum Reply to 2017-12-08 04:30:16Z

Reading your code, it seems the column is not numeric.

This should work:

frm$BMI <- as.numeric(frm$BMI)
frm$BMIGRP[frm$BMI < 18.5] <- "underweight"
Highland
3#
Highland Reply to 2017-12-07 21:40:33Z

Like @leeum said. Check that BMI is numeric. If you want to make a new category column based on the BMI, look at case_when from dplyr. So maybe this is what you wanted:

library(dplyr)

frm <- frm %>% 
  mutate(BMI = as.numeric(BMI)) %>%
  mutate(BMIGRP = case_when(
    BMI < 18.5 ~ 'underweight',
    between(BMI, 18.5, 24.9)  ~ 'healthy weight',
    between(BMI, 25, 29.9) ~ 'overweight',
    BMI > 30 ~ 'obese')
  )

The mutate(BMIGRP = as.numeric(BMIGRP)) converts the BMIGRP column to numeric. Then the mutate(BMIGRP = case_when(...) will create a new column called BMIGRP and assign 'underweight', 'healthy weight', 'overweight' or 'obese' based on the BMI. If the argument doesn't apply, an NA will be assigned.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.32237 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO