Sunday, October 9, 2011

Join datasets by using the 'JOIN' function

Well, this is a real problem I came across in my research. I took hundreds of samples and recorded the sampling time in a spreadsheet. Then I brought them back to my lab and have them analyzed, and the results was recorded in another spreadsheet. And because I analyzed the samples in a random order, the two files look like these:



So I cannot just copy the column of one file and paste it to the other file. I then tried to import these two files into MATLAB and wrote a code by myself to combine these two files based on the BagID. That is not really easy but the code turned out to be working fine. And after a while, I happened to know that there is a much simpler way to do what I wanted to do- the join function is designed for this purpose. Here is what I should do in the first place:

%import data as two datasets
ds1=dataset('XLSFile', 'BagID.xlsx');
ds2=dataset('XLSFile', 'results.xlsx');

%Jion ds1 and ds2 based on the common variable name, which is the 'BagID' in this case.
ds3=join(ds1, ds2)

Here's the output.



That's it! Isn't it neat?

P.S. Those are fake data and of course I can sort both files by the BagID and then copy&paste in the spreadsheet. But anyway, this is a good MATLAB function to know, right?

5 comments:

  1. Hello.

    I have tried to use join just like you did, I have two excel spread sheets with data for the time of an event. In one spread sheet is the day and the second one is the time step (data was taken every 8 seconds). Any time I use join I get the error: Error using dataset/join (line 163)
    Cannot find a common dataset variable to use as a key variable.

    I know I can just copy and paste the data to the same spread sheet, but I need to plot the data using the time I have for the X axis. So I literally need to join the two data sets. Any advice?

    ReplyDelete
    Replies
    1. Hi Whitney,
      I am so sorry that I just saw your comment. There is a common variable name, the 'BagID', in my dataset ds1 and ds2. I think your error message is due to that there is no such same variable in your datasets.
      I hope you have already figured it out long time ago. Thank for commenting.

      Delete
  2. I did the same thing with my dataset it worked well. thanks.

    I would further like to know if I wish to make a plot of ds3 for e.g. bag id vs sampling time. How do i go about it?

    ReplyDelete
  3. Hi Mathur,
    I think you won't be able to directly use the BagID column as x aixs because they are strings.
    However, you could plot the other column, say 'Sampling time', and then use the 'BagID' as the x aixs tick labels.
    Here's more information about how to do it:

    http://matlabnewbie.blogspot.com/2010/04/lets-plot-something-first.html

    Thanks for commenting!

    ReplyDelete
  4. This method proves invaluable in tasks like database management, data cleaning, and creating comprehensive reports. Best 12 Laptops A vital skill for any data-driven professional.

    ReplyDelete

Any comments?

my-alpine and docker-compose.yml

 ``` version: '1' services:     man:       build: .       image: my-alpine:latest   ```  Dockerfile: ``` FROM alpine:latest ENV PYTH...