+ Reply to Thread
Results 1 to 7 of 7

Problem with creating bell curves

  1. #1
    Registered User
    Join Date
    05-03-2016
    Location
    Sweden
    MS-Off Ver
    2013
    Posts
    4

    Question Problem with creating bell curves

    I am trying to create a set of bell curves for all the data I have collected during a few days of data collection.
    But I have never used bell curves before and I have been looking for a good "how to do it" tutorial but I never seem to get it right.

    Here is an imgur-link of the sheets so people can see what I have done (I wasn't allowed to attached a file):
    http://imgur.com/a/Qdj2i

    Here are two pastebin links for the data in both sheets (the data isn't sorted!)
    http://pastebin.com/pTCcm0at - Sheet 1 (Processor usage in %)
    http://pastebin.com/jTYkDpbt - Sheet 2 (Time in milliseconds)

    If anyone need more information please tell me what you need.
    I am not a expert on diagrams and such so I am not sure what I should add.
    Attached Files Attached Files
    Last edited by Sightburner; 05-03-2016 at 11:41 PM. Reason: Attached xls with data from one of my tests.

  2. #2
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,688

    Re: Problem with creating bell curves

    What is it you need to do that is different from what you've done? It looks like your charts correctly show the distribution of your data. It looks like you can tune the bin size.

    "Bell curve" is a term that describes a normal distribution. If you data is not normally distributed, your chart will not be a bell curve, and this looks like the case with your data. So I'm not sure what you meant by "bell curve" in your description.

    You should be allowed to attach a file. If the paper clip icon doesn't work (we are having intermittent problems with it), when editing a post scroll down and look for the link that says "Manage Attachments". It will let you attach a file but the screens look different.
    Jeff
    | | |·| |·| |·| |·| | |:| | |·| |·|
    Read the rules
    Use code tags to [code]enclose your code![/code]

  3. #3
    Registered User
    Join Date
    05-03-2016
    Location
    Sweden
    MS-Off Ver
    2013
    Posts
    4

    Re: Problem with creating bell curves

    "What is it you need to do that is different from what you've done? " I assumed it should take a bell shape (I have never used this kind of diagram before). Every tutorial I seen end up with something that look like a bell.
    I am compiling diagrams of the data I collected during my bachelor degree project, and one of the examinators wanted me to use bell curves since I have so much data a "normal" column chart get kind of messy.
    I also tried using the box plot diagrams but that didn't work very well either.

    I have 20 different tests with 2000 data points each (1000 for CPU performance, 1000 with generation time), I wanna create two suitable diagrams for all this data that isn't to messy to look at.
    I got the suggestion of using bell curve and box plot (box plot was a problem see the attachment, this is from my pilot study). So I decided to try using the bell curve.

    "It looks like you can tune the bin size." any suggestion on how I should tune it?

    "You should be allowed to attach a file.[...]" I'll give it a try. *Edit* Thanks that worked
    Attached Files Attached Files
    Last edited by Sightburner; 05-03-2016 at 11:49 PM.

  4. #4
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,688

    Re: Problem with creating bell curves

    I am not clear on what your data means.

    Your first file 5000x5000 shows a list of values, and plots them on a curve to show frequency distribution. Did you develop this or is it just an example you got from somewhere else? I ask that because if you developed it, I would not expect your question about bin size. When plotting continuous variable distributions, like you are doing here, you have to block out the data into bins. For example, suppose you collect the height in mm of 1000 people. You could get 500 different heights, so you can't plot the frequency of that--you have 1-3 of each height. The curve wouldn't be informative. You have to block the data into bins. So you might establish a bin size of 10 mm, so one bin would be 1700-1710 mm. That allows you to plot the number of data points in each bin on a curve. That is basically what your 5000x5000 file does. The cell H7 labeled "16th Bin Step Size" is the bin size. The bins are calculated in H9:H24. This model can handle from 1-16 bins. If you change the bin size in H7, then the plot will automatically change. Your bin size is calculated as 1/15 of the interval between 4 standard deviations above and below the mean. One strange thing about this chart is it only plots the distribution in that interval--it doesn't plot all your data. You can tune the bin size by intuition until you get a smooth curve that has a defined shape. If the bin size is too small, the curve looks flat and random. If it's too big, its shape is too coarse to give any intuition about the data. You can also use a technique like the Freedman–Diaconis rule. That rule gives a bin size of 0.001784516 for the data in Sheet1, about 1/10 of your bin size. The chart on that sheet is based on data from an external file so I have no idea what is happening there. I have implemented these changes in your file and also plotted all the data.

    I don't understand how to interpret the data in Pilotstudie and it is completely different than the data in 5000x5000. What do these values mean: CA TID, CA CPU, DLA Tid, DLA CPU
    What are independent variables and what are dependent variables?

    For a normal distribution of data, the curve will have a bell shape. Maybe instead of a "bell curve" you mean "frequency distribution."
    Attached Files Attached Files

  5. #5
    Registered User
    Join Date
    05-03-2016
    Location
    Sweden
    MS-Off Ver
    2013
    Posts
    4

    Re: Problem with creating bell curves

    I have no previous experience with making graphs in excel, but it is what my examinator wants so that is what I am trying to achieve.

    The bell curve diagram I tried to make was made from the answer from joeu2004 in his answer here, so I might have misunderstood something in his explanation. My examinator said he would like to see a bell curve and drew something similar too this. So that is what I am trying to achieve, but like I said I have no experience in making graphs so I looked at both tutorials and answers like the one linked here.

    In the 5000 x 5000 sheet 1, the data is processor usage in percent and the second sheet is generation time in milliseconds.

    I have collected the data myself with a program I wrote in C++ that generate caves using DLA or CA.
    The data I saved was processor usage and time. During the generation the program will calculate the
    CPU usage of the generation and how long it took.

    CA CPU is 1000 caves and their measured processor usage (how much of the processor(s) was usaged) to be generated with cellular automata.
    So for example M2 in sheet 1 in the pilot stusay 25, this is the measured processor usage in percent for cave 1.
    In this case it means that 1 of 4 computer cores was used to 100%

    CA Tid is 1000 caves how long it took in milliseconds to generated them with cellular automata.
    M2 in sheet 1 say 544,898489, this is the time it took to generate cave 1.

    The two others are the same but for diffusion limited aggregation instead of cellular automata.
    In the 5000 x 5000 document sheet 1 is processor usage, and sheet 2 is time for caves of size 5000 cells in width and 5000 cells height.

    Yeah pilot studie is my pilot study, that was just done to see if I could collect data with the program, and 5000 x 5000 contains data that was collected during the testing.

  6. #6
    Administrator 6StringJazzer's Avatar
    Join Date
    01-27-2010
    Location
    Tysons Corner, VA, USA
    MS-Off Ver
    MS365 Family 64-bit
    Posts
    24,688

    Re: Problem with creating bell curves

    I don't really understand most of your explanation about your data (for example, what is a cave?) so I can't make suggestions about other ways to present it.

    I think part of the problem is that people are saying "bell curve" when they mean "distribution curve". It is only a bell curve when the distribution is a normal distribution. Your data in the 5000x5000 file is not a normal distribution. You have excellent distribution curves in that file, but they will not be bell-shaped because the data is not normal.

    A distribution curve shows the frequency of a single variable. Your other file seems to have multiple variables. One way to show the relationship between two variables is to use a scatter graph. To help with that I would need to know which variable is independent and which is dependent. Then you can calculate the correlation and R2 (strength of the correlation). But your description does not give enough information to know if this is an appropriate way to show the data.

    What is it you are trying to show? That is, what is the point of collecting all this data?

  7. #7
    Registered User
    Join Date
    05-03-2016
    Location
    Sweden
    MS-Off Ver
    2013
    Posts
    4

    Re: Problem with creating bell curves

    ** EDIT **
    I uploaded the image to imgur instead
    ** EDIT **

    "I don't really understand most of your explanation about your data (for example, what is a cave?) so I can't make suggestions about other ways to present it."
    My degree project is called "PROCEDURALL GENERATED CAVE LEVELS: A comparison between cellular automata and diffusion limited aggregation". It is quite hard to explain
    if some of the terms are foreign to the read. Anyway in this project I am taking a look on these two techniques (CA and DLA) to generate caves (game levels).

    The data I have been collecting represent the processor usage (how hard the processor worked during the generation)
    and time (how long it took from start to finish) for each single cave generated. This is the performance.
    The data is collected during the generation of each cave.

    So for example if CA took 5000 ms to generate the cave, and DLA took 2500 ms, then DLA have better performance-

    So A2 to A1001 in the document is each representing 1 cave either in processor usage or time (depending on what sheet you look at).
    Each size (in the case here 5000 x 5000) generates 1000 caves that are X * Y cells/pixels big.
    I'll upload a cave to imgur that is 5000 x 5000, it might make you even more confused though.

    "I think part of the problem is that people are saying "bell curve" when they mean "distribution curve".
    It is only a bell curve when the distribution is a normal distribution. Your data in the 5000x5000 file is not a normal distribution.
    You have excellent distribution curves in that file, but they will not be bell-shaped because the data is not normal
    "
    The graph you made in the xls file was really nice. So I am not locked to a specific kind of graph, so if I can use something that give a good
    representation of the values I am happy.

    "What is it you are trying to show? That is, what is the point of collecting all this data?"
    The short answer is: Which one is better, CA or DLA? Based on their time and CPU usage. A technique that uses less time is better.
    And I would have to argue both sides off the low/high number in a CPU because you wanna use as much CPU as possible, but if it can be done with less is that better?
    But that is out of scope here.

    Anyway I have my two techniques, CA and DLA, and I am trying to make graphs that will show which one of these techniques that is the one with better performance.
    So I have 4 columns of data (2 for CA and 2 for DLA) 2 for time, and 2 for CPU usage.

    So what I wanna show is a graph for each size category (I have 6 categories) that show either CPU usage, or time where one line represent the CA data and one line represent the DLA
    data. Something like this, this show CPU usage, I just made it quick so I don't know if this is a good way to say "In this test X performed better than Y".
    I personally would have been happy enough to just use min, max and average, but my examinator would prefer that I use some kind of graph, and the one you generated and I made a quick example of
    in this post seems like they are suited for what I wanna do.

    And I hope I manage to be a bit more clear now. I am not very good at explaining things.
    Last edited by Sightburner; 05-04-2016 at 05:59 PM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. HELP!! Creating a bell curve?
    By myturn19 in forum Excel General
    Replies: 1
    Last Post: 12-18-2014, 07:10 PM
  2. Daily Statistics Updates and creating of curves
    By bloynoys in forum Excel Charting & Pivots
    Replies: 0
    Last Post: 02-05-2013, 09:47 PM
  3. Help Regarding Normal Distributions and Bell Curves
    By Euriah in forum Excel Formulas & Functions
    Replies: 0
    Last Post: 10-11-2012, 12:17 PM
  4. Comparative Charting (Bell Curves?)
    By Mordred in forum Excel Charting & Pivots
    Replies: 7
    Last Post: 06-05-2011, 11:31 AM
  5. Excel Charts - Need Help Creating Bell Curves
    By hrm in forum Excel Charting & Pivots
    Replies: 1
    Last Post: 12-16-2009, 04:56 PM
  6. Created bell curve, do not understand the curves i created
    By Phokus in forum Excel Charting & Pivots
    Replies: 2
    Last Post: 05-22-2008, 09:16 AM
  7. [SOLVED] Creating Boundary Curves in Excel?
    By Robert H in forum Excel Charting & Pivots
    Replies: 1
    Last Post: 12-12-2005, 05:35 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1