+ Reply to Thread
Results 1 to 11 of 11

Generating data set timestamps based on distribution

  1. #1
    Registered User
    Join Date
    06-29-2016
    Location
    Bristol, UK
    MS-Off Ver
    2010
    Posts
    23

    Generating data set timestamps based on distribution

    I have a data set of 4600 points which are messages that another program I have made will analyse. To do this, each message is tagged with a timestamp in whole seconds of when it will be analysed, between 0 and 600. For this data set, I need the messages to be analysed during a specific time range, i.e. 240 - 440 seconds. The aim of this is to replicate social media posts about a topic, with a sudden explosion of posts followed by gently tail off. A trace of the posts would be something like the orange line below.

    325px-Gamma_distribution_pdf.svg.png

    So what I need help with is creating a function (defined or custom via VBA) that takes parameters of start and end points for the time range (i.e. 240, 440) and to generate a timestamp for each of the 4600 data points to distribute them in a way similar to the line on the graph.

    Any help much appreciated!

    Thanks,

    Dan

  2. #2
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Generating data set timestamps based on distribution

    For example, ...?
    Entia non sunt multiplicanda sine necessitate

  3. #3
    Registered User
    Join Date
    06-29-2016
    Location
    Bristol, UK
    MS-Off Ver
    2010
    Posts
    23

    Re: Generating data set timestamps based on distribution

    Sorry, I don't follow your question. What would you like an example of?

  4. #4
    Forum Expert
    Join Date
    06-08-2012
    Location
    Left the forum!
    MS-Off Ver
    Left the forum!
    Posts
    5,189

    Re: Generating data set timestamps based on distribution

    Your post barely contains enough information for somebody to help who maybe works on similar tasks and already has an understanding of what you're doing.

    For the rest of us it's like taking a high tech military plane to Joe's back street garage for an oil change, until he has enough information to do the job he's either going to be looking at it with a blank expression, or making a complete mess of things.

  5. #5
    Registered User
    Join Date
    06-29-2016
    Location
    Bristol, UK
    MS-Off Ver
    2010
    Posts
    23

    Re: Generating data set timestamps based on distribution

    I see.

    Right! My dataset is 4600 messages that replicate social media posts over a period of time. Each is to be read into a data analyser based on an associated timestamp in whole seconds between 0 and 600. For example, the analyser will call in all posts with timestamps between 5 and 9. It is these timestamps that I need to be able to generate to adjust when they will be processed. To do this I want to be able to provide a start and end time for the timestamps, 240 and 440 seconds for example. This gives a block of time where all 4600 messages will be called into the anlayser. The reason I need them to follow a distribution during that time range, such as in the graph above, is the nature of a sudden "explosion" of posts about a topic that then die off to nothing. Obviously, a lot of these posts will have the same timestamp. This means the frequency of posts being analysed matches one of the above distributions with a few hundred having the same timestamp to begin with, replicating the spike in talk about the topic, to a few or individual posts towards the end as talk quietens down.

    So for the posts, a column of values for timestamp will be generated to something like:
    Post1 - 240
    Post2 - 240
    Post3 - 241
    ...
    Post4600 - 440

    Hope that makes things a bit clearer

  6. #6
    Forum Expert shg's Avatar
    Join Date
    06-20-2007
    Location
    The Great State of Texas
    MS-Off Ver
    2003, 2010
    Posts
    40,678

    Re: Generating data set timestamps based on distribution

    A
    B
    C
    1
    Start
    Start
    2
    1
    240
    3
    End
    End
    4
    4600
    440
    5
    6
    Item
    Stamp
    7
    1
    240.00
    B7: =(A7-$A$2)/($A$4-$A$2) * ($B$4 - $B$2) + $B$2
    8
    2
    240.04
    9
    3
    240.09
    10
    …
    …
    11
    100
    244.31
    12
    …
    …
    13
    200
    248.65
    14
    …
    …
    15
    1000
    283.44
    16
    …
    …
    17
    2000
    326.93
    18
    …
    …
    19
    3000
    370.42
    20
    …
    …
    21
    4000
    413.91
    22
    …
    …
    23
    4600
    440.00

  7. #7
    Registered User
    Join Date
    06-29-2016
    Location
    Bristol, UK
    MS-Off Ver
    2010
    Posts
    23

    Re: Generating data set timestamps based on distribution

    I have used the above formula over my dataset and then rounded down to give me integers. I have then counted the number of timestamps in each bin and plotted a histogram to view my distribution. It seems to give 36 values in each bin (between 240 and 449).

    Am I doing something wrong? I was expecting the shape to be similar to the orange line in the diagram in my original post.

    My spreadsheet is attached.
    Attached Files Attached Files

  8. #8
    Valued Forum Contributor Hydraulics's Avatar
    Join Date
    07-15-2018
    Location
    Udine - Italy
    MS-Off Ver
    Office 365
    Posts
    370

    Re: Generating data set timestamps based on distribution

    Hi,

    please take a look at the attached file, I have added a new sheet. I'm in a bit of a hurry, anyhow this is what I did:

    - Divided the messages in 200 bins (440-240)
    - Calculated lower and upper bound of the x value that will be used for the Gamma function (please note that there is a parameter, max value, since the function has no upper bound; change it as suits you best)
    - Calculated the difference in cumulative gamma distribution for each bin
    - Added a column with message numbers from 240 to 440 (for the chart)
    - Finally found the number of messages in each bin multiplying them for column D

    I hope this is enough to get you started.
    Attached Files Attached Files
    Last edited by Hydraulics; 07-24-2018 at 11:21 AM.

  9. #9
    Valued Forum Contributor Hydraulics's Avatar
    Join Date
    07-15-2018
    Location
    Udine - Italy
    MS-Off Ver
    Office 365
    Posts
    370

    Re: Generating data set timestamps based on distribution

    I realize that in my previous message there is a mistake.

    The numbers 240 and 440 are referred to seconds, not messages. Sorry for the confusion.

    However the final result still holds. This is the distribution of timestamps.

    Generating timestamps(Hydr)-chart.jpg
    Aim high or don't even try.
    ---------------------------------
    If your question has been answered, don't forget to mark the thread as SOLVED.
    If you find an answer helpful, click on the star icon at the bottom of the post.

  10. #10
    Registered User
    Join Date
    06-29-2016
    Location
    Bristol, UK
    MS-Off Ver
    2010
    Posts
    23

    Re: Generating data set timestamps based on distribution

    Thanks!

    Sorry I forgot to respond sooner, this is exactly what I needed!

  11. #11
    Valued Forum Contributor Hydraulics's Avatar
    Join Date
    07-15-2018
    Location
    Udine - Italy
    MS-Off Ver
    Office 365
    Posts
    370

    Re: Generating data set timestamps based on distribution

    Thank you for feedback an rep, glad I could help.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. [SOLVED] Percentage distribution based on another distribution
    By Sthlm in forum Excel General
    Replies: 5
    Last Post: 08-29-2017, 06:32 AM
  2. Generating variables with Poisson Distribution
    By Le_Kev in forum Excel Programming / VBA / Macros
    Replies: 5
    Last Post: 09-24-2015, 05:24 AM
  3. [SOLVED] Need Help in Generating a 24 hours Job Distribution Chart
    By NiCKz_01 in forum Excel Charting & Pivots
    Replies: 5
    Last Post: 11-17-2014, 09:27 PM
  4. Generating Random Numbers with Binomial Distribution
    By eopoku in forum Excel Formulas & Functions
    Replies: 1
    Last Post: 10-06-2013, 12:07 AM
  5. Replies: 1
    Last Post: 07-15-2013, 05:37 AM
  6. [SOLVED] Generating numbers based on a probability distribution
    By Delta223 in forum Excel General
    Replies: 7
    Last Post: 03-28-2012, 08:24 AM
  7. Generating a Lognormal distribution for a stock
    By msu4life in forum Excel General
    Replies: 1
    Last Post: 03-07-2012, 02:51 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1