+ Reply to Thread
Results 1 to 3 of 3

Data Transformation

  1. #1
    Registered User
    Join Date
    08-28-2004
    Posts
    1

    Data Transformation

    Hi All,
    I have a data set with variables id, a, b, c, d and looks like:


    1 2 3 5 1
    1 5 6 5 3
    1 2 4 2 4
    2 1 3 4 5
    2 5 6 2 3
    2 4 5 2 1


    I would like to summarize data for each id and create a new variable
    representing each code occuring in variables a, b and c.


    Essentially the new data set should have variables id, v1 - v6.
    v1 represents the number of times code 1 occurs in variables a, b and c
    in the original data set, and so on for v2 - v6.


    The data set would like;
    1 1 3 2 2 3 1
    2 2 2 2 2 1 1


    Your help is much appreciated.


    Thanks very much,
    Zai Saki

  2. #2
    Tom Ogilvy
    Guest

    Re: Data Transformation

    Sub AB()
    Dim rng As Range
    Dim cell As Range
    Dim arr() As Long
    Dim ub As Long
    Set rng = Range("A1").CurrentRegion.Resize(, 4).Cells
    ub = Application.Max(rng.Columns(1))
    ReDim arr(1 To ub, 0 To 6)
    For Each cell In rng
    If cell.Column = 1 Then
    Id = cell.Value
    arr(Id, 0) = Id
    Else
    arr(Id, cell.Value) = arr(Id, cell.Value) + 1
    End If
    Next
    Range("H1").Resize(ub, 7).Value = arr

    End Sub

    --
    Regards,
    Tom Ogilvy




    "zaisaki" <[email protected]> wrote in
    message news:[email protected]...
    >
    > Hi All,
    > I have a data set with variables id, a, b, c, d and looks like:
    >
    >
    > 1 2 3 5 1
    > 1 5 6 5 3
    > 1 2 4 2 4
    > 2 1 3 4 5
    > 2 5 6 2 3
    > 2 4 5 2 1
    >
    >
    > I would like to summarize data for each id and create a new variable
    > representing each code occuring in variables a, b and c.
    >
    >
    > Essentially the new data set should have variables id, v1 - v6.
    > v1 represents the number of times code 1 occurs in variables a, b and c
    >
    > in the original data set, and so on for v2 - v6.
    >
    >
    > The data set would like;
    > 1 1 3 2 2 3 1
    > 2 2 2 2 2 1 1
    >
    >
    > Your help is much appreciated.
    >
    >
    > Thanks very much,
    > Zai Saki
    >
    >
    > --
    > zaisaki
    > ------------------------------------------------------------------------
    > zaisaki's Profile:

    http://www.excelforum.com/member.php...o&userid=13755
    > View this thread: http://www.excelforum.com/showthread...hreadid=480390
    >




  3. #3
    Harlan Grove
    Guest

    Re: Data Transformation

    "zaisaki" <[email protected]> wrote...
    >I have a data set with variables id, a, b, c, d and looks like:
    >
    >1 2 3 5 1
    >1 5 6 5 3
    >1 2 4 2 4
    >2 1 3 4 5
    >2 5 6 2 3
    >2 4 5 2 1
    >
    >I would like to summarize data for each id and create a new variable
    >representing each code occuring in variables a, b and c.
    >
    >
    >Essentially the new data set should have variables id, v1 - v6.
    >v1 represents the number of times code 1 occurs in variables a, b and c
    >
    >in the original data set, and so on for v2 - v6.
    >
    >The data set would like;
    >1 1 3 2 2 3 1
    >2 2 2 2 2 1 1

    ....

    Given your source data, neither your 1st or 2nd result rows are consistent
    with counting only fields a-c. The 1st row is consistent with counting
    fields a-d, the 2nd isn't consistent with any set of fields since the 2nd
    through 7th columns total 10, which isn't a multiple of 3 (the number of
    rows in your source data with ID=2). If you meant to count fields a-d, your
    second row should be

    2 2 2 2 2 3 1

    You could do this entirely with formulas, or a pivot table if you add field
    names. As for formulas, if your source data was in a range named TBL, and
    the top-left result cell were G5, try

    G5:
    =INDEX(TBL,1,1)

    G6 [array formula]:
    =INDEX(TBL,MATCH(0,COUNTIF(G$5:G5,INDEX(TBL,0,1)),0),1)

    H5:M5 [array formula]:
    =MMULT(COLUMN(INDIRECT("RC1:RC"&COUNTIF(INDEX(TBL,0,1),$G5),0))^0,
    COUNTIF(OFFSET(TBL,SMALL(IF(INDEX(TBL,0,1)=$G5,ROW(TBL)),
    ROW(INDIRECT("1:"&COUNTIF(INDEX(TBL,0,1),$G5))))-ROW(INDEX(TBL,1,1)),1,1,
    COLUMNS(TBL)-1),{1,2,3,4,5,6}))

    Select H5:M5 and fill down into H6:M6. Then select G6:M6 and fill down as
    needed.

    An alternative VBA approach that ignores values in fields a-d that aren't
    integers between 1 and 6 would be


    Sub foo()
    'requires reference to Microsoft Scripting Runtime

    Dim dk As New Dictionary, dv As New Dictionary
    Dim sv As Variant, rv As Variant
    Dim i As Long, j As Long, k As Long, x As Variant

    k = 0
    'modify next line as needed
    For Each x In Array(1, 2, 3, 4, 5, 6)
    k = k + 1
    dv.Add Key:=x, Item:=k
    Next x

    k = 16
    ReDim rv(0 To dv.Count, 1 To k)

    sv = Range("A1").CurrentRegion.Value

    For i = 1 To UBound(sv, 1)
    If Not dk.Exists(sv(i, 1)) Then
    dk.Add Key:=sv(i, 1), Item:=dk.Count + 1
    rv(0, dk.Count) = sv(i, 1)
    End If

    If dk.Count >= k Then
    k = 2 * k
    ReDim Preserve rv(0 To dv.Count, 1 To k)
    End If

    For j = 2 To UBound(sv, 2)
    If dv.Exists(sv(i, j)) Then _
    rv(dv(sv(i, j)), dk(sv(i, 1))) = rv(dv(sv(i, j)), dk(sv(i, 1))) + 1
    Next j
    Next i

    ReDim Preserve rv(0 To dv.Count, 1 To dk.Count)

    ActiveCell.Resize(dk.Count, dv.Count + 1).Value = _
    Application.WorksheetFunction.Transpose(rv)

    End Sub



+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Search Engine Friendly URLs by vBSEO 3.6.0 RC 1