I have a dataset with unnecessarily many data points, hence I want to compress it by reducing it to about 1% of the original data points
Is there any easy way to do this? (I'm a complete novice!)
I have a dataset with unnecessarily many data points, hence I want to compress it by reducing it to about 1% of the original data points
Is there any easy way to do this? (I'm a complete novice!)
Yes, but it depends on how simple the logic is to determine if it should remain or be deleted.
Please ensure you mark your thread as Solved once it is. Click here to see how.
If a post helps, please don't forget to add to our reputation by clicking the star icon in the bottom left-hand corner of a post.
Please Login or Register to view this content.
Last edited by alansidman; 09-26-2014 at 03:27 PM.
if you say want say points 1,101,201........
then something like
=INDEX($A$2:$A$10000,ROWS($1:1)*100-99) filled down
"Unless otherwise stated all my comments are directed at OP"
Mojito connoisseur and now happily retired
where does code go ?
look here
how to insert code
how to enter array formula
why use -- in sumproduct
recommended reading
wiki Mojito
how to say no convincingly
most important thing you need
Martin Wilson: SPV
and RSMBC
None of the described methods seems to work - martindwilson could you expand on yours?
(Obviously I could delete 99 manually bit by bit - but it would take day to get through the data)
What is your criteria to remove the rows?
Alan עַם יִשְׂרָאֵל חַי
Change an Ugly Report with Power Query
Database Normalization
Complete Guide to Power Query
Man's Mind Stretched to New Dimensions Never Returns to Its Original Form
I'm not sure I follow - I have rows like this
111100 - 12
111150 - 12
111200 - 12
111250 - 12
Etc..
I basically just want to get rid og 99 for each 100 row (the data is far to expanded for my needs)
All the rows are different so how do you determine what to keep and what to delete? There must be some criteria otherwise your data doesn't mean much after the deletion.
<---------If you like someone's answer, click the star to the left of one of their posts to give them a reputation point for that answer.Ron W
Guys, I think there is no criteria for deleting rows he simply wants a (random?) subset so wants to keep 1 in every 100 rows to reduce list length to 1% of original.
To the OP - Mr Wilsons solution won't delete any rows but if your data is in column A then put his formula in columnB and copy down and you'll get A1 in B1, A101 in B2, A201 in B3 etc which will give you the compressed list without the need to delete any of your data. You can then copy this short list for whatever needs you have for it.
Happy with my advice? Click on the * reputation button below
Richard Buttrey
RIP - d. 06/10/2022
If any of the responses have helped then please consider rating them by clicking the small star icon below the post.
@ Richard Buttrey
Your solution sounds completely reasonable to me. Scale it according to the number of records and eh voila!
reduce list to smaller list using random selection.xlsm
Richard. That's a simple approach - I like it - sometimes the simple approach is the best
I got bored so wrote this. You can enter the %age in d1 (10% is a 90% reduction) and using the dummy data in column A will return random (unique) values from the list in accordance with the percentage requested. The bigger the number (the lower the reduction ) the longer it will take as it is looking randomly for unique values so no duplicates so if it finds a duplicate it keeps looking. That's not to say you won't get duplicate data (just not duplicate rows), Column B is the row number and column c is the corresponding data for the respective row in A
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks