Data manipulation in VBA and Excel; it’s slow, real slow
Something that I’ve been thinking about a lot lately is how to get the best performance out of spreadsheets. My employer utilizes many spreadsheets for a wide variety of mass data manipulation; most of the time it’s simply moving information between cells and performing basic arithmetic. However, some of the spreadsheets they’re using are quite complicated and take a lot of time to be processed. Some of the spreadsheets they’re using are operated almost entirely by manual data manipulation, and in these instances, it can take up to a day to process the same amount of information that a script or other programmed system should be able to do in a matter of seconds.
Speaking specifically for a moment, we have a spreadsheet that contains some information about different clients. The first column contains their account number, something unique that we can key them by; however, it’s laid out in a not-so-intelligent manner. If a client happens to have, say, three rows of information, the first column lists their account number three times. So, each row has the unique account number in the first column, no matter the total number of rows.
I’ve been asked to write some code which can search through the information in the other columns for all of the clients contained in the spreadsheet, and then when a certain criteria is met, copy that clients’ cell range to another sheet. I figured that I would need to know the number of rows that the client had assigned to them (the number of columns is known), so I created an algorithm that figured out this number.
This is where problems started to arise. As in all previous Office versions, Office 2007 uses a cut-down version of Visual Basic, VBA, to support programmed functions. VBA doesn’t include a lot (or any) abstract datatypes like, for example, the humble ArrayList. “OK, that’s fine, I’ll use an easier way — another sheet!”, which of course made plenty of sense at the time. Why would I want to use an ArrayList when I can use another sheet in the workbook just like one? Good question, actually. The reason being is that using another sheet in the workbook is incredibly slow. The current dataset I’m working with has about 1300 rows of client data, made up of clients whose individual row count varies from one to thirty rows. On a quad-core Intel machine at 2.4Ghz, it takes approximately 30 seconds to run through only 1300 lines and output the clients’ ID and row count to a seperate sheet. Bearing in mind this isn’t even half of my searching algorithm, this is just the part to determine how large a data range I need to search.
I am going to assume this incredible slowness is due in part to either VBA itself or the fact that I have to perform about 800 data entry commands into the results sheet; an operation I’m guessing isn’t particularly optimised due to the fact that it’s assumed under normal circumstances the data entry would be happening at human typing speeds.
Now, obviously there are a few issues with my approach. I’m still thinking about how I can come up with a more efficient way to perform this kind of data manipulation. I’m thinking that there needs to be some kind of integration of the .NET platform into Office; but that’s rather a blunt (and far-off) way to solve my immediate problems of algorithm, programming language and data entry inefficiencies.
So, I ask the readers, how would you approach this situation?
Update: I’ve solicited the assistance of the good people of the OCAU graphics and programming forum, and they have provided me with some alternative algorithms, one neat and simple one making use of the Windows.Scripting dictionary datatype which allows the storage of (key,value) pairs — in my case, the key being the id number, and the value being the count. However, while it takes <1 second to generate this dictionary, I can see now the other 29 seconds is caused by the entering of the data into the results spreadsheet. The thing is, I don’t really need to enter this data into the results sheet. I can just use the dictionary.
About this entry
You’re currently reading “Data manipulation in VBA and Excel; it’s slow, real slow,” an entry on if it's owən
- Published:
- 04.02.08 / 10am
- Category:
- Blog












5 Comments
Jump to comment form | comments rss [?] | trackback uri [?]