Welcome to Dream.In.Code
Become a C++ Expert!

Join 137,179 C++ Programmers for FREE! Get instant access to thousands of C++ experts, tutorials, code snippets, and more! There are 2,366 people online right now. Registration is fast and FREE... Join Now!




Access times: Arrays vs Vectors.

 
Reply to this topicStart new topic

Access times: Arrays vs Vectors., Is there a difference?

chaosTechnician
29 May, 2008 - 05:26 PM
Post #1

New D.I.C Head
*

Joined: 27 May, 2008
Posts: 20



Thanked: 1 times
My Contributions
The explanation: I'm working on a project for my CS class that has us reading data (integers) from three different files; sorting each one with an Insertion sort, a Shell sort, and a Quick sort; and displaying the processing times for each sort and each file. The files have 10, 10000, and 100000 records respectively.
Obviously, an insertion sort on 100000 integers will take some time. I initally wrote the program using arrays while I was testing my sorting algorithms. The insertion sort would take a few minutes (<5) and be done.
In order to make the program a bit more versatile, I've switched my arrays to vectors in case the grader decides to slip in unexpected file sizes when testing our programs. But now, the insertion sort has taken well over 20 minutes and still shows no sign of stopping.

My question is, are the access times for an Array and a Vector that different? Other than changing the parameters and adding a new variable to store a.size(), I haven't touched the algorithm. And it's taking far longer than it was before. My understanding is that there is very little difference between the two, especially if just referencing indices.

If it sheds any light, here's the code: (and, if it matters, I'm using Visual Studio 2005)
cpp
//Perform an Insertion sort on a Vector and time it
//Param: Vector to sort
//Post: Returns the number of clock ticks elapsed during sort
clock_t timedInsertionSort(vector<int> &a) {
clock_t start, end; // For tracking start and end times
unsigned int s = a.size();

start = clock(); // Start time
for(unsigned int i = 1; i < s; i++) { // Start at the second element (if there is one)
unsigned int j = i;
while(j > 0 && a[j] < a[j-1]) { // Move backward and push elements as necessary
swap(a[j], a[j-1]);
j--;
}
}
end = clock(); // Stop time

return end - start;
}

Is it a reference time issue? Is it some special case with this class I'm not aware of? Or something else entirely?
User is offlineProfile CardPM
+Quote Post

Cerolobo
RE: Access Times: Arrays Vs Vectors.
29 May, 2008 - 05:34 PM
Post #2

D.I.C Regular
Group Icon

Joined: 5 Apr, 2008
Posts: 440



Thanked: 31 times
My Contributions
Vector implementations depend on compiler, but they are all far slower then a standard array.

A vector is basically a automatically resizing array. There are usually a lot of additional checks done, in order to make vector safer. This usually includes thing like bounds checking, checking to see if the array needs to grow/shrink, ect... Again, the actual implementation differ based on the compiler.

Generally, when the vector is first created, the array is only meant to hold a few elements. When it fills up, another larger array will be crated, and the old data will be copied into the new array. Again, the implementations are compiler dependent, but he growth rate is typically between 1.5 and 2. As you can imagine, all of these grows will really slow you program down.

Generally, you'll get the best performance if you just throw everything into a array, and then preform a QuickSort on it.
User is offlineProfile CardPM
+Quote Post

chaosTechnician
RE: Access Times: Arrays Vs Vectors.
29 May, 2008 - 05:51 PM
Post #3

New D.I.C Head
*

Joined: 27 May, 2008
Posts: 20



Thanked: 1 times
My Contributions
Yeah, I understand the long overhead times when adding elements to it. But the vector is already filled at this point. Does the [] operator really take that much longer?
My newfound emperical evidence tells me I accept that it obviously must, but, really? huh.gif

Update: it finished after almost 50 minutes of sorting. That's a 10x increase over the array's time (with identical data and algorithm).

Maybe I'll just go back to the array.
User is offlineProfile CardPM
+Quote Post

Cerolobo
RE: Access Times: Arrays Vs Vectors.
29 May, 2008 - 06:00 PM
Post #4

D.I.C Regular
Group Icon

Joined: 5 Apr, 2008
Posts: 440



Thanked: 31 times
My Contributions
Yes, there is a lot of overhead by just access the element you want. Again, everything is compiler dependent. Generally, a bounds check will be preformed in operator[]. Well over half of the actual call is overhead, compared to just accessing a array. Plus, you may lose your locality of reference, since you are dealing with several different variables.
User is offlineProfile CardPM
+Quote Post

perfectly.insane
RE: Access Times: Arrays Vs Vectors.
29 May, 2008 - 06:52 PM
Post #5

D.I.C Addict
Group Icon

Joined: 22 Mar, 2008
Posts: 558



Thanked: 46 times
Dream Kudos: 25
Expert In: C/C++

My Contributions
STLPort's implementation is rather lightweight. I haven't investigated other implementations, but:

From _vector.h
cpp

template <class _Tp, _STLP_DEFAULT_ALLOCATOR_SELECT(_Tp) >
class vector : protected _STLP_PRIV _Vector_base<_Tp, _Alloc>
.....
{
....
typedef _Tp value_type;
typedef value_type* pointer;
typedef const value_type* const_pointer;
typedef value_type* iterator;
typedef const value_type* const_iterator;
....
pointer _M_start;
pointer _M_finish;


....
iterator begin() { return this->_M_start; }
const_iterator begin() const { return this->_M_start; }
iterator end() { return this->_M_finish; }
const_iterator end() const { return this->_M_finish; }
....
reference at(size_type __n) { _M_range_check(__n); return (*this)[__n]; }
....

reference operator[](size_type __n) { return *(begin() + __n); }
....

};


Here, the operator[] does not do bounds checking, unlike the at member. If you pre-allocate a vector (by giving it an int parameter when constructing it... vector var(20); (initial size is 20), there should be no significant difference between a vector and an array in terms of performance if the code is inlined. But yes, if you really care about performance, you should preallocate.

The library also provides a sort function.

#include <algorithm>

vector<int> myvec;


......

std::sort(myvec.begin(), myvec.end());

This uses the IntroSort algorithm, which is a QuickSort that changes to HeapSort if the recursion depth is too deep (as worst case performance can be bad for quicksort, depending on how the partitions are made).

This post has been edited by perfectly.insane: 29 May, 2008 - 06:54 PM
User is offlineProfile CardPM
+Quote Post

Cerolobo
RE: Access Times: Arrays Vs Vectors.
29 May, 2008 - 07:00 PM
Post #6

D.I.C Regular
Group Icon

Joined: 5 Apr, 2008
Posts: 440



Thanked: 31 times
My Contributions
You can also call reserve() to set the initial size of the vector.

From what I understood, he didn't know how many elements are in the file, so he could not do this. If you do know ahead of time, you might as well just allocate the array yourself.
User is offlineProfile CardPM
+Quote Post

perfectly.insane
RE: Access Times: Arrays Vs Vectors.
29 May, 2008 - 07:09 PM
Post #7

D.I.C Addict
Group Icon

Joined: 22 Mar, 2008
Posts: 558



Thanked: 46 times
Dream Kudos: 25
Expert In: C/C++

My Contributions
QUOTE
From what I understood, he didn't know how many elements are in the file, so he could not do this. If you do know ahead of time, you might as well just allocate the array yourself.


Absolutely. But I don't see what that has to do with comparing performance of a vector vs. an a manually managed array. The vector is replacing the work that you'd be otherwise doing yourself.
User is offlineProfile CardPM
+Quote Post

Cerolobo
RE: Access Times: Arrays Vs Vectors.
29 May, 2008 - 07:27 PM
Post #8

D.I.C Regular
Group Icon

Joined: 5 Apr, 2008
Posts: 440



Thanked: 31 times
My Contributions
A manually allocated array is still going to be faster then a vector. Yes, a vector requires less time to code, but you still have to deal with the additional overhead. Microsoft's implementation of the STL is not the best. A 10x difference in time really isn't worth it if you ask me. A manually allocated array will take far less then 30 minutes to write, at superior performance.
User is offlineProfile CardPM
+Quote Post

baavgai
RE: Access Times: Arrays Vs Vectors.
30 May, 2008 - 04:55 AM
Post #9

Dreaming Coder
Group Icon

Joined: 16 Oct, 2007
Posts: 2,047



Thanked: 106 times
Dream Kudos: 475
Expert In: C, C++, Java, C#, ASP.NET, PHP, Perl, Python, Oracle, SQL Server, MySql, HTML, JavaScript, Lua

My Contributions
Vectors are used because they can grow. You access time will, in general, suck compared to an array. On the other hand, arrays don't grow and are often subject to size (stack) limitations.

Given that you know your access time for a vector element will always underpreform an array, you'd reasonably try to minimize item requests from the collection. Curiously, your insertion sort is a worst case, because you're constanly asking for values. One simple change to your code will nearly double the speed. That is, holding the value being compared. Your constant swaping is a real performance hit.

cpp

// snagged from here: http://linux.wku.edu/~lamonml/algor/sort/insertion.html
// this one is pretty standard and can be found all over the place
void insertionSort(int numbers[], int array_size) {
int i, j, index;

for (i=1; i < array_size; i++) {
index = numbers[i];
j = i;
while ((j > 0) && (numbers[j-1] > index)) {
numbers[j] = numbers[j-1];
j--;
}
numbers[j] = index;
}
}


However, consider what an insertion sort is doing. It's moving all those elements starting at the bottom, just to make room for the next element. A vector suffers the exact same way an array does for this. However, if you used a "list" container instead, you could optimize how the elements are moved by simply inserting the current element into the correct position.

The template containers are very useful and should generally be prefer to arrays. If you have a specfic task, you probably can find a specific container to do the job.

User is online!Profile CardPM
+Quote Post

chaosTechnician
RE: Access Times: Arrays Vs Vectors.
30 May, 2008 - 07:00 AM
Post #10

New D.I.C Head
*

Joined: 27 May, 2008
Posts: 20



Thanked: 1 times
My Contributions
Well, I turned in the project last night and noticed your post just now (kept the vectors in case the grader decided to slip something unexpected in). The temporary variable vs swapping makes sense now that you've mentioned it.
Just out of curiosity I'll change the code to reflect your change and see how much time it saves.

UPDATE: I talked to the programmers where I work and they suggested I compile and run in Release mode instead of Debug mode. That dropped the processing time from 50 minutes to 30 seconds.
If anyone cared to know, here are various results on the list of 100,000 records:
Debug mode:
Using Swaps in Insertion sort:
Vector: 2433219 ticks
Array: 209594 ticks
Using baavgai's suggested change:
Vector: 1680359 ticks
Array: 18718 ticks
Release mode:
Using Swaps in Insertion sort:
Vector: 29610 ticks
Array: 7281 ticks
Using baavgai's suggested change:
Vector: 27781 ticks
Array: 4875 ticks

Looks like Debug mode was the biggest problem. But Vectors and swapping were definite contributors...

This post has been edited by chaosTechnician: 30 May, 2008 - 03:37 PM
User is offlineProfile CardPM
+Quote Post

perfectly.insane
RE: Access Times: Arrays Vs Vectors.
30 May, 2008 - 06:07 PM
Post #11

D.I.C Addict
Group Icon

Joined: 22 Mar, 2008
Posts: 558



Thanked: 46 times
Dream Kudos: 25
Expert In: C/C++

My Contributions
QUOTE(chaosTechnician @ 30 May, 2008 - 08:00 AM) *

UPDATE: I talked to the programmers where I work and they suggested I compile and run in Release mode instead of Debug mode. That dropped the processing time from 50 minutes to 30 seconds.


Yes... debug mode probably sets a definition that when the preprocessor processes the file that contains the vector implementation, bounds checking is included with accesses. I have done some of my own research, and I've come to the conclusion that the performance for vector isn't that great in some implementations. I have a real comparison of sorting on arrays vs. vectors.

This is for STLPort-5.1.4:
(compiled with -DNDEBUG and -D_UITHREADS)
CODE

Test                 N           int[]         vector<int>      Sorted?
==================== ==========  ============= =============    =======
std::sort                   100        0.03 ms       0.02 ms    Yes  
std::stable_sort            100        0.03 ms       0.02 ms    Yes  
quick_sort                  100        0.02 ms       0.02 ms    Yes  
std::sort                  1000        0.35 ms       0.30 ms    Yes  
std::stable_sort           1000        0.25 ms       0.23 ms    Yes  
quick_sort                 1000        0.22 ms       0.31 ms    Yes  
std::sort                 10000        4.24 ms       4.18 ms    Yes  
std::stable_sort          10000        3.07 ms       2.95 ms    Yes  
quick_sort                10000        2.64 ms       2.49 ms    Yes  
std::sort                100000       52.21 ms      52.02 ms    Yes  
std::stable_sort         100000       40.83 ms      40.81 ms    Yes  
quick_sort               100000       30.25 ms      30.33 ms    Yes  
std::sort               1000000      630.66 ms     618.89 ms    Yes  
std::stable_sort        1000000      485.84 ms     486.18 ms    Yes  
quick_sort              1000000      367.21 ms     349.81 ms    Yes  
std::sort              10000000     7735.34 ms    7793.53 ms    Yes  
std::stable_sort       10000000     5236.51 ms    5253.54 ms    Yes  
quick_sort             10000000     4081.56 ms    4081.99 ms    Yes


Here's libstdc++v3:
CODE

Test                 N           int[]         vector<int>      Sorted?
==================== ==========  ============= =============    =======
std::sort                   100        0.02 ms       0.03 ms    Yes  
std::stable_sort            100        0.03 ms       0.05 ms    Yes  
quick_sort                  100        0.02 ms       0.06 ms    Yes  
std::sort                  1000        0.17 ms       0.44 ms    Yes  
std::stable_sort           1000        0.18 ms       0.55 ms    Yes  
quick_sort                 1000        0.20 ms       0.76 ms    Yes  
std::sort                 10000        2.23 ms       5.29 ms    Yes  
std::stable_sort          10000        2.14 ms       6.65 ms    Yes  
quick_sort                10000        2.46 ms       9.13 ms    Yes  
std::sort                100000       26.66 ms      64.63 ms    Yes  
std::stable_sort         100000       31.80 ms      89.62 ms    Yes  
quick_sort               100000       29.47 ms     113.97 ms    Yes  
std::sort               1000000      320.32 ms     727.14 ms    Yes  
std::stable_sort        1000000      374.14 ms    1076.93 ms    Yes  
quick_sort              1000000      336.05 ms    1337.75 ms    Yes  
std::sort              10000000     3749.91 ms    8651.49 ms    Yes  
std::stable_sort       10000000     4038.01 ms   11621.53 ms    Yes  
quick_sort             10000000     3949.48 ms   14734.73 ms    Yes


libstdc++ returns a proxy object for vector<T>::begin (__normal_iterator), whereas STLPort returns a direct pointer to the buffer. So the difference here is expected.

I noticed that STLPort's std::sort algorithm is rather slow. quick_sort is my own implementation. It's not very sophisticated. Here's the benchmark code:

cpp

// Benchmark for std::vector....

#include <windows.h>
#include <iostream>
#include <iomanip>
#include <iterator>
#include <vector>
#include <algorithm>

template <typename Iterator>
Iterator rand_partition(Iterator b, Iterator e)
{
Iterator v = b + (rand() % ( e - b ));
Iterator oe = e;
typeof (*v) vv = *v;
typeof (*v) t;
*v = *b;
*b = vv;
v = b;
b = b + 1;
e = e - 1;

for(;;) {
while(*b < vv) { b++; if(b == oe) break; }
while(vv < *e) { e--; if(e == v) break; }
if(b >= e) break;
t = *b;
*b = *e;
*e = t;
b++; e--;
}

b--;
t = *b;
*b = *v;
*v = t;
return b;
}

template <typename Iterator>
void quick_sort(Iterator begin, Iterator end)
{
if(end - 1 <= begin) return;
Iterator i = rand_partition(begin, end);
quick_sort(begin, i);
quick_sort(i + 1, end);
}


template <typename Iterator, typename Function>
void fill_random(Iterator begin, Iterator end, Function f)
{
for(Iterator i = begin; i != end; i++)
*i = f();
}

struct quick_sort_test
{
template <typename InputIterator>
void operator()(InputIterator begin, InputIterator end)
{ quick_sort(begin, end); }
};

struct sort_test
{
template <typename InputIterator>
void operator()(InputIterator begin, InputIterator end)
{ std::sort(begin, end); }
};

struct stable_sort_test
{
template <typename InputIterator>
void operator()(InputIterator begin, InputIterator end)
{ std::stable_sort(begin, end); }
};

template <typename InputIterator>
bool is_sorted(InputIterator i1, InputIterator i2)
{
if(i2 - i1 <= 1) return true;
for(InputIterator i = i1; i != i2 - 1; i++)
if(*i > *(i + 1)) return false;
return true;
}


template <typename Function>
void perform_test(int N, Function f, const char* test, std::ostream& os)
{
using std::setw;
int* ar = new int[N];
std::vector<int> ar2(N);
LARGE_INTEGER start, end, start2, end2, freq;


fill_random(ar, ar + N, rand);
std::copy(ar, ar + N, ar2.begin());
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&start);
f(ar, ar + N);
QueryPerformanceCounter(&end);
QueryPerformanceCounter(&start2);
f(ar2.begin(), ar2.end());
QueryPerformanceCounter(&end2);

os << setw(20) << std::setprecision(2) << std::fixed << std::left << test << std::right << " "
<< setw(10) << N << " "
<< setw(10) << 1000.0 * (end.QuadPart - start.QuadPart) / freq.QuadPart << " ms "
<< setw(10) << 1000.0 * (end2.QuadPart - start2.QuadPart) / freq.QuadPart << " ms "
<< setw(4) << std::left << (is_sorted(ar, ar + N) ? "Yes" : "No") << " " << std::endl;
delete [] ar;
}

int main()
{
using std::setw;
using std::setfill;
int N = 10000000;
SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS);

std::cout
<< setw(20) << std::left << "Test" << " "
<< setw(10) << "N" << " "
<< setw(10) << "int[]" << " "
<< setw(10) << "vector<int>" << " "
<< setw(7) << "Sorted?" << std::endl;
std::cout
<< setw(20) << setfill('=') << "" << " "
<< setw(10) << setfill('=') << "" << " "
<< setw(13) << setfill('=') << "" << " "
<< setw(13) << setfill('=') << "" << " "
<< setw(7) << setfill('=') << "" << setfill(' ') << std::endl;

srand(time(0));
for(int n = 100; n <= N; n *= 10) {
perform_test(n, sort_test(), "std::sort", std::cout);
perform_test(n, stable_sort_test(), "std::stable_sort", std::cout);
perform_test(n, quick_sort_test(), "quick_sort", std::cout);
}
return 0;
}


Edit: These numbers should be a bit more accurate with the process priority turned up a bit (and not running 100 other processes at the same time helps also).

This post has been edited by perfectly.insane: 30 May, 2008 - 06:25 PM
User is offlineProfile CardPM
+Quote Post

Reply to this topicStart new topic
Time is now: 12/4/08 11:01AM

Live C++ Help!

C++ Tutorials

Reference Sheets

C++ Snippets

DIC Chatroom

Bye Bye Ads

Monthly Drawing

Thumb Drive

Top Contributors

Top 10 Kudos This Month