eBay Arbitrage

eBay Arbitrage

Popular re-commerce sites

Popular re-commerce sites which perform eBay arbitrage. This page describes a similar one in great detail from the ground up starting with financial modeling to marketing, the web application to legal.

Im not going to write about some phony-baloney eBay arbitrage that plenty of the spammy web sites do where you search manually or with a small program to find misspelled items, buy them and then resell them properly. Ill describe how some very well performing companies have come up with a way to create a new market where they can purchase electronics cheaply and resell them on eBay (or other auction services). This ride will go into the details of building a startup from the ground up, how to create a financial model with over 1500 products scalable to limitless amounts, create a TOS with a lawyer that has little e-commerce experience, shipping and selling operations, build a spydering application, a data processing application, a database with over 100 tables, a website utilizing all the spydered data, automated email, event triggering all based on open source software. In short this page will describe re-commerce or electronics recycling sites such as:,,,,, and others in great detail. How and why? Because I have spent many months building such a startup from a blank hard drive to a working company outside of the USA.


Taking advantage of a price difference between two markets and making a profit on the difference in market price is arbitrage. Hence we need at least two markets and knowledge of prices for both markets. The challenge lies in discovering markets to arbitrage, or the next best thing create a market! Which is exactly what re-commerce sites have done, with their selling points being that they are faster than auction services, offer a guaranteed price, recycle electronics responsibly and send you a pre-paid box. So these sites buy cheap electronics from those that have no use for them and sell them with a significant markup on auction services. The markup is meant to cover expenses of obtaining and reselling of the electronics as well as bringing in a profit once the scale of the operation breaks even.

The Buy Cheap Market == The Lazy Consumer

Ebay (and other such sites) has become a haven for online businesses as it provides an instant customer base, at the same time the individual seller is being pushed to the side as he doesn’t have the feedback rating or listing experience to compete with professionals. Businesses who invest time in proper listing on eBay gain the highest price for listed items making it less profitable for consumers creating and raising the entry barrier for new sellers. This motivates potential sellers to take the easy road and sell to a re-commerce site which is essentially a middleman between you and eBay.

To sell high on eBay: take great quality pictures, write a truthful description, get lots of positive feedback, accept a few payment and shipping methods and offer a satisfaction guarantee. The entire process from listing to having money in hand can take around two weeks as listings last 7 days, then possibly another 3 until payment is received, the person will obtain it in another 3 if using priority mail and if they are happy you can keep the cash or wait until pay-pal gives your money back if the customer is not happy. Lots of work is you are the average person trying to sell an old cell phone for $25. In a market where consumers have high buying power and a hectic lifestyle the opportunity cost of listing an item will be too high.

Offering a quick and easy service to buy back electronics is a tempting proposition to those that have items to sell, and they will pay for this service as it is a considerable time saver. Others are ignorant to the actual market value of their equipment. This is the buy cheap market created by re-commerce services which tapped into the closets and attics of the consumer.

The Sell High Market == Auctions

Placing an item in view of millions of those willing to buy no doubt will bring the highest price for the item as the demand is high. Purchasing from the buy cheap market and listing on sites like eBay enables businesses to sell high and use the earnings to drive the business. Total sales of cell phones and smart-phones on eBay, as of August 2010, was around $70 million and of iPods and MP3′s another $10 million. Adding other portable and home electronics such as GPS devices, TV’s, game consoles, pc’s, games, movies makes this a significant market place for selling. That’s only eBay as Amazon, etc are also a giant players.

Standing between the re-commerce sites and profit is the ability to squeeze the buy low consumer market to have goods to list on online marketplaces. This involves a marketing strategy and promotional actions, along with the technology to scan and process the data from these online marketplaces for the current price of each and every item being bought back.

The Profit

Profit comes from being able to sell the cheaply purchased products at a price that will cover operational expenses for the given item and bring a profit to the company once the business scales. However, this manner of thinking about the price is incorrect as the price is data/market driven thus you start with the maximum price that can be obtained for an item and determine the price that can be paid to churn a profit. This is a delicate balance game where the market has to be constantly monitored for changes, a pricing history needs to be kept to predict depreciation, refurbish costs for each item known, marketing budgets, the costs of acquiring and selling the item which include shipping, packaging, labor, selling fees etc.

Coming soon…

The Web Application

Re-commerce web application

Re-commerce web application architecture at a high level view of the major modules that obtain and process pricing data, watch for event triggers and serve the site to the end user.

A re-commerce site is more complicated than a typical site as it requires an application to constantly collect and process data from auction services. In addition since this business receives and sends physical items a system is needed to track what arrives, its condition and ID numbers need to be recorded, employee comments entered. Watching all of this is an application which reacts to changes in state of physical items such as when an item from a customer arrives an email is sent or when payment of the item is processed.

The database is the center of the universe in this application with crawled data being inserted into it which is later read by the data processing application and inserted into new tables. That processed data is then used to determine pricing of electronics and for the display on a web page. In addition database triggers act as events which provoke emails to be sent or other actions to occur. Gearman plays an important role in communicating between the independent modules of the application communicating between the database triggers, Postfix, Python and PHP applications.

Below is a brief description of the major components of the web application, each will be discussed in much greater detail later.

  • Spyder: Written in Python this application constantly scrubs auction services for completed auctions for items/electronics of interest. This data is stored in the DB in its original form and kept this way for future reference and for pricing history. Data is organized by item id’s, so all data for a Nokia 6300 is kept under the same item id. Since most auction services archive their data after a few weeks the stored data is very valuable for performing future price predictions, holiday trends and depreciation rate. The amount of data gathered is significant and proper table structure, efficient queries and re-indexing InnoDB after a large insert will be necessary.
  • Spyder Crunch: An original prototype was developed in Matlab but moving Matlab M code to a production environment equates to rewriting the entire application in another language. Spyder, a Python IDE, provides most of the same functionality as Matlab for graphing, number crunching plus all the benefits of Python. This app connects to the DB which stored all the crawled data and process the data for each item id by removing outliers using Grubbs test, low pass filtering, performing an exponential weighted mean to determine current market value and stores this data in the DB.
  • MySQL/InnoDB x2: Two separate databases are used in the application for online and offline use. The offline database is used for insertion and computation of the data being crawled by the Spyder applications. The summarized data is then sent to the online database which also serves the web page. InnoDB is used as it is a relational engine providing foreign keys that allow the database to be more resilient to human mistakes.
  • Event Triggers: MySQL triggers signal changes in the database to Gearman workers which take action. These triggers are events which signal a state machine which consumes the events and possibly transitions an item from one state to another.
  • Gearman: When using several languages and applications even on a distributed system Gearman makes data exchange between the applications simple and fast. Its used to communicate between database triggers and the WatchDawg worker as well as the Postfix mail application.
  • WatchDawg Worker: The state machine which tracks physical items being sent from the customer and resold on an auction service or recycled. When events occur, triggers, they are consumed by the application which determines what the action should be given the current state of the item. For example if an item has been shipped to the business location it is in the “In transit” state. Once it arrives its bar code is scanned which triggers an event, this even is consumed and the items state is advanced to “Arrived”.
  • Pricing API: MySQL functions which allow item pricing data to be consistent for all applications as the algorithm generating the price is in the database and not in every application needing a price. Avoided are issues where applications might be using old versions of the algorithm and to access the price a simple SELECT query is all that’s required.
  • Logzilla: Keeping track of where thousands of electronics being sent to, sent from or being processed at the point of business is a necessary but daunting task. This application allows each and every item which is to be sold by a customer to be tracked individually. Hence once an item arrives comments about its condition, serial numbers and if it agrees with the customers appraisal can be entered.
  • Mail Worker: Sends all emails notifying customer about the status of their sold goods.
  • WWW: The front end served application which the customer sees when visiting the website. Displayed are all the items being bought, prices along with text, a company blog etc.
Based on Open Source Software

This entire project is based on open source software. Following is a list of the more important packages, it leaves out most of the web part such as memcached, Lucene etc because I will be writing about the back-end which is totally independent of the way the data obtained and processed is displayed.

  • Python spyder: IDE for Python which is similar to Matlab but without the hassle of slow and awkward M code, the flexibility to add packages and run as production software.
  • MySQL/InnoDB: Relational database with a great community and management apps. Joins are limited but can be performed with very long SQL statements, complex queries are slower than PostgreSQL which is a significant negative especially when dealing with many gigs of data. However, for this project the back-end data processing happened offline and speed was not an issue.
  • Gearman: Provided a convenient interface between the different parts of the system.
  • PHP Zend: Front end web app.
  • Postfix: Mailing server.
  • Apache: web server.
  • Rsync: Backup tool for storing of development code on remote servers in case of local tragedy.
  • SVN: Version control used for the project.
  • Tortoise SVN: Connecting from a windows machine to a SVN repository is much easier with Tortoise.
  • phpmyadmin: Looking at data nicely formatted in a browser makes life easier. This app does not replace the MySQL command line but it substitutes it for 95% of all tasks. When working on a local network with PC’s that the outside world has no access to makes this a safe application to use.
Coding Standard
Agreed upon syntax style
Type Example
Class ClassName
Function FunctionName
Variable varName
Constant CONST
DB Table table_name
DB Table Column colName
Class Comment '''
Epoch Time
# Functions that work with epoch time
# Time zone is left as UTC and should
be handled by the highest level app.
# 24 hour period according to UTC.
class Epoch:
secInDay = 60*60*24
Function Comment '''
Converts a unix epoch timestamp into a date
so 1-19-2009 8:07:01pm = 1-19-2009 00:00:00
@time: the unix timestamp
@returns: number of seconds in the date only.
def TimeToDay(self, time):
return int(time / self.secInDay) * self.secInDay
Line Comment # add instance variables
self.myid = += 1

Working on a large project requires coding and documentation standards even if its only 1 or 2 people writing the application. What happens in a year when the application is cruising along, you’ve hired someone to maintain and grow the system and suddenly something goes terribly wrong with the core app. No documentation, coding standard, version control and lack of profuse comments in the code… that 15min problem just turned into a day of down time which is a lot of money out the window. While some seem to bark at my comment style, I think 1 line of code to 1 line of comments is an optimal balance. Its magnitudes easier to debug and anyone new looking at it instantly understands whats happening. I don’t understand why so many fail to write profuse amounts of comments, is it a desire to write cryptic code so the next guy wont understand it, or to not be able to understand what you wrote in a few weeks never mind in a year. To the right is a short version of some of the naming conventions used in this project.


Since I collected data from an auction service similar to eBay this part will be applicable to most auction services not just the eBay API. Lets just jump into whats needed: a relational database and code to download and insert data into the database. The type of database used depends on requirements. MySQL with MyISAM is great for a web application accessing data rather than writing. MySQL/InnoDB is slower but relational and transactional which allows the database to be more bullet proof. PostgreSQL is more advanced than MySQL/InnoDB 5 as it actually implements outer/inner etc joins and performs better with complex queries. MySQL has a larger support community and more refined tools such as phpmyadmin making it easier to work with. For more information read the following discussions: MySQL vs PostgreSQLMySQL vs PostgreSQL BenchmarksPostgreSQL & MySQL: Comparing Reliability and SpeedPostgres vs. MySQL: Enterprise Suitability. I chose MySQL due to its strong community and developed apps and that most potential employees were mostly familiar with MySQL.

Object Encoding
DB Table utf8_unicode_ci / utf8_unicode_bin
Python MySQLdb connection conn.set_character_set('utf8')
Python MySQLdb cursor cursor.execute('SET NAMES utf8;')
cursor.execute('SET CHARACTER SET utf8;')
cursor.execute('SET character_set_connection=utf8;')

Database encoding – Nothing causes more frustration than improper use of encodings. Enterprise applications such as auction service API’s even get this wrong and mix cp1252 with unicode as most likely someone on a windows machine manually edited the database data. Its not enough to set the database encoding as setting the connection and cursor encoding when connecting to a MySQL DB is important also. To the right are the settings used for the project. Using utf8_unicode_ci or utf8_unicode_bin is dependent if comparisons will be performed alphabetically (the former) or ASCIIbetically (the latter). Using utf8_unicode_bin and comparing strings with MySQL use the following method to convert the strings to utf8_unicode_ci: SELECT * FROM `tbl` WHERE `col` COLLATE utf8_unicode_ci LIKE '%nokia%'

The Data – Spyder

As an engineer I think numbers and facts, as data drives decisions. Thanks to the internet, eBay (and other similar services) and it’s API that gives access to tons of data, it’s possible to obtain very precise estimates of the actual market value of almost any item that sells with at least some frequency. After obtaining this data we can find interesting auctions, remove outliers, low pass filter it to smooth out the sometimes high variability in price and obtain the actual market value for the selected item. Instantly we know today’s (and can predict tomorrow’s) price for anything being sold on eBay. Using these prices we can create a financial model and determine what we can pay our customers for their electronics, and precisely know if we are making a profit on the item or not, that is if our financial model and software is detailed enough; but on just how detailed it should be Ill focus on in the next sections. Here Ill describe what data we should obtain and how.

Some Auction Info Stored
* Final price
* Starting price
* seller
* Seller rating
* number of bids
* Winning bidder
* Shipping cost
* Type of shipping
* Entire item page
* Type of auction
* Time auction finished
* Item Location
* time up for auction
* loosing bidders

Data to acquire: Auctions provide all sort of useful date we can use to determine prices. I gathered all the available data except pictures, but kept picture count, about an auction. This makes sense for future data mining, even currently pricing is based on a few variables. Future versions might use this data for a more robust pricing model or to look at trends as to what sells when and for what reasons. At the minimum the short list to the right of information should be recorded about every item up for auction.

The Items Table (pre spyder spyder): Knowing what data about items we want to collect and setting up a database for the data we can start spydering an auction service or the web for the data, except we do not have a place where we store a list of items we want to search for. Thus the www_item table in the database. This table stores the ID numbers, picture locations, category type (cell phone, laptop, mp3…), manufacturer, model, series and other information about the items. This table is central to the entire application as any information about an item stored in the DB is foreign keyed to the www_items table.

Database for Storing Data and Managing Download Application

Database for Storing Data and Managing Download Application

Filling this table is a web crawling task as its daunting to just list the model names of the over 1500 popular cell phones since 2000, not to mention mp3 players, game consoles or any other product category. This table should be filled with information that is specific to the item model as information on sales of this item fill other tables. The source of manufacturers and model names was an auction service while of pictures and dimensional information were manufacturer web sites.

Auction Categories: Table spy_itemsearch holds query and exclude strings for each item. Say one of the items in the table is a Nokia N73 which has a close cousin the Nokia N73 Music Edition. The search query would then be “Nokia N73″ and the exclude string would be “Music Edition, ME, MusicEdition”. This table allows us thus to control what we search for and what we ignore based on the specifics of the service we are crawling. Depending on the auction service and the popularity of an item, items may receive their own category making it trivial to determine if the item is a Nokia 6300 or a Nokia 6303. If items are not finely categorized by the auction service we have to create a more robust search/exclude list for every item in the www_item table.

The tables storing the data: The orange tables in the database to the right is a set for storing the data from one auction service. Since the data was downloaded using an API it is nicely structured and the tables are created modeling the data structures passed from the API. Table spy_cats-list stores the category tree of the auction service which lets searches be carried out under a specific category. For this particular service the categories are granular and drill down to the model name. Notice the table and column names have a minus (-) sign between words. This is an artifact of the auction service API naming convention and NOT a naming convention used by choice. It causes problems of its own in scripting languages when table spy_cats-list is interpreted as spy_cats - list, this happened when using SOAPpy in Python and I will explain this in the section “Connecting to the API”.

Connecting to the API: WSDL and SOAPpy

We need to be able to send variable names when sending requests to a SOAP server, due to the API requirements of the auction service. SOAPpy sends variables automatically named as v1, v2, v3. The server requires they have a name varNameA, varNameB etc. Here is the generated outgoing SOAP and code.

wsdlFile = ""
server = WSDL.Proxy(wsdlFile)
server.doGetCountries(1, h23dk99sd)

*** Outgoing SOAP ******************************************************



This is what the variable area should look like:


To force SOAPpy to send the variable names in the correct order we must pass the dictionary as keyword arguments using the “**” syntax.

wsdlFile = ""
server = WSDL.Proxy(wsdlFile, namespace='ns1:doGetCountries')
server.soapproxy.config.argsOrdering = {
                                        'doGetCatsData':['country-id', 'local-version', 'webapi-key'],}

#API Call
server.doGetCatsData(**{'country-id':COUNTRYID, 'local-version':0, 'webapi-key':WEBAPIKEY})

Software Flow of Data Download (Spyder) App

Software Flow of Data Download (Spyder) App

Download and spit out the data
create tables and insert

download code – threads, connection speed
insert into db code

Data Processing – Spyder Crunch

Matlab vS spyder

Average Daily Price

SELECT endTime.`itemID` AS itemID, endTime.`it-ending-time` AS date, SUM(endTime.`it-price`) AS totalSales, COUNT(endTime.`it-id`) AS numAuctions \
FROM ( \
SELECT `itemID`, `it-price`, `it-id`, ((`it-ending-time` DIV (60*60*24))*(60*60*24)) AS `it-ending-time` \
FROM `spy_matview_sold_used` \
WHERE `it-ending-time` < %s \
) AS endTime \
GROUP BY (endTime.`it-ending-time`), endTime.`itemID`

Data Storage – The Database

Common Queries:

  • See whats new

    SELECT *
    FROM `cats-listTmp` AS cl1
    LEFT JOIN `cats-list` AS cl2 ON cl1.`cat-id` = cl2.`cat-id`
    WHERE cl2.`cat-id` IS NULL
  • see whats removed

    SELECT cl1.`cat-id`, cl1.`cat-name`, cl1.`cat-parent`
    FROM `spy_cats-list` AS cl1
    LEFT JOIN `spy_cats-listTmp` AS cl2 ON cl1.`cat-id` = cl2.`cat-id`
    WHERE cl2.`cat-id` IS NULL
  • Select newest versions of each category

    SELECT s.*
    FROM `spy_cats-list` s JOIN
    (SELECT `cat-id`, MAX(timeStamp) AS date
    FROM `spy_cats-list` si GROUP BY `cat-id`) lastEntry
    ON s.`cat-id` = lastEntry.`cat-id` AND s.timeStamp =

Takes a very long time. In linux we can make the tmp directory be ram. when viewing viewsoldused this takes 26min to execute. ran EXPLAIN EXTENDEDand show profiles which indicate that Copying to tmp table is the longest operation.

| id | select_type | table            | type   | possible_keys                   				   | key         | key_len | ref                            | rows   | filtered | Extra                        |
|  1 | PRIMARY     | <derived2>       | ALL    | NULL                   		 				   | NULL        | NULL    | NULL                           |  65177 |   100.00 |                              |
|  2 | DERIVED     | item-attrib-list | ref    | it-id,attrib-name               				   | attrib-name | 767     |                                | 292864 |   100.00 | Using where; Using temporary |
|  2 | DERIVED     | iteminfo         | ref    | PRIMARY,it-quantity,it-bid-count,it-reserve-price | PRIMARY     | 4       | |   1 	 |   100.00 | Using where                  |
|  2 | DERIVED     | itemsearch       | eq_ref | PRIMARY                   						   | PRIMARY     | 4       | allegro.iteminfo.itemID        |   1	 |   100.00 | Distinct                     |
|  2 | DERIVED     | item-attrib-list | ref    | it-id,attrib-name              				   | it-id       | 4       | |   2	 |   100.00 | Using where; Distinct        |
5 rows in set, 1 warning (26 min 0.04 sec)

mysql> show profile;
| Status               | Duration   |
| starting             |   0.000045 |
| Opening tables       |   0.000367 |
| System lock          |   0.000002 |
| Table lock           |   0.000010 |
| checking permissions |   0.000001 |
| checking permissions |   0.000001 |
| checking permissions |   0.000002 |
| checking permissions |   0.000962 |
| optimizing           |   0.000016 |
| statistics           |   0.104559 |
| preparing            |   0.000024 |
| Creating tmp table   |   0.000999 |
| executing            |   0.000001 |
| Copying to tmp table | 999.999999 |
| Sending data         |   0.243706 |
| init                 |   0.000020 |
| optimizing           |   0.000002 |
| statistics           |   0.000008 |
| preparing            |   0.000004 |
| executing            |   0.008129 |
| executing            |   0.000031 |
| end                  |   0.000006 |
| removing tmp table   |   0.004816 |
| end                  |   0.000066 |
| query end            |   0.000002 |
| freeing items        |   0.000063 |
| removing tmp table   |   0.003653 |
| closing tables       |   0.000008 |
| logging slow query   |   0.000001 |
| logging slow query   |   0.000001 |
| cleaning up          |   0.000063 |
31 rows in set (0.01 sec)
You’ve Got Packages – Database Event Triggers

Order Number: we didn’t use a hash function as you can reverse engineer this, the safest is just using a completely random number and the actual order number will display once the item is scanned any way.

Moving Packages Around – WatchDawg

Item State Machine

Item State Machine

The Essential Core – Pricing API