The data is aggregated into a
common platform for use in a range
of customer-focused data mining and
data analytics tools, Feinsmith said.
Meanwhile, eBay is using Hadoop
technology and the Hbase database,
which supports real-time analysis of
Hadoop data, to build a new search
engine for its auction site.
Hugh Williams, vice president
of experience, search and platforms at eBay, said the new engine,
code-named Cassini, will replace
technology the company has used
since the early 2000s. The update
is needed in part to handle surging
volumes of data.
He noted that eBay has more
than 97 million active buyers and
sellers and over 200 million items
for sale in 50,000 categories. The
site handles close to 2 billion page
views, 250 million search queries
and tens of billions of database calls
daily, he added.
The company has 9 petabytes
of data stored on Hadoop and
Teradata clusters, and the amount
is growing quickly, he said.
Williams said about 100 eBay engineers are working on the Cassini
project, making it one of the company’s largest development efforts.
The new engine, slated to go live
next year, is expected to respond to
user queries with results that are
context-based and more accurate
than those provided by the current system, he said.
Feinsmith warned that IT shops interested in Hadoop should
be aware of potential security issues. And he explained that aggregating and storing data from multiple sources can create a slew
of problems related to access control and data management, while
raising questions about data entitlement and data ownership.
Feinsmith also listed other potential Hadoop drawbacks that
users should be aware of before embarking on big projects.
For instance, he said the Hadoop marketplace is “very confusing,” featuring an oft-changing slate of vendors, products and
standards. In addition, skilled Hadoop engineers are scarce.
And Williams noted that related technologies, such as Hbase,
are still somewhat immature, which
raises questions about system stability.
But Hadoop has plenty of potential.
Feinsmith said that IT workers at JPMorgan Chase are debating whether relational database technologies will evolve to
meet the bank’s emerging big data needs,
or if Hadoop-based systems will become
adept at transaction processing. u
Hadoop Is Ready for the
Enterprise, IT Execs Say
Big companies are using Hadoop systems in big projects, despite
concerns about issues such as security. By Jaikumar Vijayan
DESPITE SOME LINGERING USER CONCERNS about security and other issues, Hadoop is ready for enter- prise use, according to IT executives at the Hadoop World conference in New York earlier this month. Larry Feinsmith, managing director of IT at
JPMorgan Chase, told a keynote audience that the financial
services firm has been increasingly using the open-source storage
and data analysis framework for almost three years.
JPMorgan Chase still relies heavily on relational database
systems for transaction processing, but it uses Hadoop technology for a growing number of purposes, including fraud detection,
IT risk management and self service, Feinsmith said.
With over 150 petabytes of data stored
online, 30,000 databases and 3. 5 billion
log-ins to user accounts, data is the lifeblood of JPMorgan Chase, Feinsmith said.
Hadoop’s ability to store vast volumes
of unstructured data allows the company
to collect and store Web logs, transaction data and social media data. “Hadoop
allows us to store data that we never
Hadoop allows us
to store data that
we never stored before.
LARRY FEINSMITH, MANAGING
DIRECTOR OF IT, JPMORGAN CHASE