A Look Inside LinkedIn's Big Data Process
Sally Sadosky (Group Manager, Market Research) and Al Nevarez (Senior Manager, Business Analytics) led a thought-provoking presentation on how LinkedIn is leveraging their pool of data to transform their product. As you can imagine, the amount of data LinkedIn has on its members is vast. As an added complexity, the LinkedIn product is not targeted toward one business group. Their challenge is to use the data to optimize two sides of the same coin – B2C (members) and B2B (advertisers).
Sally Sadosky walked the audience through their ETL (Extraction Transformation and Loading) process and how they ultimately develop a single relational database. By transforming their data into a relational database, LinkedIn is able to ask very specific questions (slicing the data). This allows them to answer questions in the context of business needs and customer experience (e.g. “What is the satisfaction with our new messaging tool for members who had it enabled?”).
One key advantage LinkedIn has is the ability to keep their surveys very short because they already have the behavior data (they already know what people are doing on their platform).
A few of the big data tools they use regularly include:
- Low cost storage
- Unstructured data
- Highly scalable processing
- SQL-like query
- Query Hadoop data
- Massive result sets
- Advanced processing
- Advanced ETL
- Data flows