Data Clustering? don’t worry about the algorithm.

We are constantly pushing to improve our underlying algorithms and make them as adaptive as possible. Taking a step back, our problem generally is to fit classes of models and algorithms to customer data sets of varying data quality. In addition, we need to automate this so that we can scale delivery of our offerings from a business perspective.

This high-level business goal boils down to a number of technical requirements. It means we need to find ways of automatically evaluating results based on customer data and adaptations, and we need to do this in many different contexts.

One of our engineers, Gianmario Spacagana[1] took a fresh look at how to tune clustering algorithms. In this blog post, I will briefly introduce validation of clustering algorithms so that you can later more easily appreciate and understand Gianmario’s upcoming blog post.

The American writer Mark Russell once said “The scientific theory I like best is that the rings of Saturn are composed entirely of lost airline luggage.” I believe many developers and marketers feel just as lost when exposed to some supposedly great clustering. That is, it is very difficult to evaluate supposed optimal data clustering without understanding some basic concepts about how to evaluate and dissect clustering results. Lacking at least apprentice cluster validation  knowledge, confirming or disapproving an assertion of optimal data clustering is difficult and becomes more of a leap of faith than a reasoned judgment. I will attempt to give you enough information to ask the right questions.

Clustering involves grouping individuals in a population so that individuals in the same cluster share some properties of interest. Algorithmically, the goal of a clustering algorithm is to minimize or maximize some function related to the individuals in the same cluster. We call this the objective function. Frequently, we think of it as some type of distance between the individuals in a population. For example, let’s say we group customers based on their lifetime value. Once we have predicted the lifetime value for each customer we can use this to group customers based on these values and potentially other values such as gender, age etc. In addition to formulating an objective function, it is key for clustering that you pick the appropriate values related to the individuals to consider in your clustering. These values are often called the features.

There are many different algorithms for actually performing the clustering. Clustering algorithms are potentially very process intensive and the complexity grows quickly with the size of the data set. Each algorithm is attempting to deal with the complexity in a different way.  We will not delve into these methodological details, but if you are looking for a detailed technical survey of them, you can start with this paper by Jain et al.[2]

One common technique in clustering is to use some kind of heuristic that allows the algorithm to reduce the number of calculations. As an example, the K-MEANS algorithm guesses clusters as a starting point. It then uses this guess to reduce the computations, while hoping to converge on an adequate clustering by iterating and moving individuals between clusters as the structure evolves.

The key question I’d like to explore is how we can determine if a clustering is valid or not. The procedure of evaluating the result of a clustering algorithm is called cluster validity. There are three different perspectives on cluster validation: external, internal and relative. External involves comparing the result to some external pre-defined structure that reflects an intuition about what should characterize a cluster. It could be as simple as eying the cluster or it could be more sophisticated, using pre-labeled data.

Internal validation refers to some internal properties of the clustering. Compactness is one criterion, and it requires that clustered individuals be as close to each other as possible given an objective function. Separation is another criterion, and it requires that clustered individuals be as clearly separated from each other  given an objective function.

Gianmario employs a few different techniques in his internal evaluation, including AIC, David-Boudin and Silhouette. For external validation, he uses AdjustedRand, which is a statistical measure of cluster similarity. He uses that to compare clustering results with previously known clusters determined to be of a certain quality. This way, we achieved an external validation of clustering results.

So, when presented with a clustering, don’t worry too much about the algorithm. Instead ask:

  1. What was the evaluation criteria used in validating the clusters. Were they internal or external?
  2. What was the objective function of the algorithms?
  3. What properties were considered for internal validation, and what were the results?
  4. If external validation was done, what pre-defined structure or intuition was used?

    If you are curious about the algorithm, ask about what assumptions or heuristics the algorithms uses. Each scalable algorithm makes some assumptions; the question is does it makes sense for your problem? We will expand on this in a future blog post.

    During 2012, Gianmario Spacagana looked into auto-tuning of clustering. He titled his thesis “TunUp: A Distributed Cloud-based Genetic Evolutionary Tuning for Data Clustering.”  Gianmario was a student of Polytechnic of Turin and the Royal Institute of Technology on Stockholm. He will provide a more detailed summary of his work in an upcoming blog post. 

    In the meantime, if you would like more information, you can read Gianmario’s thesis presentation, which he posted here: http://www.slideshare.net/gmspacagna. He and AgilOne also published the source code on GitHub if you are interested in looking into it in more detail: https://github.com/gm-spacagna/tunup.

     


    [1] Gianmario Spacagana . TunUp: A Distributed Cloud-based Genetic Evolutionary Tuning for Data Clustering. Master Thesis at the Royal Institute of Technology. Stockholm Sweden. March 2013.

    [2] Ak. Jain, M.N Murty and P.J Flynn. Data Clustering: A Review. ACM Computing Surveys. Vol. 31, No. 3. September 1999.

    Consulting the Braintrust: Wharton’s Research Center for Customer Analytics

    Are you looking for answers, and not getting help from your internal analytics and business intelligence team? Is there a fundamental question that even your company’s biggest data nerd can’t answer?

    Well, if you present a problem to Wharton’s Research Center for Customer Analytics, you won’t get an answer right away, but you’ll be guaranteed maximum brainpower and a decidedly objective view of the issue.

    As pointed out by Stephanie Overby of DataInformed, Wharton’s Research Center for Customer Analytics offers a crowdsourcing-like approach to solving data-centric business problems. The “crowd,” though, is limited to academic experts in customer data analysis, so you will have a longer wait for an answer.

    Launched by veteran marketing professors, Dr. Eric T. Bradlow and Dr. Peter S. Fader, the center has six dedicated staff members, a small team of research assistants and more than 1,100 scholars in its research network.

    Research from Around the Globe

    Here’s how it works: In exchange for providing access to a customer data set and making a financial contribution toward the center’s operating costs, Wharton’s Research Center develops a research project addressing a problem suggested by the sponsor and presents it to prospective researchers from academic institutions around the globe who then submit proposals. The sponsoring company selects six proposals and each research team devotes a year to a solution.

    The Waiting is the Hardest Part

    Overby quoted Tom Thomas, executive director of marketing intelligence for digital ad agency Organic, which leveraged its parent company Omnicom’s sponsorship with the center to help take the agency’s analytics to the next level, as saying: “Predicting, indexing, and more dynamic measurement of online consumer behavior requires advanced mathematical, econometric and statistical skills. It’s hard to find this combination of skills in other agencies or consultancies.”

    For those marketers with time on their side, Wharton’s Research Center for Customer Analytics may be worth the wait. Visit its Lifelong Learning webpage for free video lectures by Wharton faculty members.

    Of course, organizations looking for quicker solutions to their marketing analytics problems, can check out www.agilone.com and request a demo of our easy to use, ‘data scientist in the cloud’ solution.

     

    What’s More Important: Brand Marketing or Predictive Marketing?

    For decades marketers have debated whether marketing be more art than science or vice versa.

    Today that question couldn’t be more relevant when discussing the virtues of predictive analytics versus brand experience. An article on Content Equals Money points out the dichotomy between the two, and whether the collection of  “1’s and 0’s” should win out over observation of human interaction.

    Predictive Models Result in Better Marketing Decisions

    The article makes good cases for both sides. Predictive analytics allows companies to make better decisions on what marketing content to run, where to run it, and who to target it to.
    Predictive analytics also helps suggest products customers might like, figures out what devices to optimize your campaign for, and devises the most effective calls to action for driving sales.

    But Emotions Play a Huge Part in Marketing

    Nonetheless, there’s a lot to be said for human interaction. Buying a product or spending money on a service is no longer a simple transaction based on a consumer seeing an ad, deciding to fork over some dollars, and getting a physical product in return.

    Thanks in large part to social networks; the brand experience is a huge component of gaining loyal customers. Predictive analytics simply can’t account for a customer’s emotional response to a video that’s gone viral.

    Predictive Analytics Drive Brand Experience

    While the article concludes that predictive analytics and the understanding of the brand experience should both be employed for maximum effect, I believe that the answer is both-and, not either-or. Predictive analytics can help illuminate the brand experience.

    Work we did at AgilOne for a market intelligence company bears this out. With a growing business, the company was looking for a better way to deepen engagement with its clients. Through predictive analytics, AgilOne developed a system that provided clients with timely and engaging content, customized specifically for them.

    If that’s not enhancing the brand experience, I don’t know what is.

    (Check out case studies on how AgilOne’s cloud-based predictive marketing has helped clients better understand their customers and lift the business bottom line.)

    Big Data versus Right Data: The Customer Intelligence Revolution

    If you’re like me, Forrester’s new report by analyst Fatemeh Khatibloo, Navigating the Future of Customer Intelligence is sure to light a fire under you, if not several fires. Customers are completely changing the game, and if we don’t change with them, we will miss key opportunities and our businesses may not even survive the disruption we all find ourselves in.

    We are seeing massive shifts in consumer behavior. They are mobile, they are social, and they are impatient. We have to engage them on their terms, across channels. We have to leverage the data we have about them: not just big data, but what Forrester calls “right data.”

    On top of this trend, we have the flood of information coming at us from sensors of all kinds. Some of these sensors customers are even willing to wear 24/7 to help them improve their health (Nike FuelBand, FitBit). But even setting aside these types of leading-edge consumers, we still need to find in all that data the right data that provides us with customer intelligence to drive engagement.

    We have to overhaul how we think about recognition and identification of customers to tailor our messages. To quote Forrester, “Marketers need strategies and systems in place to follow their consumers as they flit from one channel to the next. CI professionals need to consider recognition – knowing who an individual actually is – together with broader customer context – what we can know about them, all in a privacy-compliant way.”

    And we all know that there’s a sea of big data ready to help us, but how, in all of that, can we find the right data? Resources to deal with those exabytes of data are still beyond many of us. This is the elephant in the room when it comes to big data. But to be honest, this insight from the report did not discourage me at all. Although Forrester said, “the resources required to make those exabytes of data meaningful – data scientists, decisioning engines, and text and natural language processing tools – are still out of reach for many organizations,” here at AgilOne we have taken the power of machine learning and PhD analytics and packaged it for marketers just like you, and we stand ready to help you.

    You don’t have to take it from me. You can download Forrester’s report and read it in its entirety.

    Is Your Email Marketing Optimized?

    I’m very excited to share this Email Marketing Scorecard with you, a piece by leading Forrester analyst Shar VanBoskirk. This 15-page report is packed with tips to help you evaluate your email marketing programs and improve your use of email as a marketing optimization tool. The email marketing game is always changing, and Forrester continues to evolve its methodology so you can make your campaigns more effective, incorporating criteria about how easy it is to share email with friends and colleagues, how engaging email is on mobile devices, and how using interactive elements such as video can help.

    Forrester’s scorecard helps you evaluate your email marketing program from a business process and a user experience perspective since both dimensions are key and offers recommendations for gauging your program’s performance.

    Here are a few highlights. Shar suggests expanding your analytics criteria. Typical analytics criteria includes clicks, opens, and conversions. These are important, but Shar suggests supplementing them with more advanced analytics, creating a framework to determine the business value of email. The report states that “the best practice here is to measure email’s influence on a long-term customer loyalty and profitability.”

    You’ll find high level information about topics like governance as well as key marketing topics like segmentation. There are also important down to earth tips such as making sure that the header invites opening. With the volume of email coming in, a tantalizing subject line is more important than ever.

    You’ll also find gems like the following: “Successful email marketers combine preference-based segmentation with behavioral data; incorporate advanced email analytics with overall marketing analytics to optimize multichannel campaigns; and leverage long-term, profit-based metrics to benchmark their email programs.”

    I found this report thought-provoking and helpful in my own work here at AgilOne and I hope you find it just as valuable.

    It’s not the last click; it’s the social activation that matters

    We all like to be liked. But obsessing over Facebook “likes” and last-click attribution does not a social media marketing strategy make, nor is it any kind of litmus test for customer lifetime value. An eMarketer.com survey stated that 60 percent of marketers used “the number of people linking as friends, followers, or placing ‘likes’” on a brand’s Facebook page as the means of measuring social media marketing success. But according to Moontoast, only 20 percent of marketers found quantifiable ROI from social media. Coincidence? Probably not. Revenue attribution is notoriously hard and last-touch attribution leaves us with an incomplete set of clues.

    In “The Importance of Social Activation and Going Beyond Last-Click Attribution,” Marcus Whitney, CEO of Moontoast, explains that these tactics leave a lot of meaning on the table. He instead recommends trying to understand the “social activation” of your customers.

    Social activation is a much more thorough way to measure online revenue generation, because it takes a holistic view of social value across brand awareness, user engagement, and the use of multiple touch points. Whitney urges us to step away from “likes” and last-click attribution, and use something he calls the DITE Framework to drive engagement

    • Discovery: Draw casual fans through building brand awareness
    • Interaction: Create purchase intent with engaging posts
    • Transaction: Drive purchases and email subscriptions by offering high customer value
    • Endorsement: Facilitate advocacy with clear calls to action and sharing incentives

    According to a Facebook / Datalogix study cited by Whitney, campaigns focused on maximizing reach through social activation saw a 70 percent higher ROI than those that only focused on clicks. That’s not to say I think that this is easy. Revenue attribution isn’t easy to track. This is why a two-pronged approach makes sense. Develop an engagement strategy such as the one Whitney outlines. To properly attribute revenue, however, you’ll need to consolidate all of the consumer’s behavior and better assess customer lifetime value. Technologies such as  machine learning can help you leverage that data across all channels and touchpoints and give you insight about where your revenue is really coming from.

    Accenture tracks progress toward data-driven decision-making, cites rise in predictive analytics

    Recently I came across a Computer Weekly article that confirmed what I already knew: the use of predictive analytics is definitely on the rise. In the article, Brian McKenna summarized Accenture’s latest research on analytics in the enterprise. Since 2009, the use of predictive analytics has tripled. Considering that many executives still make decisions based on gut feel, this increase, from 12% in 2009 to 33% in 2012, is significant. Nick Millman, digital, data and analytics lead at Accenture, said, “There has been a reasonable shift from gut feel to data [as a basis of decision-making], and that will accelerate.”

    Other key trends uncovered by Accenture’s survey included the rise of the Chief Data Officer. About 66% of the 600 organizations surveyed have appointed a CDO and of those who haven’t, 71% plan to in the near future. This certainly stacks up with what I have seen about the critical importance of data to companies large and small.

    It’s not surprising, however, that companies are far from satisfied with their current use of analytics, with only 22% claiming that they are very satisfied. And all companies report an analytics skills gap: they need more analysts than they have. Accenture suggests building your analytics capability but notes that while data stewardship is hard to outsource—after all, who knows your data needs better than you do—getting help with predictive analytics and modeling is definitely an option. Here at AgilOne, we stand ready to help you expand your predictive analytics capabilities, offering you fresh insight with our business-friendly cloud-based solution.

    And by the way, Accenture summarized some of its findings in a cool infographic, if you’re interested.

    Ready for Analytics 3.0? Predictive Analytics Software Gets You There

    Tom Davenport is a key thought leader in the analytics space whose work I have been following. When I saw he had written an article called “Ready for Analytics 3.0?” I jumped on it. Davenport gives us a guided tour of the journey from Analytics 1.0 to 2.0 and now 3.0. He traces the evolution of analytics, talking about changes in the types of data in use, internal, external, structured, and unstructured, as well as advances in technology such as in-memory databases and predictive analytics software. Every bit as interesting is the movement of analytics from a back-office function to the center of the C-suite. As I read this, I thought about how true this is for those of us in marketing – analytics is right at the center of every marketing plan and every conversation.  

    Davenport sees Analytics 1.0 stretching across a long time period –  about 1954 to 2009. It’s characterized largely by the use of internal data from traditional systems of record. Models, when they were created, took several months to generate. Analytics was a back office function and almost no one competed with analytics, a topic that Davenport wrote a groundbreaking book about in 2007. 

    Around 2010, we enter Analytics 2.0, and we start hearing about big data. Companies start using Hadoop to process all that unstructured data and we see visual analytics begin to come to the fore, which he sees as a form of descriptive analytics. In this phase, descriptive analytics is still crowding out predictive and prescriptive analytics. Predictive analytics makes predictions about future events; prescriptive analytics uses business rules to suggest actions that we should take based on those predictions. Both of these types of analytics heavily leverage machine learning.  But I’m getting ahead of myself: that’s part of Analytics 3.0.

    Analytics 2.0 also heralded a shift in the place of analytics in the organization. Analysts move closer to the business and competing with analytics becomes the norm. Data scientists are in high demand.

    Lots of people would say we are still in Analytics 2.0, but Davenport suggests multiple reasons why he believes that we have entered Analytics 3.0. He gives several reasons:

    • Companies are combining numerous internal and external data sources to drive predictive and prescriptive models
    • Software that leverages agile analytical methods and machine learning is taking data from all of these sources and driving faster predictive and prescriptive analytics
    • Analytics is playing a central role in driving strategy to the point that it’s entering the C-suite: take a look at how many organizations now have a Chief Analytics Officer

    It’s an exciting and challenging time to be a marketer. We have so much data available; the potential is high for us to devise more effective marketing actions. But honing in on the most meaningful information to take just the right action is the key. Analytics 3.0 will take us there. And with all the attention coming from the C-suite, as Davenport says, we’ll actually compete with analytics.

    Efficient Development of Predictive Cloud Applications

    Introduction

    Basketball great Michael Jordan said, “Talent wins games, but teamwork and intelligence wins championships”. I fundamentally believe that the way a company thinks about their way of working and the structure  of teams can either become a millstone around the neck or a competitive advantage. I received quite a few questions on how we develop software. So in this blog post I will talk about how we think about engineering teams at AgilOne for building marketing focused big data predictive applications.

    Read More »

    Flavors of Lifetime Value

    Observation and definition of the question

    Although often over quoted, the term Life Time Value (LTV) is one of the most underutilized but fundamental marketing concepts. The concept is based on the belief that the customer is the unit value for an enterprise and the sum total of customer value is equal to the enterprise value ( see Customer Equity).   How useful is this concept for those involved in the daily grind of running a marketing department?

    Read More »

    Close

    Request a Demo