Using Identity Credential Usage Logs to Detect Anomalous ...

Using Identity Credential Usage Logs to Detect Anomalous ...

Using Identity Credential Usage Logs to Detect Anomalous Service Accesses ACM DIM 2009, Chicago, IL, 2009 Daisuke Mashima Dr. Mustaque Ahamad College of Computing Georgia Institute of Technology Atlanta, GA, USA Increasing Risk of Identity Theft Variety of online identity credentials Passwords, certificates, SSN, credit card number, etc.

Loss and theft are possible and common Consequence of online identity theft Impersonation Disclosure of sensitive information Financial loss 2 To counter such threats Online service providers are required to Analyze huge amount of log records to identify suspicious service accesses Investigate identified records extensively

In reality Significant reliance on human experts Not processed in real-time basis Automated mechanism to monitor identity usage (service accesses) is desired. 3 Outline

Observations from real data sets Our approach Anomaly-based risk scoring scheme Preliminary evaluation Conclusion / Future Work 4 Buzzport Access Log 5 Buzzport Access Log

380484533347391, 380484533347391, 380484533347391, 380484533347391, 380484533347391, 24/08/2007 14:07:05, 27/08/2007 08:01:14, 27/08/2007 08:04:36, 27/08/2007 12:05:36, 31/08/2007 14:31:43, 24/08/2007 14:18:46 27/08/2007 08:02:54

27/08/2007 08:16:05 27/08/2007 12:18:15 31/08/2007 14:38:08 Contain only (Anonymized) User ID Login timestamp Logout timestamp 6 Another data set Log records of a portal of online trading company

The following items are available: User ID Coarse Action Type (Login / Logout) Timestamp IP Address Organization Name etc. 7 Observations and Considerations Available information is quite limited. Typical fraud detection systems rely on much richer information Data are not labeled.

Supervised techniques are not available. Limited types of events can be observed. Schemes relying on event sequence or state transition have limited applicability. 8 Our Approach Utilize attributes derived from an individual identity usage record Timestamp (day-ofweek etc.), IP address, etc. Focus on categorical attributes Build user profile based on occurrence

frequency of each attribute value Determine risk scores based on frequency information 9 User Profile Management Defined as a frequency distribution of attribute values (categories) One profile for one attribute Multiple profiles can be defined per user. Day-of-week profile, hour-of-day profile, and so forth Updated upon receipt of each log record Simply increment occurrence counters corresponding

to the attribute values in the record Data aging can be easily implemented Periodically multiply all counters with some decay factor 10 Base Score and Weight Base score represents how unlikely an observed users access is. BaseScore = -log (RelativeFrequency) Score weight quantifies the effectiveness of each attribute for profiling.

When an attribute well characterizes users identity usage pattern, the value should be high. How can we quantify it? 11 Score Weight Use distance between the frequency distribution and uniform distribution as weight Bhattacharyya Distance etc. Data aging is necessary. 0.25 0.2 0.15

0.1 23 21 17 15 19 Hour of Day

13 11 9 7 5 0 3

0.05 1 Relative Frequency 12 Score Aggregation Sub Score (a product of a base score and the corresponding weight) are computed. Sub Score is computed for each profile. How can we combine Sub Scores?

Pick the MAX of Sub Scores Weighted sum of Sub Scores Others? 13

Setting of Experiments Buzzport data set Profiling attributes Week of month (5 categories) Day of week (7 categories) Hour of Day (24 categories) Scale Sub Scores in [0, 100) Use MAX of 3 Sub Scores as output 14 Trends of Risk Scores 15

Trends of Risk Scores with Data Aging Decay Factor = 0.5 is applied monthly. 16 False Positive / True Positive Analysis Randomly pick 5 users with different access frequency Split each users log records into two: Test data: last 1 month Training data: Rest of them

Analyze False Positive rate by using the same users training data and test data Analyze True Positive rate by using different users data sets (a.k.a Cross Profiling) 17 False Positive / True Positive Results * Each users threshold is determined based on the score range of the training data. 18 Time / Storage Cost Measured on Linux PC with Intel Core 2 Duo E6600 and 3GM RAM

Average time per record: 5ms Good enough for real-time processing Storage space per user: 1.4KB Potential to accommodate a large number of users 19 Conclusion Defined design principles for risk scoring based on identity usage logs Proposed a way to compute anomalybased risk scores in real-time basis Presented a prototype system using time

stamp information and showed that it has reasonably good accuracy 20 Future Work Investigate other attributes (E.g. location) Conduct detailed experiments Evaluate with other data sets Find the optimum configuration Integrate into other security mechanisms 21

Thank you very much. Questions? [email protected] http://www.cc.gatech.edu/~mashima 22

Recently Viewed Presentations

  • Opening Welcome to Statistics Norway - EFGS

    Opening Welcome to Statistics Norway - EFGS

    Opening Welcome to Statistics Norway Per Morten Holt ... Central Statistics Office Brian Costello Irland Central Statistics Office Kevin Healy Ireland Hungarian Central Statistical Office Linda Mohay Hungary DESTATIS Timm Urke Germany DESTATIS Anke Seidel Germany European Central Bank Pedro...
  • Mobile Technology in the Control Room - University of Virginia

    Mobile Technology in the Control Room - University of Virginia

    Benefits realization from tools that harness mobile devices depends on users. Attention paid to 'soft' factors is as critical as investment in the technology itself. Incremental shifts can still mean transformation is happening!Change management is key. Shifts in technology and...
  • Closing the Gaps - Reducing Inequalities in Educational Outcomes

    Closing the Gaps - Reducing Inequalities in Educational Outcomes

    closing the gaps - reducing inequalities in educational outcomes birmingham achievement group seminar january 2007 john hill research& statistics
  • Northwest Wisconsin Volleyball Club (NWVBC) All Club Meeting

    Northwest Wisconsin Volleyball Club (NWVBC) All Club Meeting

    Our "Why" Where will it happen? Practices at Spooner High School, Spooner, WI and Barron Area Community Center. When will it occur? "Season" begins after Thanksgiving for practices, tournaments are generally from January through March 2019 ( National teams extend...
  • Atomic Structure - hudson.k12.oh.us

    Atomic Structure - hudson.k12.oh.us

    YouTube - ?Reaction of Sodium & Chlorine (with subtitles)?? SALT = Synonym for an ionic compound Not just NaCl Iron Sulfide Copper Sulfate Sodium chloride LET'S REVIEW: Which atoms combine… with other atoms? Ones that do not have filled outer...
  • Whitewater Kayak Slalom Race Timer - Simon Fraser University

    Whitewater Kayak Slalom Race Timer - Simon Fraser University

    Whitewater Kayak Slalom Race Timer Engineers: Kevin Lockwood Chris Munshaw ... Modulation Receiver, modulated: (Breadboard) Appendix: Demodulation RLC Bandpass Filter H(s)= Using R=1, C=6.33uF, L=1mH Project Funded By: Mike Neckar Founder, Necky Kayaks www.necky.com Background on Whitewater ...
  • Jeopardy - MRS. KNIGHT'S 7TH GRADE MATH CLASS 2017-2018

    Jeopardy - MRS. KNIGHT'S 7TH GRADE MATH CLASS 2017-2018

    Review Word Problems Improper Divide Multiply Mixed Q $100 Q $100 Q $100 Q $100 Q $100 Q $200 Q $200 Q $200 Q $200 Q $200 Q $300 Q $300 Q $300 Q $300 Q $300 Q $400
  • Chapter 9 The Computer Industry: History, Careers, and

    Chapter 9 The Computer Industry: History, Careers, and

    FastPoll True/False QuestionsAnswer A for True and B for False. 090100 Charles Babbage invented the first digital circuits. 090200 The ABC, Harvard Mark I, COLOSSUS, and ENIAC can be classified as computer prototypes.