Lecture 23: Interfaces for Information Retrieval II SIMS

Lecture 23: Interfaces for Information Retrieval II SIMS

Lecture 23: Interfaces for Information Retrieval II SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003 http://www.sims.berkeley.edu/academics/courses/is202/f03/ IS 202 FALL 2003 2003.12.02 - SLIDE 1

Lecture Overview Review of Last Time Introduction to HCI Why Interfaces Dont Work Early Visions: Memex Interfaces for Information Retrieval II Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 FALL 2003 2003.12.02 - SLIDE 2

Lecture Overview Review of Last Time Introduction to HCI Why Interfaces Dont Work Early Visions: Memex Interfaces for Information Retrieval II Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 FALL 2003 2003.12.02 - SLIDE 3

Drawing the Circles IS 202 FALL 2003 2003.12.02 - SLIDE 4 Human-Computer Interaction (HCI) Human The end-users of a program The others in the organization The designers of the program Computer The machines the programs run on

Interaction The users tell the computers what they want The computers communicate results The computer may also tell users what the computer wants them to do 2003.12.02 - SLIDE 5 Shneidermans Design Principles

Provide informative feedback Permit easy reversal of actions Support an internal locus of control Reduce working memory load Provide alternative interfaces for expert and novice users IS 202 FALL 2003 2003.12.02 - SLIDE 6 HCI for IR Information seeking is an imprecise

process UI should aid users in understanding and expressing their information needs Help formulate queries Select among available information sources Understand search results Keep track of the progress of their search IS 202 FALL 2003 2003.12.02 - SLIDE 7 How to Design and Build UIs

Task analysis Rapid prototyping Evaluation Implementation Iterate at every stage! Design Evaluate

Prototype 2003.12.02 - SLIDE 8 Evaluation Techniques Qualitative vs. quantitative methods Qualitative (non-numeric, discursive, ethnographic)

Focus groups Interviews Surveys User observation Participatory design sessions Quantitative (numeric, statistical, empirical) User testing System testing IS 202 FALL 2003 2003.12.02 - SLIDE 9

Why Interfaces Dont Work Because We still think of using the interface We still talk of designing the interface We still talk of improving the interface We need to aid the task, not the interface to the task. The computer of the future should be invisible. IS 202 FALL 2003 2003.12.02 - SLIDE 10

What Dr. Bush Foresees Cyclops Camera Worn on forehead, it would photograph anything you see and want to record. Film would be developed at once by dry photography. Microfilm It could reduce Encyclopaedia Britannica to volume of a matchbox. Material cost: 5. Thus a whole library could be kept in a desk. Vocoder A machine which could type when talked to. But you might have to talk a special phonetic language to this mechanical supersecretary. Thinking machine A development of the mathematical calculator. Give it premises and it would pass out conclusions, all in accordance with logic. Memex

An aid to memory. Like the brain, Memex would file material by association. Press a key and it would run through a trail of facts. IS 202 FALL 2003 2003.12.02 - SLIDE 11 Interaction Paradigms for IR Direct manipulation Query specification Query refinement Result selection Delegation Agents

Recommender systems Filtering IS 202 FALL 2003 2003.12.02 - SLIDE 12 Lecture Overview Review of Last Time Introduction to HCI Why Interfaces Dont Work Early Visions: Memex Interfaces for Information Retrieval II Discussion Questions

Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 FALL 2003 2003.12.02 - SLIDE 13 HCI For IR Browsing Visualizing collections and documents Navigating collections and documents Searching Formulating queries Visualizing results

Navigating results Refining queries Selecting results IS 202 FALL 2003 2003.12.02 - SLIDE 14 Information Visualization Utility Inherently visual data Making the abstract concrete Making the invisible visible Techniques

Icons Color highlighting Brushing and linking Panning and zooming Focus-plus-context Magic lenses

Animation IS 202 FALL 2003 2003.12.02 - SLIDE 15 Mapping Logical structure of the information

Hierarchy Rank Proximity Similarity distance Term frequency History of changes Etc. IS 202 FALL 2003 Perceptual

representation of the information Outlines, trees, graphs Color, size, shape, distance Symbolic icons Animation, interaction Etc. 2003.12.02 - SLIDE 16 Task = Information Access

The standard interaction model for information access 1) 2) 3) 4) 5) 6) 7) 8) Start with an information need Select a system and collections to search on Formulate a query

Send the query to the system Receive the results Scan, evaluate, and interpret the results Stop, or Reformulate the query and go to Step 4 IS 202 FALL 2003 2003.12.02 - SLIDE 17 HCI Questions for IR Where does a user start? Faced with a large set of collections, how can a user choose one to begin with?

How will a user formulate a query? How will a user scan, evaluate, and interpret the results? How can a user reformulate a query? IS 202 FALL 2003 2003.12.02 - SLIDE 18 HCI for IR: Collection Selection Question 1: Where does the user start? IS 202 FALL 2003

2003.12.02 - SLIDE 19 Starting Points for Search Faced with a prompt or an empty entry form how to start? Lists of sources Overviews Clusters Category Hierarchies/Subject Codes Co-citation links Examples, Wizards, and Guided Tours Automatic source selection

IS 202 FALL 2003 2003.12.02 - SLIDE 20 List of Sources Have to guess based on the name Requires prior exposure/experience IS 202 FALL 2003 2003.12.02 - SLIDE 21 Old Lexis-Nexis Interface

IS 202 FALL 2003 2003.12.02 - SLIDE 22 Overviews Supervised (manual) category overviews Yahoo! HiBrowse MeSHBrowse Unsupervised (automated) groupings Clustering Kohonen feature maps

IS 202 FALL 2003 2003.12.02 - SLIDE 23 Yahoo! Interface IS 202 FALL 2003 2003.12.02 - SLIDE 24 Summary: Category Labels Advantages

Interpretable Capture summary information Describe multiple facets of content Domain dependent, and so descriptive Disadvantages Do not scale well (for organizing documents) Domain dependent, so costly to acquire May mismatch users interests IS 202 FALL 2003

2003.12.02 - SLIDE 25 Text Clustering What clustering does Finds overall similarities among groups of documents Finds overall similarities among groups of tokens Picks out some themes, ignores others How clustering works Cluster entire collection Find cluster centroid that best matches the query Problems with clustering It is expensive

It doesnt work well IS 202 FALL 2003 2003.12.02 - SLIDE 26 Scatter/Gather Interface IS 202 FALL 2003 2003.12.02 - SLIDE 27 ThemeScapes Clustering IS 202 FALL 2003

2003.12.02 - SLIDE 28 Kohonen Feature Maps on Text IS 202 FALL 2003 2003.12.02 - SLIDE 29 Summary: Clustering Advantages Get an overview of main themes Domain independent

Disadvantages Many of the ways documents could group together are not shown Not always easy to understand what they mean Cant see what documents are about Documents may be forced into one position in semantic space Hard to view titles Perhaps more suited for pattern discovery Problem: often only one view on the space IS 202 FALL 2003 2003.12.02 - SLIDE 30

HCI for IR: Query Formulation Question 2: How will a user formulate a query? IS 202 FALL 2003 2003.12.02 - SLIDE 31 Query Specification Interaction styles (Shneiderman 97) Command language Form fill Menu selection

Direct manipulation Natural language What about gesture, eye-tracking, or implicit inputs like reading habits? IS 202 FALL 2003 2003.12.02 - SLIDE 32 Command-Based Query Specification COMMAND ATTRIBUTE value CONNECTOR FIND PA shneiderman AND TW interface

What are the ATTRIBUTE names? What are the COMMAND names? What are allowable values? IS 202 FALL 2003 2003.12.02 - SLIDE 33 Form-Based Query Specification IS 202 FALL 2003 2003.12.02 - SLIDE 34

Form-Based Query Specification IS 202 FALL 2003 2003.12.02 - SLIDE 35 Direct Manipulation Query Specification IS 202 FALL 2003 2003.12.02 - SLIDE 36 Menu-Based Query Specification

IS 202 FALL 2003 2003.12.02 - SLIDE 37 Natural Language Query AskJeeves http://www.ask.com/ IS 202 FALL 2003 2003.12.02 - SLIDE 38 HCI for IR: Viewing Results

Question 3: How will a user scan, evaluate, and interpret the results? IS 202 FALL 2003 2003.12.02 - SLIDE 39 Display of Retrieval Results Goal Minimize time/effort for deciding which documents to examine in detail Idea Show the roles of the query terms in the

retrieved documents, making use of document structure IS 202 FALL 2003 2003.12.02 - SLIDE 40 Putting Results in Context Interfaces should Give hints about the roles terms play in the collection Give hints about what will happen if various terms are combined Show explicitly why documents are retrieved

in response to the query Summarize compactly the subset of interest IS 202 FALL 2003 2003.12.02 - SLIDE 41 Putting Results in Context Visualizations of query term distribution KWIC, TileBars, SeeSoft, Virtual Shakespeare Visualizing shared subsets of query terms InfoCrystal, VIBE

Table of contents as context SuperBook, Cha-Cha IS 202 FALL 2003 2003.12.02 - SLIDE 42 KWIC (Keyword in Context) IS 202 FALL 2003 2003.12.02 - SLIDE 43 TileBars

Graphical representation of term distribution and overlap Simultaneously indicate Relative document length Query term frequencies Query term distributions Query term overlap IS 202 FALL 2003

2003.12.02 - SLIDE 44 TileBars Example Query terms: DBMS (Database Systems) Reliability What roles do they play in retrieved documents? Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with

a subtopic discussion on DBMS/ Reliability Mainly about high-tech layoffs IS 202 FALL 2003 2003.12.02 - SLIDE 45 TileBars Example IS 202 FALL 2003 2003.12.02 - SLIDE 46 SeeSoft (Eick & Wills 95)

IS 202 FALL 2003 2003.12.02 - SLIDE 47 David Small: Virtual Shakespeare IS 202 FALL 2003 2003.12.02 - SLIDE 48 Other Approaches Show how often each query term occurs in sets of retrieved documents

VIBE (Korfhage 91) InfoCrystal (Spoerri 94) IS 202 FALL 2003 2003.12.02 - SLIDE 49 VIBE (Olson et al. 93, Korfhage 93) IS 202 FALL 2003 2003.12.02 - SLIDE 50 InfoCrystal (Spoerri 94)

IS 202 FALL 2003 2003.12.02 - SLIDE 51 Problems with InfoCrystal Cant see proximity or frequency of terms within documents Quantities not represented graphically More than 4 terms hard to handle No help in selecting terms to begin with IS 202 FALL 2003

2003.12.02 - SLIDE 52 Cha-Cha (Chen & Hearst 98) Shows TableOf-Contentslike view, like SuperBook Focus+Context using hyperlinks to create the TOC

Integrates Web Site structure navigation with search IS 202 FALL 2003 2003.12.02 - SLIDE 53 HCI for IR: Query Reformulation Question 4: How can a user reformulate a

query? IS 202 FALL 2003 2003.12.02 - SLIDE 54 Query Reformulation Thesaurus expansion Suggest terms similar to query terms Relevance feedback Suggest terms (and documents) similar to retrieved documents that have been judged to be relevant

More like this interaction IS 202 FALL 2003 2003.12.02 - SLIDE 55 Relevance Feedback Modify existing query based on relevance judgements Extract terms from relevant documents and add them to the query And/or re-weight the terms already in the query Two main approaches

Automatic (pseudo-relevance feedback) Users select relevant documents Users/system select terms from an automatically generated list IS 202 FALL 2003 2003.12.02 - SLIDE 56 Revealing Internals Opaque (black box) (Like web search engines)

Transparent (See used terms after Relevance Feedback ) Penetrable (Choose suggested terms before Relevance Feedback ) Which do you think worked best? IS 202 FALL 2003 2003.12.02 - SLIDE 57 Effectiveness Results Subjects using Relevance Feedback

showed 17% - 34% better performance than without Relevance Feedback Subjects with penetration case did 15% better as a group than those in opaque and transparent cases IS 202 FALL 2003 2003.12.02 - SLIDE 58 Summary: Relevance Feedback Iterative query modification can improve precision and recall for a standing query In at least one study, users were able to

make good choices by seeing which terms were suggested for Relevance Feedback and selecting among them So more like this can be useful! IS 202 FALL 2003 2003.12.02 - SLIDE 59 Summary: HCI for IR Focus on the task, not the tool Be aware of User abilities and differences Prior work and innovations

Design guidelines and rules-of-thumb Iterate, iterate, iterate It is very difficult to design good UIs It is very difficult to evaluate search UIs Better interfaces in future should produce better IR experiences IS 202 FALL 2003 2003.12.02 - SLIDE 60 Lecture Overview Review of Last Time Introduction to HCI

Why Interfaces Dont Work Early Visions: Memex Interfaces for Information Retrieval II Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 FALL 2003 2003.12.02 - SLIDE 61 Discussion Questions Arthur Law on Interfaces for IR Using visualization in web information retrieval

revealed poor results for navigation. However, this study was conducted in 1998. Are people more accustomed to these tools now with websites such as "http:// www.smartmoney.com/marketmap/"? Perhaps this method of navigation will be better for the computer generation and their higher comfort level for using the web. IS 202 FALL 2003 2003.12.02 - SLIDE 62 Discussion Questions

Arthur Law on Interfaces for IR There are various examples of command line approaches and visual approaches. Individuals perform differently with each method so will the next step involve combining these methods to optimize each person's task of information retrieval? Or will a dominant company, i.e., LexisNexis or Google enforce one method of doing queries? IS 202 FALL 2003 2003.12.02 - SLIDE 63

Discussion Questions Paul Laskowski on Interfaces for IR MIR describes at least six sources of contextual information for the documents returned by a query: metadata, term scores, location of terms in each document, combinations of terms present in each document, tables of contents, and hyperlink structure. Which of these sources provides the most help for selecting relevant documents (or does it depend on the task)? Which types of context can help with reformulating a query? In the case of the location of terms, several tools are listed that graphically show where terms are placed in each document. I imagine using this to select documents where the terms appear in the same paragraph. Should this process be automated so that documents

score higher when the search terms are near to each other? In what other ways might I use this information? IS 202 FALL 2003 2003.12.02 - SLIDE 64 Discussion Questions Brooke Maury on Interfaces for IR In chapter 10.7, Hearst discusses an application developed by Kozierok and Maes that keeps track of a users activities and makes recommendations based on previous action or situations. What impact does this assistant/agent application have on

privacy? Is this too heavy a price to pay for achieving a positive human computer exchange or a more successful retrieval? If a system is charged with looking over the shoulder of a user, is there an ethical imperative to encrypt that information or otherwise provide safeguards against the misuse or abuse of that information? IS 202 FALL 2003 2003.12.02 - SLIDE 65 Discussion Questions Brooke Maury on Interfaces for IR The study by Koenemann and Belkin

suggests that the most effective systems will allow users total control and access to what information is used for decision-making (They call such applications penetrable.). The system developed by Kozierok & Maes makes a number of important decisions without input from the user. Should K & Ms application be more penetrable? IS 202 FALL 2003 2003.12.02 - SLIDE 66 Discussion Questions

Dan Perkel on Interfaces for IR While the web "has suddenly made vast quantities of information available globally" (MIR, 322) some would argue that it also comes at the price of a giant step backwards in terms of interfaces (As one example, compare the functionality of and types of interaction allowed by an email web app such as YahooMail/HotMail with an email client such as Eudora/Outlook/AppleMail). What does this say about the future of visualization techniques for IR? What needs to happen (technically, business-wise, other) for a top search engine to add an interactive visualization component to its search results? IS 202 FALL 2003

2003.12.02 - SLIDE 67 Discussion Questions Joseph Hall on Interfaces for IR In section 10.9 of MIR: "The field of information visualization needs some new ideas about how to display large, abstract information spaces intuitively. The seems to be the "holy grail" of HCI. Something that can intuitively deal with large information spaces... with feeble human brains providing imperfect queries. For example, a nowhere-near feeble brain and pretty direct query is evidenced by danah boyd's most recent blog entry: turtles all the way down http://www.zephoria.org/thoughts/archives/000889.html#000889 In this blog entry, danah has already queried the state-of-the-art search tool, Google,

and unfortunately came across conflicting results. While Google can handle large information spaces sometimes the PageRank algorithm is just not enough. Seeing as humans tend to think in terms of "concentration"[1], what are some of the "penetrable" ways that IR tools could more effectively facilitate the human thought process instead of simply retrieving information? [1] An old card game that requires remembering exactly where you saw a certain card for retrieval later. IS 202 FALL 2003 2003.12.02 - SLIDE 68 Lecture Overview Review of Last Time

Introduction to HCI Why Interfaces Dont Work Early Visions: Memex Interfaces for Information Retrieval II Discussion Questions Action Items for Next Time Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack IS 202 FALL 2003 2003.12.02 - SLIDE 69 Next Time Wishter DEMO!

Final Exam Review IS 202 FALL 2003 2003.12.02 - SLIDE 70

Recently Viewed Presentations

  • BD FACSDiVa Software - TU Dresden

    BD FACSDiVa Software - TU Dresden

    BD FACSDiVa 4.1 An introduction FACSDiVa Software The new digital acquisition platform introduces a complete new software concept from BD The software runs on a Windows2000 computer, Macintosh systems are no longer available FACSDiVa Software The software uses a database...
  • The Depression in Canada

    The Depression in Canada

    Bennett's new deal. Bennett became increasingly isolated and faced major dissent both in the party and the country. In January1935, Bennett began a series of live radio speeches outlining a "New Deal" for Canada which supported government control and regulation
  •  Name/ Lamiaa Fathy Asal  Specialization / poultry  Advisor

    Name/ Lamiaa Fathy Asal Specialization / poultry Advisor

    **GST specific activity: lmol/h/mg protein. Table 1:Thiobarbituric acid-reactive substances (TBARS) and . glutathion-Stransferase (GST) activity in plasma, liver, testes and brain during treatment of male rabbits with either 2.5 or 5 mg/kg doses of . isoflavones
  • The Rise of American Nationalism

    The Rise of American Nationalism

    rufus king - federalist. war over, feds shot themselves in the foot. republicans comandeered bank, tariffs. ... d. clay's american systema symbol of harmony or sign of discord? bank, tariff, & internal improvements. foster national market economy.
  • What is law?

    What is law?

    Action: What is Law note and discussion. Roncarelli v Duplessis case. ... Today we consider equality the heart of justice. ... R v Dudley and Stevens (1884), 14 QBD 273. Concept of Justice. What do you think the blindfold, the...
  • TTC: CW 2013 Operational Experience with the Cornell

    TTC: CW 2013 Operational Experience with the Cornell

    Bruce Dunham - TTC Topical Meeting on CW-SRF 2013. Excessive Coupler Heating. We found that the couplers get quite warm during high power operations. The flow to the HOM's and couplers are in parallel, but we discovered the coupler tubing...
  • Metaphorical phrasal quantifiers and synonymy in a cross-linguistic

    Metaphorical phrasal quantifiers and synonymy in a cross-linguistic

    Metaphorical phrasal quantifiers and synonymy in a cross-linguistic perspective Ramón Martí Solano, University of Limoges, France Raluca Nita, University of Poitiers, France
  • Developing a Long-term Learning Progression for Energy in

    Developing a Long-term Learning Progression for Energy in

    Intermediate Levels describe students' reasoning resulted from the intersection of their intuitions and current school science. Lower Anchor is about students' naïve causal reasoning when they enter schools. The lower anchor and intermediate levels were developed based on assessment data.