Proteomic Analysis of the MCF7 Breast Cancer
Cell Line


Laboratory of Protein Biochemistry and Proteomics, UMR CNRS 7033 (BioMoCeTi),
UFR SMBH, Paris13 University, 93017 Bobigny Cedex, France

Abstract. Background: The MCF7 breast cancer cell line is a cellular model for breast cancer studies and marker discovery. Therefore, a better knowledge of its proteome is a prerequisite for a more efficient use of this model. Materials and Methods: Proteins expressed during the exponential growth phase of MCF7 cells were analyzed and mapped using two-dimensional gel electrophoresis and mass spectrometry. Results: From the spots excised from preparative gels of whole-cell extracts, a subset of 368 different polypeptides, corresponding to 249 different proteins, was identified. These polypeptides were positioned on a silver-stained gel to construct a reference map. Conclusion: The data allowed the construction of the most extensive reference map for MCF7 published to date, with 189 novel proteins, which had not been previously listed on maps, and are now accessible on World 2D-PAGE database, providing a basis for further studies on MCF7

*Both authors contributed equally to this work.

Correspondence to: Raymonde Joubert-Caron, UMR CNRS 7033, Protein Biochemistry and Proteomics Laboratory, UFR SMBH Leonard de Vinci, 74 rue Marcel Cachin, F-93017 Bobigny Cedex, France. Tel: +33 148 387 754; Fax: +33 148 387 313,

Key Words: MCF7 cell line, proteome, breast cancer, mass spectrometry, 2-DE databases.

Breast cancer is the second leading cause of cancer death in women (1, 2). The MCF7 cell line is commonly used as a cellular model for breast cancer studies and marker discovery (3-7), hence information on the MCF7 proteome is a prerequisite for a better understanding of this model. Proteomics has become a popular approach for the analysis, identification and characterization of proteins, combining large-scale protein separation with mass spectrometry and bioinformatics (8-10). Two-dimensional gel electrophoresis (2-DE) (11) remains the most frequently used approach for global protein analysis and the identification of protein isoforms, because it allows the separation and comparison of several thousand proteins on a single gel (12). Consequently, a number of dedicated 2-DE image databases have been constructed, providing maps for different cells and tissues (13, 14). However, few up-to-date informative maps dedicated to breast tissue or breast cancer cell lines are available on the web ( 2d.html). This study used a global proteomic approach based on a combination of 2-DE protein separation, image analysis and mass spectrometry analysis (MALDI-TOF) for an extensive identification of 368 polypeptides, corresponding to 249 proteins and 56 isoforms. The use of bioinformatics tools provided useful information on functional pathways and the cellular localization of the identified proteins.

Materials and Methods

Two-dimensional gel electrophoresis of MCF7 protein extract. The cell pellet of MCF7 cells grown to confluence, prepared as described previously (15, 16), was resuspended at 0oC in lysis buffer (25 mM Tris, 10 mM EDTA, 7 M urea, 2 M thiourea, 5% v/v glycerol, 0.33% v/v CHAPS, 0.35% v/v Triton X100, 0.35% w/v sulfobetaine 3-10, 10% v/v isopropanol, 12.5% v/v isobutanol, 100 mM DTT, 1 mM orthovanadate and protein inhibitor cocktail) (17). The lysate was then centrifuged twice at 150,000 xg for 25 min at 4oC.
For 2-DE, isoelectric focusing (IEF) (15, 16) was carried out on 17 cm IPG strips in a linear pH gradient (pH 5-8) using the Protean IEF Cell system (Bio-Rad, Richmond, CA, USA). Analytical-run IPG strips were rehydrated with 150 µg of protein in rehydration medium and 2% v/v carrier ampholyte mixture (Pharmalytes 5-8, GE Healthcare, Uppsala, Sweden). Focalization was complete at 90,000 Vhs. Preparative IPG strips were loaded with 500 µg of protein and 2% v/v carrier ampholyte by in-gel passive rehydration. IEF was then performed until a total of 195,000 Vhs was reached. After focusing, analytical and preparative gel strip equilibration occurred in 6 M urea, 60 mM SDS, 65 mM DTT, 30% v/v glycerol and 0.05 M Tris-HCl pH 8.6 for 15 min, then for a further 20 min with 53 mM iodoacetamide instead of DTT. The two-dimensional gels (18 cm x 20 cm x 1 mm, 8-18.5 polyacrylamide) were run in a Protean Plus Dodeca cell from Bio-Rad. Analytical gels were stained with silver nitrate as described previously (16), while preparative gels were stained with Colloidal Coomassie blue. Analysis of protein abundance and experimental determination of pI and Mr were done using Image Master Platinum 5 software (GE Healthcare, Uppsala, Sweden).

Mass spectrometric analysis of MCF7 protein spots. Individual spots were excised from a preparative gel using the Proteineer SP spot picker (Bruker Daltonique, Wissembourg, France). In-gel trypsin digestion was carried out using the GE Healthcare Ettan Digester, as described previously (18). A sandwich spotting method was used for peptide mass fingerprinting (PMF), with α-cyano-4-hydroxy-cinnamic acid (HCCA) as matrix, on a Biflex IV (Bruker Daltonique) instrument (16, 18). Using the Mascot Server software package with the Mascot Daemon client application (Matrix Science Ltd, London, UK), mass spectra of tryptic digest peptides were sought in batch mode against a downloaded copy of the biweekly updated UniProtKB/Swiss-Prot database ( sprot/download.html). The key parameters used were: taxonomy: Homo sapiens (human); fixed modification: carbamidomethyl (C); variable modification: methionine, monoisotopic; peptide mass tolerance: 0.2 Da; peptide charge state: +1; maximum missed cleavages: 1. Functional annotation of the set of identified proteins was performed using the FatiGO tool available on Babelomics (

Results and Discussion

Protein identification and 2-DE map. A total of 727 picked spots were subjected to trypsinolysis and the resulting peptide fragments subjected to PMF identification using MALDI-TOF MS. Three hundred and sixty eight spots, showing a large range of molecular weights, pI and spot intensity, were successfully identified (Table I). The identification rate was approximately 47% (368 out of 727 spots). This rate was lower than that usually obtained in our mapping experiments (14, 19). This is probably related to the use of robotic spot excision instead of the manual picking previously used in our lab. Manual picking favors the excision of large spots, which are more visible on the gels, while automation allows systematic picking independently of the size of the spot.
It is well known that accurate identification via PMF is favored by a sufficient number of peptides for matching. Thus, the relationship between the rate of identification and the protein abundance (%Vol) for the 727 spots subjected to trypsinolysis was examined (Table II). The correlation coefficient (r2) between these two parameters was 0.968 (Figure 1). The rate of successful identification was 92.3% for the spots with a %Vol >1, while it was less than 48% for the spots with a %Vol ranging from 0.05% to 0.09%. These results confirmed that the amount of protein found in a given spot is crucial for its subsequent identification.
A comprehensive 2-DE map was annotated with the spot ID of proteins listed in Table I (Figure 2). Correlating protein expression profile with cell physiology requires an estimation of the relative abundance of each protein. Thus, to provide a base for further studies on MCF7, the relative abundance of spots determined on the map is reported in Table I and illustrated in Figure 3. The values were <0.1% for most of the spots detected on MCF7 gel (1039/1317).

Protein isoforms. The 368 spots identified corresponded to 249 distinct proteins. Of these, 56 proteins (176 spots) were present in at least 2 isoforms (Table III). These data are informative, but probably incomplete because not all the isoforms were excised and identified for each protein resolved in the gel. Generally, the isoforms affected the pI value more than the molecular weight, suggesting the presence of modifications affecting the charge of the polypeptides. The presence of numerous isoforms for the same protein highlights the complexity of the MCF7 proteome. For instance, the protein P05787 (Keratin, type II cytoskeletal 8) was found in at least in 21 spots. The molecular weights of these spots ranged from 42 to 57 kDa, with most spots being found as trains at 51 kDa. Some of these spots were distributed along a large streak and may be a result of a problem of focalization of this abundant protein. But it is tempting to explain many isoforms by PTMs. Using the information available in the Swiss-Prot database in conjunction with careful examination of the experimental spectra, we were able to analyze PTMs. For instance, HNRH1_HUMAN (P31943) was identified in seven spots with pI ranging from 6.36 to 7.04 and Mr from 47,600 to 52,600. Annotations for PTMs on this protein in Swiss-Prot indicate the possibility of an acetylation of MET1, as well as phosphorylation on SER22, THR99, SER103, TYR245 and 305. The mass spectra obtained for the seven digested spots were comparable. However, phosphorylation of THR 99 and SER 103 were deduced from the observation of the pertinent ions in the spectrum (Figure 4). Significant information on PTMs can be directly deduced from MALDI-TOF spectra, as exemplified in the protein CSDE1_HUMAN (O75534). The interpretation of the spectra of the spots ID 607 (pI 6.54/Mr 88438) and ID 608 (pI 6.61/Mr 88438) indicated that the MET signal was cleaved. The sequence begins, therefore, with a serine. From the presence of the ion at m/z 2608.22, it can be deduced that for spot ID 608, SER1 is acetylated, while this acetylation was not observed for spot ID 607.
In addition to PTMs, isoforms are produced by the replacement of an amino acid. This occurred in the case of the BLVRB_HUMAN protein (P30043), identified in two spots ID 1729 (pI 7.45/Mr25106) and 1766 (pI 7.68/Mr 23346), respectively. A variant amino acid at position 45 is annotated in Swiss-Prot. ARG can be replaced by GLN. Experimentally, spot 1766 contains an ARG, whereas spot 1729 contains GLN residue in position 45. These two amino acids differ in the charge of their side chains, which may explain the different pI observed on 2-DE gels for this protein.

Figure 1. Scatter plot of the relation between protein abundance and rate of successful identification. Protein abundance was calculated in %Vol. The values of %Vol for all the spots detected on the gel were distributed in 8 classes as reported in Table II.

Figure 2. Annotated 2-DE map of MCF7. Cells were collected during the exponential growth phase. Whole cells (~2x108) were lysed in 1 ml of extraction buffer. 150 µg of the cell extract were loaded on 5-8 IPG gradient strip (17 cm-length). The labels show the spot ID in relation to the identifications listed in Table I.

Functional classification and localization. Mining a proteome currently needs a lot of time and effort in seeking all the available information. This is further hampered by the wide variation in the use of vocabulary that inhibits effective searching. The classification of identified proteins in biological processes and molecular functions based on the Gene Ontology (GO) terms facilitates annotation.
According to GO, a biological process is a series of events accomplished by one or more ordered assemblies of molecular functions. As it is often difficult to distinguish between a biological process and a molecular function, the general rule is that a process must have more than one distinct step. FatiGo (20) was used for MCF7 protein classification. Among the 249 identified proteins, 6 genes were not annotated in databases. The analysis was then carried out on 243 genes. The distribution of MCF7 proteins in various biological and molecular processes is reported in Table IV. As expected 183 proteins (91%) are involved in cellular physiology: 167 are directly implicated in metabolism, 43 participate in targeting (GO term: localization), and 24 are implicated in cell communication.

Figure 3. Histograms illustrating the relative dynamics of the MCF7 protein abundance after 2-DE separation. Protein abundance (%Vol) was evaluated by the image analysis software. The signal intensity (O. D.) measured for each spot on the gel was below the saturation threshold. Classes of percentage of volume are shown in Table II. The volume values were <0.1 for most of the spots (1039/1317).

Figure 4. A: MALDI mass spectrum of HNRH1_HUMAN (P31943) identified from the spot ID 1138. B: Table summarizing the tryptic peptides identified on the mass spectrum. M: oxidized methionine, pS: phosphorylated serine, pT: phosphorylated threonine;
*peptide contains phosphorylated SER 99 and THR 103.


The generation of a comprehensive MCF7 2-DE reference map represents a base for studies of the MCF7 proteome. As illustrated in this paper, the use of bioinformatics tools allows a deeper analysis of the proteins identified in a proteome, not only for differential studies but also for cell profiling. Our results allowed the construction of the most extensive reference map of MCF7 published so far, with 189 novel proteins, which have never been listed on maps, and are now accessible on World 2D-PAGE. This 2-DE map will be available on our website at: Biochemistry/Biochimie/bque.htm.


This work was supported by the Ministere des Finances et de l’Industrie (NODDICCAP contract), by the European Union (European Regional Development Fund) and by the Association de la Recherche contre le Cancer.


1. Hortobagyi GN, de la Garza Salazar J, Pritchard K, Amadori D, Haidinger R, Hudis CA, Khaled H, Liu MC, Martin M, Namer M, O'Shaughnessy JA, Shen ZZ and Albain KS: The global breast cancer burden: variations in epidemiology and survival. Clin Breast Cancer 6: 391-401, 2005.
2. Jemal A, Tiwari RC, Murray T, Ghafoor A, Samuels A, Ward E, Feuer EJ and Thun MJ: Cancer statistics, 2004. CA Cancer J Clin 54: 8-29, 2004.
3. Wang D, Jensen RH, Williams KE and Pallavicini MG: Differential protein expression in MCF7 breast cancer cells transfected with ErbB2, neomycin resistance and luciferase plus yellow fluorescent protein. Proteomics 4: 2175-2183, 2004.
4. Chopin V, Slomianny C, Hondermarck H and Le Bourhis X: Synergistic induction of apoptosis in breast cancer cells by cotreatment with butyrate and TNF-alpha, TRAIL, or anti-Fas agonist antibody involves enhancement of death receptors' signaling and requires P21(waf1). Exp Cell Res
298: 560-573, 2004.
5. Gehrmann ML, Fenselau C and Hathout Y: Highly altered protein expression profile in the adriamycin resistant MCF-7 cell line. J Proteome Res 3: 403-409, 2004.
6. Gehrmann ML, Hathout Y and Fenselau C: Evaluation of metabolic labeling for comparative proteomics in breast cancer cells. J Proteome Res 3: 1063-1068, 2004.
7. Hathout Y, Gehrmann ML, Chertov A and Fenselau C: Proteomic phenotyping: metastatic and invasive breast cancer. Cancer Lett 210: 245-253, 2004.
8. Englbrecht CC and Facius A: Bioinformatics challenges in proteomics. Comb Chem High Throughput Screen 8: 705-715, 2005.
9. Righetti PG, Castagna A, Antonucci F, Piubelli C, Cecconi D, Campostrini N, Zanusso G and Monaco S: The proteome: Anno Domini 2002. Clin Chem Lab Med 41: 425-438, 2003.
10. Thiede B, Hohenwarter W, Krah A, Mattow J, Schmid M, Schmidt F and Jungblut PR: Peptide mass fingerprinting. Methods 35: 237-247, 2005.
11. Gorg A, Weiss W and Dunn MJ: Current two-dimensional electrophoresis technology for proteomics. Proteomics 4: 3665-3685, 2004.
12. Pietrogrande MC, Marchetti N, Dondi F and Righetti PG: Decoding 2D-PAGE complex maps: Relevance to proteomics. J Chromatogr B Analyt Technol Biomed Life Sci 833: 51-62, 2006.
13. Hoogland C, Sanchez JC, Walther D, Baujard V, Baujard O, Tonella L, Hochstrasser DF and Appel RD: Two-dimensional electrophoresis resources available from ExPASy. Electrophoresis 20: 3568-3571, 1999.
14. Poirier F, Pontet M, Labas V, le Caer JP, Sghiouar-Imam N, Raphael M, Caron M and Joubert-Caron R: Two-dimensional database of a Burkitt lymphoma cell line (DG 75) proteins: protein pattern changes following treatment with 5'-azycytidine. Electrophoresis 22: 1867-1877, 2001.
15. Joubert-Caron R and Caron M: Proteome analysis in the study of lymphoma cells. Mass Spectrom Rev 24: 455-468, 2005.
16. Pionneau C, Canelle L, Bousquet J, Hardouin J, Bigeard J, Caron M and Joubert-Caron R: Proteomic analysis of membrane-associated proteins from the breast cancer cell line MCF7. Cancer Genom Proteom 2: 199-208, 2005.
17. Canelle L, Bousquet J, Pionneau C, Deneux L, Imam-Sghiouar N, Caron M and Joubert-Caron R: An efficient proteomics-based approach for the screening of autoantibodies. J Immunol Methods 299: 77-89, 2005.
18. Canelle L, Pionneau C, Marie A, Bousquet J, Bigeard J, Lutomski D, Kadri T, Caron M and Joubert-Caron R: Automating proteome analysis: improvements in throughput, quality and accuracy of protein identification by peptide mass fingerprinting. Rapid Commun Mass Spectrom 18: 2785-2794, 2004.
19. Caron M, Imam-Sghiouar N, Poirier F, Le Caer JP, Labas V and Joubert-Caron R: Proteomic map and database of lymphoblastoid proteins. J Chromatogr B Analyt Technol Biomed Life Sci 771: 197-209, 2002.
20. Al-Shahrour F, Minguez P, Vaquerizas JM, Conde L and Dopazo
J: Babelomics: a suite of web-tools for functional annotation and analysis of group of genes in high-throughput experiments. Nucleic Acids Res (Web Server issue) 33: 460-464, 2005.

Received November 14,2006
Accepted November 26, 2006


Copyright © 2005 Cancer Genomics & Proteomics. All rights reserved